Tokens in C - A Complete Guide | upGrad (2024)

In C, tokens are the smallest meaningful elements used to create a program. They include keywords, identifiers, constants, string literals, operators, punctuation marks, and special symbols. When a C program is compiled, it is broken down into these tokens, enabling the compiler to analyse and understand the program's structure.

Tokenization is a crucial step of the compilation process, as it allows the compiler to generate executable code from the provided C program by organising and categorising its individual elements.

How Many Tokens in C

Tokens are fundamental building blocks used in the C language to construct programs. In C, a token is defined as the smallest individual element that holds significance to the compiler's functioning.

  • Keywords: These are reserved words in the C language with predefined meanings. Examples of keywords in c tokens include "int," "float," "if," "for," and "while." There are 32 keywords in C language -

Tokens in C - A Complete Guide | upGrad (1)

For example, if: The keyword "if" is used to define a conditional statement that executes a block of code if a certain condition is true.

if (x > 0) {
printf("x is positive");
}
  • Identifiers: These tokens are user-defined names representing variables, functions, or entities. An identifier must follow certain naming rules and should not be the same as any keyword.

Certain rules are commonly used to recognise identifiers -

1. The first character of an identifier should either be an underscore or an alphabet. It cannot start with a numerical digit.

2. Identifiers in C are case-sensitive, so letters with lowercase and uppercase are considered distinct.

3. The length of identifiers should not exceed 31 characters. However, it is implementation specific.

4. Commas and blank spaces are not allowed within an identifier.

5. Using C keywords as identifiers is not permissible since they have reserved meanings for specific purposes in the language.

Tokens in C - A Complete Guide | upGrad (2)

Examples of identifiers

  • Constants: Constants represent fixed values that cannot be altered during program execution. They can be numeric constants (e.g., 10, 3.14) or character/string constants (e.g., 'A', "Hello").

Types of Constants

Examples

Integer constant

20, 41, 94, etc.

Octal constant

011, 033, 077, etc.

Floating-point constant

13.9, 25.7, 87.4, etc.

Character constant

'p', 'q', 'r', etc.

String constant

"c++", ".net", "java", etc.

Hexadecimal constant

0x5x, 0x1A, 0x8z, etc.

  • String literals: These tokens represent sequences of characters enclosed within double quotes. They are commonly used to represent text or messages in a program.

In C, strings are represented as arrays of characters, terminated by a null character '\0'. The null character denotes the end of the string. String literals are always enclosed within double quotes (" ").

When describing a string in C, you can use different syntaxes. For example:

1. Using character array initialization:

char string[10] = {'s', 'c', 'a', 'l', 'e', 'r', '\0'};

Here, string[10] indicates that 10 bytes of memory space are allocated to hold the string value. Each string character is explicitly specified within single quotes, and the null character '\0' marks the end of the string.

2. Using string literal initialization:

char string[10] = "scaler";

The string is directly initialized with the literal "scaler" in this case. The compiler automatically appends the null character '\0' at the end of the string. Again, string[10] indicates that 10 bytes of memory space are allocated.

3. Using dynamic memory allocation:

char string[] = "scaler";

Here, the string is declared without specifying the size. The memory space is allocated dynamically based on the length of the string during program execution. The null character '\0' is automatically included at the end of the string.

  • Operators: Operators are symbols used to perform various operations on data. Tokens in c example include arithmetic operators (+, -, *, /), logical operators (&&, ||), and relational operators (==, >, <).

There are three types of operators -

  • Unary Operator: Operates on a single operand. Examples include ++ (increment), -- (decrement), ! (logical negation), and sizeof.
  • Binary Operator: Operates on two operands. Examples include arithmetic operators (+, -, *, /), relational operators (==, !=, >, <), logical operators (&&, ||), and assignment operator (=).
  • Ternary Operator: Takes three operands and allows for conditional decision-making. Syntax: condition ? expression1 : expression2. It provides a concise alternative to if-else statements.
  • Special symbols: These tokens encompass special characters like escape sequences (\n, \t) and the backslash () used for specific purposes, such as representing newlines or special characters within strings.
    • () Parentheses: Used for function calling and declaration.
    • [] Square brackets: Represent array subscripts.
    • , Comma: Used to separate statements, function parameters, and variables in a printf statement.
    • {} Curly braces: Used to define code blocks and enclose loops.
    • * Asterisk: Used to represent pointers and as a multiplication operator.
    • # Hash/Preprocessor: Used for preprocessor directives and including header files.
    • . Period: Used to access members of a structure or union.
    • ~ Tilde: Not commonly used in relation to pointers.

Classification of Tokens

In the context of token classification in programming languages like C, tokens can be categorised into primary and secondary tokens. Here's an elaboration on each:

Primary Tokens

These are the fundamental elements of a programming language. They are directly recognised by the lexer or tokenizer, the component responsible for breaking down the source code into tokens. Primary tokens include

  • Keywords
  • Identifiers
  • Constants
  • Operators.

Secondary Tokens

Secondary tokens are derived from primary tokens during the tokenization process. They are created by combining or modifying primary tokens to represent additional syntactic elements in a program. Secondary tokens include

  • Strings,
  • Special Characters,
  • Compound Operators.

Rules for Naming Identifiers

There are specific rules to be followed when naming identifiers -

  • Valid characters: An identifier can include letters (uppercase and lowercase), digits, and underscores.
  • First letter: The first character of an identifier should be a letter or an underscore.
  • Avoid keywords: You cannot use reserved keywords, such as int, while, etc., as identifiers.
  • Length limitation: There is no specific limit on the length of an identifier, but it's recommended to keep it within a reasonable length. Some compilers may have limitations if the identifier exceeds 31 characters.

As long as these rules are followed, any name can be chosen for an identifier; however, it is important to ensure that the chosen name is valid and makes sense.

Some examples of identifiers include -

  • age
  • studentName
  • _count
  • total_marks
  • PI
  • MAX_VALUE
  • numberOfStudents
  • myVariable
  • isValid
  • bookTitle

These examples demonstrate valid identifiers that follow the rules mentioned earlier. They consist of a combination of letters (both uppercase and lowercase), digits, and underscores. The first character is either a letter or an underscore, and they do not conflict with reserved keywords. Identifiers are essential for naming variables, functions, and other elements in a C program, providing meaningful names to represent data and logic.

Tokens and Expressions

In the C programming language, an expression is a combination of operands, operators, and function calls that are evaluated to produce a value. It represents a computation or a calculation that yields a result. Expressions can involve variables, constants, arithmetic operations, logical operations, function calls, etc.

An expression can be as simple as a single constant, variable, or complex, involving multiple operators and operands. Expressions can also be used as parts of larger expressions or as function arguments.

Examples of Expressions:

Arithmetic Expression:

int result = 2 + 3 * 4;

In this example, the expression 2 + 3 * 4 is an arithmetic expression that performs addition and multiplication. The result of this expression is stored in the variable ‘result’.

Relational Expression:

int x = 5, y = 7;
int isGreater= x > y;

Here, the expression x > y is a relational expression that compares the values of x and y. The result of this expression is either true (1) or false (0), depending on whether ‘x’ is greater than ‘y’. The result is stored in the variable ‘isGreater’.

Tokenization Process

Lexical Analysis:

Lexical analysis, also known as scanning, is the initial phase of the compiler where the source code is divided into individual tokens or lexemes. It analyses the characters of the source code to form these tokens, which are meaningful units such as keywords, identifiers, constants, operators, and punctuation marks.

Check out this C code example to better understand the tokenizing process -

#include <stdio.h>

intmain() {
intx =5;
printf("The value of x is %d\n", x);
return0;
}

During lexical analysis, the source code is divided into tokens:

  • Keywords: include, stdio.h, int, main, return
  • Identifiers: x
  • Punctuation marks: {, }, (, ), ;, =
  • Operators: =
  • Constants: 5
  • Strings: "The value of x is %d\n"

Syntax Analysis:

Syntax analysis, also known as parsing, is the second phase of the compiler. It checks whether the sequence of tokens formed during lexical analysis follows the syntax rules defined by the programming language. It builds a parse tree or syntax tree that represents the hierarchical structure of the program based on the language's grammar rules.

Example:

Continuing from the previous example, during syntax analysis, the compiler verifies if the tokens and their arrangement follow the syntax rules of the C language. It checks for the

  • Correct placement of keywords
  • Proper use of operators and punctuation marks
  • Adherence to language-specific grammar.

If the syntax analysis is successful, the program is considered syntactically correct. Otherwise, syntax errors are reported, indicating that the program violates the language's grammar rules.

Practice Problems on Tokens in C

1. Which of the following is not a valid C Token?

A. Identifier

B. Whitespace

C. Punctuation

D. Keyword

Answer: B. Whitespace

2. Which of these is not a valid identifier?

A. myVariable

B. 123cdd

C. _grade

D. variable_start

Answer: B. 123cdd

3. Find the number of Tokens in the following C statement.

printf("Hello, %s!", Bill);

A. 6

B. 8

C. 9

D. 11

Answer: A. 6

Conclusion

Tokens in C are the smallest elements that make up a program. Understanding and using tokens correctly is essential for writing error-free C programs. They enable compilers to process and analyse codes effectively. Knowledge of tokens empowers programmers to express logic, perform computations, manipulate data, and create efficient software solutions. A solid understanding of tokens is crucial for harnessing the power of the C programming language.

Learners are encouraged to enrol in upGrad’s Master of Science in Machine Learning and AI - Now with Generative AI lectures to better understand in-demand skills like NLP, Machine Learning and Reinforcement Learning by leveraging their programming expertise. With more than 12 industry projects, an immersive learning experience and an AI-powered curriculum, aspirants are just a click away to future-proof their careers!

FAQs

1. What are the six types of Tokens in C?

The six types of Tokens in C programming include Keywords, Identifiers, Operators, Constants, Strings and Special Characters.

2. What is the role of operators in C programming?

In C programming, operators play a key role in manipulating values and regulating the flow of a program, performing a wide range of operations by implementing Arithmetic, Logical and Relational operators.

3. Can an identifier start with a numerical digit in C?

No, in C, an identifier must start with either an underscore or an alphabet character. Starting with a numerical digit will return your identifier to be invalid according to the C programming’s language rules.

Tokens in C - A Complete Guide | upGrad (2024)

FAQs

Tokens in C - A Complete Guide | upGrad? ›

In C, tokens are the smallest meaningful elements used to create a program. They include keywords, identifiers, constants, string literals, operators, punctuation marks, and special symbols.

What are the 5 types of tokens in C? ›

C language has six types of tokens: keywords, identifiers, constants, operators, special symbols, and strings.

How many tokens are there in a C code? ›

C Tokens are of 6 types, and they are classified as: Identifiers, Keywords, Constants, Operators, Special Characters and Strings.

How many types of tokens are there? ›

A token is defined as the smallest individual unit present in the program. C language consists of five types of tokens. The C compiler parses the source code to generate tokens. The five types of tokens are: Keywords, Identifiers, Operators, Special symbols, and Constants.

How to find the number of tokens in C? ›

There are 6 tokens in C: Identifiers, Keywords, Operators, Strings, Special Characters, Constant. Is printf a token? In short YES. printf is a keyword and all the keywords are a token so printf is a token.

What are 32 keywords in C language? ›

There are a total of 32 keywords in the language of C:
autobreakconst
doubleelsefloat
intlongshort
structswitchunsigned
Jun 14, 2024

What is the difference between keyword and token? ›

A token is the smallest unit in programs. Keywords are predefined or reserved words that have their own importance. The main purpose of constant is to make the value fix. In C, identifiers are user defined words.

How do you identify tokens? ›

For a keyword to be identified as a valid token, the pattern is the sequence of characters that make the keyword. For identifier to be identified as a valid token, the pattern is the predefined rules that it must start with alphabet, followed by alphabet or a digit.

What are the different types of tokens in compiler design? ›

Lexical token and lexical tokenization
Token name (Lexical category)Explanation
keywordReserved words of the language.
separator/punctuatorPunctuation characters and paired delimiters.
operatorSymbols that operate on arguments and produce results.
literalNumeric, logical, textual, and reference literals.
3 more rows

What is the format of a token? ›

The standard for formatting a token is %(token#"expression"#), where token is the name of the token and expression is a . NET formatting expression. Example: Sally wants to format the token for the field "Code" to always have four digits. She changes the Code token so it reads %(Code#"D4"#).

How to separate tokens in C? ›

Explore strtok() Function in C. The strtok() function in C is a useful tool for splitting strings into tokens based on a delimiter. With its various parameters, strtok() can help you efficiently parse and manipulate strings in your C code. Learn more about using strtok() to work with strings in C.

How to read tokens in C? ›

C Token – Special Symbols

These indicate single and multidimensional subscripts. Parentheses(): These special symbols are used to indicate function calls and function parameters. Braces{}: These opening and ending curly braces mark the start and end of a block of code containing more than one executable statement.

What are special symbols in C tokens? ›

C Token – Special Symbols

Brackets [] : Used for array element references, indicating single or multidimensional subscripts. Parentheses () : Indicate function calls and function parameters. Braces {} : Mark the start and end of a code block containing multiple executable statements.

What are the 5 tokens? ›

There are five types of java tokens: keywords, identifiers, literals, operators and separators. The classification is based on their work type; some are used to define names, and others for arithmetic operations.

What are the different types of tokens in C sharp? ›

There are several kinds of tokens: identifiers, keywords, literals, operators, and punctuators.

What are the different types of data types in C? ›

There are four basic data types in C programming, namely Char, Int, Float, and Double. What do signed and unsigned signify in C programming? In the C programming language, the signed modifier represents both positive and negative values while the unsigned modifier means all positive values.

Top Articles
How to Get Rid of Dust, According to Cleaning Experts
5 Cleaning Tips to Get Rid of Dust & Allergens* | Swiffer
Tiny Tina Deadshot Build
Craigslist Warren Michigan Free Stuff
Hannaford Weekly Flyer Manchester Nh
Bucks County Job Requisitions
Holly Ranch Aussie Farm
King Fields Mortuary
Tugboat Information
Uc Santa Cruz Events
Revitalising marine ecosystems: D-Shape’s innovative 3D-printed reef restoration solution - StartmeupHK
Keniakoop
Ivegore Machete Mutolation
Truck Toppers For Sale Craigslist
7440 Dean Martin Dr Suite 204 Directions
Kaomoji Border
Nalley Tartar Sauce
Dc Gas Login
Missed Connections Dayton Ohio
9044906381
Xxn Abbreviation List 2023
The Grand Canyon main water line has broken dozens of times. Why is it getting a major fix only now?
Loft Stores Near Me
Uconn Health Outlook
Mychart Anmed Health Login
Georgetown 10 Day Weather
*Price Lowered! This weekend ONLY* 2006 VTX1300R, windshield & hard bags, low mi - motorcycles/scooters - by owner -...
Fsga Golf
Munis Self Service Brockton
Aliciabibs
Aspenx2 Newburyport
Kirk Franklin Mother Debra Jones Age
Dashboard Unt
Masterbuilt Gravity Fan Not Working
Dhs Clio Rd Flint Mi Phone Number
Ihs Hockey Systems
LG UN90 65" 4K Smart UHD TV - 65UN9000AUJ | LG CA
R3Vlimited Forum
Rock Salt Font Free by Sideshow » Font Squirrel
Salons Open Near Me Today
Bernie Platt, former Cherry Hill mayor and funeral home magnate, has died at 90
John F Slater Funeral Home Brentwood
Movies123.Pick
Scanning the Airwaves
The Closest Walmart From My Location
Lacy Soto Mechanic
All Characters in Omega Strikers
Citibank Branch Locations In North Carolina
Mathews Vertix Mod Chart
Accident On 40 East Today
Craigslist Sarasota Free Stuff
Primary Care in Nashville & Southern KY | Tristar Medical Group
Latest Posts
Article information

Author: Edwin Metz

Last Updated:

Views: 5743

Rating: 4.8 / 5 (78 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Edwin Metz

Birthday: 1997-04-16

Address: 51593 Leanne Light, Kuphalmouth, DE 50012-5183

Phone: +639107620957

Job: Corporate Banking Technician

Hobby: Reading, scrapbook, role-playing games, Fishing, Fishing, Scuba diving, Beekeeping

Introduction: My name is Edwin Metz, I am a fair, energetic, helpful, brave, outstanding, nice, helpful person who loves writing and wants to share my knowledge and understanding with you.