Understanding Compiler Design

Saksham
4 min readApr 29, 2021

--

Every one of us has used a compiler once in our programming journey or must have read about it. Ever thought about how a compiler works?

Today I will be discussing the design structure of a compiler.

A code basically has to pass through a different number of levels to get itself converted from high-level language to assembly language, and these are all hidden in a compiler.

Here is a basic diagram of a compiler :

As we can see a compiler consists of a total of 6 different levels, where the first four are grouped together and are called frontend, and the bottom two are grouped and called backend.

The frontend part checks for any kind of errors in the code like syntax, grammar, or lexical errors whereas the backend part simplifies and generates the program by grouping the code snippets.

The Internal Processes in brief.

When we write a code in any language and compiles it, the first thing a compiler does is to check for any kind of errors in the code for which it passes through the frontend or analytic part.

Lexical Analysis

When a high-level language code is passed through lexical analysis it generates a stream of tokens, but before that, we should talk about preprocessors, the task of a preprocessor is to filter out a stream of characters by removing the header files and then passing it to the lexical analysis. (A stream of characters is nothing but just the keyboard characters we type in the code: int a = 23; or printf(“ Hello World! “);)

but what is a token?

let’s take a small code

a = 10;

Here a, =, 10, ; are all tokens. So total number of tokens here is 5.

So basically a token is the important part of a code. Why I said important? Because comments and characters such as white space or newline or tabs are not considered as tokens. And then these tokens are passed in the symbol table. The role of the symbol table is to get the record of the properties of tokens or variables such as type, size, scope, etc. The symbol table is connected to all the stages in the compiler as we can see in the diagram for future use.

After converting all the code to tokens, it is sent to the syntax analysis department.

Syntax Analysis

When the code comes here, it is checked that if a code is syntactically correct or not. A parser is used here and the output that comes is called a parse tree as it comes in the form of a tree as shown:

If there is a syntax error then an error is produced at this stage.

Semantic Analysis

Here the analyzer checks the parse tree and finds semantic errors. Note that this part doesn’t care about the syntax but checks whether the value of the variable is as per the type declared or the limit of a particular data type is not exceeded or whether variables are declared properly or not

Intermediate Code Generator

This part is the middle man between frontend and backend, it generates a code that can be understood by the backend i.e. a machine-independent code (independent of OS). It creates a 3-address code which means that the code is divided into 3 variable statement which means that maximum of 3 addresses can be there in one statement if there is 4 or 5 then it is divided into 2 3 lines of code.

a = b + c * d;
// this is converted as
x = c * d;
a = b + x;

This 3-addressed code is sent to the next level i.e. optimization of code

Code Optimisation

From here the backend part begins and code optimization is the first stage of the backend. As the name suggests, this part optimizes the code. There is not much to say about it as you all know what is optimisation. + sign is much easier to process than * hence a code like:

x = 4 * y 
// can be converted to
x = y + y + y + y
// or
z = y + y;
x = z + z;

Code generation

Here the code generator takes an optimized version of the intermediate code and links it with the target machine language. It translates the intermediate code into a sequence of re-locatable machine code. The machine code generated works the same way the intermediate code would do.

Like the Symbol Table, there is an error handler that is linked to all the phases which is responsible for handling the errors detected by the frontend

And at last, the code is converted to the assembly code or machine code.

--

--

Saksham
Saksham

Written by Saksham

Front End Developer who loves to code, debug and help.

Responses (1)