We've all had that moment of accomplishment when we had run our first ever program without any errors. Let me take you back to when I typed my first ever C program. I felt like a superhero with my code as my outfit and all of a sudden I get this thought "How does my computer understand what I'm typing? How does it understand the language that I speak?" With all this battle of thoughts in my mind I went on with trusting my imagination of an old woman dressed all-black with a crystal ball doing spooky stuff, giving me the outputs. Well, that's what I thought when I typed my first ever C program until I came to know about compilers, machine language and its purpose.
• What does a Compiler do?
So, a compiler is an intermediate in-between the user who is typing the code and the machine acquiring the code. The code typed by the User is called the "Source Code" and the code acquired by the machine is called the "Executable Code". Let's slide in some real time analogies.. Let's say your friend Rakesh cracked a joke that leaves you confused and you don't get it. Here comes your good old pal Joe to save you from being embarrassed, he explains it and now you get it, so you can laugh along. All this happens so quick.
Now with respect to the above analogy, You are the machine, "Rakesh" is the User typing the Source Code and "Jimmy" who saved you by making you understand the joke is the Compiler.
Computer processors can do only a small number of things like reading and writing memory and performing arithmetic operations written in binary. A program written in binary is a machine language. You can take a look at "M achine learning and how it helps in stock prediction" by our writer Samay in the blog section.
How do Compilers read code?
Let's take a piece of simple code for example.
The source code is understandable to us but the machine takes it as a series of meaningless characters arranged and assigned to a memory space as texts.
So, the compiler goes on to convert the source code to an executable code by these prominent steps.
• Lexical Analysis
> This step scans and divides the text into individual tokens. Tokens represent each character as a unit. In short, the compiler is figuring out what you've typed. Like..
| i | n | t | . | m | a | i | n | ( | ) | . | { | \n | \t | i | n | t | . | x | ; | \n | \t | x | . | = | . | 3 | ; | \n | }| \n |
• Syntactic Analysis
> This step creates a hierarchical structure of the code by something called a "parse tree". The compiler is breaking down the grammar of the source code at this point.
• Semantic Analysis
>This step records the various contexts that we might need throughout the program, like variables, constants, etc. Finally the corresponding machine code for the source code is generated which includes additional steps like the intermediate code, optimization, assembly code, object code and linking. Let's discuss this further more.
Alright, now let's take the binary code of our simple sample code that we have used above. When it's converted to binary it looks like this..
Pretty messy right? Let's convert it into hexadecimal code, which might still be the same. So, let's convert it to assembly code which is easy to understand.
The code can be understood now. So, let's focus our attention towards the third line and ignore the rest (let's just say that the rest are responsible for the starting and ending functions). The third line "MOVL $3 -4(%RBP)" corresponds to "x =3" of our source code which is nothing but the allocation of the number 3 to the variable x in a memory location. Let's perform an arithmetic operation but incrementing x to 4. Which would change our source code and assembly code to something like this..
The highlighted line "ADDL $1 -4C(%RBP)" indicates that the variable x is incremented by 1 and is allocated the same memory space as before. The compiler analysed the code by separating the characters of the source code(lexical analysis) as tokens, parsing the code (syntactical analysis) and contextualized the code (semantic analysis) to complete the operation.
We have learnt the purpose of a compiler and it's various processes in programming. Now an interesting question arises.. who programmed the compiler? And who programmed the compiler's compiler? Well, if we followed the chain backwards then we will reach the origin of development tools where programs were written directly in machine code. Pretty tuff right? So, the next time we compile a program let's remember the efforts taken for us to have syntax highlighting, object oriented programming, libraries, debuggers etc to make our job quite simple yet effective. The innovation never ends.