Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CS333 Lecture Notes: Syntax and Compilation Stages, Exams of Compilers

An in-depth look into the various stages of compilation in cs333, focusing on lexical analysis. It explains the process of tokenization, the role of regular expressions, and the significance of lexical analysis in compiler design. Essential for students studying compiler theory and computer science.

Typology: Exams

2021/2022

Uploaded on 09/27/2022

laurinda
laurinda 🇬🇧

4.8

(8)

220 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS333 Lecture Notes
Syntax
Fall 2019
Stages of Compilation
Lexical analysis
- takes source file as input
-generate a sequence of valid tokens
- character sequences that do not form valid tokens are discard, after generating an
error message
Syntactic analysis
- takes a sequence of tokens as input
- parses the token sequence, constructs a parse tree/abstract syntax tree according to
the grammar
- check syntax errors and ill-formed expressions
Semantic analysis
- takes parse tree/abstract syntax tree as input
-generate intermediate code (more explicit, detailed parse tree where operators will
generally be specific to the data type they are processing)
- catch semantic errors like undefined variables, variable type conflicts, and implicit
conversions
Code optimization
- take the intermediate code as input
- identify optimizations that speed up code execution without changing the program
functionality
Code generator
- converts the intermediate code into machine code
- machine code is tailored to a specific machine, while intermediate code is general
across platforms
pf3

Partial preview of the text

Download CS333 Lecture Notes: Syntax and Compilation Stages and more Exams Compilers in PDF only on Docsity!

Stages of Compilation

  • Lexical analysis
    • takes source file as input
    • generate a sequence of valid tokens
    • character sequences that do not form valid tokens are discard, after generating an error message
  • Syntactic analysis
    • takes a sequence of tokens as input
    • parses the token sequence, constructs a parse tree/abstract syntax tree according to the grammar
    • check syntax errors and ill-formed expressions
  • Semantic analysis
    • takes parse tree/abstract syntax tree as input
    • generate intermediate code (more explicit, detailed parse tree where operators will generally be specific to the data type they are processing)
    • catch semantic errors like undefined variables, variable type conflicts, and implicit conversions
  • Code optimization
    • take the intermediate code as input
    • identify optimizations that speed up code execution without changing the program functionality
  • Code generator
    • converts the intermediate code into machine code
    • machine code is tailored to a specific machine, while intermediate code is general across platforms

Lexical Analysis

  • Take a source file as the input, generate a sequence of valid tokens. Discard invalid characters after generating an error message.
  • Token
    • Identifiers ‣ variable names, function names, labels
    • Literals ‣ numbers (e.g. Integers and Floats), characters, true and false
    • Keywords ‣ bool char else false float if int main true while
    • Operators ‣ for example, + - / * && || ==
    • Punctuation ‣ for example, ;. {} ()
  • Tokenization, or lexical analysis, is simply conversion from a string of characters or whatever input format is being used to a sequential string of symbols.
  • Do not do syntax checking, but can identify improperly define identifiers.
    • In another word, it handles at least part of all of the rules that have a terminal on the right side.
    • In the case of something like an if statement, it converts the string if into a symbol that represents the keyword.
  • It is not a trivial part of compiler.
    • Takes a significant percentage of time in compilation. Up to 75% of the time for a non-optimizing compiler.
    • Most compilers separate tokenization, or lexical analysis, from syntactic analysis and program generation.
  • Because tokenization is such a common process, there are some nice tools for generating lexical analyzers automatically based on a description of the token grammar. - Examples include lex and flex, both are freely available. - These tools permit you to write the lexical syntax components of a language as a set of rules, generally based on regular expressions.

Regular Expressions

  • Regular expressions are a language on their own designed to compactly represent a set of strings as a single expression.
  • Special characters in regular expressions
    • [] : used to specify a set of alternatives ‣ [AEIOU]: one uppercase vowel ‣ T[ao]p: tap, top
    • * : used as an escape character to permit use of other special character ‣ \d: one digit from 0 to 9. E.g. CS\s\d\d\d matches CS 333, CS 232, … ‣ \d: one digit ‣ \s: whitespace -.* : matches almost any character except line breaks ‣ a.e: water, ate, gate
    • ***** : match the prior expression zero or more times ‣ \d*.\d: .3, 12.5, 139.