Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

CS333 Lecture Notes: Syntax and Compilation Stages, Exams of Compilers

British Computer Society (BCS)Compilers

An in-depth look into the various stages of compilation in cs333, focusing on lexical analysis. It explains the process of tokenization, the role of regular expressions, and the significance of lexical analysis in compiler design. Essential for students studying compiler theory and computer science.

Typology: Exams

2021/2022

Uploaded on 09/27/2022

laurinda 🇬🇧

4.8

(8)

220 documents

1 / 3

This page cannot be seen from the preview

Don't miss anything!

CS333 Lecture Notes

Syntax

Fall 2019

Stages of Compilation

•Lexical analysis

- takes source file as input

-generate a sequence of valid tokens

- character sequences that do not form valid tokens are discard, after generating an

error message

•Syntactic analysis

- takes a sequence of tokens as input

- parses the token sequence, constructs a parse tree/abstract syntax tree according to

the grammar

- check syntax errors and ill-formed expressions

•Semantic analysis

- takes parse tree/abstract syntax tree as input

-generate intermediate code (more explicit, detailed parse tree where operators will

generally be specific to the data type they are processing)

- catch semantic errors like undefined variables, variable type conflicts, and implicit

conversions

•Code optimization

- take the intermediate code as input

- identify optimizations that speed up code execution without changing the program

functionality

•Code generator

- converts the intermediate code into machine code

- machine code is tailored to a specific machine, while intermediate code is general

across platforms

Partial preview of the text

Download CS333 Lecture Notes: Syntax and Compilation Stages and more Exams Compilers in PDF only on Docsity!

Stages of Compilation

Lexical analysis
- takes source file as input
- generate a sequence of valid tokens
- character sequences that do not form valid tokens are discard, after generating an error message
Syntactic analysis
- takes a sequence of tokens as input
- parses the token sequence, constructs a parse tree/abstract syntax tree according to the grammar
- check syntax errors and ill-formed expressions
Semantic analysis
- takes parse tree/abstract syntax tree as input
- generate intermediate code (more explicit, detailed parse tree where operators will generally be specific to the data type they are processing)
- catch semantic errors like undefined variables, variable type conflicts, and implicit conversions
Code optimization
- take the intermediate code as input
- identify optimizations that speed up code execution without changing the program functionality
Code generator
- converts the intermediate code into machine code
- machine code is tailored to a specific machine, while intermediate code is general across platforms

Lexical Analysis

Take a source file as the input, generate a sequence of valid tokens. Discard invalid characters after generating an error message.
Token
- Identifiers ‣ variable names, function names, labels
- Literals ‣ numbers (e.g. Integers and Floats), characters, true and false
- Keywords ‣ bool char else false float if int main true while
- Operators ‣ for example, + - / * && || ==
- Punctuation ‣ for example, ;. {} ()
Tokenization, or lexical analysis, is simply conversion from a string of characters or whatever input format is being used to a sequential string of symbols.
Do not do syntax checking, but can identify improperly define identifiers.
- In another word, it handles at least part of all of the rules that have a terminal on the right side.
- In the case of something like an if statement, it converts the string if into a symbol that represents the keyword.
It is not a trivial part of compiler.
- Takes a significant percentage of time in compilation. Up to 75% of the time for a non-optimizing compiler.
- Most compilers separate tokenization, or lexical analysis, from syntactic analysis and program generation.
Because tokenization is such a common process, there are some nice tools for generating lexical analyzers automatically based on a description of the token grammar. - Examples include lex and flex, both are freely available. - These tools permit you to write the lexical syntax components of a language as a set of rules, generally based on regular expressions.

Regular Expressions

Regular expressions are a language on their own designed to compactly represent a set of strings as a single expression.
Special characters in regular expressions
- [] : used to specify a set of alternatives ‣ [AEIOU]: one uppercase vowel ‣ T[ao]p: tap, top
- * : used as an escape character to permit use of other special character ‣ \d: one digit from 0 to 9. E.g. CS\s\d\d\d matches CS 333, CS 232, … ‣ \d: one digit ‣ \s: whitespace -.* : matches almost any character except line breaks ‣ a.e: water, ate, gate
- ***** : match the prior expression zero or more times ‣ \d*.\d: .3, 12.5, 139.

CS333 Lecture Notes: Syntax and Compilation Stages, Exams of Compilers

Related documents

Partial preview of the text

Download CS333 Lecture Notes: Syntax and Compilation Stages and more Exams Compilers in PDF only on Docsity!

Stages of Compilation

Lexical Analysis

Regular Expressions