Download Compilers - Somatic Analysis - Lecture Slides | CS 414 and more Papers Computer Science in PDF only on Docsity!
06-0: Syntax Errors/Semantic Errors
- A program has syntax errors if it cannot be generated from the Context Free Grammar which describes the language
- The following code has no syntax errors, though it has plenty of semantic errors:
void main() { if (3 + x - true) x.y.z[3] = foo(z); }
- Why don’t we write a CFG for the language, so that all syntactically correct programs also contain no semantic errors?
06-1: Syntax Errors/Semantic Errors
- Why don’t we write a CFG for the language, so that all syntactically correct programs also contain no semantic errors?
- In general, we can’t!
- In simpleJava, variables need to be declared before they are used
- The following CFG:
- L = {ww|w ∈ {a, b}} is not Context-Free – if we can’t generate this string from a CFG, we certainly can’t generate a simpleJava program where all variables are declared before they are used.
06-2: JavaCC & CFGs
- JavaCC allows actions – arbitrary Java code – in rules
- We could use JavaCC rules to do type checking
- Why don’t we?
06-3: JavaCC & CFGs
- JavaCC allows actions – arbitrary Java code – in rules
- We could use JavaCC rules to do type checking
- Why don’t we?
- JavaCC files become very long, hard to follow, hard to debug
- Not good software engineering – trying to do too many things at once
06-4: Semantic Errors/Syntax Errors
- Thus, we only build the Abstract Syntax Tree in JavaCC (not worrying about ensuring that variables are declared before they are used, or that types match, and so on)
- The next phase of compilation – Semantic Analysis – will traverse the Abstract Syntax Tree, and find any semantic errors – errors in the meaning (semantics) of the program
- Semantic errors are all compile-time errors other than syntax errors.
06-5: Semantic Errors
- Semantic Errors can be classified into the following broad categories:
- Definition Errors
- Most strongly typed languages require variables, functions, and types to be defined before they are used with some exceptions – - Implicit variable declarations in Fortran - Implicit function definitions in C
06-6: Semantic Errors
- Semantic Errors can be classified into the following broad categories:
- Structured Variable Errors
- x.y = A[3]
- x needs to be a class variable, which has an instance variable y
- A needs to be an array variable
- x.y[z].w = 4
- x needs to be a class variable, which has an instance variable y, which is an array of class variables that have an instance variable w
06-7: Semantic Errors
- Semantic Errors can be classified into the following broad categories:
- Function and Method Errors
- foo(3, true, 8)
- foo must be a function which takes 3 parameters:
- integer
- boolean
- integer
06-8: Semantic Errors
- Semantic Errors can be classified into the following broad categories:
- Type Errors
- Build-in functions – /, *, ||, &&, etc. – need to be called with the correct types
- In simpleJava, +, -, *, / all take integers
- In simpleJava, || &&,! take booleans
- Standard Java has polymorphic functions & type coercion
06-9: Semantic Errors
- Semantic Errors can be classified into the following broad categories:
- Names (and return types, and number and types of parameters) of functions
- As variables (functions, types, etc) are declared, they are added to the environment. When a variable (function, type, etc) is accessed, its definition in the environment is checked.
06-15: Environments & Name Spaces
- Types and variables have different name spaces in simpleJava, C, and standard Java:
simpleJava:
class foo { int foo; }
void main() { foo foo; foo = new foo(); foo.foo = 4; print(foo.foo); }
06-16: Environments & Name Spaces
- Types and variables have different name spaces in simpleJava, C, and standard Java:
C: #include <stdio.h>
typedef int foo; int main() { foo foo; foo = 4; printf("%d", foo); return 0; }
06-17: Environments & Name Spaces
- Types and variables have different name spaces in simpleJava, C, and standard Java:
Java:
class EnviornTest {
static void main(String args[]) {
Integer Integer = new Integer(4); System.out.print(Integer); } }
06-18: Environments & Name Spaces
- Variables and functions in C share the same name space, so the following C code is not legal:
int foo(int x) { return 2 * x; }
int main() { int foo; printf("%d\n",foo(3)); return 0; }
- The variable definition int foo; masks the function definition for foo
06-19: Environments & Name Spaces
- Both standard Java and simpleJava use different name spaces for functions and variables
- Defining a function and variable with the same name will not confuse Java or simpleJava in the same way it will confuse C - Programmer might still get confused ...
06-20: simpleJava Environments
- We will break simpleJava environment into 3 parts:
- type environment Class definitions, and built-in types int, boolean, and void.
- function environment Function definitions – number and types of input parameters and the return type
- variable environment Definitions of local variables, including the type for each variable.
06-21: Changing Environments
int foo(int x) { boolean y;
x = 2; y = false; /* Position A */ { int y; boolean z;
y = 3; z = true; /* Position B / } / Position C */ }
06-22: Implementing Environments
- Environments are implemented with Symbol Tables
long hash(char key, int tableSize) { long h = 0; long g; for (;key;key++) { h = (h << 4) + *key; g = h & OxF0000000; if (g) h ˆ= g >> 24 h &= g } return h % tableSize; }
06-27: Implementing Symbol Tables
- What about beginScope and endScope?
- The key/value pairs are distributed across several lists – how do we know which key/value pairs to remove on an endScope?
06-28: Implementing Symbol Tables
- What about beginScope and endScope?
- The key/value pairs are distributed across several lists – how do we know which key/value pairs to remove on an endScope? - If we knew exactly which variables were inserted since the last beginScope command, we could delete them from the hash table - If we always enter and remove key/value pairs from the beginning of the appropriate list, we will remove the correct items from the environment when duplicate keys occur. - How can we keep track of which keys have been added since the last beginScope?
06-29: Implementing Symbol Tables
- How can we keep track of which keys have been added since the last beginScope?
- Maintain an auxiliary stack
- When a key/value pair is added to the hash table, push the key on the top of the stack.
- When a “Begin Scope” command is issued, push a special begin scope symbol on the stack.
- When an “End scope” command is issued, pop keys off the stack, removing them from the hash table, until the begin scope symbol is popped
06-30: Type Checking
- Built-in types ints, floats, booleans, doubles, etc. simpleJava only has the built-in types int and boolean
- Structured types Collections of other types – arrays, records, classes, structs, etc. simpleJava has arrays and classes
- Pointer types int *, char *, etc. Neither Java nor simpleJava have explicit pointers – no pointer type. (Classes are represented internally as pointers, no explicit representation)
- Subranges & Enumerated Types C and Pascal have enumerated types (enum), Pascal has subrange types. Java has neither (at least currently – enumerated types may be added in the future)
06-31: Built-In Types
- No auxiliary information required for built-in types int and boolean (an int is and int is an int)
- All types will be represented by pointers to type objects
- We will only allocate one block of memory for all integer types, and one block of memory for all boolean types
06-32: Built-In Types
void main() { int x; int y; boolean a; boolean b;
x = y; x = a; /* Type Error */ }
06-33: Built-In Types
boolean
int
KeyStack
int boolean
Type Environment
void
void INTEGERTYPE BOOLEANTYPE VOIDTYPE
a
b
KeyStack x y
Variable Environment
a
y
b
x newscope
newscope
06-34: Class Types
- For built-in types, we did not need to store any extra information.
- For Class types, what extra information do we need to store?
06-35: Class Types
- For built-in types, we did not need to store any extra information.
- For Class types, what extra information do we need to store?
- The name and type of each instance variable
- How can we store a list of bindings of variables to types?
06-40: Array Types
- For arrays, what extra information do we need to store?
- The base type of the array
- For statically declared arrays, we might also want to store range of indices, to add range checking for arrays - Will add some run time inefficiency – need to add code to dynamically check each array access to ensure that it is within the correct bounds - Large number of attacks are based on buffer overflows
06-41: Array Types
- Much like built-in types, we want only one instance of the internal representation for int[], one representation for int[][], and so on - So we can do a simple pointer comparison to determine if types are equal - Otherwise, we would need to parse an entire type structure whenever a type comparison needed to be done (and type comparisons need to be done frequently in semantic analysis!)
06-42: Array Types
void main () { int w; int x[]; int y[]; int z[][];
/* Body of main program */
}
06-43: Class Types
boolean
int
KeyStack
int boolean
Type Environment
void
void
INTEGERTYPE BOOLEANTYPE VOIDTYPE
y
z
KeyStack w y
Variable Environment
x
x
z
w newscope
newscope
int[]
ARRAY TYPE
int[][]
ARRAY TYPE
int[]
int[][]
06-44: Semantic Analysis Overview
- A Semantic Analyzer traverses the Abstract Syntax Tree, and checks for semantic errors
- When declarations are encountered, proper values are added to the correct environment
06-45: Semantic Analysis Overview
- A Semantic Analyzer traverses the Abstract Syntax Tree, and checks for semantic errors
- When a statement is encountered (such as x = 3 ), the statement is checked for errors using the current environment - Is the variable x declared in the current scope? - Is it x of type int?
06-46: Semantic Analysis Overview
- A Semantic Analyzer traverses the Abstract Syntax Tree, and checks for semantic errors
- When a statement is encountered (such as if (x > 3) x++;), the statement is checked for errors using the current environment - Is the expression x > 3 a valid expression (this will require a recursive analysis of the expression x > 3 ) - Is the expression x > 3 of type boolean? - Is the statement x++ valid (this will require a recursive analysis of the statement x++;
06-47: Semantic Analysis Overview
- A Semantic Analyzer traverses the Abstract Syntax Tree, and checks for semantic errors
- When a function definition is encountered:
- Begin a new scope
- Add the parameters of the functions to the variable environment
- Recursively check the body of the function
- End the current scope (removing definitions of local variables and parameters from the current envi- ronment)
06-48: Variable Declarations
- int x;
- Look up the type int in the type environment.
- (if it does not exists, report an error)
- Add the variable x to the current variable environment, with the type returned from the lookup of int
06-49: Variable Declarations
- foo x;
- Look up the type foo in the type environment.
- (if it does not exists, report an error)
- Add the variable x to the current variable environment, with the type returned from the lookup of foo
06-50: Array Declarations
06-55: Multidimensional Arrays
void main() { int A[][][]; int B[]; int C[][];
/* body of main */ }
- For B[]:
- int[] is already in the type environment.
- add B to variable environment, with the type found for int[]
06-56: Multidimensional Arrays
void main() { int A[][][]; int B[]; int C[][];
/* body of main */ }
- For C[][]:
- int[][] is already in the type environment
- add C to variable environment with type found for int[][]
06-57: Multidimensional Arrays
- For the declaration int A[][][], why add types int[], int[][], and int[][][] to the type environment?
- Why not just create a type int[][][], and add A to the variable environment with this type?
- In short, why make sure that all instances of the type int[] point to the same instance? (examples)
06-58: Multidimensional Arrays
void Sort(int Data[]);
void main() { int A[]; int B[]; int C[][];
/* Code to allocate space for A,B & C, and set initial values */
Sort(A); Sort(B); Sort(C[2]); }
06-59: Function Prototypes
- int foo(int a, boolean b);
- Add a description of this function to the function environment
06-60: Function Prototypes
- int foo(int a, boolean b);
- Add a description of this function to the function environment
- Type of each parameter
- Return type of the function
06-61: Function Prototypes
int foo(int a, boolean b);
boolean
int
KeyStack
int boolean
Type Environment
void
void INTEGERTYPE BOOLEANTYPE VOIDTYPE
foo
KeyStack foo
Function Environment
newscope
newscope
FUNCTION TYPE
Return Type Parameters
06-62: Function Prototypes
- int PrintBoard(int board[][]);
- Analyze types of input parameter
- Add int[] and int[][] to the type environment, if not already there.
06-63: Class Definitions
class MyClass { int integerval; int Array[]; boolean boolval; }
06-64: Class Definitions
- Analyze formal parameters & return type. Check against prototype (if there is one), or add function entry to function environment (if no prototype)
- Begin a new scope in the variable environment
- Add formal parameters to the variable environment
- Analyze the body of the function, using modified variable environment
- End current scope in variable environment
06-69: Expressions
- To analyze an expression:
- Make sure the expression is well formed (no semantic errors)
- Return the type of the expression (to be used by the calling function)
06-70: Expressions
- Simple Expressions
- 3 (integer literal)
- This is a well formed expression, with the type int
- true (boolean literal)
- This is a well formed expression, with the type int
06-71: Expressions
- Operator Expressions
- 3 + 4
- Recursively find types of left and right operand
- Make sure the operands have integer types
- Return integer type
- x ¿ 3
- Recursively find types of left and right operand
- Make sure the operands have integer types
- Return boolean type
06-72: Expressions
- Operator Expressions
- (x ¿ 3) —— z
- Recursively find types of left and right operand
- Make sure the operands have boolean types
- Return boolean type
06-73: Expressions – Variables
- Simple (Base) Variables – x
- Look up x in the variable environment
- If the variable was in the variable environment, return the associated type.
- If the variable was not in the variable environment, display an error.
- Need to return something if variable is not defined – return type integer for lack of something better
06-74: Expressions – Variables
- Array Variables – A[3]
- Analyze the index, ensuring that it is of type int
- Analyze the base variable. Ensure that the base variable is an Array Type
- Return the type of an element of the array, extracted from the base type of the array
- int A[];
/* initialize A, etc. */ x = A[3];
06-75: Expressions – Variables
- Array Variables
- Analyze the index, ensuring that it is of type int
- Analyze the base variable. Ensure that the base variable is an Array Type
- Return the type of an element of the array, extracted from the base type of the array
- int B[][];
/* initialize B, etc. */ x = B[3][4];
06-76: Expressions – Variables
- Array Variables
- Analyze the index, ensuring that it is of type int
- Analyze the base variable. Ensure that the base variable is an Array Type
- Return the type of an element of the array, extracted from the base type of the array
- int B[][]; int A[];
/* initialize A, B, etc. */ x = B[A[4]][A[3]];
06-77: Expressions – Variables
- Array Variables
- Analyze the index, ensuring that it is of type int
- Analyze the base variable. Ensure that the base variable is an Array Type
- Analyze the “if” statement
- Analyze the “else” statement (if there is one)
06-82: Statements
- Assignment statements
- Analyze the left-hand side of the assignment statement
- Analyze the right-hand side of the assignment statement
- Make sure the types are the same
- Can do this with a simple pointer comparison!
06-83: Statements
- Block statements
- Begin new scope in variable environment
- Recursively analyze all children
- End current scope in variable environment
06-84: Statements
- Variable Declaration Statements
- Look up type of variable
- May involve adding types to type environment for arrays
- Add variable to variable environment
- If there is an initialization expression, make sure the type of the expression matches the type of the variable.
06-85: Types in Java
- Each type will be represented by a class
- All types will be subclasses of the “type” class:
class Type { }
06-86: Built-in Types
- Only one internal representation of each built-in type
- All references to INTEGER type will be a pointer to the same block of memory
- How can we achieve this in Java?
- Singleton software design pattern
06-87: Singletons in Java
- Use a singleton when you want only one instantiation of a class
- Every call to “new” creates a new instance
- – prohibit calls to “new”!
- Make the constructor private
- Obtain instances through a static method
06-88: Singletons in Java
public class IntegerType extends Type {
private IntegerType() { }
public static IntegerType instance() { if (instance_ == null) { instance_ = new IntegerType(); } return instance_; } static private IntegerType instance_; }
06-89: Singletons in Java
Type t1; Type t2; Type t3;
t1 = IntegerType.instance(); t2 = IntegerType.instance(); t3 = IntegerType.instance();
- t1, t2, and t3 all point to the same instance
06-90: Structured Types in Java
- Built-in types (integer, boolean, void) do not need any extra information)
- An integer is an integer is an integer
- Structured types (Arrays, classes) need more information
- An array of what
- What fields does the class have
06-91: Array Types in Java
- Internal representation of array type needs to store the element type of the array class ArrayType extends Type { public ArrayType(Type type) { type_ = type; } public Type type() { return type_; } public void settype(Type type) { type_ = type; } private Type type_; }