Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Compilers - Somatic Analysis - Lecture Slides | CS 414, Papers of Computer Science

Material Type: Paper; Class: Compilers; Subject: Computer Science; University: University of San Francisco (CA); Term: Summer II 2008;

Typology: Papers

Pre 2010

Uploaded on 07/30/2009

koofers-user-ajr
koofers-user-ajr 🇺🇸

10 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS414-2008S-06 Semantic Analysis 1
06-0: Syntax Errors/Semantic Errors
A program has syntax errors if it cannot be generated from the Context Free Grammar which describes the
language
The following code has no syntax errors, though it has plenty of semantic errors:
void main() {
if (3 + x - true)
x.y.z[3] = foo(z);
}
Why don’t we write a CFG for the language, so that all syntactically correct programs also contain no semantic
errors?
06-1: Syntax Errors/Semantic Errors
Why don’t we write a CFG for the language, so that all syntactically correct programs also contain no semantic
errors?
In general, we can’t!
In simpleJava, variables need to be declared before they are used
The following CFG:
L={ww|w {a, b}}
is not Context-Free if we can’t generate this string from a CFG, we certainly can’t generate a simpleJava
program where all variables are declared before they are used.
06-2: JavaCC & CFGs
JavaCC allows actions arbitrary Java code in rules
We could use JavaCC rules to do type checking
Why don’t we?
06-3: JavaCC & CFGs
JavaCC allows actions arbitrary Java code in rules
We could use JavaCC rules to do type checking
Why don’t we?
JavaCC files become very long, hard to follow, hard to debug
Not good software engineering trying to do too many things at once
06-4: Semantic Errors/Syntax Errors
Thus, we only build the Abstract Syntax Tree in JavaCC (not worrying about ensuring that variables are declared
before they are used, or that types match, and so on)
The next phase of compilation Semantic Analysis will traverse the Abstract Syntax Tree, and find any
semantic errors errors in the meaning (semantics) of the program
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Compilers - Somatic Analysis - Lecture Slides | CS 414 and more Papers Computer Science in PDF only on Docsity!

06-0: Syntax Errors/Semantic Errors

  • A program has syntax errors if it cannot be generated from the Context Free Grammar which describes the language
  • The following code has no syntax errors, though it has plenty of semantic errors:

void main() { if (3 + x - true) x.y.z[3] = foo(z); }

  • Why don’t we write a CFG for the language, so that all syntactically correct programs also contain no semantic errors?

06-1: Syntax Errors/Semantic Errors

  • Why don’t we write a CFG for the language, so that all syntactically correct programs also contain no semantic errors?
  • In general, we can’t!
    • In simpleJava, variables need to be declared before they are used
    • The following CFG:
      • L = {ww|w ∈ {a, b}} is not Context-Free – if we can’t generate this string from a CFG, we certainly can’t generate a simpleJava program where all variables are declared before they are used.

06-2: JavaCC & CFGs

  • JavaCC allows actions – arbitrary Java code – in rules
  • We could use JavaCC rules to do type checking
  • Why don’t we?

06-3: JavaCC & CFGs

  • JavaCC allows actions – arbitrary Java code – in rules
  • We could use JavaCC rules to do type checking
  • Why don’t we?
    • JavaCC files become very long, hard to follow, hard to debug
    • Not good software engineering – trying to do too many things at once

06-4: Semantic Errors/Syntax Errors

  • Thus, we only build the Abstract Syntax Tree in JavaCC (not worrying about ensuring that variables are declared before they are used, or that types match, and so on)
  • The next phase of compilation – Semantic Analysis – will traverse the Abstract Syntax Tree, and find any semantic errors – errors in the meaning (semantics) of the program
  • Semantic errors are all compile-time errors other than syntax errors.

06-5: Semantic Errors

  • Semantic Errors can be classified into the following broad categories:
  • Definition Errors
  • Most strongly typed languages require variables, functions, and types to be defined before they are used with some exceptions – - Implicit variable declarations in Fortran - Implicit function definitions in C

06-6: Semantic Errors

  • Semantic Errors can be classified into the following broad categories:
  • Structured Variable Errors
    • x.y = A[3]
      • x needs to be a class variable, which has an instance variable y
      • A needs to be an array variable
    • x.y[z].w = 4
    • x needs to be a class variable, which has an instance variable y, which is an array of class variables that have an instance variable w

06-7: Semantic Errors

  • Semantic Errors can be classified into the following broad categories:
    • Function and Method Errors
      • foo(3, true, 8)
        • foo must be a function which takes 3 parameters:
        • integer
        • boolean
        • integer

06-8: Semantic Errors

  • Semantic Errors can be classified into the following broad categories:
    • Type Errors
    • Build-in functions – /, *, ||, &&, etc. – need to be called with the correct types
      • In simpleJava, +, -, *, / all take integers
      • In simpleJava, || &&,! take booleans
      • Standard Java has polymorphic functions & type coercion

06-9: Semantic Errors

  • Semantic Errors can be classified into the following broad categories:
  • Names (and return types, and number and types of parameters) of functions
  • As variables (functions, types, etc) are declared, they are added to the environment. When a variable (function, type, etc) is accessed, its definition in the environment is checked.

06-15: Environments & Name Spaces

  • Types and variables have different name spaces in simpleJava, C, and standard Java:

simpleJava:

class foo { int foo; }

void main() { foo foo; foo = new foo(); foo.foo = 4; print(foo.foo); }

06-16: Environments & Name Spaces

  • Types and variables have different name spaces in simpleJava, C, and standard Java:

C: #include <stdio.h>

typedef int foo; int main() { foo foo; foo = 4; printf("%d", foo); return 0; }

06-17: Environments & Name Spaces

  • Types and variables have different name spaces in simpleJava, C, and standard Java:

Java:

class EnviornTest {

static void main(String args[]) {

Integer Integer = new Integer(4); System.out.print(Integer); } }

06-18: Environments & Name Spaces

  • Variables and functions in C share the same name space, so the following C code is not legal:

int foo(int x) { return 2 * x; }

int main() { int foo; printf("%d\n",foo(3)); return 0; }

  • The variable definition int foo; masks the function definition for foo

06-19: Environments & Name Spaces

  • Both standard Java and simpleJava use different name spaces for functions and variables
  • Defining a function and variable with the same name will not confuse Java or simpleJava in the same way it will confuse C - Programmer might still get confused ...

06-20: simpleJava Environments

  • We will break simpleJava environment into 3 parts:
    • type environment Class definitions, and built-in types int, boolean, and void.
    • function environment Function definitions – number and types of input parameters and the return type
    • variable environment Definitions of local variables, including the type for each variable.

06-21: Changing Environments

int foo(int x) { boolean y;

x = 2; y = false; /* Position A */ { int y; boolean z;

y = 3; z = true; /* Position B / } / Position C */ }

06-22: Implementing Environments

  • Environments are implemented with Symbol Tables

long hash(char key, int tableSize) { long h = 0; long g; for (;key;key++) { h = (h << 4) + *key; g = h & OxF0000000; if (g) h ˆ= g >> 24 h &= g } return h % tableSize; }

06-27: Implementing Symbol Tables

  • What about beginScope and endScope?
  • The key/value pairs are distributed across several lists – how do we know which key/value pairs to remove on an endScope?

06-28: Implementing Symbol Tables

  • What about beginScope and endScope?
  • The key/value pairs are distributed across several lists – how do we know which key/value pairs to remove on an endScope? - If we knew exactly which variables were inserted since the last beginScope command, we could delete them from the hash table - If we always enter and remove key/value pairs from the beginning of the appropriate list, we will remove the correct items from the environment when duplicate keys occur. - How can we keep track of which keys have been added since the last beginScope?

06-29: Implementing Symbol Tables

  • How can we keep track of which keys have been added since the last beginScope?
  • Maintain an auxiliary stack
    • When a key/value pair is added to the hash table, push the key on the top of the stack.
    • When a “Begin Scope” command is issued, push a special begin scope symbol on the stack.
    • When an “End scope” command is issued, pop keys off the stack, removing them from the hash table, until the begin scope symbol is popped

06-30: Type Checking

  • Built-in types ints, floats, booleans, doubles, etc. simpleJava only has the built-in types int and boolean
  • Structured types Collections of other types – arrays, records, classes, structs, etc. simpleJava has arrays and classes
  • Pointer types int *, char *, etc. Neither Java nor simpleJava have explicit pointers – no pointer type. (Classes are represented internally as pointers, no explicit representation)
  • Subranges & Enumerated Types C and Pascal have enumerated types (enum), Pascal has subrange types. Java has neither (at least currently – enumerated types may be added in the future)

06-31: Built-In Types

  • No auxiliary information required for built-in types int and boolean (an int is and int is an int)
  • All types will be represented by pointers to type objects
  • We will only allocate one block of memory for all integer types, and one block of memory for all boolean types

06-32: Built-In Types

void main() { int x; int y; boolean a; boolean b;

x = y; x = a; /* Type Error */ }

06-33: Built-In Types

boolean

int

KeyStack

int boolean

Type Environment

void

void INTEGERTYPE BOOLEANTYPE VOIDTYPE

a

b

KeyStack x y

Variable Environment

a

y

b

x newscope

newscope

06-34: Class Types

  • For built-in types, we did not need to store any extra information.
  • For Class types, what extra information do we need to store?

06-35: Class Types

  • For built-in types, we did not need to store any extra information.
  • For Class types, what extra information do we need to store?
    • The name and type of each instance variable
  • How can we store a list of bindings of variables to types?

06-40: Array Types

  • For arrays, what extra information do we need to store?
    • The base type of the array
    • For statically declared arrays, we might also want to store range of indices, to add range checking for arrays - Will add some run time inefficiency – need to add code to dynamically check each array access to ensure that it is within the correct bounds - Large number of attacks are based on buffer overflows

06-41: Array Types

  • Much like built-in types, we want only one instance of the internal representation for int[], one representation for int[][], and so on - So we can do a simple pointer comparison to determine if types are equal - Otherwise, we would need to parse an entire type structure whenever a type comparison needed to be done (and type comparisons need to be done frequently in semantic analysis!)

06-42: Array Types

void main () { int w; int x[]; int y[]; int z[][];

/* Body of main program */

}

06-43: Class Types

boolean

int

KeyStack

int boolean

Type Environment

void

void

INTEGERTYPE BOOLEANTYPE VOIDTYPE

y

z

KeyStack w y

Variable Environment

x

x

z

w newscope

newscope

int[]

ARRAY TYPE

int[][]

ARRAY TYPE

int[]

int[][]

06-44: Semantic Analysis Overview

  • A Semantic Analyzer traverses the Abstract Syntax Tree, and checks for semantic errors
    • When declarations are encountered, proper values are added to the correct environment

06-45: Semantic Analysis Overview

  • A Semantic Analyzer traverses the Abstract Syntax Tree, and checks for semantic errors
    • When a statement is encountered (such as x = 3 ), the statement is checked for errors using the current environment - Is the variable x declared in the current scope? - Is it x of type int?

06-46: Semantic Analysis Overview

  • A Semantic Analyzer traverses the Abstract Syntax Tree, and checks for semantic errors
    • When a statement is encountered (such as if (x > 3) x++;), the statement is checked for errors using the current environment - Is the expression x > 3 a valid expression (this will require a recursive analysis of the expression x > 3 ) - Is the expression x > 3 of type boolean? - Is the statement x++ valid (this will require a recursive analysis of the statement x++;

06-47: Semantic Analysis Overview

  • A Semantic Analyzer traverses the Abstract Syntax Tree, and checks for semantic errors
    • When a function definition is encountered:
      • Begin a new scope
      • Add the parameters of the functions to the variable environment
      • Recursively check the body of the function
      • End the current scope (removing definitions of local variables and parameters from the current envi- ronment)

06-48: Variable Declarations

  • int x;
    • Look up the type int in the type environment.
      • (if it does not exists, report an error)
    • Add the variable x to the current variable environment, with the type returned from the lookup of int

06-49: Variable Declarations

  • foo x;
    • Look up the type foo in the type environment.
      • (if it does not exists, report an error)
    • Add the variable x to the current variable environment, with the type returned from the lookup of foo

06-50: Array Declarations

06-55: Multidimensional Arrays

void main() { int A[][][]; int B[]; int C[][];

/* body of main */ }

  • For B[]:
    • int[] is already in the type environment.
    • add B to variable environment, with the type found for int[]

06-56: Multidimensional Arrays

void main() { int A[][][]; int B[]; int C[][];

/* body of main */ }

  • For C[][]:
    • int[][] is already in the type environment
    • add C to variable environment with type found for int[][]

06-57: Multidimensional Arrays

  • For the declaration int A[][][], why add types int[], int[][], and int[][][] to the type environment?
  • Why not just create a type int[][][], and add A to the variable environment with this type?
  • In short, why make sure that all instances of the type int[] point to the same instance? (examples)

06-58: Multidimensional Arrays

void Sort(int Data[]);

void main() { int A[]; int B[]; int C[][];

/* Code to allocate space for A,B & C, and set initial values */

Sort(A); Sort(B); Sort(C[2]); }

06-59: Function Prototypes

  • int foo(int a, boolean b);
  • Add a description of this function to the function environment

06-60: Function Prototypes

  • int foo(int a, boolean b);
  • Add a description of this function to the function environment
    • Type of each parameter
    • Return type of the function

06-61: Function Prototypes

int foo(int a, boolean b);

boolean

int

KeyStack

int boolean

Type Environment

void

void INTEGERTYPE BOOLEANTYPE VOIDTYPE

foo

KeyStack foo

Function Environment

newscope

newscope

FUNCTION TYPE

Return Type Parameters

06-62: Function Prototypes

  • int PrintBoard(int board[][]);
  • Analyze types of input parameter
    • Add int[] and int[][] to the type environment, if not already there.

06-63: Class Definitions

class MyClass { int integerval; int Array[]; boolean boolval; }

06-64: Class Definitions

  • Analyze formal parameters & return type. Check against prototype (if there is one), or add function entry to function environment (if no prototype)
  • Begin a new scope in the variable environment
  • Add formal parameters to the variable environment
  • Analyze the body of the function, using modified variable environment
  • End current scope in variable environment

06-69: Expressions

  • To analyze an expression:
    • Make sure the expression is well formed (no semantic errors)
    • Return the type of the expression (to be used by the calling function)

06-70: Expressions

  • Simple Expressions
    • 3 (integer literal)
      • This is a well formed expression, with the type int
    • true (boolean literal)
      • This is a well formed expression, with the type int

06-71: Expressions

  • Operator Expressions
    • 3 + 4
      • Recursively find types of left and right operand
      • Make sure the operands have integer types
      • Return integer type
    • x ¿ 3
      • Recursively find types of left and right operand
      • Make sure the operands have integer types
      • Return boolean type

06-72: Expressions

  • Operator Expressions
    • (x ¿ 3) —— z
      • Recursively find types of left and right operand
      • Make sure the operands have boolean types
      • Return boolean type

06-73: Expressions – Variables

  • Simple (Base) Variables – x
  • Look up x in the variable environment
  • If the variable was in the variable environment, return the associated type.
  • If the variable was not in the variable environment, display an error.
    • Need to return something if variable is not defined – return type integer for lack of something better

06-74: Expressions – Variables

  • Array Variables – A[3]
    • Analyze the index, ensuring that it is of type int
    • Analyze the base variable. Ensure that the base variable is an Array Type
    • Return the type of an element of the array, extracted from the base type of the array
  • int A[];

/* initialize A, etc. */ x = A[3];

06-75: Expressions – Variables

  • Array Variables
    • Analyze the index, ensuring that it is of type int
    • Analyze the base variable. Ensure that the base variable is an Array Type
    • Return the type of an element of the array, extracted from the base type of the array
  • int B[][];

/* initialize B, etc. */ x = B[3][4];

06-76: Expressions – Variables

  • Array Variables
    • Analyze the index, ensuring that it is of type int
    • Analyze the base variable. Ensure that the base variable is an Array Type
    • Return the type of an element of the array, extracted from the base type of the array
  • int B[][]; int A[];

/* initialize A, B, etc. */ x = B[A[4]][A[3]];

06-77: Expressions – Variables

  • Array Variables
    • Analyze the index, ensuring that it is of type int
    • Analyze the base variable. Ensure that the base variable is an Array Type
  • Analyze the “if” statement
  • Analyze the “else” statement (if there is one)

06-82: Statements

  • Assignment statements
    • Analyze the left-hand side of the assignment statement
    • Analyze the right-hand side of the assignment statement
    • Make sure the types are the same
      • Can do this with a simple pointer comparison!

06-83: Statements

  • Block statements
    • Begin new scope in variable environment
    • Recursively analyze all children
    • End current scope in variable environment

06-84: Statements

  • Variable Declaration Statements
    • Look up type of variable
      • May involve adding types to type environment for arrays
    • Add variable to variable environment
    • If there is an initialization expression, make sure the type of the expression matches the type of the variable.

06-85: Types in Java

  • Each type will be represented by a class
  • All types will be subclasses of the “type” class:

class Type { }

06-86: Built-in Types

  • Only one internal representation of each built-in type
    • All references to INTEGER type will be a pointer to the same block of memory
  • How can we achieve this in Java?
    • Singleton software design pattern

06-87: Singletons in Java

  • Use a singleton when you want only one instantiation of a class
  • Every call to “new” creates a new instance
  • – prohibit calls to “new”!
    • Make the constructor private
    • Obtain instances through a static method

06-88: Singletons in Java

public class IntegerType extends Type {

private IntegerType() { }

public static IntegerType instance() { if (instance_ == null) { instance_ = new IntegerType(); } return instance_; } static private IntegerType instance_; }

06-89: Singletons in Java

Type t1; Type t2; Type t3;

t1 = IntegerType.instance(); t2 = IntegerType.instance(); t3 = IntegerType.instance();

  • t1, t2, and t3 all point to the same instance

06-90: Structured Types in Java

  • Built-in types (integer, boolean, void) do not need any extra information)
    • An integer is an integer is an integer
  • Structured types (Arrays, classes) need more information
    • An array of what
    • What fields does the class have

06-91: Array Types in Java

  • Internal representation of array type needs to store the element type of the array class ArrayType extends Type { public ArrayType(Type type) { type_ = type; } public Type type() { return type_; } public void settype(Type type) { type_ = type; } private Type type_; }