Chapter 5
|
OVERVIEW
Semantic Analysis phase:
- connects variable definitions to their uses
- type checks expressions
- translates abstract syntax to a simpler representation for code generation.
The following will focus on symbol table construction and type checking.
Semantic analysis operates upon the parse tree. Before plunging into semantic analysis it is important to develop a solid understanding of the grammar that describes the syntax of our MiniJava language from which the parse tree is produced.
Concrete vs Abstract
Using SableCC version 3, the grammar has been written as concrete syntax for parsing, that which describes precisely what tokens and production rules are valid. Concrete syntax contains the keywords, semicolon, non-terminals, etc. that make up a production rule, an if statement can be defined in concrete syntax by:
{if} if l_parenthese exp r_parenthese [true]:statement else [false]:statement Abstract syntax, generally ignores keywords and those parts of the syntax that carry no information about the meaning of the phrase, from the if the abstract syntax would be:
{if} exp [true]:statement [false]:statement Working with abstract syntax, as discussed in Chapter 4, results in simpler parse trees, often with fewer productions. SableCC3 has transformations from concrete to abstract syntax, these have been applied to the concrete to produce the complete abstract syntax below.
A simple transformation of concrete plus to abstract plus illustrates the basic transformation syntax.
The concrete defines a plus exp and {-> New exp.plus(exp.exp,term.exp) } defines the transformation to an abstract plus, the [l] and [r] of the abstract syntax serve to identify the two expressions in the parse tree.
Concrete
exp {-> exp} =
{plus} exp plus term
{-> New exp.plus(exp.exp,term.exp) }Abstract
exp = {plus} [l]:exp [r]:expexp_list = list* {-> New exp_list([list.exp])} ;
list {-> exp*}= exp exp_rest* {-> [exp, exp_rest.exp]};
exp_rest {-> exp} = comma exp {-> exp};
exp_list = exp*;
The second example transforms lists of comma separated expressions:
ExpList ® Exp ExpRest* | є
ExpRest ® , Expa definition which is clumsy to use as a list, to a true list of expressions that are simple to traverse:
ExpList ® Exp*
Important point to recognize is that the concrete syntax still defines parsing rules, the abstract syntax defines the syntax tree. Both are needed.
| Concrete (for exp only) | Abstract (complete) |
|
Productions exp {-> exp} = {lt} exp lt term {-> New exp.lt(exp.exp,term.exp) } | {plus} exp plus term {-> New exp.plus(exp.exp,term.exp) } | {minus} exp minus term {-> New exp.minus(exp.exp,term.exp) } | {arrayindex} [array]:exp l_bracket [index]:exp r_bracket {-> New exp.arrayindex(array.exp, index.exp) } | {newarray} new int l_bracket exp r_bracket {-> New exp.newarray(exp) } | {term} term {-> term.exp } ; term {-> exp}= {times} term star value {-> New exp.times(term.exp,value.exp) } | {and} term and value {-> New exp.and(term.exp,value.exp) } | {value} value {-> value.exp }; value {-> exp}= {not} complement object {-> New exp.not(object.exp) } | {object} object { -> object.exp } ; object {-> exp} = {length} factor dot length {-> New exp.length(factor.exp) } | {methodcall} factor dot id l_parenthese exp_list r_parenthese {-> New exp.methodcall(factor.exp, id, exp_list) } | {factor} factor {-> factor.exp } ; factor {-> exp} = {new} new id l_parenthese r_parenthese {-> New exp.new(id) } | {integerliteral} integer_literal {-> New exp.integerliteral(integer_literal) } | {trueliteral} true {-> New exp.trueliteral(true) } | {falseliteral} false {-> New exp.falseliteral(false) } | {id} id {-> New exp.id(id) } | {this} this {-> New exp.this(this) } | {exp} l_parenthese exp r_parenthese {-> exp.exp}; |
Abstract Syntax Tree program = main_class class_decl*; main_class = [classname]:id [arg]:id statement; class_decl = id var_decl* method_decl* | {extends} [classname]:id [extend]:id var_decl* method_decl*; var_decl = type id; method_decl = type id formal_list var_decl* statement* exp; statement = {statementlist} statement* | {if} exp [true]:statement [false]:statement | {while} exp statement | {println} exp | {assign} id exp | {array_assign} id [index]:exp [r]:exp; formal_list = type id formal_rest* | {empty}; formal_rest = type id; type= {int_array} | {boolean} | {int} | {id} id; exp = {and} [l]:exp [r]:exp | |
Exercise
- What do you think the following means as a transformation: value {-> exp}
- Most alternations for exp appear in the abstract syntax (e.g. plus), why doesn't value, factor, term, etc.?
- What would be the abstract parse of:
if (x==12+y) {
x=x+1;
y=y*x;
} else
x=5;
- Does concrete syntax: {exp} l_parenthese exp r_parenthese {-> exp.exp}
have any corresponding abstract syntax? Explain.
Example: (4*5)
Visiting the Parse Tree Nodes
Since the main point of writing the grammar is to visit do something useful at nodes on the generated parse tree, it is important to understand how nodes are visited. SableCC automatically generates from the grammar a DepthFirstAdapter (a visitor pattern) to invoke methods corresponding to tree nodes; the key then is how to connect the grammar with methods.
For the plus abstract syntax:
exp = {plus} [l]:exp [r]:exp
the corresponding method would be:
// Exp L, R;
public void caseAPlusExp(APlusExp node) {
node.getL().apply(this);
node.getR().apply(this);
}
- The method name and parameter is a combination of the production (exp) and alternative (plus). The abstract grammar labels [l] and [r] identify the two exp's which are accessed by getL() and getR() respectively.
- node.getL().apply(this); calls the appropriate method for the type of node getL(), which may be an integer literal (e.g. 4), an identifier or another exp. The key purpose of the apply() method is to continue calling methods until the parse tree nodes have been visited.
The parse tree at the APlustExp node for the string 4+3*2
APlusExp 4+3*2
L / \ R
AIntegerLiteral ATimesExp
| L / \ R
4 AIntegerLiteral AIntegerLiteral
| |
3 2Exercise
- What is the method signature for the grammar rule:
statement = {while} exp statement
5.1 SYMBOL TABLES
The symbol table stores attributes such as type and storage location associated to:
- declared constants
- method parameters
- instance variables
- local variables
- methods
- classes
The declarations are defined in the abstract MiniJava grammar as:
main_class = [classname]:id [arg]:id statement;
class_decl = id var_decl* method_decl* |
{extends} [classname]:id [extend]:id var_decl* method_decl*;
var_decl = type id;
method_decl = type id formal_list var_decl* statement* exp;
Essentially the symbol table is a dictionary structure (or a hash table) which contains:
- key is the name of a class, method, variable, etc.
- value is the object referenced by the key.
In a language such as MiniJava, where all symbols must be declared, the symbol table holds the names of all declared symbols. The symbol table organization corresponds directly to the organization of the language. Consider the following program and its hierarchical organization at right:
class A {
public static void main(String [] a) {
System.out.println(new B().Bf(1,2));
}
}
class B {
int instance1;
public int Bf(int formal1, int formal2) {
int local1;
local1=1;
instance1=local1;
if (formal1 < 1)
local1 = 1;
else
local1 = formal1 *
(this.Bf(formal1-1, formal2));
return local1;
}
}SymbolTable
Class
A
Globals
Methods
main
Parameters
a String[]
Variables
B
Globals
instance1
Methods
Bf int
Parameters
formal1 int
formal2 int
Variables
local1 intExercise
- What are the declarations of the above program?
- Sketch a reasonable organization for the bold-face symbols using hash tables, lists, etc.
SYMBOL TABLE ORGANIZATION
MiniJava has three major declarations: class, method and variable. Four classes to implement symbol tables corresponding to the declarations are then: SymbolTable, Class, Method, and Variable. A hierarchy of hash tables organize program symbols analogous to the declaration structure.
Examine the program above to observe the hierarchical relationship of declarations.
The relationship of each class implementing the SymbolTable is illustrated at right where, for example, SymbolTable class depends on Method and Class class definitions, Class depends on Method and Variable, etc.
Type - One key point is that each node is typed, it is an instance of a class. In the parse tree below at the APlustExp node for the string 4+3*2
APlusExp
L / \ R
AIntegerLiteral ATimesExp
| L / \ R
4 AIntegerLiteral AIntegerLiteral
| |
3 2each node is typed (e.g. APlusExp).
Exercise
- Examine the minijava\node sub-directory. Find APlusExp definition.
SymbolTable - primarily responsible for managing classes where each class name is unique and serves as a hash key. Fields are:
- Hashtable hashtable; - container for every Class object in the file keyed by the string identifying the Class.
package SymbolTable; public class SymbolTable {
private Hashtable hashtable;
public SymbolTable()
public boolean addClass(String id, String parent)
public Class getClass(String id)
public boolean containsClass(String id)
public Node getVarType(Method m, Class c, String id)
public Method getMethod(String id, String classScope)
public Node getMethodType(String id, String classScope)
public boolean compareTypes(Node t1, Node t2);Class - Each class requires a hash table each for methods and globals (instance variables) and the name of its parent. This has the effect of binding the class to parent, methods and globals. Fields are:
- Hashtable methods; - methods of the class
- Hashtable globals; - instance variables of the class
- String id; - the class name
- String parent; - the class parent, note this is the name, the actual parent Class object is stored in the SymbolTable hashtable keyed by the parent name.
- Node type; - the class type, always AIdType from the grammar:
type= {int_array}
| {boolean}
| {int}
| {id} id;
package SymbolTable; public class Class {
String id;
Hashtable methods;
Hashtable globals;
String parent;
Node type;public Class(String id, String parent)
public String getId(){ return id; }
public Node Type()
public boolean addMethod(String id, Node type)
public Enumeration getMethods()
public Method getMethod(String id)
public boolean addVar(String id, Node type)
public Variable getVar(String id)
public boolean containsVar(String id)
public boolean containsMethod(String id)
public String parent()
public void setVarOffset(String id, int offset)
public int getVarOffset(String id)Method - Each method requires a hash table for local variables and a vector for formal parameters. This has the effect of binding the method to its formal parameters and local variables. Fields are:
- Hashtable vars; - variables
- Vector params; - parameters
- String id; - the method name
- Node type; - the method type from the grammar:
type= {int_array}
| {boolean}
| {int}
| {id} id;
package SymbolTable; public class Method {
String id;
Node type;
Vector params;
Hashtable vars;public Method(String id, Node type)
public String getId()
public Node Type()
public Variable getLocal(String id)
public boolean addParam(String id, Node type)
public Variable getParam(String id)
public void setParamOffset(String id, int offset)
public int getParamOffset(String id)
public Enumeration getParams()
public Variable getParamAt(int i)
public boolean containsParam(String id)
public Variable getVar(String id)
public boolean addVar(String id, Node type)
public boolean containsVar(String id)
public Enumeration getVars()
public void setVarOffset(String id, int offset)
public int getVarOffset(String id)Variable - Each variable requires an offset that refers to a starting location allocated for storing the value of the variable. In some architectures variables can be on a stack or heap in which variables would typically address a location using a register + offset such as a stack pointer + offset. The offset is computed when the method call frame is constructed, the effect is to bind each variable to type and location. Fields are:
- String id; - the variable name
- int offset; - assigned offset
- Node type; - the method type from the grammar:
type= {int_array}
| {boolean}
| {int}
| {id} id;
package SymbolTable; public class Variable {
String id;
int offset;
Node type;public Variable(String id, Node type)
public String id()
public Node Type()
public void setOffset(int offset)
public int getOffset()
Exercise - Using the definitions for package SymbolTable, diagram the hash tables with the following program declarations:
class A {
int instanceA;
public void Af(int formalA1) {
int localA;
}
}
class B extends A {
int instanceB;
public int Bf1(int formalB1, int formalB2) {
int localB;
}
public int Bf2(int formalB1, int formalB2) {
int localB;
}
}
public class SymbolTable {
private Hashtable hashtable;public class Class {
String id;
Hashtable methods;
Hashtable globals;
String parent;
Node type;public class Method {
String id;
Node type;
Vector params;
Hashtable vars;public class Variable {
String id;
int offset;
Node type;
BUILDING A SYMBOL TABLE
Building a symbol table requires:
- visiting each parse tree node that declares a class, method or variable
- class declarations - add instance variables and methods
- method declarations - add formal parameters and local variables
- variable declarations - add type
The MiniJava grammar for these declarations and the type is:
main_class = [classname]:id [arg]:id statement;
class_decl = id var_decl* method_decl* |
{extends} [classname]:id [extend]:id var_decl* method_decl*;
var_decl = type id;
method_decl = type id formal_list var_decl* statement* exp;type= {int_array}
| {boolean}
| {int}
| {id} id;Visiting each node using SableCC DepthFirstAdapter visitor pattern is a matter of defining a method with the correct signature. The method naming convention can be seen by the example in the right column. Generally methods should be named by:
caseALabelProduction(ALabelProduction node)
for:
class_decl = id var_decl* method_decl*
is defined as:
void caseAClassDecl(AClassDecl node)
Class declaration
class_decl = id var_decl* method_decl*
The method body adds the class but does not directly add method or variable declarations to the symbol table. The node.apply(this) method visits the node, in this case the list of methods and variables are visited and added to the current class.
public void caseAClassDecl(AClassDecl node) {
if(!symbolTable.addClass( node.getId().toString(), null)){
System.out.println("Class " + node.getId().toString() + "is already defined" );
System.exit(-1);
}
currClass = symbolTable.getClass(node.getId().toString());LinkedList v = node.getVarDecl();
Iterator iv = v.iterator();
while ( iv.hasNext() ) ((AVarDecl)iv.next()).apply(this);LinkedList m = node.getMethodDecl();
Iterator im = m.iterator();
while ( im.hasNext() ) ((AMethodDecl)im.next()).apply(this);
}
- node.getId().toString() - returns the class_decl id as a string.
- symbolTable.addClass( node.getId().toString(), null)) - returns true if the class has been added to the symbolTable.
Variable declaration
var_decl = type id;
- Variables can be defined in classes (instance variables) or methods (locals). Note that the type of the node is computed by visiting the type by: node.getType().apply(this). Type may be primitive or a class.
- When currMethod == null is true an instance variable is being defined so add variable to a class.
- When currMethod == null is false a local method variable is being defined so add variable to a method.
public void caseAVarDecl(AVarDecl node) {
node.getType().apply(this); Node t = node.getType(); String id = node.getId().toString(); if (currMethod == null){ if (!currClass.addVar(id,t)){ System.out.println(id + "is already defined in " + currClass.getId()); System.exit(-1); } } else { if (!currMethod.addVar(id,t)){ System.out.println(id + "is already defined in " + currClass.getId() + "." + currMethod.getId()); System.exit(-1); } } }
Type int
type= {int_array}
| {boolean}
| {int}
| {id} id;As method and variable nodes are visited, type is determined and bound to the identifying symbol. The following sets the type of the node to AIntType, for example in: int x.
public void caseAIntType(AIntType node) {
setType(node, new AIntType());
}
5.2 TYPE-CHECKING MiniJava
Type checking is not part of the context free grammar; type checking is context sensitive, for example, a variable must be declared in the current scope.
Type checking uses the symbol table or implicit type (e.g. && result is boolean) to determine if each use of a symbol conforms with its declaration; an variable defined as an int is only used as an int, for example. The basic strategy is to visit each statement and expression node, checking each to verify typing.
The grammar for and expressions hints at the type checking steps needed; the left and right sides of the expression must both be type checked (by apply(this) on left and right expressions) and be boolean type (ABooleanType from the grammar). The type of the and is set as boolean.
exp = {lt} [l]:exp [r]:exp |
{and} [l]:exp [r]:exp;
public void caseAAndExp(AAndExp node) {
node.getL().apply(this);
if (! ((Node)getType(node.getL()) instanceof ABooleanType) ) { System.out.println("L side of And must be of type Boolean"); System.exit(-1); }
node.getR().apply(this); if (! ((Node)getType(node.getR()) instanceof ABooleanType) ) { System.out.println("R side of And must be of type Boolean"); System.exit(-1); }
setType(node, new ABooleanType()); }true && (3 < 2) AAndExp
L / \ R
ABooleanExp ALtExp
| L / \ R
true AIntegerLiteral AIntegerLiteral
| |
3 2Exercise
- Give a definition for the visitor method of ALtExp patterned after AAndExp.
type= {int_array}
| {boolean}
| {int}
| {id} id;Assignment
Note that the symbol table is not often needed for expressions because operands must be of some predetermined type for operations such as +, -, *, &&, etc. The symbol table is needed for assignment and method calls where user-defined types (e.g. objects) are allowed.
Assignment requires that left and right-hand sides are the same type. For example:
a = b;
requires that a and b are the same type which could be any of those listed above including a class id.
The assignment statement grammar is defined as:
statement = {assign} id exp
The assignment semantics require an id that has been typed in a declaration (e.g. boolean foo;) and an expression that must match the type of id.
public void caseAAssignStatement(AAssignStatement node) {
node.getExp().apply(this);
Node t1 = symbolTable.getVarType(currMethod,currClass,node.getId().toString()); Node t2 = getType(node.getExp());
if (symbolTable.compareTypes(t1,t2)==false){ System.out.println("Type error in assignment to "+node.getId().toString()); System.exit(0); } }
- node.getExp().apply(this); - visits child nodes to determine type of expression.
- symbolTable.getVarType(currMethod,currClass,node.getId().toString()); - looks up the type of the variable id in the current class and method; id can be a method parameter or local variable, or a class field.
- symbolTable.compareTypes(t1,t2) - returns true when the type of the two parameters is the same. Note that MiniJava uses name equivalence as the basis for type equivalence.
Exercise
class B { int a;
int m() {
int b;
b = 2;
a = b + 3;
b = a * 2 + b;
return 42;
}
}statement = {assign} id exp;
exp = {plus} [l]:exp [r]:exp |
{times} [l]:exp [r]:exp |
{integerliteral} integer_literal |
{id} id |
{new} id;type= {int_array}
| {boolean}
| {int}
| {id} id;
- What is the parse tree for the following as a statement:
- b = 2;
- a = b + 3;
- b = a * 2 + b;
- Add type to each node of the tree.
- Knowing that fields are part of the class declaration and local variables are part of the method declaration, why does the type of the id part of assignment statement need to be looked up?
Method calls
Method calls have the grammar:
exp = {methodcall} exp id exp_list
an example and tokens are: (new Fac()).ComputeFac( 5 );
- exp - new Fac() type check to verify that it is an AIdType, a class or object.
- id - ComputeFace with the class used to lookup method in symbol table, the method object is then used to lookup formal parameters and type. The type of formal and actual parameters must agree.
- exp_list - 5 is applied to determine type (instead of 5 it could be a method call, etc.) which is compared with the formal parameter in that position.
Consider a method signature and invocation:
void M (int a, int b, boolean c)
this.M (3, 2+5, true)The formals and actuals must agree in type and number by position, and do in this case.
// Exp Id ExpList
public void caseAMethodcallExp(AMethodcallExp node) {
node.getExp().apply(this);
if (! (getType(node.getExp()) instanceof AIdType)) {
System.out.println("method "+ node.getId().toString()
+ "called on something that is not a class or Object.");
System.exit(-1);
}
String cname = getClass(node.getExp());
String mname = node.getId().toString();
Method calledMethod = symbolTable.getMethod(mname, cname);
Node formal = null; Node actual = null;
node.getExpList().apply(this);
for ( int i = 0; i < ((AExpList)node.getExpList()).getExp().size(); i++ ) {
if (calledMethod.getParamAt(i)!=null)
formal = calledMethod.getParamAt(i).Type();
actual = getType( (Node)((AExpList)node.getExpList()).getExp().get(i));
if (!symbolTable.compareTypes(formal, actual)){
System.out.println("Type Error in arguments passed to " + cname+"." +mname);
System.exit(-1);
}
}
setType(node, symbolTable.getMethodType(mname,cname));
}Most of the work is matching formals with actuals to check that both are the same type.
void M (int a, int b, boolean c) formals
this.M (3, 2+5, true) actualsexp = {methodcall} exp id exp_list
- node.getExp().apply(this); - compute type of the exp
- getType(node.getExp()) instanceof AIdType - receiver must be id type, not a primitive type.
- symbolTable.getMethod(mname, cname); - lookup the method object in the symbol table
- node.getExpList().apply(this); - compute the type of all exp's in the list.
- formal = calledMethod.getParamAt(i).Type() - get type of the ith formal parameter.
- actual = getType( (Node)((AExpList)node.getExpList()).getExp().get(i)) - get the type of the ith actual parameter
- symbolTable.compareTypes(formal, actual) - is formal and actual the same?
- setType(node, symbolTable.getMethodType(mname,cname)); - If all parameters are same type, set the method type to that from method declaration in symbol table.
type= {int_array}
| {boolean}
| {int}
| {id} id;Exercise: use void M (int a, int b, boolean c) formals
this.M (3, 2+5, true) actuals
- Why in 2. be the exp part of a method call be an id type?
- The method call node has the id of the method called. Why is 3. needed to look up the method in a class declaration?
- How does 4. compute the type of each expression in the list?
- In 5. calledMethod is an object of class Method. What is formal when i is 1?
- What does 6. do?
- Explain 8.
Putting it together
Building the symbol table and type checking is done below as two separate operations after the parse tree is constructed. The key steps are:
- Start ast = parser.parse(); - parses and constructs the parse tree referenced by ast.
- ast.apply(bstv); - invokes the DepthFirstAdapter apply method of BuildSymbolTable object to the root node of the parse tree. BuildSymbolTable constructs the symbol table.
- ast.apply(new TypeCheckVisitor(bstv.getSymbolTable())); - constructs a TypeCheckVisitor object (a DepthFirstAdapter) passing the symbol table. Invokes the DepthFirstAdapter apply method of TypeCheckVisitor object to the root node of the parse tree.
import minijava.lexer.*;
import minijava.node.*;
import minijava.parser.*;
import java.io.*;
import BuildSymbolTableVisitor.BuildSymbolTableVisitor;
import TypeCheckVisitor.TypeCheckVisitor;
public class Main{
public static void main(String[] args){
try{
Lexer lexer = new Lexer(new PushbackReader(
new InputStreamReader(new FileInputStream(args[0])), 1024));
Parser parser = new Parser(lexer);
Start ast = parser.parse();
BuildSymbolTableVisitor bstv = new BuildSymbolTableVisitor();
ast.apply(bstv);
ast.apply(new TypeCheckVisitor(bstv.getSymbolTable()));
}
catch(Exception e){
System.out.println("Error: " + e.getMessage());
}
}
}
A few minor changes has been made to the operation of class Main and general organization:
- input file, rather using redirection of input the program reads the file name from the command line; it is now run by:
java -cp . Main file.java
- location is in the root directory rather than minijava; makes it simpler to locate, compile and execute.
- Homework 5 has been organized into multiple projects (directories) to better manage the growing complexity of the software project.
- A BlueJ project has been defined to provide better development tools, see Homework 5.