Chapter 5
Semantic Analysis

powered by FreeFind

Modified: 

OVERVIEW

Semantic Analysis phase:

The following will focus on symbol table construction and type checking.

Semantic analysis operates upon the parse tree. Before plunging into semantic analysis it is important to develop a solid understanding of the grammar that describes the syntax of our MiniJava language from which the parse tree is produced.

Concrete vs Abstract

Using SableCC version 3, the grammar has been written as concrete syntax for parsing, that which describes precisely what tokens and production rules are valid. Concrete syntax contains the keywords, semicolon, non-terminals, etc. that make up a production rule, an if statement can be defined in concrete syntax by:

{if}      if l_parenthese exp r_parenthese [true]:statement else [false]:statement

Abstract syntax, generally ignores keywords and those parts of the syntax that carry no information about the meaning of the phrase, from the if the abstract syntax would be:

{if} exp [true]:statement [false]:statement

Working with abstract syntax, as discussed in Chapter 4, results in simpler parse trees, often with fewer productions. SableCC3 has transformations from concrete to abstract syntax, these have been applied to the concrete to produce the complete abstract syntax below.

A simple transformation of concrete plus to abstract plus illustrates the basic transformation syntax.

The concrete defines a plus exp and {-> New exp.plus(exp.exp,term.exp) } defines the transformation to an abstract plus, the [l] and [r] of the abstract syntax serve to identify the two expressions in the parse tree.

Concrete

exp {-> exp} = 
   {plus}  exp plus term
            {-> New exp.plus(exp.exp,term.exp) }
Abstract

exp =  {plus}  [l]:exp [r]:exp
exp_list = list* {-> New exp_list([list.exp])} ;

list {-> exp*}= exp exp_rest* {-> [exp, exp_rest.exp]};
exp_rest {-> exp} = comma exp {-> exp};
 
exp_list = exp*;

    The second example transforms lists of comma separated expressions:

ExpList ® Exp ExpRest* | є
ExpRest ® , Exp

a definition which is clumsy to use as a list, to a true list of expressions that are simple to traverse:

ExpList ® Exp*

Important point to recognize is that the concrete syntax still defines parsing rules, the abstract syntax defines the syntax tree. Both are needed.

Concrete (for exp only) Abstract (complete)
Productions

 exp {-> exp} =           
                        {lt}    exp lt term
                                     {-> New exp.lt(exp.exp,term.exp) } | 
                        {plus}  exp plus term
                                     {-> New exp.plus(exp.exp,term.exp) } | 
                        {minus} exp minus term
                                     {-> New exp.minus(exp.exp,term.exp) } | 
                        {arrayindex} [array]:exp l_bracket [index]:exp r_bracket
                                     {-> New exp.arrayindex(array.exp, index.exp) } | 
                        {newarray} new int l_bracket exp r_bracket
                                     {-> New exp.newarray(exp) } | 
                        {term} term {-> term.exp }  ;  
term {-> exp}= 
                        {times} term star value
                                     {-> New exp.times(term.exp,value.exp) } |  
                        {and}   term and value
                                     {-> New exp.and(term.exp,value.exp) } | 
                        {value} value {-> value.exp }; 
value {-> exp}= {not} complement object {-> New exp.not(object.exp) } | 
                        {object} object { -> object.exp } ;  
object {-> exp} =
                        {length} factor dot length
                                     {-> New exp.length(factor.exp) } | 
                        {methodcall} factor dot id l_parenthese exp_list r_parenthese
                                     {-> New exp.methodcall(factor.exp, id, exp_list) } | 
                        {factor} factor {-> factor.exp } ; 
factor {-> exp} =
                        {new} new id l_parenthese r_parenthese
                                     {-> New exp.new(id) } | 
                        {integerliteral} integer_literal
                                     {-> New exp.integerliteral(integer_literal) } | 
                        {trueliteral} true
                                     {-> New exp.trueliteral(true) } | 
                        {falseliteral} false
                                     {-> New exp.falseliteral(false) } | 
                        {id} id
                                     {-> New exp.id(id) } | 
                        {this} this {-> New exp.this(this) } | 
                        {exp} l_parenthese exp r_parenthese {-> exp.exp};
Abstract Syntax Tree
 
program = main_class class_decl*;
main_class = [classname]:id [arg]:id statement;
class_decl =
     id var_decl* method_decl* |
             {extends} [classname]:id
                             [extend]:id var_decl* method_decl*;
var_decl = type id;
method_decl = type id formal_list var_decl* statement* exp;
statement =
             {statementlist} statement* |
             {if} exp [true]:statement [false]:statement |
             {while} exp statement |
             {println} exp |
             {assign} id exp |
             {array_assign} id [index]:exp [r]:exp;
formal_list = type id formal_rest* | {empty};
formal_rest = type id;
 
type=   {int_array} |
            {boolean}   |
            {int}        |
            {id} id;

exp =    {and}     [l]:exp [r]:exp |
             {lt}        [l]:exp [r]:exp |
             {plus}    [l]:exp [r]:exp |
             {minus} [l]:exp [r]:exp |
             {times}  [l]:exp [r]:exp |
             {length} exp                   |
             {arrayindex} [array]:exp [index]:exp |
             {methodcall} exp id exp_list |
             {integerliteral} integer_literal |
             {trueliteral} true |
             {falseliteral} false |
             {id} id |
             {this} this |
             {new} id |
             {newarray} exp |
             {not}     exp ;
exp_list = exp*;

Exercise

  1. What do you think the following means as a transformation: value {-> exp}
  2. Most alternations for exp appear in the abstract syntax (e.g. plus), why doesn't value, factor, term, etc.?
  3. What would be the abstract parse of:

        if (x==12+y)  {
                x=x+1;
                y=y*x;
        } else
                 x=5;      
     
  4. Does concrete syntax: {exp} l_parenthese exp r_parenthese {-> exp.exp}

have any corresponding abstract syntax? Explain.

Example: (4*5)

Visiting the Parse Tree Nodes

Since the main point of writing the grammar is to visit do something useful at nodes on the generated parse tree, it is important to understand how nodes are visited. SableCC automatically generates from the grammar a DepthFirstAdapter (a visitor pattern) to invoke methods corresponding to tree nodes; the key then is how to connect the grammar with methods.

For the plus abstract syntax:

exp =  {plus}  [l]:exp [r]:exp

the corresponding method would be:

 
 // Exp L, R;
public void caseAPlusExp(APlusExp node) {
     node.getL().apply(this);
     node.getR().apply(this);
}

The parse tree at the APlustExp node for the string 4+3*2

               APlusExp                4+3*2
            L  /          \   R
   AIntegerLiteral    ATimesExp
       |                 L  /          \ R
      4        AIntegerLiteral   AIntegerLiteral
                          |                   |
                         3                   2

Exercise

statement = {while} exp statement

 

5.1 SYMBOL TABLES

The symbol table stores attributes such as type and storage location associated to:

The declarations are defined in the abstract MiniJava grammar as:

main_class =    [classname]:id [arg]:id statement;

class_decl =                    id var_decl* method_decl* |
                     {extends} [classname]:id [extend]:id var_decl* method_decl*;

var_decl =       type id;

method_decl = type id formal_list var_decl* statement* exp;

 

Essentially the symbol table is a dictionary structure (or a hash table) which contains:

In a language such as MiniJava, where all symbols must be declared, the symbol table holds the names of all declared symbols. The symbol table organization corresponds directly to the organization of the language. Consider the following program and its hierarchical organization at right:

class A {
   public static void main(String [] a) {
     System.out.println(new B().Bf(1,2));
   }
}

class B {
   int instance1;

   public int Bf(int formal1, int formal2) {
     int local1;

     local1=1;

     instance1=local1;

     if (formal1 < 1)
        local1 = 1;
     else
        local1 = formal1 *
             (this.Bf(formal1-1, formal2));
     return local1;
   }
}
        SymbolTable
             Class
             A
                 Globals
                 Methods
                    main 
                       Parameters
                           a    String[]
                       Variables
             B
                 Globals
                    instance1
                 Methods
                    Bf      int
                       Parameters
                          formal1   int
                          formal2   int
                       Variables
                          local1      int

Exercise


SYMBOL TABLE ORGANIZATION

MiniJava has three major declarations: class, method and variable. Four classes to implement symbol tables corresponding to the declarations are then: SymbolTable, Class, Method, and Variable. A hierarchy of hash tables organize program symbols analogous to the declaration structure.

Examine the program above to observe the hierarchical relationship of declarations.

The relationship of each class implementing the SymbolTable is illustrated at right where, for example, SymbolTable class depends on Method and Class class definitions, Class depends on Method and Variable, etc.

Type - One key point is that each node is typed, it is an instance of a class. In the parse tree below at the APlustExp node for the string 4+3*2

               APlusExp               
            L  /            \   R
AIntegerLiteral    ATimesExp
       |                 L  /          \ R
      4    AIntegerLiteral   AIntegerLiteral
                          |                   |
                         3                   2

each node is typed (e.g. APlusExp).

Exercise

 

SymbolTable - primarily responsible for managing classes where each class name is unique and serves as a hash key. Fields are:

package SymbolTable; 

public class SymbolTable {
    private Hashtable hashtable;

    public SymbolTable()
    public boolean addClass(String id, String parent)
    public Class getClass(String id)
    public boolean containsClass(String id)
    public Node getVarType(Method m, Class c, String id)
    public Method getMethod(String id, String classScope)
    public Node getMethodType(String id, String classScope) 
    public boolean compareTypes(Node t1, Node t2); 

Class - Each class requires a hash table each for methods and globals (instance variables) and the name of its parent. This has the effect of binding the class to parent, methods and globals. Fields are:

package SymbolTable;

public class Class {
    String id;
    Hashtable methods;
    Hashtable globals;
    String parent;
    Node type;

    public Class(String id, String parent)
    public String getId(){ return id; }   
    public Node Type()
    public boolean addMethod(String id, Node type)
    public Enumeration getMethods()
    public Method getMethod(String id)
    public boolean addVar(String id, Node type)
    public Variable getVar(String id)
    public boolean containsVar(String id)
    public boolean containsMethod(String id)
    public String parent()
    public void setVarOffset(String id, int offset)
    public int getVarOffset(String id)

Method - Each method requires a hash table for local variables and a vector for formal parameters. This has the effect of binding the method to its formal parameters and local variables. Fields are:

package SymbolTable;

public class Method {
    String id;
    Node type;
    Vector params;
    Hashtable vars;

    public Method(String id, Node type)
    public String getId()
    public Node Type()
    public Variable getLocal(String id)
    public boolean addParam(String id, Node type)     
    public Variable getParam(String id)
    public void setParamOffset(String id, int offset)
    public int getParamOffset(String id)
    public Enumeration getParams()
    public Variable getParamAt(int i)
    public boolean containsParam(String id)     
    public Variable getVar(String id)
    public boolean addVar(String id, Node type)
    public boolean containsVar(String id)
    public Enumeration getVars()
    public void setVarOffset(String id, int offset)
    public int getVarOffset(String id)

Variable - Each variable requires an offset that refers to a starting location allocated for storing the value of the variable. In some architectures variables can be on a stack or heap in which variables would typically address a location using a register + offset such as a stack pointer + offset. The offset is computed when the method call frame is constructed, the effect is to bind each variable to type and location. Fields are:

package SymbolTable;

public class Variable {
    String id;
    int offset;
    Node type;   

    public Variable(String id, Node type)
    public String id()
    public Node Type()
    public void setOffset(int offset)
    public int getOffset()

 

Exercise - Using the definitions for package SymbolTable, diagram the hash tables with the following program declarations:

class A {
   int instanceA;
   public void Af(int formalA1) {
     int localA;
   }
}

class B extends A {
   int instanceB;

   public int Bf1(int formalB1, int formalB2) {
     int localB;
   }
   public int Bf2(int formalB1, int formalB2) {
     int localB;
   }
}
public class SymbolTable {
    private Hashtable hashtable;
public class Class {
    String id;
    Hashtable methods;
    Hashtable globals;
    String parent;
    Node type;
public class Method {
    String id;
    Node type;
    Vector params;
    Hashtable vars;
public class Variable {
    String id;
    int offset;
    Node type;

   

BUILDING A SYMBOL TABLE

Building a symbol table requires:

The MiniJava grammar for these declarations and the type is:

main_class =    [classname]:id [arg]:id statement;
class_decl =     id var_decl* method_decl* |
                      {extends} [classname]:id [extend]:id var_decl* method_decl*;
var_decl =       type id;
method_decl = type id formal_list var_decl* statement* exp;

type= {int_array}
       | {boolean}
       | {int}       
       | {id} id;

Visiting each node using SableCC DepthFirstAdapter visitor pattern is a matter of defining a method with the correct signature. The method naming convention can be seen by the example in the right column. Generally methods should be named by:

caseALabelProduction(ALabelProduction node)

for:

class_decl =     id var_decl* method_decl*

is defined as:

void caseAClassDecl(AClassDecl node)

Class declaration

class_decl =     id var_decl* method_decl*

The method body adds the class but does not directly add method or variable declarations to the symbol table. The node.apply(this) method visits the node, in this case the list of methods and variables are visited and added to the current class.

 public void caseAClassDecl(AClassDecl node) {
   if(!symbolTable.addClass( node.getId().toString(), null)){
      System.out.println("Class " + node.getId().toString() + "is already defined" );
      System.exit(-1);
   }
   currClass = symbolTable.getClass(node.getId().toString());

   LinkedList v = node.getVarDecl();
   Iterator iv = v.iterator();
   while ( iv.hasNext() ) ((AVarDecl)iv.next()).apply(this);

   LinkedList m = node.getMethodDecl();
   Iterator im = m.iterator();
   while ( im.hasNext() ) ((AMethodDecl)im.next()).apply(this);
}

 

Variable declaration

var_decl = type id;

  public void caseAVarDecl(AVarDecl node) {   
     node.getType().apply(this);

     Node t =  node.getType();

     String id =  node.getId().toString();

 

      if (currMethod == null){

              if (!currClass.addVar(id,t)){

                  System.out.println(id + "is already defined in " + currClass.getId());

                  System.exit(-1);

              }

      } else {            

                if (!currMethod.addVar(id,t)){

                      System.out.println(id + "is already defined in " + currClass.getId() + "." +

                                      currMethod.getId());

                      System.exit(-1);

               }

      }

  }

 

Type int

type= {int_array}
       | {boolean}
       | {int}       
       | {id} id;

As method and variable nodes are visited, type is determined and bound to the identifying symbol. The following sets the type of the node to AIntType, for example in: int x.

public void caseAIntType(AIntType node) {
     setType(node, new AIntType());
}


5.2 TYPE-CHECKING MiniJava

Type checking is not part of the context free grammar; type checking is context sensitive, for example, a variable must be declared in the current scope.

Type checking uses the symbol table or implicit type (e.g. && result is boolean) to determine if each use of a symbol conforms with its declaration; an variable defined as an int is only used as an int, for example. The basic strategy is to visit each statement and expression node, checking each to verify typing.

The grammar for and expressions hints at the type checking steps needed; the left and right sides of the expression must both be type checked (by apply(this) on left and right expressions) and be boolean type (ABooleanType from the grammar). The type of the and is set as boolean.

exp = {lt}    [l]:exp [r]:exp |
         {and} [l]:exp [r]:exp;
 

  public void caseAAndExp(AAndExp node) {


    node.getL().apply(this);
    if (! ((Node)getType(node.getL()) instanceof ABooleanType) ) {

       System.out.println("L side of And must be of type Boolean");

       System.exit(-1);

    }


    node.getR().apply(this);

    if (! ((Node)getType(node.getR()) instanceof ABooleanType) ) {

       System.out.println("R side of And must be of type Boolean");

       System.exit(-1);

    }


    setType(node, new ABooleanType());

  }

            true && (3 < 2)

               AAndExp               
            L  /            \   R
ABooleanExp       ALtExp
   |                       L  /     \ R
 true    AIntegerLiteral   AIntegerLiteral
                          |                   |
                         3                   2

Exercise

 

type= {int_array}
       | {boolean}
       | {int}       
       | {id} id;

Assignment

Note that the symbol table is not often needed for expressions because operands must be of some predetermined type for operations such as +, -, *, &&, etc. The symbol table is needed for assignment and method calls where user-defined types (e.g. objects) are allowed.

Assignment requires that left and right-hand sides are the same type. For example:

a = b;

requires that a and b are the same type which could be any of those listed above including a class id.

The assignment statement grammar is defined as:

statement = {assign} id exp

The assignment semantics require an id that has been typed in a declaration (e.g. boolean foo;) and an expression that must match the type of id.

  public void caseAAssignStatement(AAssignStatement node) {

    node.getExp().apply(this);
 

    Node t1 = symbolTable.getVarType(currMethod,currClass,node.getId().toString());

    Node t2 = getType(node.getExp());


    if (symbolTable.compareTypes(t1,t2)==false){

            System.out.println("Type error in assignment to "+node.getId().toString());  

            System.exit(0);

    }

  }

Exercise

class B {

 int a;

 int m() {

  int b;

  b = 2;

  a = b + 3;

  b = a * 2 + b;

  return 42;
 }
}

statement = {assign} id exp;

exp = {plus}  [l]:exp [r]:exp |
         {times} [l]:exp [r]:exp |
         {integerliteral} integer_literal |
         {id} id |
         {new} id;

type= {int_array}
       | {boolean}
       | {int}       
       | {id} id;

 

Method calls

Method calls have the grammar:

exp = {methodcall} exp id exp_list

an example and tokens are:   (new Fac()).ComputeFac( 5 );

Consider a method signature and invocation:

void M (int a, int b, boolean c)
 this.M (3,     2+5,  true)

The formals and actuals must agree in type and number by position, and do in this case.

 // Exp Id ExpList
public void caseAMethodcallExp(AMethodcallExp node) {
   node.getExp().apply(this);

   if (! (getType(node.getExp()) instanceof AIdType)) {
       System.out.println("method "+ node.getId().toString()
                                  + "called on something that is not a class or Object.");
       System.exit(-1);
   }
   String cname = getClass(node.getExp());
   String mname = node.getId().toString();

   Method calledMethod = symbolTable.getMethod(mname, cname);

   Node formal = null;    Node actual = null;
   node.getExpList().apply(this);

   for ( int i = 0; i < ((AExpList)node.getExpList()).getExp().size(); i++ ) {
     if (calledMethod.getParamAt(i)!=null)
        formal = calledMethod.getParamAt(i).Type();
     actual = getType( (Node)((AExpList)node.getExpList()).getExp().get(i));

     if (!symbolTable.compareTypes(formal, actual)){
       System.out.println("Type Error in arguments passed to " + cname+"." +mname);
       System.exit(-1);
     }
   }
   setType(node, symbolTable.getMethodType(mname,cname));
}

Most of the work is matching formals with actuals to check that both are the same type.

     void M (int a, int b, boolean c)    formals
     this.M  (3,     2+5,  true)            actuals

exp = {methodcall} exp id exp_list

  1. node.getExp().apply(this); - compute type of the exp
     
  2. getType(node.getExp()) instanceof AIdType - receiver must be id type, not a primitive type.
     
  3. symbolTable.getMethod(mname, cname);  - lookup the method object in the symbol table
     
  4. node.getExpList().apply(this); - compute the type of all exp's in the list.
     
  5. formal = calledMethod.getParamAt(i).Type() - get type of the ith formal parameter.
     
  6. actual = getType( (Node)((AExpList)node.getExpList()).getExp().get(i)) - get the type of the ith actual parameter
     
  7. symbolTable.compareTypes(formal, actual) - is formal and actual the same?
     
  8. setType(node, symbolTable.getMethodType(mname,cname)); - If all parameters are same type, set the method type to that from method declaration in symbol table.

type= {int_array}
       | {boolean}
       | {int}       
       | {id} id;

Exercise: use    void M (int a, int b, boolean c)    formals
                         this.M  (3,     2+5,  true)            actuals

 

Putting it together

Building the symbol table and type checking is done below as two separate operations after the parse tree is constructed. The key steps are:

  1. Start ast = parser.parse(); - parses and constructs the parse tree referenced by ast.
     
  2. ast.apply(bstv); - invokes the DepthFirstAdapter apply method of BuildSymbolTable object to the root node of the parse tree. BuildSymbolTable constructs the symbol table.
     
  3. ast.apply(new TypeCheckVisitor(bstv.getSymbolTable())); - constructs a TypeCheckVisitor object (a DepthFirstAdapter) passing the symbol table. Invokes the DepthFirstAdapter apply method of TypeCheckVisitor object to the root node of the parse tree.
import minijava.lexer.*;
import minijava.node.*;
import minijava.parser.*;
import java.io.*;
import BuildSymbolTableVisitor.BuildSymbolTableVisitor;
import TypeCheckVisitor.TypeCheckVisitor;

public class Main{
   public static void main(String[] args){
      try{
            Lexer lexer = new Lexer(new PushbackReader(
                                       new InputStreamReader(new FileInputStream(args[0])), 1024));

            Parser parser = new Parser(lexer);

            Start ast = parser.parse();

            BuildSymbolTableVisitor bstv = new BuildSymbolTableVisitor();

            ast.apply(bstv);

            ast.apply(new TypeCheckVisitor(bstv.getSymbolTable()));
       }
       catch(Exception e){
                      System.out.println("Error: " + e.getMessage());
       }
  }
}

 

A few minor changes has been made to the operation of class Main and general organization:

java -cp . Main file.java