Chapter 4
|
OVERVIEW
Parsing recognizes whether a sentence is part of a grammar and generates a parse tree; we have seen in Homework 1 that useful work, such as interpreting the meaning or semantics of code can be performed while walking the node of the parse tree. This chapter will examine how to perform semantic actions for recursive descent parsers and compiler/compilers such as JavaCC and SableCC.
4.1 SEMANTIC ACTIONS
Interpreting the straightline programs required performing some action at each node corresponding to a terminal or nonterminal in the grammar. From Homework 1 example:
5 + a
required the following 3 apply functions to be called for each corresponding node of the syntax tree. The syntax tree was walked by other calls to apply functions such as the following to interpret the right-hand part of an OpExp:
this.interpret(exp.right)
new OpExp(
new NumExp(5),
Plus,
new IdExp("a")
)int interpret(IdExp exp)
{ return ((Integer)table.get(exp.id)).intValue();}
int interpret(NumExp exp)
{ return exp.num; }
int interpret(OpExp exp)
{ int left = this.interpret(exp.left);
int right = this.interpret(exp.right);
return left + right;
}
RECURSIVE DESCENT
Recursive descent parsers are hand-written from the grammar, semantic actions can be performed during the parse. For example the following interprets an F nonterminal by looking up the value of the ID token.
PROGRAM 4.1. Recursive-descent interpreter for part of Grammar 3.15. class Token {int kind; Object val;
Token(int k, Object v) {kind=k; val=v;}
}
final int EOF=0, ID=1, NUM=2, PLUS=3, MINUS=4, ...
int lookup(String id) { ... }
int F_follow[] = { PLUS, TIMES, RPAREN, EOF };
int F() {switch (tok.kind) {
case ID: int i=lookup( (String) (tok.val)); advance(); return i;
case NUM: int i=((Integer) (tok.val)) .intValue();
advance(); return i;
case LPAREN: eat(LPAREN);
int i = E ( ) ;
eatOrSkipTo(RPAREN, F_follow);
return i;
case EOF:
default: print("expected ID, NUM, or left-paren"); skipto(F_follow); return 0;
} }
AUTOMATICALLY GENERATED PARSERSJavaCC generates a parser from a set of grammar rules each of which can have a semantic action. When the parser is executed, the semantic actions are preformed whenever that production rule is reduced. For example, in the following JavaCC whenever an Exp is reduced, code for interpreting as Term + Term is executed (or Term - Term). A disadvantage to this approach is that semantic actions must occur as each rule is encountered, the compiler must analyze the program in exactly the order parsed; others are the use of specialized embedded code versus a general programming language making maintenance difficult.
Program 4.2 JavaCC version of a variant of Grammar 3.15 void Start() :
{ int i; }
{ i=Exp() <EOF> {System.out.println(i);}
}
int Exp() :
{ int a,i; }
{ a=Term()
( "+" i=Term() { a=a+i; }
| "-" i=Term() { a=a-i; }
) *
{ return a; }
}
int Term() :
{ int a,i; }
{ a=Factor()
( "*" i=Factor() { a=a*i; }
| "/" i=Factor() { a=a/i; }
) *
{ return a; }
}
int Factor() :
{ Token t; int i; }
{ t=<IDENTIFIER> { return lookup(t.image); }
| t=<INTEGER_LITERAL> { return Integer.parseInt(t.image); }
| "(" i=Exp() ")" { return i; }
}
4.2 ABSTRACT PARSE TREES
Recall that a parse tree contains the grammar rules used to parse a program. Grammar 4.3 is impractical for parsing due to ambiguities, but the parse tree of a+b*c could be generated by hand. Grammar 3.8 can be parsed but it considerably more complex.
a+b*32 Concrete syntax - Grammar 3.8 can be parsed, representing the concrete syntax of the grammar.
Abstract syntax - Though Grammar 4.3 is not suited for parsing it is much more readable and, after the parse tree is constructed, there are no ambiguities; the example syntax tree above correctly places b*32 to be performed rather than a+b. The parse tree represents the abstract syntax of the grammar.
Example - The following PROGRAM 4.4 for JavaCC, builds a syntax tree during a parse using Grammar 3.15; PROGRAM 4.5 implements functions that interpret while walking the tree.
Tree construction - the parse proceeds as follows:
- as each rule is reduced the Exp, Term, etc. rule is executed
- a Exp node is constructed and returned as an Exp.
PROGRAM 4.4. Building syntax trees for expressions. Exp Start() : { Exp e; }
{ e=Exp() { return e; }
}
Exp Exp() :
{ Exp e1,e2; }
{ e1=Term()
( "+" e2=Term() {e1=new PlusExp(e1,e2); }
| "-" e2=Term() { el=new MinusExp(e1,e2); }
) *
{ return e1; }
}
Exp Term() :
{ Exp e1,e2; }
{ e1=Factor()
( "*" e2=Factor() { el=new TimesExp(e1,e2); }
| "/' e2=Factor() { el=new DivideExp(e1,e2); }
) *
{ return e1; }
}
Exp Factor() :
{ Token t; Exp e; }
{ ( t=<IDENTIFIER> { return new Identifier(t.image); } |
t=<INTEGER_LITERAL> { return new IntegerLiteral(t.image); } |
'(' e=Exp() ')' {return e; } )
}Interpret
Recall that Start() is typically called from a Java main function; the parse tree root is returned by Start() which can then be manipulated in Java. The Exp class defines an eval() function that evaluates an Exp by recursively traversing the parse tree.
PROGRAM 4.5 Partial Exp class for PROGRAM 4.4 public abstract class Exp {
public abstract int eval();
}
public class PlusExp extends Exp {
private Exp e1,e2;
public PlusExp(Exp a1, Exp a2) { e1=a1; e2=a2; }
public int eval() {
return e1.eval()+e2.eval();
}
}POSITIONS (for error reporting, etc.)
Violations of grammar can be detected and reported during parsing; errors, such as type misuse (e.g. "snafu"*10), are detected during semantic analysis of the parse. One-pass compilers (i.e. make one pass over the source file) perform lexical analysis, parsing and semantic analysis simultaneously, simplifying error reporting (i.e. the error occurred at the current position in the source) but complicating the compiler (i.e. scanning, parsing, analysis and error reporting are entangled).
Separating the lexical analysis and parsing (where grammar errors are detected) from the semantic analysis (where type errors are detected) requires that the source file position of the text be added to each syntax tree node. When a semantic error is detected, the position serves to mark the location of the error in the source text.
4.3 VISITORS
PROGRAMs 4.4 and 4.5 implement an object-oriented approach to constructing and performing semantic action on an syntax tree. PROGRAM 4.5 defines classes for each production rule with constructors and an eval method to perform semantic actions (i.e. interpret the int value of the node) by walking the syntax tree to leaf nodes. PROGRAM 4.4 constructs the tree as parsing occurs; a trivial separate program would construct the tree and start the walk: the following prints the value of the expression parsed, for example 4+5 prints 9:
Exp e = Start(); System.out.println( e.eval() );
This approach presents problems for a compiler writer; we'll soon see that rather than evaluating an expression our semantic action will be to generate code for a Pentium or PowerPC or whatever. By the above approach the syntax and code generation are combined making it difficult to separate code generation for the various machines.
The text presents a Visitor pattern in the style of syntax-separate-from-interpretation. Basically, there are two classes defined outside the parsing:
- for abstract syntax
- constructor of a node for a rule, called during parsing
- accept method that calls a visitor method with the rule object as a parameter and may return a result (as in the interpreter).
- for semantic actions
- visit method that calls the accept method on rule elements which, in turn, call visit method on rule elements until leaf node
PROGRAMs 4.7 and 4.8 public abstract class Exp {
public abstract int accept( Visitor v);
}
public class PlusExp extends Exp {
public Exp e1,e2;
public PlusExp(Exp a1, Exp a2) { e1=a1; e2=a2; }
public int accept(Visitor v) {
return v.visit(this); }
}public class IntegerLiteral extends Exp {
public String f0;
public IntegerLiteral(String n0) { f0=n0; }
public int accept(Visitor v) {
return v.visit(this); }
}public interface Visitor {
public int visit(PlusExp n);
public int visit(IntegerLiteral n);
}public class Interpreter implements Visitor {
public int visit(PlusExp n) {
return n.e1.accept(this) + n.e2.accept(this); }public int visit(IntegerLiteral n) {
return Integer.parseInt(n.f0); }
}The syntax tree constructed during the parse of 4+5 and code to interpret:
PlusExp
e1 / \ e2
IntegerLiteral IntegerLiteral
fO | | fO
4 5Exp e = Start(); System.out.println( e.accept( new Interpreter() ) );
Exercise - Trace the execution of the above, assume that Start() returns a PlusExp object.
ABSTRACT SYNTAX FOR MiniJava
SableCC automatically generates syntax tree classes, code to build syntax trees and visitor classes from the grammar rules. JavaCC has auxiliary modules to separately generate syntax tree. Given the simpler approach, we will use SableCC.
Example - StraightLine Interpreter
SableCC generates and automatically traverses (walks) a syntax tree; calling methods at each node. The methods called are generated from the grammar but do nothing. Our task is to over-ride generated methods with ones that perform some useful interpreter action.
First the straightline grammar. It has been modified to remove several shift/reduce problems (e.g. Stm ; Stm and Exp Binop Exp) and introduce standard binary operator precedence and associativity:
Package StraightLine;
Tokens
semi = ';';
lpar = '(';
rpar = ')';
comma = ',';
plus = '+';
minus = '-';
times = '*';
div = '/';
assign = ':=';
print = 'print';
whitespace = (' ' | '\t' | 13 10 | 10 | 13 )+;
num = ['0'..'9']+;
id = ['a'..'z'] (['a'..'z'] | ['0'..'9'])*;
Ignored Tokens
whitespace;
Productions
stm = {compoundstm} stm semi onestm |
{onestm} onestm;
onestm={assignstm} id assign exp |
{printstm} print lpar explist rpar;
exp = {plusexp} exp plus term |
{minusexp} exp minus term |
{term} term |
{eseqexp} lpar stm comma exp rpar;
term = {timesterm} term times factor |
{divterm} term div factor |
{factor} factor;
factor= {idexp} id |
{numexp} num;
explist={pairexplist} exp comma explist |
{lastexplist} exp;
SableCC calls methods for each production in the syntax tree so that when an assignment node is visited a user-written assignment method is called. The method naming convention can be seen by the example in the right column. Generally methods should be named by:
caseALabelProduction(ALabelProduction node)
where in our case:
Label = {assignstm}
Production = onestm
The parameter class follows the same convention.
Assignment or onestm ={assignstm} id assign exp
a:=37 assignstm
/ | \
id := exp
| |
a 37public void caseAAssignstmOnestm
(AAssignstmOnestm node)
onestm ={assignstm} id assign exp
The assignment statement action needed is to:
- interpret the value of the expression (integers only)
- place the expression value in a hash table keyed by the identifier.
public void caseAAssignstmOnestm(AAssignstmOnestm node)
{
node.getExp().apply(this);
table.put(node.getId().toString(), new Integer(getIntValue(node.getExp())));
}You'll notice that before the assignment can be performed the expression must be fully evaluated. That is performed by:
node.getExp().apply(this);
which gets the expression part (itself a node in the tree) of the assignment statement, node.getExp(), and calls the user defined method for expressions. The call will cascade through the nodes until a node is reached where a integer result can be produced.
For our interpreter it is enough that an expression node evaluates itself to an integer value (consider the 4 Exp definitions). On return from the apply method the expression node has an int value (37 in this example) available by:
getIntValue(node.getExp())
The rest of the assignment statement deals with hash table details:
- the id is the hash key and is easiest to use as a string - node.getId().toString()
- the hash container only holds objects not primitive types - new Integer(getIntValue(node.getExp()))
Multiplication or term = {timesterm} term times factor
a:=3*4 timesterm
/ | \
term * factor
| |
3 4public void caseATimestermTerm
(ATimestermTerm node)
term = {timesterm} term times factor
A second example illustrates how computation state is created and maintained as the nodes are visited. The action needed is to:
- interpret the term node, node.getTerm().apply(this);
- interpret the factor node, node.getFactor().apply(this);
- get the term and factor node values, getIntValue(node.getTerm())
- set the value of the current term node to the multiplication result, setIntValue(node, t*f)
public void caseATimestermTerm(ATimestermTerm node)
{
node.getTerm().apply(this);
int t = getIntValue(node.getTerm());
node.getFactor().apply(this);
int f = getIntValue(node.getFactor());
setIntValue(node, t*f);
}Notice that before the term or factor value can be used each must be evaluated. That is performed by:
node.getTerm().apply(this);
which gets the term part (itself a node in the syntax tree) of the timesterm statement, node.getTerm(), and calls the user defined method for terms. The call will cascade through the syntax tree nodes until a node is reached where a integer result can be produced.
The final but key part is to set the value of the current node to the computed value of term*factor so that higher level nodes (e.g. an expression) can retrieve the value of this term node using getIntValue(node.getTerm()):
setIntValue(node, t*f);
Putting it together
There are two main classes needed: the one with main function to construct parser and interpreter objects and call methods, and the Interpreter class which defines methods for each node in the syntax tree. First the important parts of the main function:
public static void main(String[] arguments){
Parser parser = new Parser(
new Lexer(
new PushbackReader(
new InputStreamReader(System.in), 1024)));
Start syntaxtree = parser.parse();
syntaxtree.apply( new Interpreter() );
}
- Parser parser = new Parser(...) - constructs the parser object using code generated from the supplied grammar rules.
- Start syntaxtree = parser.parse(); - parses the input and returns the syntax tree root.
- syntaxtree.apply( new Interpreter() ); - constructs an Interpreter object which is passed to apply method which calls our visitor methods written in the Interpreter class.
Interpreter class defines our visitor methods that are called by apply methods; note that our visitor methods call apply on nodes that need to be interpreted.
class Interpreter extends DepthFirstAdapter
{
public Interpreter() { System.out.println("\nStraightLine Interpreter"); }
public void caseAAssignstmOnestm(AAssignstmOnestm node)
{
node.getExp().apply(this);
table.put(node.getId().toString(), new Integer(getIntValue(node.getExp())));
}
public void caseATimestermTerm(ATimestermTerm node)
{
node.getTerm().apply(this);
int t = getIntValue(node.getTerm());
node.getFactor().apply(this);
int f = getIntValue(node.getFactor());
setIntValue(node, t*f);
}
- class Interpreter extends DepthFirstAdapter - DepthFirstAdapter, generated by SableCC, is a class that defines the do-nothing methods that we need to implement and over-ride. It also defines apply method to call our node semantic action methods we will over-ride.
- public void caseAAssignstmOnestm(AAssignstmOnestm node) - One of the methods we need to implement using exactly the signature.
- node.getExp().apply(this); - gets the Exp field value of the AAssignstmOnestm node and calls the apply method for an Exp. Remember that this is an Interpreter object used by apply to call a method for an Exp. Assume the source text parsed was 5*x matching the grammar rule of term = {timesterm} term times factor. The generated visitor method signature would be:
public void caseATimestermTerm(ATimestermTerm node)The actual apply method call for:
node.getExp().apply(this);
would then be:
ATimestermTerm.apply(this);
The apply would then call the method:
void caseATimestermTerm(ATimestermTerm node)
CONCRETE versus ABSTRACT SYNTAX
Languages have two syntax definitions: an abstract one for humans (on the left) and a concrete for defining productions for parsing (on the right).
SableCC3 supports translation of the concrete syntax for parsing to the abstract syntax for semantic actions (those performed in a Java program that walks the AST). The abstract syntax of the language now defines methods (and number of methods) generally creating a more stable definition and cleaner, simpler, more intuitive semantic actions. The size and complexity of the trees above indicate that the concrete syntax generates more nodes and types of methods (exp had to be divided down to terms and factors to enforce precedence and associativity).
The concrete syntax tree (defined in the Productions section) is transformed into the abstract syntax tree (defined in the Abstract Syntax Tree section) by transformations defined for each abstract non-terminal (in the Productions sections).
The respective trees are included below for:
a := 2; b:=3; c:=a*b; print(c)
|
stm
/ \ stm stm / \ \ stm stm print explist / \ / \ \ stm stm id times exp exp | / \ | / \ | onestm id exp c exp exp id / \ | | | | | id exp b num id id c | | | | | a num 3 a b | 2 |
stm
/ \ stm onestm / \ \ stm onestm print explist / \ / \ \ stm onestm id exp exp | / \ | | | onestm id exp c times term term / \ | | / \ | id exp b term term factor factor | | | | | | a term factor factor id id | | | | | factor num id b c | | | num 3 a | 2 |
|
Abstract Syntax Tree stm = {compoundstm} [head]:stm [tail]:stm | {assignstm} id exp | {printstm} [explist]:explist; exp = {idexp} id | {numexp} num | {plus} [left]:exp [right]:exp | {minus} [left]:exp [right]:exp | {times} [left]:exp [right]:exp | {div} [left]:exp [right]:exp | {eseqexp} [left]:stm [right]:exp ; explist={pairexplist} [head]:exp [tail]:explist | {lastexplist} exp; |
Productions stm = {compoundstm} stm semi onestm | {onestm} onestm; onestm={assignstm} id assign exp | {printstm} print lpar explist rpar; exp = {plusexp} exp plus term | {minusexp} exp minus term | {term} term | {eseqexp} lpar stm comma exp rpar; term = {timesterm} term times factor | {divterm} term div factor | {factor} factor; factor= {idexp} id | {numexp} num; explist={pairexplist} exp comma explist | {lastexplist} exp; |
The new SableCC3 productions with transformations for the AST are:
| Productions stm {-> stm} = {compoundstm} [head]:stm semi [tail]:onestm {-> New stm.compoundstm(head.stm, tail.stm) } | {onestm} onestm {-> onestm.stm } ; onestm {-> stm} = {assignstm} id assign exp {-> New stm.assignstm(id,exp)} | {printstm} print lpar [explist]:explist rpar {-> New stm.printstm(explist.explist)} ; exp {-> exp} = {plus} [left]:exp plus [right]:term {-> New exp.plus(left.exp,right.exp) } | {minus} [left]:exp minus [right]:term {-> New exp.minus(left.exp,right.exp) } | {term} term {-> term.exp }| {eseqexp} lpar [left]:stm comma [right]:exp rpar {-> New exp.eseqexp(left.stm,right.exp) } ; term {-> exp}= {times} [left]:term times [right]:factor {-> New exp.times(left.exp,right.exp) } | {div} [left]:term div [right]:factor {-> New exp.div(left.exp,right.exp) } | {factor} factor {-> factor.exp }; factor {-> exp} = {idexp} id {-> New exp.idexp(id) } | {numexp} num {-> New exp.numexp(num) }; explist {-> explist} = {pairexplist} [head]:exp comma [tail]:explist {-> New explist.pairexplist( head.exp, tail.explist) } | {lastexplist} exp {-> New explist.lastexplist(exp) }; |
| stm = {compoundstm} stm semi onestm
| {onestm} onestm; |
| stm {-> stm} = {compoundstm} [head]:stm semi [tail]:onestm {-> New stm.compoundstm(head.stm, tail.stm) } | {onestm} onestm {-> onestm.stm } ; |
The above example illustrates the addition of the transformations to stm that correspond to abstract syntax of:
stm = {compoundstm} [head]:stm [tail]:stm
Below are the meaning of each part:
- {-> stm} defines a transformation to a stm
- [head]:stm defines the first stm to be labeled head.
- {-> New stm.compoundstm(head.stm, tail.stm) } a transformation creating a New stm of compoundstm type having a head.stm and tail.stm. Note that compoundstm appears in the abstract syntax.
- {-> onestm.stm } does no create a onestm in abstract syntax.
Semantic Actions - Interpreter
The payoff of the extra effort to construct an AST is simpler and fewer semantic actions, as seen in the trees above and the code for our Interpreter below. Note that the only change is method and parameter names are somewhat different in a predicable manner (we no longer have a onestm non-terminal) and there are fewer methods (not obvious here but no longer have term and factor non-terminal either).
| Concrete public void caseAAssignstmOnestm(AAssignstmOnestm node) { node.getExp().apply(this); table.put(node.getId().toString(), new Integer(getIntValue(node.getExp()))); } public void caseATimestermTerm(ATimestermTerm node) { node.getTerm().apply(this); int t = getIntValue(node.getTerm()); node.getFactor().apply(this); int f = getIntValue(node.getFactor()); setIntValue(node, t*f); } |
| Abstract public void caseAAssignstmStm(AAssignstmStm node) { node.getExp().apply(this); table.put(node.getId().toString(), new Integer(getIntValue(node.getExp()))); } public void caseATimesExp(ATimesExp node) { node.getLeft().apply(this); int t = getIntValue(node.getLeft()); node.getRight().apply(this); int f = getIntValue(node.getRight()); setIntValue(node, t*f); } |
The following compares the output of the concrete versus abstract trees.
StraightLine Interpreter Results on Concrete Syntax Tree a := 2; b:=3;
c:=a*b;
print(c)
StraightLine Interpreter
onestm ={assignstm} a := 2
exp = {term} 2
term = {factor} 2
factor ={numexp} 2
onestm ={assignstm} b := 3
exp = {term} 3
term = {factor} 3
factor ={numexp} 3
onestm ={assignstm} c := a * b
exp = {term} a * b
term ={timesterm} a * b
term = {factor} a
factor ={idexp} a
factor ={idexp} b
onestm ={printstm} print ( c )
explist ={lastexplist} c
exp = {term} c
term = {factor} c
factor ={idexp} c
6
StraightLine Interpreter Results on Abstract Syntax Tree a := 2; b:=3;
c:=a*b;
print(c)
StraightLine Interpreter
stm ={assignstm} a 2
exp ={numexp} 2
stm ={assignstm} b 3
exp ={numexp} 3
stm ={assignstm} c a b
exp ={times} a b
exp ={idexp} a
exp ={idexp} b
stm ={printstm} c
explist ={lastexplist} c
exp ={idexp} c
6Files for Concrete and Abstract Interpreter
The AST StraightLine grammar file and the Interpreter are significantly different, the Main file is unchanged.
Concrete Abstract StraightLine.js
Interpreter.java
Main.javaStraightLine.js
Interpreter.java
Main.java