Homework 8
|
The purpose of the compiler is to generate code that corresponds precisely to the semantics of the high level language. The target architecture is the JVM, the code will be JVM assembly language. Just to recap previous work.
Homework 3 defined the concrete grammar necessary for SableCC to generate a parser, however an abstract grammar results in a simpler solution for the type checker.
Homework 4 introduced the concrete to abstract grammar transformations of SableCC3 which will be applied to the MiniJava language. The implementation of the interpreter used the the DepthFirstAdapter class to visit nodes of the AST.
Homework 5 combined those ideas with that of a symbol table to implement a type checker based on the complete abstract grammar definition of MiniJava in SableCC3 form and skeleton classes for the necessary Java.
Homework 6 combined those ideas to determine where data was to be located in the frame (offset) and update the symbol table with that information.
Homework 7 determined the stack size requirements of each method and updated the symbol table with that information.
Homework 8 will utilize that previous work. As before, the parse tree and symbol table hold the critical data needed to guide code generation. The assembly code generated for each class will be written to a separate file.
Organization
The organization of the code generator is illustrated and described below.
- Main calls the following compiler modules:
- BuildSymbolTableVisitor - visits program, class and method declaration AST nodes to add symbol and type to the symbol table.
- TypeCheckVisitor - visits each non-declaration AST node to verify each use of a symbol corresponds to the operations allowed on its type.
- FrameVisitor - visits program, class and method declaration AST nodes to calculate the offsets of variables within the frame and updates the symbol table with the offsets.
- StackVisitor - visits each method declaration and analyzes each statement and return expression of the declaration to determine the maximum stack space required by the method.
- CodeVisitor - visits each non-declaration AST node to generate JVM assembly code corresponding to the semantic meaning of the MiniJava.
JVM Assembler Code Generation
In what order should code be generated? The root node of the AST is the program, the results of which are computed by child nodes. Given the left-to-right, program at root construction of the AST, code can be generated directly from the AST while visiting nodes in depth-first left-to-right order, as in many of the previous compiler stages. For example, the code for each operand must be generated before the code for binary addition, exactly paralleling the AST structure.
Code generation is further simplified by the stack architecture of the JVM as there is an unlimited amount of operand locations available, unlike when using a finite number of registers. The stack allows each node to produce code independent (mostly) of others, the results of an instruction (e.g. add) are placed on the dynamic stack for operands of subsequent instructions; static numbers of registers requires some global allocation scheme to avoid conflicts.
One additional key simplification is we will generate assembly code rather than directly producing machine code; machine code requires some bookkeeping to calculate addresses of branching instructions; assembly merely needs to generate a unique label, the assembler will determine the final addresses.
A key means for determining appropriate code uses jasper to disassemble Java-compiled class files we can then study the JVM assembly code that should be produced by our compiler.
The simplest approach to code generation merely prints the assembly code that corresponds to the node. Consider the code necessary for addition which consumes two stack operands:
| 4+5 |
APlusExp / \ AIntegerliteralExp AIntegerliteralExp | | 4 5 |
public void caseAPlusExp(APlusExp node) { node.getL().apply(this); node.getR().apply(this); emit("iadd"); } public void caseAIntegerliteralExp(AIntegerliteralExp node) { emit("bipush "+node); } |
bipush 4 bipush 5 iadd |
At the APlusExp node, the left and right operands are 4 and 5 respectively. The operands will be placed on the stack by apply(this) regardless of the complexity of the the operands. In the following example, the left operand of the APlusExp node is ATimesExp which, because visited by node.getL().apply(this); before APlusExp, its code is printed first.
| 4*5+6 |
APlusExp / \ ATimesExp AIntegerliteralExp / \ | AIntegerliteralExp AIntegerliteralExp 6 | | 4 5 |
public void caseAPlusExp(APlusExp node) { node.getL().apply(this); node.getR().apply(this); emit("iadd"); } public void caseATimesExp(ATimesExp node) { node.getL().apply(this); node.getR().apply(this); emit("imul"); } public void caseAIntegerliteralExp(AIntegerliteralExp node) { emit("bipush "+node); } |
bipush 4 bipush 5 imul bipush 6 iadd |
Conditional Branching - Relational and Logical Operators
The above examples illustrate that arithmetic results are at TOS after the operation is performed. However, most architectures represent boolean results in flags, the test x==y would leave ZF=1 on an Intel processor. Most architectures do provide logical bitwise AND, OR, etc. that do produce a result other than flag settings.
JVM does not leave boolean results on the stack but does leave bitwise results on the stack, fine for AND and OR operations but how about logical results? Specifically how are conditional branching statements such as if, while etc. implemented?
Implementing the > operation
Consider implementing the > operation, though not a part of MiniJava. The abstract grammar could be:
exp = {gt} [l]:exp [r]:exp;
A conditional expression, such as 4 > 5, should leave the result at TOS; using true=1 and false=0 for the result.
Java Java using conditional branching JVM assembler using conditional branching void m() { boolean a;
a = 4 > 5;
}
void m() { boolean a;
if (4 <= 5 ) goto label1;
a = true;
goto label2;
label1:
a = false;
label2:
}
.method m()V bipush 4
bipush 5
ifcmple label1
bipush 1 ; true
istore 1 ; a = true
goto label2
label1:
bipush 0 ; false
istore 1 ; a = false
label2:
return
.end methoddo while Implementation
Consider implementing the do while operation, also not part of MiniJava. A hand-written implementation appears at right. This captures the flavor of the code generation needed but is too restrictive.
Java Java using conditional branching JVM assembler using conditional branching void m() { int a;
a = 6;
do {
a = a - 2;
while ( a > 3 );
}
void m() { int a;
a = 6;
do1:
a = a - 2;
if ( a <= 3 ) goto do1
}
.method m()V bipush 6
istore 1 ; a = 6
do1:
iload 1 ; a
bipush 2 ; 2
isub ; a - 2
istore 1 ; a = a - 2
iload 1 ; a
bipush 3 ; 3
ifcmple do1
return
.end methodThe real question is how to generate code for the do while when visiting the appropriate AST nodes. The abstract grammar for the do while statement and exp provides some clues:
statement = {dowhile} statement exp |
{assign} id exp;exp = {minus} [l]:exp [r]:exp |
{gt} [l]:exp [r]:exp;
Generating code for the do while depends upon code generated by child AST nodes.
- At the do while node, the statement must be visited first (using apply).
- Next, visiting the conditional exp must leave either true=1 or false=0 at TOS;
- ifne branches if TOS != 0 to the do1 label.
| Java | AST nodes and Generated Assembler Code | CodeVisitor | |
| do { a = a - 2; while (a > 3); |
ADowhileStatement / \ AAssignStatement AGtExp / \ / \ AIdType AMinusExp AIdExp AIntExp | / \ | | a AIdExp AIntExp a 3 | | a 2 The assembly code would appear similar to:
|
public void caseAMinusExp(AMinusExp node) { node.getL().apply(this); node.getR().apply(this); emit("isub"); } // Leaves true=1 or false=0 at TOS public void caseGtExp(AGtExp node) { node.getL().apply(this); node.getR().apply(this); emit(" ifcmple gt2"); emit(" bipush 1"); emit(" goto gt3"); emit("gt2:"); emit(" bipush 0"); emit("gt3:); } public void caseADowhileStatement(ADowhileStatement node) { emit("do1:"); node.getStatement().apply(this); node.getExp().apply(this); // exp result at TOS emit(" ifne do1"); // if(TOS != 0) goto do1 emit("while4:"); } |
CLASS DECLARATION - AClassDecl
Each class should be defined in a separate file, for example, code for class B should be written to a file named B.asm although the name nor extension is important to the jasmin assembler. All definitions for the class, basically fields and methods, should be written to the file.
Constructors
MiniJava classes cannot have a constructor but a default constructor must be supplied for each class. The default constructor below invokes the Object constructor:
.method public <init>()V
.limit stack 1
.limit locals 1
aload_0
invokespecial java/lang/Object/<init>()V
return
.end methodInheritance
Constructors for a subclass should invoke the super class constructor rather than the Object constructor.
METHOD DECLARATION - AMethodDecl
Code generation for method declaration depends heavily upon the symbol table for information on parameters and local variables, such as the frame offset of each. As the following example illustrates, each method must include a prologue that defines:
- method prototype - includes the method name, number and type of formal parameters and return type of method
- stack - limit of maximum used
- locals - number of locals
Frame access - the frame organization is part of the JVM discussion, parameters and locals are part of the frame and accessed by their offset within the current frame. The method bm2 code sequence below copies the two operands from the frame by offset to TOS; the binary multiplication consumes the two operands, leaving the result at TOS:
iload 1 // bp1
iload 2 // bp2
imul
class B {
public int bm1(int bp1, int bp2, int bp3) {
return this.bm2( bp1, bp2 );
}
public int bm2(int bp1, int bp2) {
int bl;
bl = bp1 * bp2 + bp1;
return bl;
}
}
bm1 Frame
0 this 1 bp1 2 bp2 3 bp3
bm2 Frame
0 this 1 bp1 2 bp2 3 bl
.class public B
.super java/lang/Object
; .method <init>()V same for all MiniJava classes
.method public bm1(III)I
.limit stack 3
.limit locals 4
aload 0 ; this
iload 1 ; bp1
iload 2 ; bp2
invokevirtual B/bm2(II)I ; this.bm2( bp1, bp2 )
ireturn
.end method
.method public bm2(II)I
.limit stack 2
.limit locals 4
iload 1 ; bp1 * bp2
iload 2
imul
iload 1 ; bp1 * bp2 + bp1
iadd
istore 3 ; bl = bp1 * bp2 + bp1
iload 3
ireturn ; return bl
.end method
METHOD INVOCATION - AMethodcallExp
exp = {methodcall} exp id exp_listexp_list = exp*;
Invoking a method is discussed in greater detail in the JVM notes. An example below illustrates the key interactions between methods using the stack to pass data. Essentially code generation for a method call is the same as with any operation, the operands must be placed on the stack prior to calling the method and any returned result is at TOS. The calling code generation consists of:
- copy object reference to TOS (i.e. this)
- apply each actual parameter in left-to-right order; the result of each apply is to generate code that leaves the parameter value on the stack.
- invoke the method
Caller Callee int x;
x = new B().m(11, 12);class B {
int m(int a, int b) { return 13; }Caller Stack before call
new B
bipush 11
bipush 12
invokevirtual B/m(II)I
reference to B object 11 12 Caller Stack after call
13 TOS Callee Frame Callee Stack
this
a 11
b 12
0 reference to B object 1 11 2 12
.method public m(II)I
bipush 13
ireturn
.end method
13
TOSApplying the parameters - AExpList
From the abstract grammar, the method parameters are defined as a list of expressions, requiring code to be generated for each. Of course, the code is generated when each parameter is applied so the problem becomes that of accessing each parameter in the expression list.
The AExpList node is where the parameters can be accessed one at a time, the grammar shows the exp* field to be a of list of 0 or more expressions. The exp* is a Java LinkedList object, allowing access to each exp of the list.
See TypeCheckVisitor or StackVisitor for an example of use.
Give the timing results of the following Fibonacci program compiled by your compiler and Javac. Fibonacci(43) seems to be the limit. The batch file at right displays the current time as the prompt and executes the program.
class Fib {
public static void main(String [] a) {
System.out.println(new Fibonacci().fib(43));
}
}
class Fibonacci {
public int fib(int n) {
int result;
if (n < 2)
result = 1;
else
result = this.fib(n-1)+this.fib(n-2);
return result;
}
}prompt $t
java -cp . Fib
Use HW7 as a starting point, add class CodeVisitor to the JVM package.
jasper is a disassembler of Java class
files to an assembly program. Basically the steps are:
jasmin assembles a JVM assembly file into a class file that can be executed by the JVM.
- Compile the Java program file to assembly language.
- Assemble a class file to assembly, using example.j
- Home: java -jar \jasmin\jasmin.jar example.j
java -cp . example.class
- IUS: java -jar v:\common\user\C431\jasmin.jar example.j
java -cp . example.class
- Cover sheet with your name, C431, date, and Homework 8.
- Print out of CodeVisitor.
- Print out of assembly programs produced for each of 3 test programs; TreeVisitor.java should be one of those tested.
- Print out of execution produced from 3 test results. Indicate the test programs.
- Give the timing results of the Fibonacci program compiled by your compiler and Javac.
- Email all files used as a zipped file to: with subject: HW8 Solution
FAQ
TypeCheckVisitor from Homework 5 shares much of the code needed, some code generation methods will be simpler.
| PrintStream classStream; try { classStream = new
PrintStream(new FileOutputStream("HW8.j"));} |
Assembly labels must be unique. One simple way to
ensure uniqueness is to generate a global count that is incremented and
appended to each label generated. So that the code is somewhat recognizable
create mnemonic labels such as: while1, if3, etc.
No. You will need to read and understand some symbol table definitions but no changes are needed.
Every node must be visited.
Abstract (complete) Abstract Syntax Tree
program = main_class class_decl*;
main_class = [classname]:id [arg]:id statement;
class_decl =
id var_decl* method_decl* |
{extends} [classname]:id
[extend]:id var_decl* method_decl*;
var_decl = type id;
method_decl = type id formal_list var_decl* statement* exp;
statement =
{statementlist} statement* |
{if} exp [true]:statement [false]:statement |
{while} exp statement |
{println} exp |
{assign} id exp |
{array_assign} id [index]:exp [r]:exp;
formal_list = type id formal_rest* | {empty};
formal_rest = type id;
type= {int_array} |
{boolean} |
{int} |
{id} id;
exp = {and} [l]:exp [r]:exp |
{lt} [l]:exp [r]:exp |
{plus} [l]:exp [r]:exp |
{minus} [l]:exp [r]:exp |
{times} [l]:exp [r]:exp |
{length} exp |
{arrayindex} [array]:exp [index]:exp |
{methodcall} exp id exp_list |
{integerliteral} integer_literal |
{trueliteral} true |
{falseliteral} false |
{id} id |
{this} this |
{new} id |
{newarray} exp |
{not} exp ;
exp_list = exp*;