Homework 8
Code Generation  

powered by FreeFind

Modified: 

Overview

The purpose of the compiler is to generate code that corresponds precisely to the semantics of the high level language. The target architecture is the JVM, the code will be JVM assembly language. Just to recap previous work.

Homework 3 defined the concrete grammar necessary for SableCC to generate a parser, however an abstract grammar results in a simpler solution for the type checker.

Homework 4 introduced the concrete to abstract grammar transformations of SableCC3 which will be applied to the MiniJava language. The implementation of the interpreter used the the DepthFirstAdapter class to visit nodes of the AST.

Homework 5 combined those ideas with that of a symbol table to implement a type checker based on the complete abstract grammar definition of MiniJava in SableCC3 form and skeleton classes for the necessary Java.

Homework 6 combined those ideas to determine where data was to be located in the frame (offset) and update the symbol table with that information.

Homework 7 determined the stack size requirements of each method and updated the symbol table with that information.

Homework 8 will utilize that previous work. As before, the parse tree and symbol table hold the critical data needed to guide code generation. The assembly code generated for each class will be written to a separate file.

Organization

The organization of the code generator is illustrated and described below.

JVM Assembler Code Generation

In what order should code be generated? The root node of the AST is the program, the results of which are computed by child nodes. Given the left-to-right, program at root construction of the AST, code can be generated directly from the AST while visiting nodes in depth-first left-to-right order, as in many of the previous compiler stages. For example, the code for each operand must be generated before the code for binary addition, exactly paralleling the AST structure.

Code generation is further simplified by the stack architecture of the JVM as there is an unlimited amount of operand locations available, unlike when using a finite number of registers. The stack allows each node to produce code independent (mostly) of others, the results of an instruction (e.g. add) are placed on the dynamic stack for operands of subsequent instructions; static numbers of registers requires some global allocation scheme to avoid conflicts.

One additional key simplification is we will generate assembly code rather than directly producing machine code; machine code requires some bookkeeping to calculate addresses of branching instructions; assembly merely needs to generate a unique label, the assembler will determine the final addresses.

A key means for determining appropriate code uses jasper to disassemble Java-compiled class files we can then study the JVM assembly code that should be produced by our compiler.

The simplest approach to code generation merely prints the assembly code that corresponds to the node. Consider the code necessary for addition which consumes two stack operands:

4+5                       APlusExp
                      /         \
AIntegerliteralExp   AIntegerliteralExp
                  |                |
                 4                5
public void caseAPlusExp(APlusExp node) {
  node.getL().apply(this);
  node.getR().apply(this);
  emit("iadd");
}

public void caseAIntegerliteralExp(AIntegerliteralExp node) {
  emit("bipush   "+node);
}
   bipush    4

   bipush    5

   iadd

At the APlusExp node, the left and right operands are 4 and 5 respectively. The operands will be placed on the stack by apply(this) regardless of the complexity of the the operands. In the following example, the left operand of the APlusExp node is ATimesExp which, because visited by node.getL().apply(this); before APlusExp, its code is printed first.

4*5+6                               APlusExp
                               /         \
                  ATimesExp   AIntegerliteralExp
                    /        \                         |
AIntegerliteralExp AIntegerliteralExp    6
              |                  |                     
              4                 5
public void caseAPlusExp(APlusExp node) {
  node.getL().apply(this);
  node.getR().apply(this);
  emit("iadd");
}

public void caseATimesExp(ATimesExp node) {
  node.getL().apply(this);
  node.getR().apply(this);
  emit("imul");
}

public void caseAIntegerliteralExp(AIntegerliteralExp node) {
  emit("bipush   "+node);
}
bipush  4

bipush  5

imul

bipush  6

iadd

Conditional Branching - Relational and Logical Operators

The above examples illustrate that arithmetic results are at TOS after the operation is performed. However, most architectures represent boolean results in flags, the test x==y would leave ZF=1 on an Intel processor. Most architectures do provide logical bitwise AND, OR, etc. that do produce a result other than flag settings.

JVM does not leave boolean results on the stack but does leave bitwise results on the stack, fine for AND and OR operations but how about logical results? Specifically how are conditional branching statements such as if, while etc. implemented?

Implementing the > operation

Consider implementing the > operation, though not a part of MiniJava. The abstract grammar could be:

exp = {gt} [l]:exp [r]:exp;

A conditional expression, such as 4 > 5, should leave the result at TOS; using true=1 and false=0 for the result.

Java Java using conditional branching JVM assembler using conditional branching
void m() {

  boolean a;

  a = 4 > 5;

}

void m() {

     boolean a;

     if (4 <= 5 ) goto label1;

     a = true;

     goto label2;

  label1:

     a = false;

  label2:

}

.method     m()V

   bipush     4

   bipush     5

   ifcmple     label1

   bipush      1        ; true

   istore        1        ; a = true

   goto         label2

 label1:

   bipush      0        ; false

   istore        1        ; a = false

 label2:

 return
.end method

do while Implementation

Consider implementing the do while operation, also not part of MiniJava. A hand-written implementation appears at right. This captures the flavor of the code generation needed but is too restrictive.

Java Java using conditional branching JVM assembler using conditional branching
void m() {

  int a;

  a = 6;

  do {

        a = a - 2;

  while ( a > 3 );

}

void m() {

     int a;

     a = 6;

  do1:

     a = a - 2;

     if ( a <= 3 ) goto do1

}

.method     m()V

   bipush     6

   istore       1        ; a = 6

 do1:

   iload        1         ; a

   bipush      2         ;  2

   isub                    ;  a - 2

   istore        1        ; a = a - 2

   iload          1       ; a

   bipush       3        ; 3

   ifcmple      do1

 return
.end method

The real question is how to generate code for the do while when visiting the appropriate AST nodes. The abstract grammar for the do while statement and exp provides some clues:

statement = {dowhile} statement exp |
                  {assign} id exp;

        exp = {minus} [l]:exp [r]:exp |
                 {gt} [l]:exp [r]:exp;


Generating code for the do while depends upon code generated by child AST nodes.

Java AST nodes and Generated Assembler Code CodeVisitor
do {

   a = a - 2;

while (a > 3);

 
                        ADowhileStatement
                               /             \
           AAssignStatement        AGtExp
           /        \                      /          \
AIdType  AMinusExp         AIdExp AIntExp
 |           /        \                    |            |
 a    AIdExp  AIntExp             a            3
           |            |
           a            2

The assembly code would appear similar to:
 

do1:                      
          iload       1    ; statement
          bipush    2    ; a = a - 2
          isub            
          istore      1   
          iload       1    ; a > 3  
          bipush    3   
          ifcmple   gt2
          bipush    1
          goto       gt3
   gt2:
          bipush    0
   gt3:                   
          ifne        do1
while4:   
public void caseAMinusExp(AMinusExp node) {
  node.getL().apply(this);
  node.getR().apply(this);
  emit("isub");
}
         // Leaves true=1 or false=0 at TOS
public void caseGtExp(AGtExp node) {
  node.getL().apply(this);
  node.getR().apply(this);
  emit("   ifcmple  gt2");
  emit("   bipush   1");
  emit("   goto gt3");
  emit("gt2:");
  emit("   bipush   0");
  emit("gt3:);
}

public void caseADowhileStatement(ADowhileStatement node) {
  emit("do1:");
  node.getStatement().apply(this);
  node.getExp().apply(this);            // exp result at TOS
  emit("    ifne do1");                      // if(TOS != 0) goto do1
  emit("while4:");
}

CLASS DECLARATION - AClassDecl

Each class should be defined in a separate file, for example, code for class B should be written to a file named B.asm although the name nor extension is important to the jasmin assembler. All definitions for the class, basically fields and methods, should be written to the file.

Constructors

MiniJava classes cannot have a constructor but a default constructor must be supplied for each class. The default constructor below invokes the Object constructor:

 .method               public <init>()V
 .limit stack           1
 .limit locals           1
     aload_0              
     invokespecial    java/lang/Object/<init>()V
     return               
 .end method

Inheritance

Constructors for a subclass should invoke the super class constructor rather than the Object constructor.

 

METHOD DECLARATION - AMethodDecl

Code generation for method declaration depends heavily upon the symbol table for information on parameters and local variables, such as the frame offset of each. As the following example illustrates, each method must include a prologue that defines:

Frame access - the frame organization is part of the JVM discussion, parameters and locals are part of the frame and accessed by their offset within the current frame. The method bm2 code sequence below copies the two operands from the frame by offset to TOS; the binary multiplication consumes the two operands, leaving the result at TOS:

  iload 1                             // bp1
  iload 2                             // bp2
  imul
 

class B {  
 



    public int bm1(int bp1, int bp2, int bp3) {
        return this.bm2( bp1, bp2 );
    }







    public int bm2(int bp1, int bp2) {
        int bl;
        bl = bp1 * bp2 + bp1;
        return bl;
    }
}
 

 

bm1 Frame

0 this
1 bp1
2 bp2
3 bp3

 

 

bm2 Frame

0 this
1 bp1
2 bp2
3 bl

 

 

 
.class public B
.super java/lang/Object

; .method <init>()V same for all MiniJava classes

.method public bm1(III)I
.limit stack 3
.limit locals 4
  aload 0                          ; this
  iload 1                           ; bp1
  iload 2                           ; bp2
  invokevirtual B/bm2(II)I  ; this.bm2( bp1, bp2 )
  ireturn
.end method

.method public bm2(II)I
.limit stack 2
.limit locals 4
  iload 1                             ; bp1 * bp2
  iload 2
  imul
  iload 1                             ; bp1 * bp2 + bp1
  iadd
  istore 3                            ; bl = bp1 * bp2 + bp1
  iload 3
  ireturn                             ; return bl
.end method

METHOD INVOCATION - AMethodcallExp

exp = {methodcall} exp id exp_list

exp_list = exp*;

Invoking a method is discussed in greater detail in the JVM notes. An example below illustrates the key interactions between methods using the stack to pass data. Essentially code generation for a method call is the same as with any operation, the operands must be placed on the stack prior to calling the method and any returned result is at TOS. The calling code generation consists of:

Caller Callee
int x;
x = new B().m(11, 12);
class B {
   int m(int a, int b) { return 13; }
                                     Caller Stack before call
new B                             
bipush            11
bipush            12

invokevirtual   B/m(II)I
reference to B object
11
12

Caller Stack after call

 
13
TOS
                     Callee Frame
this
a            11    
b            12
0 reference to B object
1 11
2 12
 
                    Callee Stack
.method public m(II)I    
   bipush   13   
   ireturn
.end method 
 
13

TOS

Applying the parameters - AExpList

From the abstract grammar, the method parameters are defined as a list of expressions, requiring code to be generated for each. Of course, the code is generated when each parameter is applied so the problem becomes that of accessing each parameter in the expression list.

The AExpList node is where the parameters can be accessed one at a time, the grammar shows the exp* field to be a of list of 0 or more expressions. The exp* is a Java LinkedList object, allowing access to each exp of the list.

See TypeCheckVisitor or StackVisitor for an example of use.

Timing

Give the timing results of the following Fibonacci program compiled by your compiler and Javac. Fibonacci(43) seems to be the limit. The batch file at right displays the current time as the prompt and executes the program.

 

class Fib {
  public static void main(String [] a) {
     System.out.println(new Fibonacci().fib(43));
  }
}

class Fibonacci {
   public int fib(int n) {
     int result;
     if (n < 2)
       result = 1;
     else
       result = this.fib(n-1)+this.fib(n-2);
     return result;
   }
}
prompt $t
java -cp . Fib

 

Assignment

  1. Generate JVM assembly code for each class by implementing the class in the JVM package: CodeVisitor.
  2. Generate code, assemble and execute using 3 provided MiniJava test programs; TreeVisitor.java should be one of those tested.  
  3. Give the timing results of the Fibonacci program compiled by your compiler and Javac.

Files

Use HW7 as a starting point, add class CodeVisitor to the JVM package.

Getting Started

  1. Compile the Java program file  to assembly language.
  2. Assemble a class file to assembly, using example.j
     
    • Home: java -jar \jasmin\jasmin.jar example.j
                java -cp . example.class
       
    • IUS:    java -jar v:\common\user\C431\jasmin.jar example.j
                java -cp . example.class

Turn In

  1. Cover sheet with your name, C431, date, and Homework 8.
  2. Print out of CodeVisitor.
  3. Print out of assembly programs produced for each of 3 test programs; TreeVisitor.java should be one of those tested.
  4. Print out of execution produced from 3 test results. Indicate the test programs.
  5. Give the timing results of the Fibonacci program compiled by your compiler and Javac.
  6. Email all files used as a zipped file to: with subject: HW8 Solution

 

FAQ

  1. Where to start?

    TypeCheckVisitor from Homework 5 shares much of the code needed, some code generation methods will be simpler.

     

  2. Create a new file for a class.

    Each class must be written to a separate file. The following opens a file named HW8.j and writes an assembly language comment.
     
    PrintStream classStream;

    try { classStream = new PrintStream(new FileOutputStream("HW8.j"));}
    catch(Exception e) {
        System.out.println("Cannot create HW8.j");
        System.exit(-1);
    }

    classStream.println(" ; Start of new class");

     

  3. Generate labels.

    Assembly labels must be unique. One simple way to ensure uniqueness is to generate a global count that is incremented and appended to each label generated. So that the code is somewhat recognizable create mnemonic labels such as: while1, if3, etc.
     

  4. Will earlier parts (SymbolTable, etc.) need changed?

    No. You will need to read and understand some symbol table definitions but no changes are needed.

     

  5. What AST nodes require code generation?

    Every node must be visited.

Abstract (complete)
Abstract Syntax Tree
 
program = main_class class_decl*;
main_class = [classname]:id [arg]:id statement;
class_decl =
     id var_decl* method_decl* |
             {extends} [classname]:id
                             [extend]:id var_decl* method_decl*;
var_decl = type id;
method_decl = type id formal_list var_decl* statement* exp;
statement =
             {statementlist} statement* |
             {if} exp [true]:statement [false]:statement |
             {while} exp statement |
             {println} exp |
             {assign} id exp |
             {array_assign} id [index]:exp [r]:exp;
formal_list = type id formal_rest* | {empty};
formal_rest = type id;
 
type=   {int_array} |
            {boolean}   |
            {int}        |
            {id} id;

exp =    {and}     [l]:exp [r]:exp |
             {lt}        [l]:exp [r]:exp |
             {plus}    [l]:exp [r]:exp |
             {minus} [l]:exp [r]:exp |
             {times}  [l]:exp [r]:exp |
             {length} exp                   |
             {arrayindex} [array]:exp [index]:exp |
             {methodcall} exp id exp_list |
             {integerliteral} integer_literal |
             {trueliteral} true |
             {falseliteral} false |
             {id} id |
             {this} this |
             {new} id |
             {newarray} exp |
             {not}     exp ;
exp_list = exp*;