Chapter 7
|
RESOURCES
Download Chapters 7 and 8 software
Test package contains test.java that can be used to test your understanding of Chapter 7 and 8. The file contains an example of:
- generating an IR tree (Chapter 7)
- rewriting an IR tree to a canonical tree (Chapter 8)
- linearization to canonical tree to statement list (Chapter 8)
- forming basic blocks (Chapter 8)
- traces (Chapter 8)
The file can be edited to verify your understanding of the input and results of each phase.
Directions:
- Unzip to directory: Chap8
- At command prompt, change to Chap8 directory.
- Compile by: javac -classpath . Test\test.java
- Execute by: java -cp . Test.test
OVERVIEW
There are two basic approaches for parse tree to code translation:
- directly target an architecture - simple but difficult to change architectures or perform optimizations
- translate to a intermediate code - complex but allows general optimization techniques to be applied and back-end code generation for any number of architectures.
Architecture
To understand the translation process we will examine a few examples translating an AST node to a linear representation targeted for a specific architecture.
The code will be generated essentially in post-order, after visiting each sub-node and generating code. In practice, the code could be written to a file and then assembled.
386 - For an expression node, we must decide where to place results of code execution. Some possibilities are to generate code to leave the result on stack or in registers. Let's use registers and specifically the eAx register since it is commonly used by commercial compilers.
APlusExp - Each sub-node of Plus will also use the eAx register to return a result; the result of the left node will be returned in eAx which is overwritten by execution of the code for the right node.
The apply() method generates the code for that node, the following generates code for the left node: node.getL().apply(this);
For the expression: a + 3 * 4, the code generated would be at right:
a + 3 * 4
+
/ \
Id *
| / \
a Literal Literal
| |
3 4public void caseAPlusExp(APlusExp node) {
node.getL().apply(this);System.out.println( "Push eAx");
node.getR().apply(this);
System.out.println( "Pop eBx");
System.out.println( "Add eAx, eBx");
}Mov eAx, a ; left exp
Push eAx
Mov eAx, 3 ; right exp
Mov eBx, 4
Imul eBx
Pop eBx ; eAx = left + right
Add eAx, eBxAWhileStatement - The while has two sub-nodes, an expression and a statement that is executed when the expression is true.
We will generate expression code first by visiting the Exp sub-node, leaving the result of 1 for true or 0 for false in eAx. Statement code will be generated by visiting the Statement sub-node.
Note that MiniJava statements generally produce side-effects but do not return results. Also that the variable a would need to be represented by an absolute address in the heap, frame offset, global variable, etc. The labels would, of course, need to be unique.
while( a < 3 ) a = a + 1;
while
/ \
Exp Statement
| |
a<3 a = a + 1public void caseAWhileStatement(AWhileStatement node) { System.out.println( "while:");
node.getExp().apply(this);
System.out.println( " Cmp eAx, 0");
System.out.println( " Je endWhile");node.getStatement().apply(this);
System.out.println( " Jmp while");
System.out.println( "endWhile:");
}while: Mov eAx, 1 ; Exp a < 3
Mov eBx, a
Cmp eBx, 3
Jb true
Mov eAx, 0true:
Cmp eAx, 0
Je endWhileAdd a, 1 ; Statement
Jmp while
endWhile:
Exercise
- Give the AIfStatement to generate 386 code.
- Give the APlusExp to generate JVM code.
7.1 INTERMEDIATE REPRESENTATION TREES
Requirements for intermediate language
- convenient to produce
- convenient to translate to real architecture
- each construct must have clear and simple meaning
Intermediate representation trees
The intermediate representation is an abstract architecture assembly language.
There are two groups of operators in the assembly language: expressions (Exp) and statements (Stm).
The following are abstract assembly language operations:
Exp - operation to compute a value with possible side-effects
- CONST(i) - The integer literal 12 would be:
CONST(12)
- NAME(n) - A symbolic constant n corresponding to an assembly language label:
NAME(new Label("printInt"))
- TEMP(t) - Temporary t, similar to a register but the abstract machine has infinite number. The following constructs a unique Temp number, TEMP(t) is similar then to eAx on an Intel processor, it distinguishes one temporary from another:
TEMP(new Temp())
- BINOP(o, e1, e2) - Apply operator o to operands e1 and e2 where e1 is evaluated first. The following is 2+5
BINOP(BINOP.PLUS,CONST(2),CONST(5))
- MEM(e) - Dereferences wordsize bytes starting at address e. On left side means store, on right means fetch. The contents of a new Temp numbered 4 is:
MEM(TEMP(new Temp(4)))
- CALL(f,l) - Applies function f to argument list l. f is evaluated before evaluating the arguments in left to right order. Apply printInt to the list args:
CALL( NAME(new Label("printInt")), args)
- ESEQ(s, e) - Statement s is evaluated with possible side-effects then e is evaluated for results. A side-effect statement that moves 12 to a temporary numbered 4 is:
MOVE(TEMP(new Temp(4)), CONST(12))
Performing the side-effect and expression that adds 5 to the temporary: (4) = 12; (4) + 5:
ESEQ( MOVE(TEMP(new Temp(4)), CONST(12)), BINOP(BINOP.PLUS, TEMP(new Temp(4)), CONST(5)))
or
Temp t = new Temp(4);
ESEQ( MOVE(TEMP(t), CONST(12)), BINOP(BINOP.PLUS, TEMP(t), CONST(5)))
Stm - perform side-effects and control flow
- MOVE( TEMP t, e) - Evaluate e and move result to temporary t. Move 12 to a temporary numbered 4:
MOVE(TEMP(new Temp(4)), CONST(12))
Store the address of temporary 4 to temporary 5 (i.e. 5 references 4):
MOVE(TEMP(new Temp(5)), MEM(TEMP(new Temp(4))))
- MOVE( MEM(e1), e2) - Evaluate e1 yielding address a. Evaluate e2 and store wordSize byte result at address a. The following stores CONST(12) at the address of temporary 5:
MOVE( MEM( TEMP( new Temp(5)), CONST(12))
- EXP( e ) - Evaluate e and discard result.
EXP( CONST(12) )
- JUMP( e, labels ) - Jump to address e which may be literal label or a computed address. C switch( i ) statements may be implemented by arithmetic on i to compute the label from the list to jump. For dataflow analysis of Chapter 17, all possible e locations must be specified. The common case of jumping to a known label is:
JUMP( new Label( "printInt") )
- CJUMP( o, e1, e2, t, f ) - Compare evaluation of e1 and e2 using relation operator o; if result is true jump to address t else jump to address f. To jump to while when 3 < 5 is true and endwhile when false:
CJUMP( CJUMP.LT, CONST(3), CONST(5), new Label("while"), new Label("endwhile"))
- SEQ( s1, s2 ) - Statement s1 followed by statement s2; the purpose of SEQ is to construct the IR tree, each SEQ defines a parent node of two child statements.
SEQ
/ \
s1 s2The following does nothing useful:
SEQ( EXP( CONST(12) ), EXP( CONST(12) ) );
The following is invalid because CONST is an Exp rather than a Stm:
SEQ( CONST(12), CONST(12) );
This initializes temporary 4 to integer 12 and jumps when less than 30 to while otherwise to endwhile
SEQ(
MOVE(TEMP(new Temp(4)), CONST(12)),
CJUMP( CJUMP.LT, TEMP(new Temp(4)), CONST(30), new Label("while"), new Label("endwhile"))
)- LABEL( n ) - Constant value of name n is the current machine code address, a label definition in assembly language. NAME( n ) may be used for jump, calls, etc. target. Defining a while of:
while( 2 < 1 )
Temp4 = 5;
// Assumes while, T and F are defined as new Labels(...)
SEQ(
SEQ(
SEQ(
LABEL( while ),
CJUMP(Tree.CJUMP.LT, CONST(2), CONST(1), T, F))
),
SEQ(
LABEL( T ),
SEQ(
MOVE(TEMP(new Temp(4)), CONST(5)),
JUMP( while )
)
)
),
LABEL( F )
)SEQ
/ \
SEQ LABEL(F)
/ \
SEQ SEQ
/ \ / \
LABEL(while) CJUMP LABEL(T) SEQ
/ \
MOVE JUMP(while)
TREE of SEQs
An AST of a program is a tree, so is an IR of the program. SEQs serve to construct the tree as in the above while implementation.
IR Instruction Implementation package Tree; abstract class Exp{}; public class CONST extends Exp {
public CONST(int v);public class NAME extends Exp {
public NAME(Label l) ;public class TEMP extends Exp {
public TEMP(Temp t)public class BINOP extends Exp {
public BINOP(int b, Exp l, Exp r);
public final static int PLUS=0, MINUS=1, MUL=2, DIV=3,
AND=4,OR=5,LSHIFT=6,RSHIFT=7,
ARSHIFT=8,XOR=9;public class MEM extends Exp {
public MEM(Exp e)
public class CALL extends Exp {
public CALL(Exp f, ExpList a)
public class ESEQ extends Exp {
public ESEQ(Stm s, Exp e)abstract class Stm{}; public class MOVE extends Stm {
public MOVE(Exp d, Exp s)
public class EXP extends Stm {
public EXP(Exp e)
public class JUMP extends Stm {
public JUMP(Exp e, LabelList t)
public JUMP(Label target)
public class CJUMP extends Stm {
public CJUMP(int rel, Exp l, Exp r, Label t, Label f)
public final static int EQ=0, NE=1, LT=2, GT=3, LE=4,
GE=5, ULT=6, ULE=7, UGT=8, UGE=9;
public class SEQ extends Stm {
public SEQ(Stm l, Stm r)
public class LABEL extends Stm {
public LABEL(Label l)public class ExpList extends Exp {
public ExpList(Exp head, ExpList tail)public class StmList {
public StmList(Stm h, StmList t)
Note
To improve readability of all IR examples, each class constructor is invoked via a static method similar to the following:
private static Tree.Exp CONST(int value) { return new Tree.CONST(value); } so that we can use:
CONST( 3 )
rather than:
new CONST( 3 )
7.2 TRANSLATION INTO TREES
Example - To get a feel for translating from AST into IR consider the expression 3 + 4.
Translation requires that each code producing AST node define the corresponding IR code; the result of translating 3 + 4 from AST to IR is one branch of an IR code tree. With most of the AST translating one-to-one IR, the IR tree will have the same structure but a different representation language, one more amenable to analysis and eventual machine language code generation.
3 + 4 public void caseAPlusExp(APlusExp node) {
node.getL().apply(this);
node.getR().apply(this);
// BINOP (PLUS, Exp(L), Exp(R) )
setExp(node,
new Ex(
BINOP(
Tree.BINOP.PLUS,
getExp(node.getL()).unEx(),
getExp(node.getR()).unEx()
)
)
);
}AST
APlusExp
L / \ R
/ \
AIntegerLiteral AIntegerLiteral
| |
3 4
IR
BINOP
/ | \
BINOP.PLUS CONST(3) CONST(4)MiniJava clearly differentiates between expressions and statements so there is not much need for such translation. However, C and many other languages have statements that behave as expression and vice-versa:
if( (a=3) > 4) ....
Translating Exp's to Stm's - In translating, there are 3 main types of expressions to consider:
- Ex an expression represented as a Tree.Exp
- Nx no result represented by a Tree.Stm
- Cx a conditional represented by a Tree.Stm
Exp - The base Exp translating class:
package Translate;
public abstract class Exp {
abstract Tree.Exp unEx();
abstract Tree.Stm unNx();
abstract Tree.Stm unCx(Temp.Label t, Temp.Label f);
}Nx - We'll first examine the simple Nx class translation.
Given a Stm the constructor returns null when translating to Ex or Cx and simply returns the Stm for a Nx.
package Translate;
public class Nx extends Exp {
Tree.Stm stm;
Nx(Tree.Stm s) { stm = s; }
Tree.Exp unEx() { return null; }
Tree.Stm unNx() { return stm; }
Tree.Stm unCx(Temp.Label t, Temp.Label f) { return null; }
}
These aren't good examples but the following does show that Nx prevents nonsense translations:
Translate.Nx a = new Nx(MOVE(TEMP(new Temp()), CONST(1)));
Tree.Exp e = a.unEx(); // e = null
Tree.Stm s1 = a.unCx(); // s1 = null
Tree.Stm s2 = a.unNx(); // s2 = MOVE(TEMP(new Temp()), CONST(1))
A more realistic example demonstrates the implementation of the while.
The while is a Nx so the node is new Nx( ... )
The body of the while must be a Stm of Nx so the following simply returns the body:
body.unNx();
Remember that Ex, Nx and Cx define nodes that can, in some cases be translated from an Ex to an Nx or Cx, etc. This is important in languages such a C which blur statements, expressions and conditionals but not for MiniJava.
public void caseAWhileStatement(AWhileStatement node) {
node.getExp().apply(this);
node.getStatement().apply(this);
Label test = new Label();
Label T = new Label();
Label F = new Label();
Exp exp = getExp(node.getExp());
Exp body = getExp(node.getStatement());
setExp(node,
new Nx(
SEQ(
SEQ(
SEQ(
LABEL(test),
CJUMP(Tree.CJUMP.EQ, exp.unEx(), CONST(1),T,F)
),
SEQ(
SEQ(
LABEL(T),
body.unNx()
),
Tree.JUMP( test )
)
),
LABEL(F)
)
)
);
}For:
while(i < 5) i = i + 1;
The interesting parts of the while are both Exp's:exp.unEx() which represents: i < 5
The section on Ex will show how the conditional expression is implemented.
body.unNx() which represents: i = i + 1
i + 1: Is also an Ex.
i = i + 1: Assignment is a Nx done below.
Exercise
- What is the purpose of:
SEQ(
LABEL(test),
CJUMP(Tree.CJUMP.EQ, exp.unEx(), CONST(1),T,F)
)
- WHat is the purpose of:
SEQ(
LABEL(T),
body.unNx()
)
public void caseAAssignStatement(AAssignStatement node) {
node.getId().apply(this);
node.getExp().apply(this);
Exp e1 = getExp(node.getId());
Exp e2 = getExp(node.getExp());
if (e1.unEx() instanceof Tree.TEMP)
setExp(node, new Nx(MOVE (e1.unEx(), e2.unEx())));
else {
Temp z = new Temp(0);
setExp(node,
new Nx(
MOVE(
MEM(
BINOP(
Tree.BINOP.PLUS,
TEMP(z),
e1.unEx()
)
),
e2.unEx()
)
)
);
}
}Assignment can be to a TEMP (temporary or register) or MEM (memory).
- TEMP use MOVE( TEMP t, e) - Evaluate e and move result to temporary t.
- MEM use MOVE( MEM(e1), e2) - Evaluate e1 yielding address a. Evaluate e2 and store wordSize byte result at address a.
For x = 5
Temp z = new Temp(0);
TEMP(z) + e1.unEx()
is the address of x in the current frame since x would be an offset and Temp(0) is the first allocation in the frame.
Ex - We'll examine the Ex class translation methods and try to place more in context of intermediate code generation.
package Translate;
public class Ex extends Exp {
Tree.Exp exp;
Ex(Tree.Exp e) { exp = e; }Tree.Exp unEx() { return exp; }
Tree.Stm unNx() { return new Tree.EXP(exp); }
Tree.Stm unCx(Label t, Label f) {
if (exp instanceof Tree.CONST) { // if exp is a constant, emit JUMP statement.
Tree.CONST c = (Tree.CONST)exp;
if (c.value == 0)
return new Tree.JUMP(f);
else
return new Tree.JUMP(t);
}
return new Tree.CJUMP(Tree.CJUMP.NE, exp, new Tree.CONST(0), t, f);
}
}
- unEx is straightforward since it merely returns the exp.
- unNx is also as Tree.EXP( exp ) evaluates exp and discards the results; the effect is to convert a Tree.Exp into a Tree.Stm. Not useful in MiniJava.
- unCx translates an Tree.Exp to a conditional Tree.Stm, given two labels to branch on true or false. For a contrived example:
Tree.Exp e = new Ex( CONST(12) );
Tree.Stm s = e.unCx(new Label("isTrue"), new Label("isFalse"));
s would now be: JUMP("isTrue")
Again, not terribly useful.
However, relations (conditional expressions) are implemented as Ex objects. The < has the following pseudo code that parallels the implementation:
t = 0
if left < right goto T else goto F
T: t = 1
F:
public void caseALtExp(ALtExp node) {
node.getL().apply(this);
node.getR().apply(this);
Exp expl= getExp(node.getL());
Exp expr= getExp(node.getL());
Label T = new Label();
Label F = new Label();
Temp t = new Temp();
setExp(node,
new Ex(
ESEQ(
SEQ(
SEQ(
SEQ(
MOVE(TEMP(t),CONST(0)),
CJUMP(Tree.CJUMP.LT, expl.unEx(), expr.unEx(), T, F)
),
SEQ(
LABEL(T),
MOVE(TEMP(t),CONST(1))
)
),
LABEL(F)
),
TEMP(t)
)
)
) ;
}Cx - Will be examined later in the context of conditionals.
Exercise - What is the purpose above of:
SEQ(
MOVE(TEMP(t),CONST(0)),
CJUMP(Tree.CJUMP.LT, expl.unEx(), expr.unEx(), T, F)
)
SIMPLE VARIABLES
InFrame
Variables declared in the current frame are represented by offsets from the current frame pointer.
Primitive data types
Recall that frames hold parameters and local variables which may be primitive (int) or an object (reference). It matters when accessing the variable: a primitive value is in the frame, an object holds a memory reference to structure.
Frame variables are accessed by:
frame pointer + offset(variable)
With fp (frame pointer) at address 101 and variable v at offset 2, address 103 is accessed:
fp + offset(v) = 101 + 2 = 103
The value of a primitive variable v is then accessed by MEM:
MEM(v) = MEM(fp + offset(v)) = MEM(103) = 100
Offset Frame MEM address
fp+0 fp+1 fp+2 v fp+3
34 104 106 100 43 23 121
100 101 102 103 104 105 106 Since offset(v) is fixed at compile time, the computation to access primitive v is then:
MEM(fp+offset(v)) = MEM(fp+2)
or in IR:
MEM(
BINOP(
BINOP.PLUS,
TEMP(fp),
CONST(2)
)
)where fp is the frame pointer register. You might recall the Frame class holds the definition of the frame pointer in terms of the specific target architecture.
Assignment to an InFrame Primitive
v = 3
Offset Frame MEM address
fp+0 fp+1 fp+2 v fp+3
34 104 106 100 43 23 121
100 101 102 103 104 105 106 MOVE(
MEM(
BINOP(
BINOP.PLUS,
TEMP(fp),
CONST(2)
)
),
CONST(3)
)References
References must be dereferenced. The general method where v is a reference in the current frame requires nested MEM:
MEM(MEM(v)) = MEM(MEM(fp + offset(v))) = MEM(MEM(103)) = MEM(100) = 34
or in IR:
MEM(
MEM(
BINOP(
BINOP.PLUS,
TEMP(fp),
CONST(2)
)
)
)Objects
Offset Frame MEM address
fp+0 this fp+1 fp+2 v x y f
34 104 106 100 43 23 121
100 101 102 103 104 105 106 Objects are references to structures that can only be validly dereferenced through this. Consider an object defined by:
class B { int x, y, f; }
The general method to dereference a field f where this is at the current frame offset 0 and the offset( f ) is relative to the starting address of the B object:
MEM(MEM(f)) = MEM(MEM(fp)+offset(f)) = MEM(MEM(101)+2) = MEM(104+2) =MEM(106) = 34
or in IR:
MEM(
BINOP(
BINOP.PLUS,
MEM(TEMP(fp)),
CONST(Offset(f))
)
)InReg
Registers are TEMP's and are not in the frame. TEMP variable a is then:
TEMP( a )
Translating variable access
Access by variable ID is then a matter of generating the correct IR node, either a TEMP or CONST for the offset.
public void caseAIdExp(AIdExp node) {
SimpleExp se = currMethod.getVar( node.getId().toString()).getExp();
if (se instanceof Temp) {
setExp(node, new Ex(
TEMP((Temp)(se))));
return;
}
else
if (se instanceof Offset) {
setExp(node, new Ex(
CONST( ((Offset)se).value())));
return;
}
setExp(node, null);
}
ARRAY VARIABLES
Some languages (Pascal) define an array variable where on assignment, the values of one array are copied from the right-side variable to the left-side variable. The following copies the 5 values of b to a.
var a, b : array[0..4] of integer;
begin
a := b;
endC defines pointers where arrays are constant pointers which cannot be assigned a new reference.
Not legal int a[5], b[5];
a = b;
Legal - a aliases b int *a, b[5];
a = b;
MiniJava defines arrays as variable pointers which can be assigned a new reference. Remember that every non-primitive is a pointer.
a aliases b int [] a;
int [] b;
b = new int[5];a = b;
STRUCTURED L-VALUES
- l-value result of an expression that can occur on left-side of assignment statement; denotes a location that can be assigned to.
Valid x = 3
a[i] = 3
x = a[i] + 3
- r-value can only appear on right-side of assignment.
Valid x = 3
Invalid 3 = x
Structured l-values
- scalar is a value of one component such as a pointer, int, etc. MiniJava all variables and l-values are a scalar; array and all other objects (non-primitives) are pointers, all occupy one memory location. The location of a scalar v is:
MEM(fp+offset(v))
- structure is a multi-component value such as a C struct. Each value occupies multiple memory locations. The MEM for structures must include the size as for structure s:
MEM(fp+offset(v), size(s) )
typedef struct List {
struct List *Next;
int Value;
} Link;Assuming that pointers and ints occupy one word each the size(Link) is 2.
SUBSCRIPTING AND FIELD SELECTION
C arrays are constant pointers (fixed addresses). Calculating the address of a one-dimensional array subscript based at 0 in C depends on the word size of the array components:
int a[5];
&a[i] = &a + i
For word size greater than 1 the computation is:
&a[i] = &a + i*word size
Java arrays are variable pointers which must be dereferenced. Calculating the address of a one-dimensional array subscript based at 0 in Java also depends on the word size of the array components (using the C meaning of * for dereferencing):
int [] a = int[5]; &a[i] = *a + i
145 245 765 476 189
100 101 102 103 104
100 a
For word size greater than 1 the computation is:
&a[i] = *a + i*wordsize
Translation of array variable subscripts then depends upon the language use of arrays. Java arrays are objects or a pointer that requires a dereference for each access or, more likely, the pointer reference would be copied to a register.
Access to a non-indexed object is:
MEM(MEM(v)) = MEM(MEM(fp + offset(v))) = MEM(MEM(103)) = MEM(100) = 34
or in IR:
MEM(
MEM(
BINOP(
BINOP.PLUS,
TEMP(fp),
CONST(2)
)
)
)
Offset Frame MEM address
fp+0 fp+1 fp+2 v fp+3
34 21 78 100 101 23 121
100 101 102 103 104 105 106 Array access is similar to regular objects with the addition of an index; for an InFrame array a[2] of wordsize 1:
MEM(MEM(a)+i) = MEM(MEM(fp + offset(a))+i) = MEM(MEM(105)+i) = MEM(102+2) = MEM(104) = 87
or in IR:
MEM(
BINOP(
BINOP.PLUS,
MEM(
BINOP(
BINOP.PLUS,
TEMP(fp),
CONST(4)
)
),
CONST(1)
)
)
Offset Frame MEM address
fp+0 this fp+1 a[0] fp+2 a[1] fp+3 a[2] fp+4 a
34 201 78 34 87 102 121
100 101 102 103 104 105 106 Translation from AST to IR for wordsize 4 is:
public void caseAArrayindexExp(AArrayindexExp node) {
node.getArray().apply(this);
node.getIndex().apply(this);
Temp t_index = new Temp();
Temp t_size = new Temp();
Tree.Exp e1 = getExp(node.getArray()).unEx();
Tree.Exp e2 = getExp(node.getIndex()).unEx();
Label F = new Label();
Label T = new Label();
LinkedList argList = new LinkedList();
Tree.ExpList args = new Tree.ExpList(argList);
Tree.Stm s1 =
SEQ(
SEQ(
SEQ(
SEQ(
SEQ(
MOVE(TEMP(t_index),BINOP(Tree.BINOP.MUL,e2,CONST(4))),
MOVE(TEMP(t_size),MEM(e1))
),
CJUMP(Tree.CJUMP.GE,TEMP(t_index),TEMP(t_size),T,F)
),
LABEL(T)
),
MOVE(TEMP(new Temp()), CALL(NAME(new Label("_error")),args))
),
LABEL(F)
);
Temp t = new Temp();
Stm s2 =
SEQ(
s1,
MOVE(TEMP(t),
MEM(
BINOP(BINOP.PLUS,
e1,
BINOP(BINOP.PLUS,
BINOP(BINOP.MUL, e2, CONST(4)),
)
)
)
)
);
setExp(node, new Ex(ESEQ(s2,TEMP(t))));
}INDEX CHECKING
Out of range index errors are a source of many program failures and buffer over-run is exploited by hackers. The following excerpt from the above IR tests that the address computed is within the range of some array addresses.
CJUMP(Tree.CJUMP.GE,TEMP(t_index),TEMP(t_size),T,F)
Exercise
- What boundary condition does the above check?
- Is the other boundary condition checked?
ARITHMETIC
IR arithmetic is a simple translation.
Though no IR unary operators, most can be implemented using binary; unary negation is subtraction from 0. An example:
3 + 4 public void caseAPlusExp(APlusExp node) {
node.getL().apply(this);
node.getR().apply(this);
//BINOP (PLUS, Exp(L), Exp(R) )
setExp(node,
new Ex(
BINOP(
BINOP.PLUS,
getExp(node.getL()).unEx(),
getExp(node.getR()).unEx()
)
)
);
}AST
APlusExp
L / \ R
/ \
AIntegerLiteral AIntegerLiteral
| |
3 4
IR
BINOP
/ | \
BINOP.PLUS CONST(3) CONST(4)
CONDITIONALS
We have already seen the while translation.
The conditional CJUMP must evaluate expression exp.unEx() to 1 in order to execute the body; the expression most often is a conditional such as x < 4 && y > 3.
The effect is to create a large number of labels and some rather inefficient branching. Think of the IR for each less than expression and the separate branching required by each.
public void caseAWhileStatement(AWhileStatement node) {
node.getExp().apply(this);
node.getStatement().apply(this);
Label test = new Label();
Label T = new Label();
Label F = new Label();
Exp exp = getExp(node.getExp());
Exp body = getExp(node.getStatement());
setExp(node,
new Nx(
SEQ(
SEQ(
SEQ(
LABEL(test),
CJUMP(Tree.CJUMP.EQ, exp.unEx(), CONST(1),T,F)
),
SEQ(
SEQ(
LABEL(T),
body.unNx()
),
Tree.JUMP( test )
)
),
LABEL(F)
)
)
);
}Cx - an abstract class that must be inherited and the unCx method over-ridden; there is no Cx constructor. It is designed to simplify and formalize the IR code for conditionals.
RelCx - translates conditionals. Its purpose is to extend Cx class by defining an unCx with the CJUMP needed by the unEx and unNx methods.
The following has been modified for readability; new Tree. has been removed.
package Translate;
public abstract class Cx extends Exp {
Tree.Exp unEx() {
Temp r = new Temp();
Label t = new Label();
Label f = new Label();
return ESEQ(
SEQ(
MOVE( TEMP(r), CONST(1)),
SEQ(
unCx(t, f),
SEQ(
LABEL(f),
SEQ(
MOVE(TEMP(r), CONST(0)),
LABEL(t)
)
)
)
),
TEMP(r)
);
}
abstract Tree.Stm unCx(Label t, Label f);
Tree.Stm unNx() {
Label join = new Label();
return SEQ(
unCx(join, join),
LABEL(join)
);
}
}package Translate;
public class RelCx extends Cx {
int op;
Tree.Exp left, right;
RelCx(int o, Tree.Exp l, Tree.Exp r) {
op = o;
left = l;
right = r;
}
Tree.Stm unCx(Label t, Label f) {
return CJUMP(op, left, right, t, f);
}
}
IfThenElseExp - Extends Exp to make use of the RelCx. It is listed below but will not be discussed.
package Translate;
public class IfThenElseExp extends Exp {
Exp cond, a, b;
Label t = new Label();
Label f = new Label();
Label join = new Label();
IfThenElseExp(Exp cc, Exp aa, Exp bb) {
cond = cc;
a = aa;
b = bb;
}
private static Tree.Stm SEQ(Tree.Stm left, Tree.Stm right) {
if (left == null)
return right;
if (right == null)
return left;
return new Tree.SEQ(left, right);
}
private static Tree.LABEL LABEL(Label l) { return new Tree.LABEL(l); }
private static Tree.Exp ESEQ(Tree.Stm stm, Tree.Exp exp) {
if (stm == null) return exp;
return new Tree.ESEQ(stm, exp);
}
private static Tree.Stm MOVE(Tree.Exp dst, Tree.Exp src) {
return new Tree.MOVE(dst, src);
}
private static Tree.Stm JUMP(Label l) { return new Tree.JUMP(l); }
private static Tree.Exp TEMP(Temp t) { return new Tree.TEMP(t); }
Tree.Stm unCx(Label tt, Label ff) {
Tree.Stm aStm = a.unCx(tt, ff);
if (aStm instanceof Tree.JUMP) {
Tree.JUMP aJump = (Tree.JUMP)aStm;
if (aJump.exp instanceof Tree.NAME) {
Tree.NAME aName = (Tree.NAME)aJump.exp;
aStm = null;
t = aName.label;
}
}
Tree.Stm bStm = b.unCx(tt, ff);
if (bStm instanceof Tree.JUMP) {
Tree.JUMP bJump = (Tree.JUMP)bStm;
if (bJump.exp instanceof Tree.NAME) {
Tree.NAME bName = (Tree.NAME)bJump.exp;
bStm = null;
f = bName.label;
}
}
Tree.Stm condStm = cond.unCx(t, f);
if (aStm == null && bStm == null)
return condStm;
if (aStm == null)
return SEQ(condStm, SEQ(LABEL(f), bStm));
if (bStm == null)
return SEQ(condStm, SEQ(LABEL(t), aStm));
return SEQ(condStm,
SEQ(SEQ(LABEL(t), aStm),
SEQ(LABEL(f), bStm)));
}
Tree.Exp unEx() {
Tree.Exp aExp = a.unEx();
if (aExp == null)
return null;
Tree.Exp bExp = b.unEx();
if (bExp == null)
return null;
Temp r = new Temp();
return ESEQ(SEQ(SEQ(cond.unCx(t, f),
SEQ(SEQ(LABEL(t),
SEQ(MOVE(TEMP(r), aExp),
JUMP(join))),
SEQ(LABEL(f),
SEQ(MOVE(TEMP(r), bExp),
JUMP(join))))),
LABEL(join)),
TEMP(r));
}
Tree.Stm unNx() {
Tree.Stm aStm = a.unNx();
if (aStm == null)
t = join;
else
aStm = SEQ(SEQ(LABEL(t), aStm), JUMP(join));
Tree.Stm bStm = b.unNx();
if (bStm == null)
f = join;
else
bStm = SEQ(SEQ(LABEL(f), bStm), JUMP(join));
if (aStm == null && bStm == null)
return cond.unNx();
Tree.Stm condStm = cond.unCx(t, f);
if (aStm == null)
return SEQ(SEQ(condStm, bStm), LABEL(join));
if (bStm == null)
return SEQ(SEQ(condStm, aStm), LABEL(join));
return SEQ(SEQ(condStm, SEQ(aStm, bStm)), LABEL(join));
}
}IfStatement
The regular IR for the AST AIfStatement node is surprisingly simple. For an if statement of:
if (a < 3 && x < a) x = 2 else a = 3;
- exp: a < 3 && x < a which evaluates to 0 or 1
- stmT: x = 2
- stmF: a = 3
public void caseAIfStatement(AIfStatement node) {
node.getExp().apply(this);
node.getTrue().apply(this);
node.getFalse().apply(this);
Label T = new Label();
Label F = new Label();
Label D = new Label();
Exp exp = getExp(node.getExp());
Exp stmT = getExp(node.getTrue());
Exp stmF = getExp(node.getFalse());
setExp(node,
new Nx(
SEQ(
SEQ(
SEQ(
SEQ(
CJUMP(Tree.CJUMP.EQ, exp.unEx(), CONST(1),T,F),
SEQ(
LABEL(T),
stmT.unNx()
)
),
JUMP(D)
),
SEQ(
LABEL(F),
stmF.unNx()
)
),
LABEL(D)
)
)
);
}
STRINGS
String literals are implemented as a constant address of initialized memory. In assembly language, a label would be generated for the string. For example, Intel assembly:
str1: db "Hello World"
In Java, a string is an object so that:
String s = "Hello World";
assigns s the address of the location of "Hello World" string.
The translator needs to make a new label for each literal by: Tree.NAME( new Label("str1") ) and generate an assembly language (not IR Tree) code fragment to reserve and initialize a block of memory with the string.
RECORD AND ARRAY CREATION
Primitive scalar variables can be dynamically allocated on the stack as part of the frame.
Records and arrays (objects) can outlive the method in which created to be accessed after the method has completed; the frame is deallocated on method completion so cannot be used for references. Consider:
int [] f() { int [] a = new int[5];
:
:
return a;
}
145 245 765 476 189 length 5
100 101 102 103 104
100 a
Heap - an area of static memory allocated for non-frame data.
Array creation
- Determine space requirements. Example requires 5 words+1 for length field of array.
- Call external (not IR) function to allocate heap space and return address of first location.
- Generate code for initializing array values to 0.
- Generate code for saving array length at offset 0.
The AST to IR translation for new int [ exp ] is:
public void caseANewarrayExp(ANewarrayExp node) {
node.getExp().apply(this);
Temp t1 = new Temp(); // array offset 0 address
Temp t2 = new Temp();
Label cj = new Label();
Label F = new Label();
Label T = new Label();
Exp exp1 = getExp(node.getExp());
// 1. Determine array size = exp + 1 for wordsize 1
Tree.Exp size =
BINOP(
Tree.BINOP.MUL,
BINOP(Tree.BINOP.PLUS,exp1.unEx(),CONST(1)),
CONST(1)); // wordsize 1
// 2. Call external _halloc get pointer to space allocated in t1
LinkedList argList = new LinkedList();
argList.add(size);
Tree.ExpList args = new Tree.ExpList(argList);
Tree.Stm s1 = // s1 allocates
MOVE(
TEMP(t1), // t1 is start of array
CALL(NAME(new Label("_halloc")), args));
// 3. Initialization
Tree.Stm s2 = // s2 initializes
SEQ(
SEQ(
SEQ(
SEQ(
SEQ(
SEQ(
MOVE(TEMP(t2),CONST(1)), // start at offset 1
SEQ (
LABEL(cj),
CJUMP(Tree.CJUMP.LT,TEMP(t2),size,F,T)
)
),
LABEL(T)
),
MOVE(
MEM(
BINOP(
Tree.BINOP.PLUS,
TEMP(t1),
TEMP(t2)
)
),
CONST(0) // initialize to 0
)
),
MOVE(
TEMP(t2),
BINOP(
Tree.BINOP.PLUS,
TEMP(t2),
CONST(1) // wordsize 1
)
)
),
JUMP(cj)
),
// 4. array length in array offset 0
SEQ(
LABEL(F),
MOVE(
MEM(
TEMP(t1)),
BINOP(
Tree.BINOP.MUL,
exp1.unEx(),
CONST(1) // wordsize 1
)
)
)
);
setExp(node,
new Ex(
ESEQ( // s1 allocates
SEQ(s1,s2), // s2 initialaizes
TEMP(t1) // t1 holds heap reference
)
)
);
}Calling runtime-system functions
The above presents an example of calling an external memory allocation function. To allocate size locations where size is an IR Exp:
MOVE(
TEMP(t1),
CALL(NAME(new Label("_halloc")), new ExpList(size, null))
);Exp size = BINOP(BINOP.PLUS,exp1.unEx(),CONST(1)); // for wordsize 1 which stores the returned address to t1, _halloc would likely be a C-library function.
WHILE LOOPS
MiniJava IR while has been examined. The more general case that allows break statement would require a label at the end of the while (e.g. done) to branch to unconditionally on execution of the break.
test:
if (!condition) goto done
body
goto test
done:
FOR LOOPS
Can be implemented as a while but fails on boundary conditions that exceed the control variable's representation. We have seen the translation of AST to IR while's. The straightforward translation of for to while is:
for( i=lo; i<=hi; i++)
bodyi=lo
while( i <= hi)
body
i++;
FUNCTION CALL
Function calls are:
CALL( NAME( new Label( "f"), new ExpList( exp1, exp2, null) )
Method calls require this as an argument and the class and method name be combined to uniquely define the method name:
CALL( NAME( new Label( "class_method"), new ExpList( this, exp1, exp2, null) )
The class_method label would need to correspond to that of a method declaration which defined the IR to execute.
STATIC LINKS - Skip
7.3 DECLARATIONS
VARIABLE DEFINITION
- Space is allocated in the frame for each variable declaration within a method.
- Initialization is a Tree expression placed prior to the body of the method, essentially assignment statement IR.
- MiniJava does not allow variable initialization so we're off the hook.
FUNCTION DEFINITION
prologue public void bm1(int a, int b, int c) { .method public bm1(III)V
.limit stack 1
.limit locals 4body b = 12; MOVE(MEM(BINOP(PLUS, fp, CONST(2))), CONST(12)) epilogue } return
.end methodprologue - in target assembly or machine language.
- pseudo-instructions needed to announce the beginning of a method
- method name label
- allocate new frame
- instructions to save 'escaping' arguments into frame and move 'non-escaping' arguments to registers
- save any registers, including return address
Example - JVM
.method public bm1(III)V
.limit stack 1
.limit locals 4body
- in IR
Example - IR
MOVE(MEM(BINOP(PLUS, fp, CONST(2))), CONST(12)) epilogue
- return value
- restore registers
- deallocate frame
- return
- pseudo-instructions for method end
Example - JVM
return
.end method
FRAGMENTSMethods consist of:
- body: The IR code. Method bodies are IR code fragments that will eventually be translated into assembly language
- frame: Parameters and local variables. The prologue and epilogue are part of the frame as machine-specific code.
Method body fragments and frame pairs are placed in a list.
Assembly code translation will walk the list generating method assembly code.
Code to add <frame, body> pair to fragment list.
- procEntryExit called with the IR code body fragment to pair with the corresponding frame.
private LinkedList frags = new LinkedList();
public void procEntryExit(Exp body) {
Frame.Frame myframe = level.frame;
Exp bodyExp = body.unEx();
Stm bodyStm;
if (bodyExp != null)
bodyStm = MOVE(TEMP(myframe.RV()), bodyExp);
else
bodyStm = body.unNx();
ProcFrag frag = new ProcFrag(bodyStm, myframe);
frags.add(frag);
}
public Iterator getResults() {
return frags.iterator();
}
CLASSES AND OBJECTS
class Vehicle {
int position;
int gas;
int move( int x ) {
this.position = x;
}
}
100
position
0gas
0Vehicle a = new Vehicle()
a
100
200
position
0gas
0Vehicle b = new Vehicle()
b
200 new Vehicle() allocates and initializes a new Vehicle object.
- allocate heap space for instance variables (position and gas), parameters and locals are in frame or registers.
- initialize by iterating through allocated memory
Similar to record and array creation.
Methods and the this pointer
Vehicle a = new Vehicle();
a.move( 5 );
- this is passed as a method parameter.
int move( int x ) {
this.position = x;
}Exercise - What is the value of this in above example?
JVM provides a new method to allocate and initialize objects.
this is the first method parameter at offset 0 of frame.
.method public static main([Ljava/lang/String;)V
.limit stack 2
.limit locals 1
new Vehicle // object address
bipush 5 // x parameter
invokevirtual Vehicle/move(I)I
return
.end method.method public move(I)V
.limit stack 3
.limit locals 2 // putfield Frame
aload 0 // this @ 0
iload 1 // x @ 1
putfield Vehicle/position I
return
.end method
class Vehicle {
int position;
int gas;
int move( int x ) {
this.position = x;
}
}
100
position
0gas
0Vehicle a = new Vehicle()
a
100
200
position
0gas
0Vehicle b = new Vehicle()
b
200 Accessing variables
frame - offsets from start of frame
- this is at offset 0
- arguments and local variables are in the frame
- arguments first followed by locals
fields - class and field name from symbol table
- access via method call to getfield or putfield
- this first argument