Chapter 9
|
OVERVIEW
Each IR tree node expresses one operation.
Most real architectures can perform several in one instruction as at right.
TREE PATTERNS
- Instruction selection - determining machine instructions to implement an IR tree
- Tree pattern - the IR tree fragment that corresponds to a valid instruction
- Tiling - instruction selection to cover IR tree with minimal tree patterns
Jouette Architecture
- r0 = 0 always
- TEMPi = ri
- Instructions above double-line alter registers
- Instructions below double-line alter memory
Exercise
- Find a tiling that cover the IR tree at right.
- Give the effect of the tilings.
- Are other tilings possible?
|
Jouette Architecture |
|
![]() |
|
a[i] = x where i is register, a and x are frame-resident a is frame offset to reference to an array |
Jouette Architecture |
![]() |
![]() |
|
One tile covers one node |
|
![]() Exercise
|
OPTIMAL AND OPTIMUM TILINGS
Example - Optimal is not necessarily optimum. See Figure 9.2
- Suppose all instructions have cost of 1 except MOVEM which has cost of m.
- m>1 then (a) optimum, (b) optimal:
- (a) = 6
- (b) > 6
- m<1 then (a) optimal, (b) optimum:
- (a) = 6
- (b) < 6
- m=1 then (a) and (b) are optimum:
- (a) = (b)
9.1 ALGORITHMS FOR INSTRUCTION SELECTION
CISC - Complex Instruction Set Computers often accomplish multiple operations per instruction, can have large tilings
RISC - Reduced Instruction Set Computers generally accomplish few operations per instruction, have smaller tilings
Optimum/Optimal differences larger on CISC than on RISC
MAXIMAL MUNCH - Optimal tiling
Algorithm
- Start at root
- Find largest tile that fits (covers the most nodes) leaving subtrees
- Generate instruction for tiling
- Repeat on each subtree
Implementation
- Two recursive functions
- munchStm for statements
- One clause for each statement
- munchExp for expressions
- each clause will match one tile
- clauses ordered by tile preference (biggest to smallest)
- Key is finding target instruction that covers the largest tiling
- Order into cases where largest tiling matches are checked first.
- Jouette STORE and MOVEM same size tiling, choose either
- Top-down, depth-first recursion over IR tree, emit instructions after visiting subtrees
- Only instruction is emitted, registers allocated later
Example
IR tree
2. ADDI r2 ¬ r0 + 6
1. ADDI r1 ¬ r2 + 5Subtree instructions generated first
Exercise - Munch the following IR into Jouette instructions using PROGRAM 9.3
- MEM(BINOP(PLUS,e1,CONST(2)))
- MOVE(TEMP(t0), CONST(3))
Jouette Architecture
Exercise
Match each tiling of the IR tree above with the corresponding tile node at left.
DYNAMIC PROGRAMMING
- Assigns a cost to every node in tree, the sum of the cost of subtree nodes
- Bottom-up, summing cost of node n subtrees first
- Matches n with each tile kind, minimum cost match selected
- cost n = sum subtree cost + cost tiling n
- leaves - places where subtrees can be attached
- Each tile has 0 or more leaves
Example - Cost is number of tilings
cost(ADD) = 3 since total 3 the number of tilings required to cover ADD and subtrees ADDI r2 ¬ r0 + 2
ADDI r1 ¬ r0 + 1
ADD r3 ¬ r1 + r2cost(ADI) = 2 since total 2 the number of tilings required to cover ADDI and subtrees
ADDI r2 ¬ r0 + 2
ADDI r3 ¬ r2 + 1
Example - Using minimum cost of subtree from previous example
- cost(MEM) = 3
ADDI r2 ¬ r0 + 2
ADDI r3 ¬ r2 + 1
LOAD r4 ¬ M[r3 + 0]- cost(MEM) = 2
ADDI r2 ¬ r0 + 2
LOAD r4 ¬ M[r2 + 1]- cost(MEM) = 2
ADDI r1 ¬ r0 + 1
LOAD r4 ¬ M[r1 + 2]
Note that other costs are possible:
- Number of memory accesses
- Number of registers used
Exercise - Assume a tiling cost = 1
- What is the cost at 8?
- What is the cost at 5?
- What is the cost at the root given the current tiling?
- FP is a register, is the tiling at 8 correct? Explain.
- What is an alternative tiling for 2?
Jouette Architecture
TREE GRAMMARS
Skip for now
FAST MATCHING
- Maximal munch and dynamic-programming algorithms examine all tiles that match each node.
- Tile match if each non-leaf node of tile is labeled with same operator as node of tree (MEM, CONST, etc.)
- More efficient to examine only tiles with matching root label
match(n) {
switch(label(n)) {
case MEM: ...
case BINOP: ...
case CONST: ...
}For example, the MEM root has only 4 possible subtrees, examine only those rather than recursively examining every possible expression again.
In practice, visitor pattern yields similar performance; discussed at end of notes.
EFFICIENCY OF TILING ALGORITHMS
- Linear runtime of both Maximal Munch and Dynamic Programming
- Dependent on number of nodes in input tree
- In practice
- Instruction selection is very fast
- Even lexical analysis takes more time to run
MM vs DP
- Maximal Munch
- Good for RISC
- Tile size is small and of uniform cost
- Usually no difference between optimum and optimal tiling
- Dynamic Programming
- Good for CISC
- Tile size is large and of widely varying cost
- Small, but noticeable differences between optimum and
optimal tiling
9.2 CISC MACHINES
CISC RISC
- few registers (16, 8, or 6)
- heterogeneous registers (e.g. MUL result only to eAx)
- multiple addressing modes (e.g. direct MOV M, 4 and indirect MOV [eBx], 4)
- 2 address instructions (e.g. ADD M, eAx or ADD eBx, eAx)
- variable-length instructions
- multiple effect instructions (e.g. MOVS which moves and indexes 2 registers)
- 32 registers
- homogenous registers
- arithmetic between registers only
- 3 address instructions
- load and store only with M[reg+const]
- fixed-length instructions
- one result per instruction
Problems
- Few registers - continue to generate TEMP nodes and assume register allocator can handle well later
- Classes of registers - Multiply typifies problem where only eAx can be used and is two of the operands. Also hoses eDx but high-level languages generally ignore anyway. Solution is to add move instructions, for temp3 = temp1 * temp2 is:
mov eAx, temp1
mul temp2
mov temp3, eAx
- Two-address instructions - Solution is to add move instructions, for temp3 = temp1 + temp2:
mov eAx, temp1
add eAx, temp2
mov temp3, eAx
- Arithmetic instructions address memory - Fetch all operands from memory and store result back, for temp3 = temp1 + temp2:
mov eAx, temp1
mov eBx, temp2
add eAx, eBx
mov temp3, eAx
- Several addressing modes - Generally no advantage to using CISC addressing modes over RISC-like instructions. For example, the following performs the same number of memory accesses:
CISC mov eCx, n
mov eDi, offset A
cld
xor Al, Al
repnz stosbRISC-like mov eCx, n
xor eDi, eDi
xor Al, Al
do:
mov A[eDi], Al
inc eDi
loopnz doRecall that stosb does:
mov [eDi], Al
inc eDi- Variable-length instructions - Not a problem for compiler since emitting assembler instructions rather than machine code.
- Instructions with side-effects - Instructions such as stosb hard to model using tree patterns since have more than one result (the auto-increment).
Exercise - Give corresponding Intel CISC instructions for the RISC three-address instructions:
temp1 = temp0 + 4 temp2 = temp0 + 8
temp3 = temp1 * temp2
9.3 - INSTRUCTION SELECTION FOR THE MiniJava COMPILER
- Figure 9.3 only prints instruction to use, does not give registers
- Root of each tile is a result in an intermediate register
- Register allocation is assignment of register numbers to each node
- Many allocation aspects independent of target machine
- Can allocate:
- during instruction selection phase but would create duplication for each target
- before but not know which nodes require registers since only tile roots require
- after but not know which registers instructions use.
ABSTRACT ASSEMBLY LANGUAGE INSTRUCTIONS
package Assem;
public abstract class Instr {
public String assem;
public abstract Temp[] use();
public abstract Temp[] def();
public abstract Targets jumps();
public String format(TempMap m);
}public Targets(LabelList labels);
public OPER(String assem, TempList dst, TempList src, LabelList jump);
public OPER(String assem, TempList dst, TempList src);
public MOVE(String assem, Temp dst, Temp src);
public LABEL(String assem, Label label);THREE CONSTRUCTORS
- OPER( assem, dst, src ) - operations that fall through to the next instruction. jumps() method returns null
- assem - assembly language instruction
- dst - list of result registers, include any side-effects (e.g. Mul eBx has eAx and eDx as result)
- src - list of operand registers, include any hidden operands (e.g. Div eBx has dividend eDx:eAx)
- OPER( assem, dst, src, labels ) - operations that can jump to a target label
- assem - assembly language instruction
- dst - list of result registers, include any side-effects (e.g. Mul eBx has eAx and eDx as result)
- src - list of operand registers, include any hidden operands (e.g. Div eBx has dividend eDx:eAx)
- labels - list of target labels that must include next instruction if possible to fall through
- LABEL( assem, label ) - a jump target pair in assembly and internal representation
- assem - assembly language form of label (e.g. aLabel: )
- label - identifier for label used (internal to Assem package).
- MOVE( assem, dst, src ) - similar to OPER but performs only data transfer. If src and dst assigned to same registers can later be deleted.
- assem - assembly language instruction
- dst - list of result registers, include any side-effects (e.g. Mul eBx has eAx and eDx as result)
- src - list of operand registers, include any hidden operands (e.g. Div eBx has dividend eDx:eAx)
Methods
- jumps() - returns list of jump targets
- def() - returns list of dst registers
- use() - returns list of src registers
- format( m ) - formats assembly instruction as a string where m is a TempMap object with a register assignment method.
Machine independence - Assem.Instr is independent of target machine; tuned for three-address architecture.
Abstract assem format - The assem format is abstract, details of the destination and source registers listed separate from instruction. Simplifies later register assignment.
Example - IR to Jouette assembly language. Result at right after register allocation where frame.FP assigned r27.
new OPER(
"LOAD 'd0 <- M['s0+8]",new TempList( new Temp(), null ), // d0 dst
new TempList( frame.FP, null) ); // s0 src
assem LOAD 'd0 <- M['s0+8]
dst r1
src r27
Jouette code LOAD r1 <- M[r27+8]
Example - IR to Jouette assembly language. Result at right after register allocation where frame.FP assigned r27.
assem ADDI 'd0 <- 's0+3
LOAD 'd0 <- M['s0+0]
MUL 'd0 <- 's0*'s1
dst t908
t909
t910
src t87
t92
t908, t909
Possible Jouette code ADDI r1 <- r12+3
LOAD r2 <- M[r13+0]
MUL r1 <- r1*r2
Example - IR to Mips assembly.
MOVE(
MEM(
BINOP(
BINOP.PLUS,
MEM(
BINOP(
BINOP.PLUS,
TEMP(new Temp(8)),
CONST(8)
)
),
BINOP(
BINOP.MUL,
TEMP(new Temp(4)),
CONST(4)
)
)
),
MEM(
BINOP(
BINOP.PLUS,
TEMP(new Temp(8)),
CONST(10)
)
)
)
Mips assem lw `d0 10(`s0)
lw `d0 8(`s0)
sll `d0 `s0 2
add `d0 `s0 `s1
sw `s0 (`s1)
dst t46
t48
t49
t47
src t8
t8
t4
t48 t49
t46 t47
Two-address instructions - Intel and other machines have instructions of the form for arithmetic of:
opr1 = opr1 + opr2
where opr1 serves as a source and destination.
Example
assem ADD 'd0 <- 's0
dst t1
src t1, t2
PRODUCING ASSEMBLY INSTRUCTIONS - Munch IR expressions into Assem instructions
- IR is architecture neutral
- Assem targeted to Jouette
- Assem instructions produced bottom-up from IR tree (root depends on children executed first)
- Temp munchExp( Exp e ) - Munch IR expressions into Assem instructions that produce result in a Temp
- void munchStm( Stm s ) - Munch IR statements into Assem instructions
- TempList L( Temp h, TempList t ) - TempList constructor method
- void emit( Assem.Instr inst ) - collects list of instructions in order emitted (i.e. first instruction emitted at head)
![]() |
![]() |
Example
Temp munchExp( MEM( BINOP( PLUS, e, CONST(i) ))) {
Temp r = new Temp();
emit( new OPER(
"LOAD 'd0 <- M['s0+" + i + "]",L( r, null ), // d0 dst
L( munchExp( e ), null) )); // s0 src
return r;}
Possible code LOAD r1 <- M[r27+8]

PROCEDURE CALLS
- trashes certain registers
- caller-save
- return-address
- return-value
- should be listed in calldefs list as destinations of CALL for later compiler phases
void f( parameters ) { return; } EXP( CALL( f, args ) )
type f( parameters ) { return type; } MOVE (TEMP t, CALL( f, args ) )
void munchStm( EXP( CALL( e, args ) )) { Temp r = new Temp();
TempList l = munchArgs( 0, args );
emit( new OPER( "CALL 's0", calldefs, L(r, l) ));
}
IF THERE IS NO FRAME POINTER
- Not all architectures have frame pointer
- Those that do copy stack pointer to frame pointer, then increment stack pointer by frame size
- Tree language uses FP+k to reference InFrame variables
- Those without use stack pointer + frame size as a virtual frame pointer
- saves time of copy to FP and increment of SP
- saves use of register as FP
- Codegen must replace FP+k with SP+k+frame size but cannot know frame size at this point, because frame size depends upon number of parameters placed in registers or on stack, not know until after register assignment.
- Codegen constructed with frame argument that would seem to need to calculate the frame size; however the author suggests:
- codegen knows at what label a function is emitted
- can replace FP+k with SP+k+label_framesize
- assume the function prologue will add definition label_framesize of generated by FRAME.procEntryExit3
SIMPLE MaximalMunch for Jouette
The following examples call the munch of IR at left producing the Jouette code at right:
IR Jouette new MaximalMunch(MEM(BINOP(BINOP.PLUS,TEMP(new Temp()),CONST(1)))); LOAD t1 <- M[t0+1] new MaximalMunch(MEM(TEMP(new Temp(3)))); LOAD t2 <- M[t3] new MaximalMunch( CONST(1)); ADDI t3 <- 1 new MaximalMunch(BINOP(BINOP.PLUS,TEMP(new Temp()),TEMP(new Temp()))); ADD t6 <- t4+t5 new MaximalMunch(MOVE(MEM(CONST(2)),CONST(3))); ADDI t10 <- 3
STORE M[2] <- t10Visitor pattern
MaximalMunch has been implemented somewhat differently from the text, using a visitor pattern rather than polymorphic recursive calls to explicit munch methods. The visitor pattern has been used heavily in this course, we have seen the benefits of having a common interface between the generic AST generated by SableCC and our interpretation (e.g. type-checking, code generation). Visitor patterns require two parts, SableCC generated and we the other. Now to see the priniciples of both.
Visitor patterns have 2 parts, for syntax-trees:
- Each syntax-tree element (SableCC's APlusExp) must implement an accept method (SableCC's apply). It passes control back to the appropriate method of the visit (SabelCC's caseAPlusExp).
- A visit method, as part of its work (e.g. type-checking) calls accept on a syntax-tree element. The accept then calls the visit method for that element.
Control then passes back-and-forth between accept and visit.
In the following example:
- MunchVisitor - defines the methods that must be implemented (this is not a complete list).
- MaximalMunch - implements the interface by defining a visit method for each MunchVisitor method.
- new MaximalMunch(MOVE ... ); - Constructs a MaximalMunch and passes a MOVE Stm to the constructor.
- public MaximalMunch(Stm s){ s.accept(this); } - Constructor invoked by MOVE Stm so s=MOVE; s.accept(this) is then essentially MOVE.accept(this) which invokes the MOVE accept method.
- public void accept(MunchVisitor v) - invokes the visitor method implemented by the MunchVisitor implementor MaximalMunch; invoked by:
v.visit(this).
The invocation order is listed below.
public interface MunchVisitor { public Temp visit(CONST n);
public void visit(MOVE n);
public Temp visit(MEM n);
}public class MaximalMunch implements MunchVisitor { public MaximalMunch(Stm s){ s.accept(this); } 2.
public MaximalMunch(Exp e){ e.accept(this); }
public void visit(MOVE s) { ... } 4.
public Temp visit(MEM n) { ... }
}
new MaximalMunch( 1.
MOVE(
MEM(
CONST(2)
),
CONST(3)
)
);public class MOVE extends Stm {
public void accept(MunchVisitor v) { 3.
v.visit(this);
}public class MEM extends Exp {
public Temp accept(MunchVisitor v) {
return v.visit(this);
}Exercise
- Why is statement 2 called from 1?
- What is s at statement 2? What is this?
- Why is statement 3 called from 2?
- What is v at statement 3? What is this?
- Why is statement 4 called from 3?
Code emission
The visitor methods, where the munching work is actually done, are all in the single MaximalMunch class. To understand how MaximalMunch generates Jouette code, consider invoking and implementing visit(CONST e), the simplest.
new MaximalMunch( CONST(1) ); 1. public MaximalMunch(Exp e){ e.accept(this); } 2.
public Temp visit( CONST e ) { 3.
Temp t = new Temp();
emit("ADDI", t.toString(), e.value+"");
return t;
}ADDI t3 <- 1 e.accept(this) invokes visit(CONST e) which emits (i.e. prints) the corresponding Jouette code. A Temp (t3 register) is created to hold the integer constant 1.
Note that return t returns the 1 to the caller in a register, t3. The caller may then use t3 in an expression. For example, CONST 12 emitted the ADDI t3 <- 12; t3 was returned to the visit(MOVE s) which emitted the STORE M[2] <- t3.
s.src.accept(this) in the following visit(MOVE s), invokes the visit(CONST e) which returns t3.
new MaximalMunch(
MOVE(MEM(CONST(2)),CONST(12)) 1.
)ADDI t3 <- 12
STORE M[2] <- t3public class MaximalMunch implements MunchVisitor {
public void visit(MOVE s) { 2.
if (s.dst instanceof MEM) {
MEM mem = (MEM)s.dst;
CONST exp = (CONST)mem.exp;
if (exp != null)
emit("STORE ", "M["+exp.value + "]", s.src.accept(this)+""); 3.
}
}public Temp visit(MEM n) { ... } 5.
:public class MOVE extends Stm { 4.
public void accept(MunchVisitor v) {
v.visit(this);
}Results of text Figure 9.2
MOVE(
MEM(
BINOP(
BINOP.PLUS,
MEM(
BINOP(
BINOP.PLUS,
TEMP(new Temp(0)),
CONST(8)
)
),
BINOP(
BINOP.MUL,
TEMP(new Temp(4)),
CONST(4)
)
)
),
MEM(
BINOP(
BINOP.PLUS,
TEMP(new Temp(0)),
CONST(10)
)
)
)
LOAD t13 <- M[t0+8]
ADDI t15 <- 4
MUL t14 <- t4*t15
ADD t12 <- t13+t14
ADDI t16 <- t0+10
MOVEM M[t12] <- M[t16]
The software implements the MaximalMunch for the Jouette computer described in the text and can be used to test your understanding of Chapter 9. The spirit of the text's munch has been maintained but simplifications have been made for clarity. Primary changes are:
- the frame and frame pointer FP are not explicitly defined
- a visitor (MunchVisitor) is used to do a depth-first traverse of the Tree in a general way rather than explicit method invocations
- constant 0 or a register with a 0 are not added to instructions
- Jouette instructions are printed in the order produced by depth-first traversal rather than added to a list of instructions for later processing in Chapter 10.
The file can be edited to verify your understanding of the input and results of the MaximalMunch.
Directions:
- Unzip to directory: Chap9
- At command prompt, change to Chap9 directory.
- Compile by: javac -classpath . Jouette\test.java
- Execute by: java -cp . Jouette.test
MaximalMunch for Jouette
| package Jouette; import Temp.Temp; import Tree.*; public class MaximalMunch implements MunchVisitor
{ |