Chapter 2

Processor Architecture

Modified

 

Computer Architecture

 

2.1.1 Basic Microcomputer design

Based on stored-program model suggested by John von Neumann in 1940's, read more.

 

Clock

synchronizes operations internal to CPU (adding 2 numbers) with external devices (e.g. memory)

rate measured in cycles/second, 1 GHz = 1 billion cycles/second.

Memory

External to CPU, relatively slow, by order of magnitude slower than CPU registers.

Holds program instructions and data.

CPU

Register - limited amount of high speed internal memory for program variables

Cache - On chip memory for buffering predicted instructions, usually the next sequential set.

Instruction register - holds most recently fetched program instruction.

ALU - Arithmetic Logic Unit, performs most of program instruction execution.

 

2.1.2 Instruction Execution Cycle

Fetch/Execute Cycle
IP = 0

while( true )

Fetch instruction at IP

Increment IP

Decode instruction

Execute instruction

++   read memory at z to accumulator
        increment accumulator
        write accumulator to memory at z

=      read memory at z to accumulator
        write accumulator to memory at y

0.  z++

1.  y = z

 

Multistage pipeline

Fetch/Execute cycle can operate in multiple stages, much like an assembly line.

Multiple instructions can be at different stages in Fetch/Execute cycle simultaneously.

12 cycles/2 instructions = 6 cycles/instruction

 

 

7 cycles/2 instructions = 3.5 cycles/instruction

One S4 requiring 2 cycles.
11 cycles/3 instructions = 3.66 cycles/instruction

Multiple S4 requiring 2 cycles.
10 cycles/4 instructions = 2.5 cycles/instruction

 

2.1.3 Reading from Memory

Key insight is that reading/writing memory takes multiple clock cycles.

In figure below, 4 cycles from start of read until data available from external memory (to CPU chip).

CPU registers can be read in about an order of magnitude faster.

Cache memory is onboard the CPU chip, holding the next set of instructions predicted to be executed.

One reasonable prediction is the next instruction executed is the one following the current.

Cache holds set of instructions before and after current.

Executing instructions outside cache incurs significant time penalty.

Question

Implication of function/method calls?

 

Hypothetical Machine - Hypothetical Machine Simulator

A hypothetical machine architecture (i.e. one that does not exist in hardware) is given here to examine a simple computer's operation.

The diagram at right is of the hypothetical machine:

CPU (with internal registers)

Memory (at the center of the diagram for storing program instructions and data).

The CPU's function is to execute program instructions stored in memory. How does the CPU execute a high-level language program? Consider the following statement: 

Y = S + T - R

The Hypothetical Machine or most other computers do not execute a program written in symbolic form such as C++ directly. Each symbolic statement must be  translated to the corresponding machine instructions that can be executed by that CPU, in this case for the Hypothetical Machine.

A compiler can translate the Y = S + T - R into the Hypothetical Machine program code that follows. For the Hypothetical Machine, four machine instructions are translated from the single statement. This is due to having four operands (variables) Y, S, R, and T in one statement but each Hypothetical Machine instruction can have only one operand in memory.

How can the Hypothetical Machine do arithmetic operations with only one operand instructions when most operations require three operations, for example X=Y+Z?

The Hypothetical Machine has an accumulator named A that, as in a calculator, is always one of the operands. The following illustrates how Y = S + T - R is implemented in Hypothetical Machine language, explicitly using one operand.
 

Y = S + T - R Translation to Hypothetical Machine Language
Operation Hypothetical 
Machine
Mnemonic
Comments
A = S LDA   S Load A register with value of S
A = A + T ADD  T Add to A register value of T
A = A - R SUB   R Subtract from A register value of R
Y = A STA   Y Store the A register value to Y

 

The CPU executes machine instructions that are stored in memory. How does the CPU of the machine work? 

Suppose we had written the program instructions to perform Z = Y + 921. The machine instructions of the algorithm and data (variables Z and Y, and constant 921) would be stored in memory.

How would the program, algorithm and data, appear in the Hypothetical Machine memory?  

Z=Y+921 in Memory
Address Contents

000
001
002
003
004
005
006
007

01005
20004
02006
30007
00921
04800
00807
99000
Useful to remember that PROGRAM = ALGORITHM + DATA

 

PROGRAM Z=Y+921

To better see how the program is represented in memory we need to know addresses of program instructions (ALGORITHM) and variables (DATA) as below. The ALGORITHM is stored in memory addresses 000-003, and 007. The DATA (variables) are stored in memory addresses 004-006.

Address
Instruction
Mnemonic 
Comments 
 
000 
001 
002 
003 
004 
005 
006 
 007
OP ADDR
01 005   
      20 004   
      02 006   
      30 007   
      00921    
      04800    
      00807    
      99 000   
  
           LDA  Y
           ADD  921  
           STA   Z
           Jmp   007
 921    00921
 Y        04800
 Z        00807
           HLT
 
 A = Y
 A = A + 921
 Z = A
 GO TO Address 7
 Constant 921
 Variable Y
 Variable Z
 Halt Execution
NOTE: The MNEMONIC column entries correspond to the INSTRUCTION column, so mnemonic Jmp 007 is stored as machine instruction 30007 (JMP is code 30) at memory ADDRESS 003. Variable Y address is 005 so that mnemonic LDA Y instruction code is 01005 (LDA is code 01 and Y is at address 005). The constant 921 must also be stored in memory also, at address 004.

Fetch/Execute ALGORITHM:

How does the CPU execute a program? Simply stated, the CPU executes a program by repeatedly FETCH'ING one instruction from memory and EXECUTE'ING that instruction in an endless cycle called the FETCH/EXECUTE cycle. To execute a program, the Hypothetical Machine CPU implements this cycle in five steps as below.
General Fetch/Execute Hypothetical Machine Fetch/Execute
0. Initialize CPU  PC = 0
1. Fetch instruction at PC  MAR = PC
 MDR = Memory[ MAR ]
 IR  = MDR
2. PC = next instruction address   PC = PC + 1
3. Decode instruction  OP = IR operation field
 Addr = IR address field
4. Execute operation OP  
5. Go to 1  
 
CPU Architecture
As the CPU executes a program, it must maintain knowledge of the cumulative result or state of the execution of all previous program instructions. The CPU architecture consists of internal memory: PC, IR, A, B, MDR, MAR registers and external memory to hold data to maintain the state of execution. The register function, memory size and range are defined as:
PC
IR
A, B
MAR
MDR
Memory
3 digits, holds address in range 000 to 999 of next instruction.
5 digits, holds instruction (2-digit operation and 3-digit operand).
5 digits, holds numerical value in range -99999 to 99999.
3 digits, holds address in range 000 to 999 of Memory index.
5 digits, holds numerical value accessed to/from Memory[ MAR ].
5 digits, array indexed from 000 to 999, holds values -99999 to 99999

 

 

INSTRUCTIONS - The CPU executes the following instructions.
Code Operation Result
01 LDA addr A <- Memory[addr]
02 STA addr  Memory[addr] <- A
03 XAB A <-> B
04 CLA A <- 0
20 ADD addr A <- A + Memory[addr]
21 SUB addr A <- A - Memory[addr]
22 MUL addr A B <- A * Memory[addr]
23 DIV addr A <- A B / Memory[addr]
B <- A B % Memory[addr]
30 JMP addr PC <- addr
99 HLT Halt program execution
 

Fetch/Execute of  Z = Y + 921 - A more detailed view of the Fetch/Execute of our above program:

0.    Initialize CPU	
PC = 0
1.    Fetch instruction at PC   
2.    PC = PC + 1       
3.    Decode     
4.    Execute OP     
5.    GO TO 1.
LDA  Y
Memory ADDRESS 000 = 01005
PC = 1
OP=01 ADDR=005
A = Y = 4800
1.    Fetch instruction at PC          
2.    PC = PC + 1     
3.    Decode    
4.    Execute OP     
5.    GO TO 1.
ADD 921
Memory ADDRESS 001 = 20004
PC = 2
OP=20 ADDR=004
A = A + 921 = 5721
1.    Fetch instruction at PC   
2.    PC = PC + 1         
3.    Decode        
4.    Execute OP  
5.    GO TO 1.
STA Z
Memory ADDRESS 002 = 02006
PC = 3
OP=02 ADDR=006
Z = A = 5721
1.    Fetch instruction at PC    
2.    PC = PC + 1       
3.    Decode       
4.    Execute OP     
5.    GO TO 1.
JMP 007
Memory ADDRESS 003 = 30007
PC = 4
OP=30 ADDR=007
PC = 007
1.    Fetch instruction at PC   
2.    PC = PC + 1      
3.    Decode      
4.    Execute OP                   
HLT
Memory ADDRESS 007 = 99000
PC = 8
OP=99 ADDR=000
 
Instruction Cycles

Clock rate is often used by computer manufacturers as a raw measure of performance. The Hypothetical Machine operation is timed by a central clock. We have seen that the Fetch/Execute cycle consists of a series of several steps where each step takes one clock tick so the faster the clock ticks, the faster the machine executes. The LDA Y instruction of the Hypothetical Machine requires 7 total steps, 4 steps to fetch and 3 steps to execute. With a 1 Hz. clock (one tick per second), the LDA Y instruction would require 7 seconds. With a 1000 Hz. clock (1000 ticks per second), only 7/1000 seconds. With a 400 MHz. clock (400 million ticks per second), only 7/400,000,000 seconds. Note that not all instructions take the same number of steps to execute and modern CPUs can perform multiple steps simultaneously.

A more detailed (and accurate) examination of instruction execution is given below showing each data movement within the CPU registers. The Hypothetical Machine Simulator used in Homework 1 can be used to trace the Fetch/Execute cycle.

 

Example: LDA Y        01005
1.    Fetch      1.    MAR <- PC                            = 000
                      2.    MDR <- (MAR)                       = (000) = 01005
                      3.    IR  <- MDR                            = OP = 01 ADDR = 005
2.                   4.    PC  <- PC + 1                        = 004
3.    Execute  5.    MAR <- ADDR                        = 005
                      6.    MDR <- (MAR)                        = 04800
                      7.    A   <- MDR                             = 04800
 
Example: JMP 007    30007
1.    Fetch      1.    MAR <- PC                            = 003
                      2.    MDR <- (MAR)                       = (003) = 30007
                      3.    IR  <- MDR                            = OP = 30 ADDR = 007
2.                   4.    PC  <- PC + 1                         = 004
3.    Execute  5.    PC  <- ADDR                          = 007

 

2.1.4 How Programs Run

Load and execute process.

  1. You click on or type an executable program name.
  2. OS allocates memory and copies file into memory.
  3. OS hands over CPU to the program.
  4. Program executes until

    finished - OS takes back CPU and reclaims memory

    program waiting for input/output or time interval exhausted - OS takes back CPU and schedules waiting program to execute.

 

2.2     IA-32 Processor Architecture

2.2.1    Modes of operation

Protected - each program assigned separate segment of memory. Access up to 4 Gb.

Real-address - protected but program can switch modes. Limit of 1 Mb memory.

System management - allows OS to implement non-user functions (e.g. power management)

 

2.2.2 Basic Execution Environment

General 8, 16 and 32 bit register

Pointer and index registers are 32 bit

32 bits addresses 4 Gb

 

2.2.3    Floating-Point Unit

CPU has ALU for integer arithmetic

FPU separate processor for floating point math.

2.3     IA-32 Memory Management

2.3.1 Real-Address Mode

4 Gb divided into 1 Mb segments.

Hardware prevents programs from accessing another's segment.

16-bit registers = 1 Mb. addressing

2.3.2    Protected mode

32-bit addressing = 4 Gb.

Recall 32-bit pointer and index registers.

Flat memory model is 32-bit.

Paging

Programs can be larger than physical memory by swapping 4096 byte blocks between memory and disk.

 

2.4    Components of an IA-32 Microcomputer

Motherboard

PCI and PCI Express Bus Architecture

Motherboard Chipset

Intel 8237 DMA controller - high speed transfers between external devices and RAM

Intel 8259A Interrupt controller - interface between external hardware interrupt requests and CPU

8254 Timer Counter

2.4.2     Video Output

Most current systems have pipelined graphics processor (e.g. NVIDIA) capable of executing OpenGL and/or DirectX graphics instructions.

2.4.3    Memory

2.4.4    Input-Output Ports and Device Interfaces

CPU connected to memory and external world via data and address bus.

Each external device has an address.

A printer with address 15 could be sent character 'X' to print by instruction similar to:

Out 15, 'X'

 

2.5    Input-Output System

Hierarchy of software layers define I/O system.

Application programs written in C++ or other high-level language should work correctly across computers and OS.

Application program calls an output operation (e.g. System.out) that is implemented in the next lower level.

OS (e.g. Windows) should handle I/O same on different hardware. OS is in large part a set of functions called by application programs.

OS layer calls BIOS layer.

BIOS is specific to the hardware.

BIOS is set of Basic Input and Output System functions to interface non-hardware specific I/O functions to specific hardware.

A BIOS function for drawing a character on the screen is different on 1986 computer but the same C++ program could run on a 2010.

In theory, assembly language programs have access to all lower level layers.

In reality, a multitasking OS such as Windows generally controls access, all I/O is necessarily through the OS.