Chapter 2Processor Architecture |
Modified: |
Computer Architecture
2.1.1 Basic Microcomputer design
Based on stored-program model suggested by John von Neumann in 1940's, read more.

Clock
synchronizes operations internal to CPU (adding 2 numbers) with external devices (e.g. memory)
rate measured in cycles/second, 1 GHz = 1 billion cycles/second.
Memory
External to CPU, relatively slow, by order of magnitude slower than CPU registers.
Holds program instructions and data.
CPU
Register - limited amount of high speed internal memory for program variables
Cache - On chip memory for buffering predicted instructions, usually the next sequential set.
Instruction register - holds most recently fetched program instruction.
ALU - Arithmetic Logic Unit, performs most of program instruction execution.
2.1.2 Instruction Execution Cycle
Fetch/Execute Cycle IP = 0 while( true )
Fetch instruction at IP
Increment IP
Decode instruction
Execute instruction
++ read memory at z to accumulator
increment accumulator
write accumulator to memory at z= read memory at z to accumulator
write accumulator to memory at y0. z++ 1. y = z
Multistage pipeline
Fetch/Execute cycle can operate in multiple stages, much like an assembly line.
Multiple instructions can be at different stages in Fetch/Execute cycle simultaneously.
12 cycles/2 instructions = 6 cycles/instruction
7 cycles/2 instructions = 3.5 cycles/instruction
One S4 requiring 2 cycles.
11 cycles/3 instructions = 3.66 cycles/instructionMultiple S4 requiring 2 cycles.
10 cycles/4 instructions = 2.5 cycles/instruction
2.1.3 Reading from Memory
Key insight is that reading/writing memory takes multiple clock cycles.
In figure below, 4 cycles from start of read until data available from external memory (to CPU chip).
CPU registers can be read in about an order of magnitude faster.
Cache memory is onboard the CPU chip, holding the next set of instructions predicted to be executed.
One reasonable prediction is the next instruction executed is the one following the current.
Cache holds set of instructions before and after current.
Executing instructions outside cache incurs significant time penalty.
Question
Implication of function/method calls?

A hypothetical machine architecture (i.e. one that does not exist in hardware) is given here to examine a simple computer's operation.
The diagram at right is of the hypothetical machine:
CPU (with internal registers)
Memory (at the center of the diagram for storing program instructions and data).
The CPU's function is to execute program instructions stored in memory. How does the CPU execute a high-level language program? Consider the following statement:
Y = S + T - R
The Hypothetical Machine or most other computers do not execute a program written in symbolic form such as C++ directly. Each symbolic statement must be translated to the corresponding machine instructions that can be executed by that CPU, in this case for the Hypothetical Machine.
A compiler can translate the Y = S + T - R into the Hypothetical Machine program code that follows. For the Hypothetical Machine, four machine instructions are translated from the single statement. This is due to having four operands (variables) Y, S, R, and T in one statement but each Hypothetical Machine instruction can have only one operand in memory.
How can the Hypothetical Machine do arithmetic operations with only one operand instructions when most operations require three operations, for example X=Y+Z?
The Hypothetical Machine has an accumulator named A that, as in a calculator, is always one of the operands. The following illustrates how Y = S + T - R is implemented in Hypothetical Machine language, explicitly using one operand.
Y = S + T - R Translation to Hypothetical Machine Language Operation Hypothetical
Machine
MnemonicComments A = S LDA S Load A register with value of S A = A + T ADD T Add to A register value of T A = A - R SUB R Subtract from A register value of R Y = A STA Y Store the A register value to Y
The CPU executes machine instructions that are stored in memory. How does the CPU of the machine work?
Suppose we had written the program instructions to perform Z = Y + 921. The machine instructions of the algorithm and data (variables Z and Y, and constant 921) would be stored in memory.
How would the program, algorithm and data, appear in the Hypothetical Machine memory?
Useful to remember that PROGRAM = ALGORITHM + DATA
Z=Y+921 in Memory Address Contents 000
001
002
003
004
005
006
00701005
20004
02006
30007
00921
04800
00807
99000
PROGRAM Z=Y+921
To better see how the program is represented in memory we need to know addresses of program instructions (ALGORITHM) and variables (DATA) as below. The ALGORITHM is stored in memory addresses 000-003, and 007. The DATA (variables) are stored in memory addresses 004-006.
Address |
Instruction |
Mnemonic |
Comments |
000 001 002 003 004 005 006 007 |
OP ADDR
01 005
20 004
02 006
30 007
00921
04800
00807
99 000
|
LDA Y
ADD 921
STA Z
Jmp 007
921 00921
Y 04800
Z 00807
HLT
|
A = Y A = A + 921 Z = A GO TO Address 7 Constant 921 Variable Y Variable Z Halt Execution |
NOTE: The MNEMONIC column entries correspond to the INSTRUCTION column, so mnemonic Jmp 007 is stored as machine instruction 30007 (JMP is code 30) at memory ADDRESS 003. Variable Y address is 005 so that mnemonic LDA Y instruction code is 01005 (LDA is code 01 and Y is at address 005). The constant 921 must also be stored in memory also, at address 004.
Fetch/Execute ALGORITHM:
How does the CPU execute a program? Simply stated, the CPU executes a program by repeatedly FETCH'ING one instruction from memory and EXECUTE'ING that instruction in an endless cycle called the FETCH/EXECUTE cycle. To execute a program, the Hypothetical Machine CPU implements this cycle in five steps as below.
| General Fetch/Execute | Hypothetical Machine Fetch/Execute |
| 0. Initialize CPU | PC = 0 |
| 1. Fetch instruction at PC | MAR = PC MDR = Memory[ MAR ] IR = MDR |
| 2. PC = next instruction address | PC = PC + 1 |
| 3. Decode instruction | OP = IR operation field Addr = IR address field |
| 4. Execute operation OP | |
| 5. Go to 1 |
CPU Architecture
As the CPU executes a program, it must maintain knowledge of the cumulative result or state of the execution of all previous program instructions. The CPU architecture consists of internal memory: PC, IR, A, B, MDR, MAR registers and external memory to hold data to maintain the state of execution. The register function, memory size and range are defined as:
PC
IR
A, B
MAR
MDR
Memory 3 digits, holds address in range 000 to 999 of next instruction. 5 digits, holds instruction (2-digit operation and 3-digit operand). 5 digits, holds numerical value in range -99999 to 99999. 3 digits, holds address in range 000 to 999 of Memory index. 5 digits, holds numerical value accessed to/from Memory[ MAR ]. 5 digits, array indexed from 000 to 999, holds values -99999 to 99999
INSTRUCTIONS - The CPU executes the following instructions.
Code Operation Result 01 LDA addr A <- Memory[addr] 02 STA addr Memory[addr] <- A 03 XAB A <-> B 04 CLA A <- 0 20 ADD addr A <- A + Memory[addr] 21 SUB addr A <- A - Memory[addr] 22 MUL addr A B <- A * Memory[addr] 23 DIV addr A <- A B / Memory[addr]
B <- A B % Memory[addr]30 JMP addr PC <- addr 99 HLT Halt program execution
Fetch/Execute of Z = Y + 921 - A more detailed view of the Fetch/Execute of our above program:
0. Initialize CPUPC = 0 1. Fetch instruction at PC
2. PC = PC + 1
3. Decode
4. Execute OP
5. GO TO 1.LDA Y
Memory ADDRESS 000 = 01005
PC = 1
OP=01 ADDR=005
A = Y = 4800 1. Fetch instruction at PC 2. PC = PC + 1 3. Decode 4. Execute OP 5. GO TO 1.ADD 921
Memory ADDRESS 001 = 20004
PC = 2
OP=20 ADDR=004
A = A + 921 = 5721 1. Fetch instruction at PC 2. PC = PC + 1 3. Decode 4. Execute OP 5. GO TO 1.STA Z
Memory ADDRESS 002 = 02006
PC = 3
OP=02 ADDR=006
Z = A = 5721 1. Fetch instruction at PC 2. PC = PC + 1 3. Decode 4. Execute OP 5. GO TO 1.JMP 007
Memory ADDRESS 003 = 30007
PC = 4
OP=30 ADDR=007
PC = 007 1. Fetch instruction at PC 2. PC = PC + 1 3. Decode 4. Execute OPHLT
Memory ADDRESS 007 = 99000
PC = 8
OP=99 ADDR=000
Instruction Cycles
Clock rate is often used by computer manufacturers as a raw measure of performance. The Hypothetical Machine operation is timed by a central clock. We have seen that the Fetch/Execute cycle consists of a series of several steps where each step takes one clock tick so the faster the clock ticks, the faster the machine executes. The LDA Y instruction of the Hypothetical Machine requires 7 total steps, 4 steps to fetch and 3 steps to execute. With a 1 Hz. clock (one tick per second), the LDA Y instruction would require 7 seconds. With a 1000 Hz. clock (1000 ticks per second), only 7/1000 seconds. With a 400 MHz. clock (400 million ticks per second), only 7/400,000,000 seconds. Note that not all instructions take the same number of steps to execute and modern CPUs can perform multiple steps simultaneously.
A more detailed (and accurate) examination of instruction execution is given below showing each data movement within the CPU registers. The Hypothetical Machine Simulator used in Homework 1 can be used to trace the Fetch/Execute cycle.
Example: LDA Y 010051. Fetch 1. MAR <- PC = 000 2. MDR <- (MAR) = (000) = 01005 3. IR <- MDR = OP = 01 ADDR = 005 2. 4. PC <- PC + 1 = 004 3. Execute 5. MAR <- ADDR = 005 6. MDR <- (MAR) = 04800 7. A <- MDR = 04800 Example: JMP 007 300071. Fetch 1. MAR <- PC = 003 2. MDR <- (MAR) = (003) = 30007 3. IR <- MDR = OP = 30 ADDR = 007 2. 4. PC <- PC + 1 = 004 3. Execute 5. PC <- ADDR = 007
2.1.4 How Programs Run
Load and execute process.
- You click on or type an executable program name.
- OS allocates memory and copies file into memory.
- OS hands over CPU to the program.
- Program executes until
finished - OS takes back CPU and reclaims memory
program waiting for input/output or time interval exhausted - OS takes back CPU and schedules waiting program to execute.
2.2 IA-32 Processor Architecture
2.2.1 Modes of operation
Protected - each program assigned separate segment of memory. Access up to 4 Gb.
Real-address - protected but program can switch modes. Limit of 1 Mb memory.
System management - allows OS to implement non-user functions (e.g. power management)
2.2.2 Basic Execution Environment
General 8, 16 and 32 bit register
Pointer and index registers are 32 bit
32 bits addresses 4 Gb
2.2.3 Floating-Point Unit
CPU has ALU for integer arithmetic
FPU separate processor for floating point math.
2.3 IA-32 Memory Management
2.3.1 Real-Address Mode
4 Gb divided into 1 Mb segments.
Hardware prevents programs from accessing another's segment.
16-bit registers = 1 Mb. addressing
2.3.2 Protected mode
32-bit addressing = 4 Gb.
Recall 32-bit pointer and index registers.
Flat memory model is 32-bit.
Paging
Programs can be larger than physical memory by swapping 4096 byte blocks between memory and disk.
2.4 Components of an IA-32 Microcomputer
Motherboard
PCI and PCI Express Bus Architecture
Motherboard Chipset
Intel 8237 DMA controller - high speed transfers between external devices and RAM
Intel 8259A Interrupt controller - interface between external hardware interrupt requests and CPU
8254 Timer Counter
2.4.2 Video Output
Most current systems have pipelined graphics processor (e.g. NVIDIA) capable of executing OpenGL and/or DirectX graphics instructions.
2.4.3 Memory
2.4.4 Input-Output Ports and Device Interfaces
CPU connected to memory and external world via data and address bus.
Each external device has an address.
A printer with address 15 could be sent character 'X' to print by instruction similar to:
Out 15, 'X'
2.5 Input-Output System
Hierarchy of software layers define I/O system.
Application programs written in C++ or other high-level language should work correctly across computers and OS.
Application program calls an output operation (e.g. System.out) that is implemented in the next lower level.
OS (e.g. Windows) should handle I/O same on different hardware. OS is in large part a set of functions called by application programs.
OS layer calls BIOS layer.
BIOS is specific to the hardware.
BIOS is set of Basic Input and Output System functions to interface non-hardware specific I/O functions to specific hardware.
A BIOS function for drawing a character on the screen is different on 1986 computer but the same C++ program could run on a 2010.
In theory, assembly language programs have access to all lower level layers.
In reality, a multitasking OS such as Windows generally controls access, all I/O is necessarily through the OS.