

## Datapoint 2200



→ Released 1970

- → 2K shift register main memory
- → CPU: ~100 TTL components
- → Instruction set implemented by Intel in the 8008





# Legacy

- → CISC ISA
- → Lack of registers
- → Four modes of operation
- → Segmentation
- → Superscalar
- → Complex addressing modes

### Microarchitectures

- → New one "every two years"
- → Focus on P6







## Instruction decode

- → Pentium 4:Trace cache
- → Pentium M, Core: µ-op fusion
- → ARM: Single instruction per clock
- → IA-64:Two identical decoders (currently)

# P6: "Register renaming"

- → Performed by ROB
- → 40 physical registers in RAT
- → ARM: Lots of GPRs
- → IA-64: Lots of GPRs and register rotation

# Avoiding stalls Out-of-order execution Branch prediction Speculation









# P6: Branch prediction Branch target buffer: 512 entries Branch history and predicted address Mispredicts: 10-15 cycles Static prediction ~90% hit rate Pentium M: Loop detection

# P6: Speculation



|         | Re  | egi | sters                 |
|---------|-----|-----|-----------------------|
| 32      | 15  | 7 ( | 2                     |
| <br>EAX | AH  | AL  | - Accumulator         |
| ECX     | СН  | CL  | - Count               |
| EDX     | DH  | DL  | — Data                |
| EBX     | BH  | BL  | Base of data          |
| EBP     | В   | Р   | Base of stack         |
| ESP     | S   | Р   | - Stack pointer       |
| ESI     | S   | SI  | String source idx     |
| EDI     | D   | )   | - String dest idx     |
| EIP     | IF  | Ρ   | - Instruction pointer |
| EFLAGS  | FLA | GS  | - CPU flags           |
|         | С   | S   | Code segment          |
|         | S   | S   | - Stack segment       |
|         | D   | S   | – Data segment        |
|         | E   | S   | Extra data segment    |
|         | E   | S   | Extra data segment 2  |
|         | G   | S   | Extra data segment 3  |
|         |     | _   | ·                     |



| 127 | XMMO         |                     |
|-----|--------------|---------------------|
|     | XMM0<br>XMM1 |                     |
|     | XMM2         | - SSE GP register 2 |
|     | XMM3         | - SSE GP register 3 |
|     | XMM4         | - SSE GP register 4 |
|     | XMM5         |                     |
|     | XMM6         | - SSE GP register 6 |
|     | XMM7         | - SSE GP register 7 |

|      | MMX     |                      |  |
|------|---------|----------------------|--|
|      |         |                      |  |
| 9 63 |         | 0                    |  |
| FPR  | MMO     | MMX GP register 0    |  |
| FPR  | MM1     | MMX GP register 1    |  |
| FPR  | MM2     | - MMX GP register 2  |  |
| FPR  | MM3     | - MMX GP register 3  |  |
| FPR  | MM4     | — MMX GP register 4  |  |
| FPR  | MM5     | — MMX GP register 5  |  |
|      | MMG     | - MMX GP register 6  |  |
| FPR  | IVIIVIO | Minist on Toglotor o |  |









| compute_nodes()                             |                                               |                                                                                                                                                                          |  |  |  |
|---------------------------------------------|-----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| The code                                    | ARM                                           | IA-32                                                                                                                                                                    |  |  |  |
| other_value =<br>localnode->from_values[i]; | ldr r4, [r0, #12]<br>ldr r2, [r4, lr, lsl #2] | <pre>mov ecx,DWORD PTR [eax+12] mov eax,DWORD PTR [ecx+ebx*4]</pre>                                                                                                      |  |  |  |
| cur_value -= coeff * value;                 | mufd f0, f2, f1<br>sufd f3, f3, f0            | mulsd xmm1,xmm2<br>subsd xmm0,xmm1                                                                                                                                       |  |  |  |
| <function prologue=""></function>           | stmdb sp!, {r4, r5, lr}                       | <pre>push ebp<br/>mov ebp, esp<br/>and esp, 0xffffff8<br/>sub esp, 0x10<br/>mov DWORD PTR [esp], esi<br/>mov DWORD PTR [esp+4], edi<br/>mov DWORD PTR [esp+8], ebx</pre> |  |  |  |
| <function epilogue=""></function>           | ldmia sp!, {r4, r5, pc}                       | <pre>mov esi, DWORD PTR [esp] mov edi, DWORD PTR [esp+4] mov ebx, DWORD PTR [esp+8] mov esp, ebp pop ebp ret</pre>                                                       |  |  |  |





- → 8087: Stack architecture
- → P III, P4, Core: SSE, SSE2, SSE3, SSE4

## Thread-level parallelism

- → SMP
- → Pentium Extreme Edition
- → Core Duo
- → Hyperthreading









