# THE MEMORY HIERARCHY



This work supported by UNSW and HP through the Gelato Federation

# CACHE DESIGN

- → Cache Levels
  - L1/L2 usually on chip
  - L2 usually unified
- L3 much larger
- L1 tied to clock rate, lower levels tied to miss cost of L1
- → Cache Lines
- → Split cache

Slide 3

Slide 4

- Instruction cache
- Data cache

# CACHE ORGANISATION

#### Where can a cache line live?:

- ➔ Direct Mapped
- $\rightarrow$  Fully Associative
- → Set Associative
  - *n-way set associative* is to divide total cache into *n* compartments.
  - Line may live anywhere in set *total\_blocks* mod *n*

How do we find a cache line?:

- → Index
- → Tag
- $index_bits = log_2(\frac{\frac{cache_size}{associativity}}{line_size})$





# → Fast == Expensive

Slide 2

Slide 1

- → Exploit locality
  Spatial
  - Temporal



Registers



WHERE IS THAT LINE - TAGS AND INDEXES

QUICKLY - OTHER CACHE PARAMETERS

- → Replacement Policy
  - LRU/Random/FIFO

# Slide 6

- Write policyThrough
  - Back
- → Inclusive or Exclusive?



## CHRONOLOGY OF A CACHE HIT

- ① Processor loads from an address
- ② Address is requested in the cache
- ③ Index selects offset within (all) ways
- ④ Tag selects correct entry from set
- Slide 8 <sup>⑤</sup> Data is retrieved from selected line

## Cache addressing: This address is a virtual address

- → Virtual addresses may alias
- Cache must be coherent
- **Synonyms** :  $\Delta VA$ ,  $\equiv$  page
- **Homonyms** :  $\equiv$  page,  $\Delta$  VA





PHYSICALLY TAGGED, VIRTUALLY INDEXED





**x** Flush cache on context switch (Sledgehammer)

#### Software Approaches:

- Slide 12 → SASOS
  - → Mungi
  - → Colouring
    - → Make shared items align in the cache (SunOS)
    - → Globally visible shared region (OS/2)

#### DEALING WITH ALIASING

#### Hardware Approaches:

- → Reverse Maps
  - → Back Pointers (MIPS R6000, Alpha?)
- Slide 13

Slide 14

- → Dual Directories→ ASID or segmentation
  - Add bits to distinguish VAs
  - 🗴 Makes sharing harder
  - Itanium Region Registers

### DEALING WITH MISSES

# So far, everything has been about hit latency

## Slide 15 Types of misses:

- Compulsory
- ② Capacity
- ③ Conflict



Way N Cache Ways

## MORE CACHE?

- Less compulsory misses
- 🗴 \$\$\$
- **Slide 16** *cache\_size = line\_size \* set\_index \* assocativity* 
  - $\uparrow$  cache means more *what*?
  - ➔ Greater line size
  - ➔ Greater associativity
  - → Greater index size

→ Data

# PREFETCHING EXAMPLE

## Walk an array in cache sized lines

| Metric              | ICC         | ecc         | Description                                                    |
|---------------------|-------------|-------------|----------------------------------------------------------------|
| Run Time (seconds)  | 0.824       | 1.507       | Program execution time                                         |
| L1D READ MISSES ALL | 12,514,832  | 12,505,200  | L1 Data Cache Read Misses                                      |
| BE_EXE_BUBBLE_GRALL | 129,629,838 | 737,891,717 | Full Pipe Bubbles in Main Pipe due to Execution Unit<br>Stalls |
| BE_EXE_BUBBLE_GRGR  | 0           | 0           | Back-end was stalled by exe due to GR/GR depen-<br>dency       |

# Why?:

Slide 19

| 400000000000940: | [MII] | (p16) | ld4 |
|------------------|-------|-------|-----|
| 40000000000946:  |       | (p17) | add |
| 4000000000094c:  |       |       | nop |
| 40000000000950:  | [MMB] | (p16) | lfe |
| 40000000000956:  |       |       | nop |
| 40000000000095c: |       |       | br. |
|                  |       |       |     |

[MII] (p16) ld4 r32=[r3],64 (p17) add r34=r35,r33 nop.i 0x0 [MMB] (p16) lfetch.nt1 [r2],64 nop.m 0x0 br.ctop.sptk.few 40000000000940 <walk+0x80>;;

PREFETCHING

**OTHER MISS PENALTY REDUCTION SCHEMES** 

## → Requires non-blocking cache

Slide 18 Concept Hide latency by overlapping execution with fetching.

→ Critical word first

→ Victim caches→ Way prediction

→ Trace Cache

Slide 17

- → Very useful for loops
- → *stride* is distance a loop jumps in memory
- → Hardware or Software based

|          | IF YOU ARE STILL AWAKE, I OWE YOU A BEER |  |  |  |  |  |
|----------|------------------------------------------|--|--|--|--|--|
| Slide 20 | QUESTIONS?                               |  |  |  |  |  |

#### Prefetching Example