



### **Pipeline Rules Operands and Execution** The register file is the primary source of bindings issued in the No overtaking : instructions proceed "up" the pipeline in order results pipeline, since it provides instruction operands. Algorithms for issuing operands must be carefully selected to Algorithms for issuing operands must be carefully selected to maximise throughput maximise throughput If the register file does not rovide the correct operands in the If an instruction's operand bindings are valid and the stage it is results pipeline, an instruction may stall the pipeline in contains the required functional unit for it to execute, it may do so. Once executed, the result is entered into the destination indefinitely binding, which is then marked valid Bindings are also provided by executed instructions, which require only local logic to ensure the correct values are passed Once an instruction is executed, at least one copy of its down the results pipeline destination binding is entered into the results pipeline An un-executed instruction cannot pass the last stage able to execute it. Therefore, an un-executed instruction must stay in the pipeline until executed When Pipeline Bindings Match **Operation Example** If a valid result binding matches an invalid source binding, copy the result value to the source value and mark the source valid. PC = 102 B := A + B

 $= 103 \quad D := C$ 

Result pipe

Remarks

Registers contain: A[14]B[2]C[3]D[21]

Fetch, send source names B, C to reg file

Stage

R

0

2

Instruction pipe

PC = 101

- If an invalid destination binding matches a valid result binding, mark the result binding invalid.
- If a valid destination binding matches a result binding, copy the destination value into the result value and mark the result valid.
- If an invalid destination binding matches a valid result binding, mark the result binding invalid.



# **Operation Example Continued**

| Stage | Instruction pipe            | Result pipe          | Remarks                                   |
|-------|-----------------------------|----------------------|-------------------------------------------|
| R.    |                             | $B[2]C[3]\downarrow$ | Registers contain: A[14]B[2]C[3]D[21]     |
| 0     |                             |                      |                                           |
| 1     |                             |                      |                                           |
| 2     | $A[] := B[] + C[] \uparrow$ |                      |                                           |
| 1     | $PC = 102^{+}$              |                      | Fetch, send source names A, B to reg file |

| Stage | Instruction pipe             | Result pipe | Remarks                               |
|-------|------------------------------|-------------|---------------------------------------|
| R.    |                              | A[14]B[2]   | Registers contain: A[14]B[2]C[3]D[21] |
| 0     |                              | B[2]C[3]    | Musta'i swap with instruction below.  |
| 1     | $[A[] := B[] + C[] \uparrow$ |             | Musta'i swap with result above.       |
| 2     | $B[] := A[] + B[] \uparrow$  |             |                                       |
| I     | PC = 103                     |             | Fetch delayed due to cache miss.      |





# **Operation Example Continued**

| Stage | Instruction pipe    | Result pipe          | Remarks                               |
|-------|---------------------|----------------------|---------------------------------------|
| R     |                     | A[14]B[2]            | Begisters contain: A[14]B[2]C[3]D[21] |
| 0     | A[] := B[2] + C[3]  | $B[2]C[3]\downarrow$ | Garner B, C; execute                  |
| 1     | B[] := A[] + B[]    |                      |                                       |
| 2     |                     |                      |                                       |
| 1     | $PC = 103 \uparrow$ |                      | Fetch, send source name C to reg file |



# Operation Example Continued

| Staye | Instruction pipe               | Result pipe | Remarks                               |
|-------|--------------------------------|-------------|---------------------------------------|
| R     |                                | A[14]B[2]   | Begisters contain: A[14]B[2]C[1]D[21] |
| 0     | $A[5] := B[2] + C[3] \uparrow$ | A[5]        | Insert result                         |
| 1     | $B[] := A[] + B[2] \uparrow$   | B[2]C[3]    | Garner B                              |
| 2     | $D[] := C[] - 1 \uparrow$      |             | Literal -1 held in hinding value      |
| 1     | PC = 104                       |             | Fetch, send source names to reg file  |

| Stage: | Instruction pipe               | Result pipe | Remarks                              |
|--------|--------------------------------|-------------|--------------------------------------|
| R      | $A[5] := B[2] + C[3] \uparrow$ | A[]B[2]     | Begisters contain: A[5]B[2]C[3]D[21] |
| 0      | $B := A[5] + B[2] \uparrow$    | A[5]        | Gamer A, essente                     |
| 1      | $D := C[3] - 1^{+}$            | B[2]C[3]    | Gamer C, execute                     |
| 2      |                                |             |                                      |
| I      | PC = 105                       |             | Fetch, send source names to reg file |

# Handling Traps

The instruction that generates the trap sends a special trap
binding down the results pipeline, thereby affecting subsequent instructions

When the special trap binding reaches the "bottom" of the pipeline, it is interpreted as such by the program control logic

When setting the trap or branch binding, information about the instruction can also be held, such as its program counter value





# Handling Branches

### Similar to traps

When a branch instruction enters the pipeline, it has the condition code register as a source

The processor makes a branch prediction and continues to issue instructions up the pipeline

If the prediction is correct, execution is not altered

If a misprediction occurs, the branch instruction sends a wrong-branch-result binding down the results pipeline, which disables subsequent (now invalid) instructions.





## **Functional Units**

Different stages can be capable of different types of processing

Multiple-cycle operations may be implemented as sidings

Instructions "drop off" the operands and retrieve the result further up the pipeline. Can be a functional unit or specialised coprocessor

This procedure removes the need for stalling while waiting for lengthy operations to complete, if sidings are pipelined

An invalidated instruction using a siding must still retrieve the results in order to ensure the sidings and pipeline are coordinated





## **Handling Branches**

When the result reaches the control logic at the bottom, the program counter is adjusted to the correct value.

Stages such as the instruction decoder and program control logic can also be separated by stages, with required communication occuring via the results pipeline.

Last Pipeline Rule : If a result binding is either trap-result or wrong-branch-result, mark the instruction invalid. The instruction may proceed up the pipeline, but it will have no side effects on the results pipeline or the register file.





# Functional Units





### **Register Files and Caches**

Multiple register files can be supported, for example a floating point register file located after all the floating point functional units and an integer register file after all integer units

Any instruction which may alter a register file must execute before passing it, so the results are not lost

To reduce the latency involved in fetching instructions from instruction register and passing them down the pipeline, register caches can be used

Locating a register cache above the instruction decoder enables bindings to be filled by the results pipeline



## **Register Files and Caches**

Source bindings are filled by the register cache if the register has a valid entry in the cache

- Results which pass the cache are entered as valid
- Any instruction passing the register cache which may alter a valid entry must invalidate it.
- Traps and mispredicted branch instructions must invalidate cache entries modified by instructions after the trap or branch

Rather than keep track of such instructions, the entire cache is swept clean.





### **Register Files and Caches**

Requests for source operands need only be issued when required registers are not valid in cache

- This reduces the number of operand requests sent to the register file
- The path from the register file to the decode and register cache stages is long.

Communications are pipelined and the register cache is considered as a regular pipeline stage, governed by the pipeline rules, in order to speed up the connection between the ends of the pipeline

Cache bindings are then maintained by the pipeline rules

### **Proposed Implementations**

Many things can be modified – number of stages, number of sidings, where sidings pick up operands and return results, caching, etc

Terminology : "Packet" refers to the bindings held at a stage in either the instruction or results pipeline. In the first example, the packet size of the results pipeline was two

The number and contents of the bindings in the instruction pipeline bindings and results pipeline bindings can be varied for different implementations

If the instruction pipeline has a packet size greater than one binding, then the implementation is like a conventional multi-scalar design

Instruction

Stage 0

Results







### **Rotary Pipeline**

- The results pipeline is converted to a loop so that registers circulate around the stages of the outer pipeline
- Pipeline stages are characterised by the functional unit stored there rather than data held there. Operands are sourced from the circulating registers
- This eliminates the need to stall if an instruction reaches the last stage able to execute it, since there is no longer any last stage.
- Registers/results are not kept in lock-step, but lapping is not allowed in order to maintain consistency
- If data dependencies exist, execution is prepared and begins as soon as the data is available



### **Rotary Pipeline Organisation**

- Only the required registers are passed around the pipeline
- The number of required registers is proportional to the number of functional units
- This means that rotary pipeline processors can be very large structures if many functional units are used, since support for each "loop" must be added
- Sequential instructions are issued in the same direction as register pipeline flow
- If a register holds condition codes, it can be passed around as a special type of register to pass the signal to each stage



### **Rotary Pipeline Organisation**



### **Rotary Pipeline Execution**

- The pipeline has to be stalled if no appropriate stages (functional units) are available
- Speculative execution can either be achieved by postponing committing the result until the speculation is confirmed or by writing to temporary registers
- Sequential instructions are issued in the same direction as register pipeline flow
- If an exception is raised, instructions must be cancelled behind the current instruction, but care must be taken not to cancel instructions at another point in the loop which is actually ahead of the instruction.





