# **Processes and Threads** Implementation

**Learning Outcomes** 

- A basic understanding of the MIPS R3000 assembly and compiler generated code.
- An understanding of the typical implementation strategies of processes and threads
  - Including an appreciation of the trade-offs between the implementation approaches
    - Kernel-threads versus user-level threads
- · A detailed understanding of "context switching"



1

2

1 🏭 UNSW

### **MIPS R3000**

- Load/store architecture
  - No instructions that operate on memory except load and
  - Simple load/stores to/from memory from/to registers
    - Store word: sw r4, (r5)
    - Store contents of r4 in memory using address contained in register r5
    - Load word: 1w r3, (r7)
      - Load contents of memory into r3 using address contained in r7
         Delay of one instruction after load before data available in destination

      - register

         Must always an instruction between a load from memory and the
  - subsequent use of the register.

    •lw, sw, lb, sb, lh, sh,...



### **MIPS R3000**

- Arithmetic and logical operations are register to register operations
  - E.g., add r3, r2, r1
  - No arithmetic operations on memory
- Example
  - •add r3, r2, r1  $\Rightarrow$  r3 = r2 + r1
- · Some other instructions
  - •add, sub, and, or, xor, sll, srl
  - •move r2, r1  $\Rightarrow$  r2 = r1

3

## **MIPS R3000**

- All instructions are encoded in 32-bit
- Some instructions have immediate operands
  - Immediate values are constants encoded in the instruction itself
  - Only 16-bit value
  - Examples
    - Add Immediate: addi r2, r1, 2048 ⇒ r2 = r1 + 2048 • Load Immediate : li r2, 1234
    - ⇒ r2 = 1234

Example code Simple code example: a = a + 1lw r4,32(r29) // r29 = stack pointerli r5, 1 add r4, r4, r r4,32(r29) Offset(Address)

5





• RISC architecture – 5 stage pipeline
• Instruction partially through pipeline prior to jmp having an effect

• Instruction partially through pipeline prior to jmp having an effect

| Deacho | Figure | 1.1 | Figure

Jump and Link Instruction jal 0x10 1f • JAL is used to implement function calls 0x14 nop • r31 = PC+8 0x18 lw r4,(r6) Return Address register (RA) is used to return from function call 1: 0x2a sw r2, (r3) 0x38 jr r31 0x3a nop 10

9

Compiler Register Conventions

• Given 32 registers, which registers are used for
• Local variables?
• Argument passing?
• Function call results?
• Stack Pointer?

11 12





**Function Stack Frames**  Fach function call allocates a Stack new stack frame for local variables, the return address, f1() stack previous frame pointer etc. - Frame pointer: start of current frame Frame stack frame - Stack pointer: end of current Pointer f2() stack stack frame frame Example: assume f1() calls Stack f2(), which calls f3(). Pointer

**Function Stack Frames**  Fach function call allocates a Stack new stack frame for local variables, the return address, f1() stack previous frame pointer etc. - Frame pointer: start of current frame stack frame Stack pointer: end of current f2() stack stack frame frame • Example: assume f1() calls Frame f2(), which calls f3(). Pointer f3() stack frame Stack Pointer

15 16



17 18

```
0040011c <m
  40011c:
                          addiu sp,sp,-40
  400120:
            afbf0024
                          sw
                                ra,36(sp)
            afbe0020
                                s8,32(sp)
  400124:
 400128:
            03a0f021
                                s8,sp
 40012c:
            24020005
                         li
                                v0,5
  400130:
            afa20010
                                v0,16(sp)
 400134:
            24020006
                         li
                                v0.6
            afa20014
                                v0,20(sp)
  400138:
 40013c:
            24040001
                                a0,1
 400140:
            24050002
                         1i
                                a1,2
            24060003
 400144:
                                a2,3
 400148:
            0c10002c
                         jal
                                4000b0 <sixargs>
            24070004
 40014c:
                         li
                                a3,4
  400150:
            afc20018
                                v0,24(s8)
 400154:
            03c0e821
                                sp,s8
 400158:
            8fbf0024
                         lw
                                ra,36(sp)
  40015c:
            8fbe0020
                                s8,32(sp)
 400160:
            03e00008
                         jr
                                ra
 400164:
            27bd0028
                         addiu sp,sp,40
                                                                  19 🐺 UNSW
```

004000b0 <sixargs>: 4000b0: 27bdfff8 addiu sp,sp,-8 4000b4: afbe0000 s8,0(sp) s8,sp 4000b8: 03a0f021 a0,8(s8) 4000bc: afc40008 afc5000c 4000c0: a1,12(s8) 4000c4: afc60010 a2,16(s8) 4000c8: afc70014 a3,20(s8) 4000cc 8fc30008 lw 4000d0: 8fc2000c v0,12(s8) 4000d4: 00000000 nop 4000d8: 00621021 addu v0.v1.v0 4000dc: 8fc30010 lw v1,16(s8) 4000e0: 00000000 4000e4: 00431021 v0,v0,v1 4000e8: 8fc30014 lw v1,20(s8) 00000000 4000ec: nop 4000f0 · 00431021 v0 v0 v1 4000f4: 8fc30018 lw v1,24(s8) 4000f8: 00000000 20 JUNSW

19 20

4000fc: 00431021 addu v0,v0,v1
400100: 8fc3001c lw v1,28(s8)
400104: 00000000 nop
400108: 00431021 addu v0,v0,v1
40010c: 03c0e821 move sp,s8
400110: 8fbe0000 lw s8,0(sp)
400114: 03e00008 jr ra
400118: 27bd0008 addiusp,sp,8



21 22

**Process Memory Process** Layout • Minimally consist of three segments Stack · contains the code (instructions) • Data • Global variables Stack Gap • Activation records of procedure/function/method Local variables • Note: data can dynamically grow up Data • E.g., malloc()-ing • The stack can dynamically grow down E.g., increasing function call depth or recursion Text 23 UNSW



23 24













29 30





### **User-level Threads**

- Implementation at user-level
  - User-level Thread Control Block (TCB), ready queue, blocked queue, and
  - Kernel has no knowledge of the threads (it only sees a single process)
  - If a thread blocks waiting for a resource held by another thread inside the same process, its state is saved and the dispatcher switches to another ready thread
  - Thread management (create, exit, yield, wait) are implemented in a runtime support library

33

### **User-Level Threads**

- Pros
  - Thread management and switching at user level is much faster than doing it in kernel level
    - No need to trap (take syscall exception) into kernel and back to switch
  - Dispatcher algorithm can be tuned to the application
  - · E.g. use priorities
  - Can be implemented on any OS (thread or non-thread aware)
  - Can easily support massive numbers of threads on a per-application basis
    - Use normal application virtual memory
    - Kernel memory more constrained. Difficult to efficiently support wildly differing numbers of threads for different applications.

34

### **User-level Threads**

- Cons
- Threads have to yield() manually (no timer interrupt delivery to user-

  - Co-operative multithreading
     A single poorly design/implemented thread can monopolise the available CPU time
  - There are work-arounds (e.g. a timer signal per second to enable pre-emptive multithreading), they are course grain and a kludge.
- Does not take advantage of multiple CPUs (in reality, we still have a single threaded process as far as the kernel is concerned)

35 JUNSW

35 36

# **User-Level Threads** • If a thread makes a blocking system call (or takes a page fault), the process (and all the internal threads) blocks Can't overlap I/O with computation User Mode Scheduler Scheduler Kernel Mode 36 JUNSW





# Kernel Threads • Threads are implemented in the kernel • TCBs are stored in the kernel • A subset of information in a traditional PCB • The subset related to execution context • TCBs have a PCB associated with them • Resources associated with the group of threads (the process) • Thread management calls are implemented as system calls • E.g. create, wait, exit

Kernel Threads

• Cons

• Thread creation and destruction, and blocking and unblocking threads requires kernel entry and exit.

• More expensive than user-level equivalent

39 40



1. Hardware stacks program counter, etc.
2. Hardware loads new program counter from interrupt vector.
3. Assembly language procedure saves registers.
4. Assembly language procedure sets up new stack.
5. C interrupt service runs (typically reads and buffers input).
6. Scheduler decides which process is to run next.
7. C procedure returns to the assembly code.
8. Assembly language procedure starts up new current process.

Skeleton of what lowest level of OS does when an interrupt occurs — a context switch

41 42





Context Switch
 Context switch must be transparent for processes/threads

 When dispatched again, process/thread should not notice that something else was running in the meantime (except for elapsed time)
 ⇒OS must save all state that affects the thread

 This state is called the process/thread context
 Switching between process/threads consequently

results in a context switch.

Simplified Explicit Thread Switch

45 46





47 48





• Call 'C' code to process syscall, exception, or interrupt
• Results in a 'C' activation stack building up

SP

(C' activation stack trapframe)

51

Example Context Switch

• Any other existing thread must
• be in kernel mode (on a uni processor),
• and have a similar stack layout to the stack we are currently using

Kernel stacks of other threads/processes

| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |
| Kernel State | C' activation stack | trapframe |

Example Context Switch

• We save the current SP in the PCB (or TCB), and load the SP of the target thread.

• Thus we have switched contexts

SP

Kernel State 'C' activation stack trapframe
Kernel State 'C' activation stack trapframe
Kernel State 'C' activation stack trapframe

53 54







Example Context Switch

• The user-level SP is restored

SP

Kernel State 'C' activation stack trapframe

Kernel State 'C' activation stack trapframe

58 UNSW

57 58

The Interesting Part of a Thread Switch

• What does the "push kernel state" part do???

SP

Kernel State C' activation stack trapframe

Kernel State C' activation stack trapframe

static
void
thread\_switch(threadstate\_t newstate, struct wchan \*wc)
{
 struct thread \*cur, \*next;
 cur = curthread;
 do {
 next = threadlist\_remhead(&curcpu->c\_runqueue);
 if (next == NULL) {
 cpu\_idle();
 }
 } while (next == NULL);

/\* do the switch (in assembler in switch.\$) \*/
switchframe\_switch(&cur->t\_context, &next->t\_context);

basics of pick next thread and run it remain

59 60





OS/161 switchframe\_switch

/\* Get the new stack pointer from the new thread \*/

lw sp, 0(a1)
nop /\* delay slot for load \*/

/\* Now, restore the registers \*/
lw s0, 0(sp)
lw s1, 4(sp)
lw s2, 8(sp)
lw s2, 8(sp)
lw s3, 12(sp)
lw s4, 16(sp)
lw s5, 20(sp)
lw s5, 20(sp)
lw s6, 24(sp)
lw s8, 28(sp)
lw gp, 32(sp)
lw gp, 32(sp)
lw nop /\* delay slot for load \*/



63 64

