MIPS R4x00 Instruction Set
Courtesy Don Miller, Arizona State University

Introduction

references

MIPS  R4000 User's Manual, Joe Heinrich, Prentice-Hall, 1993 (class text).
MIPSpro™ Assembly Language Programmer's Guide, Huffman, L. and Graves, D., Silicon Graphics, 1996  available here.
Inside L4/MIPS, Gernot Heiser, UNSW, Sydney, Australia, Version 2.14, July 12, 2000,  pp. 10-11 and 24-33 (Class Notes I).
Computer Organization and Design the Hardware/Software Interface, 2nd Edition, Patterson, D. and Hennessy, J., Morgan Kaufmann, 1998. (cse 330 text).
MIPS Handout #1 - MIPS Instructions - a Subset
MIPS Handout #2 - Endianness 
MIPS Handout #3 - System Coprocessor Status Register - CP0 Register 12
MIPS Handout #4 - Endianness (continued)
MIPS Handout #5 - Super Simple Macro Example
MIPS Handout #6 - MIPS R4600 Pipeline

All references below to listing numbers, table numbers, figure numbers and section numbers are from Inside L4/MIPS version 2.14:
L 4.1, lines 1, 7, and 15 page 24 means see listing L 4.1, lines 1, 7 and 15 on page 24 of Inside L4/MIPS.
T 2.1 page 11 is Table 2.1 on page 11, Figure 2.3 page 15 is Figure 2.3 on page 15 and S 3.2 is section 3.2.
Examples from Inside L4/MIPS are preceded by listing number, line number(s) and page number like this:  
L 4.1 lines 1, 7, 15 page 24

overview

MIPS processors - a bunch of families

R2000, R3000, R4000, R5000 and R10000    <-- close to latest family

R4000 family - R4x00

So when it doesn't matter we will use R4x00, else will use R4600 to refer to MIPS R4600.
No significant differences between R4600 and R4700 
No significant differences between R4000 and R4400 (cse 530 text covers these)

R4x00

processor has 32  general registers - r0-r31
assemblers use symbolic names for these based on compiler conventions

Ref: Table 2.1 page 11
also see regdef.h

r0    zero    reads as 0 and writes to it are ignored
r1    at       assembler uses to store intermediate results of pseudo-instruction macros
r30  s8/fp  frame pointer (caller saved)
r31  ra       linkage register for jal instruction

.set at        a pseudo-op -  assembler uses atwithout a warning message/users get a warning message
.set noat    user programs can use at without a warning message/if assembler requires it you get a warning message

r26, r27    k0, k1    kernel's exception handlers use these as temporaries
                               don't use if there can be an exception, i.e., only use if interrupts are disabled
                               you need to be sure no page fault can occur
                                      no mapped addresses used
                                                            or
                                      all virtual memory pages already mapped

ti, si and vi    temporaries (callee saved), caller saved and integer function results - see Table 2.1

also there are:

pc        program counter or instruction pointer
hi/lo      for multiply and divide - we may not need
SR       status register aka. CP0 register 12 (CP0 is "System CoProcessor 0) see MIPS HO #3
                in other machines this is called PS or PSW (processor status register or word)

syntax: MIPS HO #1 uses $ prefix for registers - defines in   regdef.h gets rid of them in assembler source

MIPS R4x00 data sizes:

doubleword        dw        8 bytes
word                   w        4 bytes
halfword             hw        2 bytes
byte                    b        1 byte (8 bits)

There are 5 groups of instructions - see MIPS HO #1

· arithmetic
· logical
· data transfer
· conditional branch
· unconditional jump

and there's a few special ones (see below)

Note: RISC processors have
              · cpu operations
              · loads and stores
          RISC processors don't have
              · cpu operations on values in memory
                 e.g.,  no  add   A, B        where A and B are in memory
                 and   no  add   ri, A         where A is in memory
RISC processors have a lot of registers, e.g.,   ³   32 for intermediate storage in cpu
            - get a lot of their high performance from this

code listings - see L 1.1 page 2

.set reorder        default - allows assembler to reorder machine language instructions for efficiency
.set noreorder     enables you to Do It Yourself (DIY) for maximum efficiency

L 1.1 lines 0-3  page 2

0	mtc0    t0, C0_STATUS
1	    .....
2	    .....
3	move	a5, a4

assembler can't  reorder - this is indicated with  BOLD line numbers

L 1.1 lines 4-7  page 2

4 3:	lui	a0, KERNEL_BASE
5
6
7	and	a2, a0, a4
lines that can be reordered have non-bold line numbers

Notes below sort of follow Patterson and Hennessy's order of covering the instruction set - skipping a lot of things that should be familiar to you.

We only do 64-bit  cpu instructions and concentrate on the ones that come up in the L4 code covered in Inside L4/MIPS.

Arithmetic

add and subtract            

register format

op    rd, rs, rt                rd ¬ rs   op    rt

3 operands        rs    source register
                         rd   destination register
                         rt   target register

leftmost is destination
right two are sources
op is add or subtract

add    s0, s1, s2           s0 ¬ s1 + s2
add    s0, s0, s1           s0 ¬ s0 + s1

sub    s0, s1, s2            s0 ¬ s1 - s2
sub    s0, s0, s1            s0 ¬ s0 - s1

immediate format

immediate - 16 bit constant that gets sign extended

add immediate

addi       s0, s1, 100        s0 ¬ s1 + 10010

unsigned

add unsigned

unsigned  Þ   overflow not detected
(signed    Þ    overflow detected)

addu        s0, s1, s2       s0 ¬  s1 + s2

immediate unsigned 

addiu        s0, s1, 100        s0 ¬ s1 + 100

moving things around

moving to and from system control coprocessor (CP0) - aka. Status Register (SR)
see MIPS HO #3 on SR

dmtc0        rt, rd           CP0 register rd ¬  rt  
dmfc0        rt, rd             rt  ¬  CP0 register rd

2 operands        rt    general register
                         rd   coprocessor 0 register

L 4.1 lines 3, 12 page 24

3	dmfc0        k0, C0_ENTRYHI	/* k0 ¬ C0_ENTRYHI
12	dmtc0        k0, C0_ENTRYLO1	/* C0_ENTRY_LO1 ¬  k0         

notes:       C0_ENTRYHI and C0_ENTRYLO1 are 64 bit registers
                info flow is right to left for mtc0 (like a store)

mtc0        rt, rd
mtf0         rt, rd

L 4.4  lines 0, 2 page 30

mtc0        k0, C0_STATUS            /* C0_STATUS ¬  k00...31  
mfc0        k0, C0_STATUS            /* k0  ¬ (C0_STATUS31)32 || C0_STATUS

notes:         ||          º      concatenated
               bit31       º       bit number 31 (sign bit of a 32 bit 2's complement integer)
              (bit31)32   º      32 copies of bit 31
so for mfc0 - coprocessor 0 status register contents are sign extended and the result is moved to k0

 

Data Transfer

load and store

doubleword (8 bytes)
word (4 bytes)
halfword (2 bytes)
byte (8 bits)

op    rt, offset (base)  

3 operands        rt             destination for loads/source for stores
                         offset    16 bit base 10 - gets sign extended
                        
base       a general register
op is load or store

load        rt, offset (base)                  rt ¬  M [base + offset]
store       rt, offset (base)                  M [base + offset] ¬  rt

load and store doubleword

ld    s0, 100(s1)        s0 <-- M [s1 + 100]
sd    s0, 100(s1)       M [s1 + 100] <-- s0

L 4.1 lines 1, 15, 7 page 24

1	sd	t0, K_TLB_T0_SAVE(k0)	/* M[k0 + 15210] <-- t0	  [save t0]
	  .....
7	ld	t0, (k1)		/* t0 <-- M[k1 + 0]
  	  .....
15	ld	t0, K_TLB_T0_SAVE(k0)	/* t0 <-- M[k0 + 15210]    [restore t0]

Notes:
L 4.1 line 0 loads the base address of the  kernel_vars structure into k0
K_TLB_T0_SAVE indexes into kernel_vars to access a save area for t0
K_TLB_T0_SAVE is defined as 152 in kernel_h  ~ page 5
S 3.2 has more info

load and store word

lw    rt, offset (base)
sw    rt, offset (base)

loads:  word that's loaded is sign extended
stores: least significant word (lower numbered four bytes) are stored at memory address specified by base and offset

L 4.2 lines 17, 18 , 20, 21 page 26

17	lw	t0, (t2)	/* these instructions
18	lw	t1, 4(t2)	/* collectively do
	.....			/* the following:
20	sw	t0, 8(k1)	/* 8(k1) <-- t2
21	sw	t1, 12(k1)	/* 12(k1) <-- 4(t2)

load and store byte

same as load and store word except
loads:  byte is sign extended
stores: least significant byte is stored at address

 L4.3 line 20 page 28
Also see Figure 2.3 page 15 (and MIPS HO #3) and Figure 4.1 page 29.

20	sb	t0, -24(sp)	/* M[sp-24] <-- SR7...0

Note:    t0 has a copy of SR when line 20 is executed

endianness

See MIPS HOs #2 and #4 on Endianness
Some processors are Little-endian (LE) and some processors are Big-endian (BE) and some can be both.

Key point:
In a LE format the lsd has the lowest byte number and is stored at the lowest address.
In " BE   "         "   msd "    "      "         "       "          "    "      "     "   "      "          "

LE:    80x86/VAX/PDP-11
BE:    680x0/IBM 370/Sparc

MIPS and Alpha can be programmed to be either. L4 configures MIPS R4x00 as a Big-endian.

Since we are always on the same processor - endianness is only important to us if we want to access a subpart of a word or doubleword.

load upper immediate

lui       rt, immediate    /*  main idea     rt 31...16 ¬  immed
more exactly:        rt ¬ (immed15)32 || immed || 016
Put 16 bit immediate into upper half of  32 bit lower half of a 64-bit register and zero low order 16 bits. Then sign extend to fill upper half of 64 bit register.

L4.1 line 1 page 24

lui    k0, kernel_base        /* k0 <-- 0xffff ffff 8004 0000

Notes:
kernel_base is defined as 0x8004 (8004H) in kernel.h ~ page 4.
k0 is loaded with base address of statically allocated kernel data - physical address 0x40000
see Figure  3.1 page 18 and  S 3.2.1

Conditional Branch

Key point - RISC processors don't keep state  (e.g., <, >, =, etc.) around  in condition code bits in the PS word between instructions - this speeds up pipeline processing. HW to handle interrupts/exceptions is less complicated - pipeline flushing simpler/faster.

So how do RISC processors do conditional branching?
1.  branch based on the contents of 2 general registers
            conditional branch instructions.
2.  set a general register based on some comparison and then branch based on what's in this register
            set instruction
            conditional branch instruction

There are zillions of conditional branch instructions (37 of them).
We'll just consider a tiny subset that includes the ones in the L4 code.

branching based on the contents of 2 general registers

beq    rs, rt, label
bne    rs, rt, label

    rs   source register
    rt    target register
    label    branch target address

technically label is the  word offset from the instruction following the branch. It's  computed by taking the 16 bit offset field in the branch instruction, shifting it left by 2 bits and sign extending it.

beq    s1, s2, foo

if s1 = s2    goto foo
else    pc <-- pc + 4        /* all instructions are 4 bytes


bne    s1, s2, foo

    if s1 ¹ s2    go to foo
    else    pc <-- pc + 4

Typical usage - compare contents of a register to r0 (zero)

beq    s1, zero, foo

    if s1 = 0    goto foo
    else    pc <-- pc + 4

bne    s1, zero, foo
    if s1 ¹  0   goto foo
    else    pc <-- pc + 4

L 4.1 lines 8 and 17 page 24

8	bne	t0, k0, 2f	/* if t0 ¹ k0 goto 2f
	 .....
17 2:	j	tlb2_miss

L 4.3 line 13 and 25 page 28

13	beq	k1, zero, 1f	/* if k1 = 0 goto 1f
	  .....
25 1:	slti	t1, AT, MAX_SYSCALL_NUMBER

set instructions

all are variants of set on less than

slt    rd, rs, rt

    if     rs < rt        rd ¬ 1            /* rs, rt considered as signed integers
    else                 rd ¬ 0   

slti    rd, rs, immed

    if     r immed        rd ¬ 1            /* rs and immed are signed integers
    else                          rd ¬

L 4.3 line 25 page 28

25 1:	slti	t1, AT, MAX_SYSCALL_NUMBER	/* if syscall # < 7 then t1 <-- 1
Notes:
#define MAX_SYSCALL_NUMBER 7 is in syscalls.h ~ page 1

valid system call numbers are 0-6  

  

sltu, sltiu    -    same idea with rs, rd, immed considered as unsigned integers (natural numbers)

        slt    t0, s0, s1
        bne  t0, zero, foo        /* if s0 < s1 goto foo (t0 will be  ¹  0 in this case) 
goo:       ....                         /* else continue in-line at goo

more branches

bgtz rs, foo        if rs >  0    goto foo
bgez rs, foo       if rs ³  0    goto foo
blez rs, foo        if rs £  0    goto foo
bltz rs, foo         if rs <  0    goto foo

L5.1 line 0 page 48

0	bltz	sdesc, receive_only	/* if a0 < 0  goto receive_only

Note: register naming convention in IPC code is given in  T 5.1 page 48 -  sdesc is mnemonic for a0.

Unconditional Jumps

jump
j    label        /* goto label

target field in instruction is a  word offset from next instruction
this is really an unconditional branch - has more bits (26) so can go further away though

jump register
j    rs            /* goto address in rs

jump and link
jal label     /* goto label
                    ra ¬ pc + 8 
address of  instruction after  instruction in branch delay slot   is saved in r31
note: HO #1 seems wrong - it says:  ra <-- pc + 4

jalr    rs         /* goto address in rs
  
                   ra ¬ pc + 8

Logical Instructions

and, or, xor
register format

op    rd, rs, rt            r¬ rs op rt                    /* bitwise operations
op is and, or, exclusive or

and    s0, s1, s2            s0 ¬s1 & s2
or      s0, s1, s2            s0 ¬ s1 | s2
xor    s0, s1, s2            s0 ¬ s1 Å  s2

L 4.4 lines 4, 6 page 30

4	and	k0, k0, k1	/* k0 <-- k0 & k1
		.....
6	or 	k0, k0, k1	/* k0 <-- k0 | k1

L 4.2 line 9 page 26

9	xor	k0, t0		/* k0 <-- k0 Å t0

immediate format

op    rt, rs, immed        rt, <-- rs op immed
immed is a 16-bit constant that gets 0-extended

andi    s0, s1, immed        s0 <-- 048 || (immed & s115...0)
ori      s0, s1, immed        s0 <-- s163...16 || (immed | s115...0)
xor     s0, s1, immed        s0 <-- s1 Å  (048 || immed)

andi    s1, s1, 0xfffe        /* clear bit 0 of s1
ori      s1, s1, 0x0002      /* set bit 1 of s1

L 4.3 lines 12, 13 page 28
see also MIPS HO #3

12	andi	k1, t0, ST_KSU	/* k13,2 <-- SR processor mode at time of exception   
13	beq 	k1, zero, 1f	/* if this mode was kernel mode got0 1f

Notes:
when line 12 is executed, t0 has SR at time exception occurred
SR3,2 are KSU bit field:  kmode = 00/umode = 10
ST_KSU º 0x0018 in R4KC0_H  ~ page 2 - used to access KSU bit field in SR
so line 12 is:    k1 <-- 059 || KSU || 03

L 4.2.2 lines  0-2 page 30

0	mfc0	k0, C0_STATUS	/* collectively these lines
1	ori	k0, k0, ST_EXL	/* set the EXL bit in SR
2	mtc0	k0, C0_STATUS	/* to 1

Notes:
C0_STATUS º  Coprocessor register 12 (SR) - see  r4kc0.h ~ page 2
ST_EXL º  0x0002 - see r4kc0.h ~ page 2
see also MIPS HO #3

L 5.7 line 17 page 56

17	xori	v0, v0, L4_IPC_SRC_MASK  /* v0 <-- v0 Å 0x08

Note: L4_IPC_SRC_MASK º  0x08 - see ipc.h ~ page 2

shift instructions - dsll, dsrl, sll, srl, dsllv and dsrlv 

doubleword shifts

dsll    rd, rt, sa        rd <-- rt << sa        /* sa is shift amount, 0 into low order bits
dsrl    rd, rt, sa        rd <--rt >> sa         /* 0 into high order bits

L4.1 lines 4, 5 page 24

4    dsll	t0, k0, 38     /*    t0 ¬ k0 <<38 || 038      
5    srl	t0, t0, 47     /*    t0 ¬ 047 || t063...47

word shifts

sll    rd, rt, sa        /* acts like a sll in a 32-bit register - result is sign extended
srl    rd, rt, sa        /* acts like a srl in a 32-bit register - result is sign extended

L 4.3 lines 9, 10 page 28

9	srl	k1, 5	/* collectively these lines clear
10	sll 	k1, 5	/* KSU ERL EXL and IE fields of SR

Note: at start of line 9 k1 has processor status at time of exception
see also MIPS HO #3

run-time variable length shifts

dsllv    rd, rt, rs       /* same as dsll and dsrl except rs holds shift amount
dsrlv   rd, rt, rs

L4.2 line 5 page 26

5 1:	dsrlv	t1, k0, t1	/* t1 <-- (k0 >> (t1))

Special Instructions

syscall, eret, nop, move, b, macros and dla

syscall - system call

generates a system call exception
simplified (not already in exception mode and not in bootstrap mode)

EPC  <--  pc        /* address of instruction that caused exception is stored in EPC (CP0 R14)  
CAUSE <-- info on exception cause is saved in CAUSE register (CP0 R13) - some info is:
                       did exception occur in a branch delay slot
                       coprocessor unit #
                       interrupt pending mask at time of exception
                       exception code, e.g., TLB exception, breakpoint, syscall, address error
pc <-- 0x180        /*  180H a physical address addressed via CKSEG0  is put in pc
                             /* these physical addresses (PAs) start at 0xffff ffff 8000 0000

so boiled down

    EPC <-- pc
    CAUSE <--cause info
    pc <-- 0x180 - the starting address of the general exception handler 

note: there's just one syscall instruction in the MIPS R4x00
         in L4 higher level code:        at <-- syscall #        and general exception handler examines it to determine the type of system call

L4.3 lines 21 and 27-30 page 28

21	bne	AT, zero, 1f	/* if syscall is not ipc goto 1f (ipc is syscall 0)
	  .....
27	dsll	AT, 3			    /* collectively these 3 lines
28	daddu	t0, k0, AT		    /* index into system call jump table
29	ld	t0, K_SYSCALL_JMP_TABLE(t0) /* get address of the system call routine
30	jr	t0			    /* and transfer control to it

eret - exception return

simplified
    pc <-- EPC
    SR <-- current status word with EXL = 0      /* processor no longer at exception level

tlbwr

the four doublewords of TLB entry addressed by RANDOM register (CP0 register 1)
are loaded from CP0 registers  5, 10, 2 and 3  - RANDOM can be 0...47.
TLB G bit <-- EntryLo0G ·  EntryLo1G

        TLB [RANDOM] <-- PageMask || (EntryHi AND NOT PageMask) || EntryLo1|| EntryLo0
        TLB140 <-- EntryLoG ·  EntryLo1G

simplified
        TLB[ RANDOM] <-- a tlb entry

nop, move and b

not HW instructions - assembler generates 1 instruction equivalents such as the following

nop                              add r0, r0, r0                   L 4.3 line 7 page 28
move t2, sp                  addu t2, sp, r0                  L 4.3 line 14 page 28  - save sp in t2
b 5f                              beq r0, r0, 5f                    L 4.10 line 26 page 36
b send_only_short         beq  r0, r0, send_only_short    L 4.11 line 23 page 38

macro instructions
see kernel/macros.h

L4.9 line 42 page 35 
see  MIPS HO#5 for the details

42	tcbtop(t8)    /*  t8 <-- top of current thread's tcb

in macros.h (~ page 3)
    #define    tcbtop(tcb) \
    ori    tcb, sp, TCB0
in kernel.h (~ page 3)
    #define TCB_SIZE0x800
    #define TCB0 (TCB_SIZE - 1) /* so TCB0 º 0x7ff

syntax is same as C preprocessor
    replace  tcbtop(tcb)   with     "ori    tcb, sp, TCB0"
    each line that is to be included in place of macro name and arguments is preceded by a '\'

dla

can't find this guy anywhere - probably an assembler instruction or macro that
translates into the equivalent of  la in other processors - it loads an address

Load and Branch Delay Slots and Programming

Introduction

MIPS R4600 has a 5 stage pipeline
See MIPS HO#6

load delays

can get a "data hazard" type pipeline stall if data being loaded is needed in the immediately following instruction 
this is called a "load delay" and the instruction position immediately following the load is called a "load delay slot"

good programmers and assemblers try to code so that the instruction in the load delay slot does not need the data that is being loaded 

L 4.1 lines 9-11 page 24

9	lwu	t0, 8(k1)	
10	lwu 	k0, 12(k1)	/* executes in load delay slot of line 9 lwu
11	dmtc0	t0, C0_ENTRYLO0	/* data from lwu on line 9 is available now

branch delays

Suppose an instruction (e.g., I1 in MIPS HO#6) fetched in clock cycle #1 (CC1) is a branch or jump e.g.,
                    beq    s1, zero, label
and suppose there is enough HW to do the branch computation in the decode stage (CC2) as occurs in many MIPS implementations (e.g., R4600)
still the instruction that logically follows the branch, I2 , can't be fetched until CC3
i.e., this instruction is delayed whether the branch is taken or not
this is called a "branch delay" and the possibly unused slot is called a "branch delay slot"

To reduce wasted instruction cycles we could place instructions in the branch delay slot that we would like executed 
        · independent of whether or not the branch is taken 
        · in one of the cases (taken or not taken) and we don't care in the other (not taken or taken)

L 4.3 lines 21-23, 25 page 28
example of instruction useful whether or not branch is taken

21	bne	AT, zero, 1f
22	dsubu	sp, 24	 /* instruction in branch delay slot always used
23	j	k_ipc	 /* i.e., it's used later in this routine and in k_ipc
	 .....
25 1:	slti	t1, AT, MAX_SYSCALL_NUMBER + 1

L 4.1 lines 8-10 page 24
example of instruction in branch delay slot useful if branch not taken and harmless if branch taken

8	bne	t0, k0, 2f	 
9	lwu	t0, 8(k1)/* always executed/used in-line and harmless if branch taken
10	lwu 	k0, 12(k1)	

 

L 4.2 lines 15, 16 page 26
sometimes can't use the branch delay slot.

15	bne	t1, zer0, xtlb_refill_fail
16	nop

Note: could have left  nop out - Liedtke inserts  nops to make unfilled branch delay slots obvious and as an aid to cycle counting.


L4.2 lines 11-14 page 26
one last case - user branch prediction

uses a different branch instruction - branch on equal likely

beql    rs, rt, label

    if rs = rt    goto label and also execute instruction in branch delay slot
    else    don't execute instruction in branch delay slot and continue in-line

11	beql	t1, zero, 1b
12	dsrl	t1, t0, 6	/* only executed if branch is taken
13	dsrl	k0, t0, 6	/* normal in-line execution
14	dsllv	t1, t1, k0	/* uses unmodified t1