Screen Version

School of Computer Science & Engineering
University of New South Wales
Advanced Operating Systems
COMP9242 2004/S2

Debugging in L4

When developing an operating system on top of L4 you do not have the luxury of using a source level debugger such as gdb. There are still a number of techniques at your disposal to assist debugging, however.

The Pistachio Kernel debugger

If your operating system causes certain types of faults you are likely to be dropped into the L4 kernel debugger. You should also be able to break into the kernel debugger at any time with the escape (ESC) key (provided it is compiled into your kernel).

The kernel debugger has many options that can be displayed by hitting the ? key.

--- KD# breakin ---
-- current ASID is 0, CPU 0 --
> help
  BS - back up to previous menu
  ?  - this help message
 ESC - back to previous menu
  a  - architecure specifics
  c  - KDB configuration
  B  - generic bootinfo
 SPC - show current user exception frame
  f  - show floating point registers
  F  - show exception frame
  K  - dump kernel interface page
  s  - search for an exception frame
  g  - continue execution
  d  - dump memory
  D  - dump memory in other space
  6  - Reset system
  q  - show scheduling queue
  t  - show thread control block
  T  - shows thread control block (extended)
  0  - sigma0 interaction
  #  - statistics
  r  - enable/disable/list tracepoints
>

The most useful functions you will use are (in no particular order) most likely SPC, F, K, g, d (and D), q, t (and T), r and 6. Take the time to learn to use them, they are a very handy and will save you many hours (or even days) of debugging.

Debugging a stop

Often when using L4 you will cause the kernel to enter the debugger for one reason or another. This example takes you through debugging an unhandled exception, eg. the output below:


    :Exception occured: 0x07, EPC=0x0000000000a013f0 VA=0x000000001c800030
STATUS=0x040180e0
--- KD# Unhandled Exception ---

-- current ASID is 2, CPU 0 --

The first thing you want to do is show the thread control block (TCB) of the thread that cause the exception. The command 't' will print out the current TCB. eg:

> showtcb                                                          
tcb/tid [current]: current
=== TCB: 400000000002a000 === ID: 0000002a00000001 = 000000010000a880/ffffffff80ae6800 === PRIO: 0x64 ========
UIP: 0000000000a013f0   queues: Rswl      wait : 0000000000000000:0000000000000000   space: ffffffff80ab6000  
USP: 0000000000a5ab78   tstate: RUNNING   ready: 400000000002a000:400000000002a000   pdir : 0000000000000000
KSP: 400000000002ad58   sndhd : 0000000000000000  send : 0000000000000000:0000000000000000   pager: 0000002900000001
total quant:            0us, ts length:          10000us, curr ts:        10000us
abs timeout:            0us, rel timeout:            0us                         
sens prio: 100, delay: max=0us, curr=0us                
resources: 0000000000000000 []          
partner: 0000000000000000, saved partner: 0000000000000000, saved state: ABORTED, scheduler: 0000002900000001
utcb_page_area: 0000000100000000 40000

NB: Even though showtcb is shown after the prompt you have to actually hit 't', not type showtcb

The first thing you need to do is work out which thread is executing. the ID: field on the first line of output indicates the thread id of the faulting thread. You will probably find it useful to print the thread id of each thread you start in your operating system; at least while you are developing the system.

Once you have determined which thread is executing you want to work out the code it is executing. The UIP: (user instruction pointer) field helps you here. In this case the code was executing at address 0xa013f0.

Now that we know the faulting address we now want to find out which line of code contains the fault. The first step is to find which executable file the fault is in. Dite helps us here:

% dite -d rootimage.dite
'rootimage.dite': (2 entries)
No.    Name           Base     Size     Entry    Init   Resource
0   sos              0xa00000 0x5af10  0xa00000   Yes    Yes
1   tty_test         0xa5b000 0x42ee0  0xa5b000   No     No

Dite shows us in this case the the fault is in the sos object code. (This may be obvious from which thread is faulting, but not always.) We can now run objdump on the appropriate file:

% mips64-elf-objdump -dl sos.app | less

You can now use the searching facility in less to search for the faulting address. In this case I find the following fragment of output:

00a013d8 <zsccgetreg>:
  a013d8:   30a500ff    andi  a1,a1,0xff
  a013dc:   50a00004    beqzl a1,a013f0 <zsccgetreg+0x18>
  a013e0:   dc830000    ld v1,0(a0)
  a013e4:   dc820000    ld v0,0(a0)
  a013e8:   a0450000    sb a1,0(v0)
  a013ec:   dc830000    ld v1,0(a0)
  a013f0:   03e00008    jr ra
  a013f4:   90620000    lbu   v0,0(v1)

In this case I am lucky and it is a small piece of function so tracking down the bug shouldn't be too hard. In large functions your skill at reading assembler code, (that was one of the recommended skills for this course remember), comes into play. Of course this should encourage to keep you functions small.

More on `objdump`

Objdump is a very handy utility for working out exactly what is where in an executable so you can work out what exactly is going wrong.

The two standard incantations for objdump are:

% mips64-elf-objdump -dl my_elf.file | less

and

% mips64-elf-objdump -lx my_elf.file | less

The first command (-dl) disassesmbles the text segment and shows you all the instructions and at what address. Using this information you can find out things such as:

What does my function compile to?
eg. Am I compiling the right function? Do the instructions make sense?
Where is this bad address coming from?
Your pager gets the IP and the BVA of a pagefault. If the BVA seems to make no sense, you should be able to work out what that address is calculated from.
Why is this instruction crashing?
Using the kernel debugger to dump the exception frame of a thread you can work out exactly what the instruction was trying to do.
Where is this crashing function (eg. memcpy) called from?
Use the kernel debugger to find the return address in the stack frame. Check the instruction stream to make sure the return address isn't stored anywhere (eg. before calling another function). You may need to dump memory in the kernel debugger to poke around in the stack.

NB: In case it's not yet obvious, you will need to get up to speed on your MIPS assembly and not be afraid to get your hands dirty if you want to minimise the time you spend debugging.

The second objdump command (-lx) is useful for when addresses appear inside an object file but outside of the text segment. This is especially useful when debugging ELF loading. The -lx option displays section and symbol information. Further options can be added to dump data segments etc. man objdump is your friend.

Kernel Debugger Tracing

Kernel debug tracing is a handy tool that can be used for many things:

Check the system for liveness
Check for repeated faulting
Check parameters for executed system calls
and more...

Kernel debugger tracing has to be compiled into the kernel, select in make menuconfig under Debugger --> Trace Settings --> Enable Tracepoints. (This should be enabled already if you followed the instructions for m0 correctly. It may also be desirable to set the Debugger --> Enter kernel debugger on startup option if you need to enable tracing early on during boot.

The simplest way to enable tracing from the kernel debugger is to use r to enter the tracing submenu, then E to enable all tracepoints. You can also ? from the tracing submenu to list all the options. It can be very handy to only trace certain kernel events. NOTE: If you enable the Fast IPC path you will not see any trace events for IPC messages delievered on the fastpath. There is no tracing for this case (otherwise it wouldn't be so fast!).

A partial trace from the default tty_test application looks something like the following:

 1   Available memory from 7dfff to 29f000 - 2MB
 2   Found: 1 tty_test
 3   Created tid: 40000000001
 4   task:   Hello world, I'm        0x40000000001!
 5   --- KD# breakin ---
 6   -- current ASID is 0, CPU 0 --
 7   > tracepoints
 8   /tracepoints> enableall
 9   /tracepoints> up
10   > go
11   wakeup timeout (curr=ffffffff80074000 wu=4000000000400000) Current time = 1006000
12   Unwind: tcb=0000040000000001 p=0000040000000001 s=WAIT_TIMEOUT (saved: p=0000000000000000 s=ABORTED)
13   task:   count is                1
14   SYS_IPC: current: 0000040000000001, to_tid: 0000000000000000, from_tid: 0000040000000001, to: 0x2bd0
15   wakeup timeout (curr=ffffffff80074000 wu=4000000000400000) Current time = 2006000
16   Unwind: tcb=0000040000000001 p=0000040000000001 s=WAIT_TIMEOUT (saved: p=0000000000000000 s=ABORTED)
17   task:   count is                2
18   SYS_IPC: current: 0000040000000001, to_tid: 0000000000000000, from_tid: 0000040000000001, to: 0x2bd0

Lines 1-4 are the end of the normal bootup for tty_test. At line 5 we break into the kernel debugger with ESC and enable all tracepoints. Line 11 we see the thread wakeup from its IPC timeout and the kernel perform an unwind on thread to return it with a timeout code. Line 13 is the printout from the thread.

Line 14 is the IPC call from the thread with a timeout. The calling thread ID is 0000040000000001. The IPC destination TID is 0000000000000000 (no send phase). The from_tid is 0000040000000001 (L4_Myself, eg. wait for a timeout). The timout specified is 0x2bd0.

This series of events then repeats.

The run queues

q in the kernel debugger shows all the current L4 threads, in order of priority, and whether or not they are runnable.

> showqueue
[255]: (0000002700000001) (0000002900000001)
[100]: (0000002a00000001) (0000040000000001)
[  0]: (0000000700000001)
idle : ffffffff80074000

This dump shows 6 threads. 5 real threads and the idle thread. The two threads at priority 255 are sigma0 and the 1st thread in the root task. The two threads at priority 100 are the SOS kernel init thread (blocked on IPC forever), and the user-level tty_test thread. The thread at priority 0 is the in-kernel interrupt handler thread which delivers hardware interrupts to its pager.

Threads with brackets around their thread ID, eg. (0000002a00000001) are currently blocked. Thread IDs without brackets are runnable. In the above example the only runnable thread is the idle thread. Other threads may become runnable due to a timeout, IPC or interrupt.

Entering the kernel debugger

The kernel debugger is a life-line to debug your code. Being able to enter the kernel debugger to examine the state at the right time can make hours of difference to debugging.

L4_KDB_Enter
L4_KDB_Enter() is the standard way to programatically enter the kernel debugger from C. You provide a string parameter to print when the debugger enters. This is the function ultimately used by the assert() macro.
Unaligned Exceptions
Unaligned exceptions are an easy (one instruction) method of entering the kernel debugger. This can be desirable if you are in assembly language (eg. CRT stub code) or the L4_KDB_Enter macro is failing because of some bug (rare, but it has been known to happen). On MIPS any access to a type which is not naturally aligned will cause an unaligned exception (eg. accessing an 8-byte word that is not on an 8-byte boundary). The following C and assembly examples show how to do this:
```
        /* in C */
        *((long*)1) = 0;


        /* and assembly */
        ld v0, 1(zero)
      
```
A handy feature of the unaligned access is that the memory address need not be mapped. A disadvantage is that you can cannot continue from the debugger - the instruction is restarted and you take another exception.
Illegal Instructions
Another way to work out (or halt) when code reaches a particular point is to insert an illegal instruction. This can be useful when working in assembly, especially if you want to be certain code doesn't execute (eg. in the CRT after a jump to main). The .long directive as shown below can be used to create an illegal instruction.
```
        .long 0x1010101
      
```
Don't try using .long 0 - it's a NOP on MIPS!

Examining Thread State

There is a lot of information about a thread that you may want or need to find out. There are a number of ways to find thread state from within the kernel debugger.

showtcb

The t command be used to dump a thread's TCB. This can be either the current thread (eg. during a debug enter, exception or whatever), or you can specify a thread ID. A sample TCB is shown here. Interesting fields are marked in bold and explained below. The explanation is only a rough guide and may be incomplete. The exact semantics can be checked in the L4 source.

> showtcb
tcb/tid/name [current]: current
=== TCB: 4000000000400000 === ID: 0000040000000001 = 0000000100000000/ffffffff802fc000 === PRIO: 0x64 ========
UIP: 000000000025c000   queues: Rswl      wait : 0000000000000000:0000000000000000   space: ffffffff802f2000
USP: 0000000000252e58   tstate: RUNNING           ready: 0000040000000001:0000040000000001   pdir : 0000000000000000
KSP: 4000000000400d08   sndhd : 0000000000000000  send : 0000000000000000:0000000000000000   pager: 0000002900000001
total quant:                   0us, ts length  :                  10000us, curr ts:            10000us
abs timeout:                   0us, rel timeout:                      0us
sens prio: 100, delay: max=0us, curr=0us
resources: 0000000000000000 []
partner: 0000000000000000, saved partner: 0000000000000000, saved state: ABORTED, scheduler: 0000002900000001

ID
The ID field is the global thread ID of the thread.
PRIO
This is the priority of the thread. Always handy to check in case you're having a problem with the hard priorities.
UIP and USP
User IP and SP. Handy to find where a thread is and what's on its stack (eg. to poke around in the current frame and call history). Sometimes the UIP can actually be in kernel code depending on the state of the thread, or if it is a kernel thread.
space
This is the address space the thread is in. The value is only visible and useful in the kernel. Using this value allows you to check if two threads are in the space space, and dumping memory from a space.
tstate
The current state of the thread. This can include values such as RUNNING (ready to run), WAIT (waiting to receive an IPC), POLLING (waiting to deliver an IPC).
pager
The thread ID of the pager of this thread.
partner
The thread ID of the pager of this thread.

showtcbext

The T command be used to dump a thread's TCB and extended information as shown below. The first part is basically the same, interesting bits are highlighted and described below.

> showtcbext
tcb/tid/name [current]: 40000000001
=== TCB: 4000000000400000 === ID: 0000040000000001 = 0000000100000000/ffffffff802fc000 === PRIO: 0x64 ========
UIP: 0000000200000808   queues: rsWl      wait : 0000040000000001:0000040000000001   space: ffffffff802f2000
USP: 0000000000252dd8   tstate: WAIT_TO           ready: 0000000000000000:0000000000000000   pdir : 0000000000000000
KSP: 4000000000400df8   sndhd : 0000000000000000  send : 0000000000000000:0000000000000000   pager: 0000002900000001
total quant:                   0us, ts length  :                  10000us, curr ts:            10000us
abs timeout:             1005424us, rel timeout:                 829424us
sens prio: 100, delay: max=0us, curr=0us
resources: 0000000000000000 []
partner: 0000040000000001, saved partner: 0000000000000000, saved state: ABORTED, scheduler: 0000002900000001
 
user handle:       0000000000000000  cop flags:      00                preempt flags:     00 [~~~]
exception handler: 0000000000000000  virtual sender: 0000002a00000001  intended receiver: 0000000000000000
xfer timeouts:     snd (never)
                   rcv (never)
 
mr( 0): 2bd000000000000b 0000000000000001 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
mr( 8): 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
mr(16): 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
mr(24): 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
mr(32): 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
mr(40): 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
mr(48): 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
mr(56): 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Message Tag: 11 untyped, 0 typed, label = 2bd000000000, flags = ----
 
br( 0): 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
br( 8): 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
br(16): 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
br(24): 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
br(32): 0000000000000000
Acceptor: 0000000000000000 (s)
  fpage : (NIL-FPAGE)

Message and Buffer registers

The extended information also lists message and buffer register contents. This can be useful to debug the exact contents of IPC messages, especially when used in conjunction with kernel break-in on IPC send events.

Message Tag

The IPC message tag is shown in a nice, easy to read format.

Acceptor

The fpage acceptor for received fpages. Handy to debug why pagefaults re-occur, and whether fpages are being over-mapped.

frame

The SPC (space bar) command be used to dump a thread's register set (eg. exception frame). Below is an example exception frame.

> frame
== Stack frame: ffffffff80074c20 ==
== STATUS:  40190e2 == CAUSE:               24 == EPC: ffffffff8006362c
at = ffffffffffffff9a, v0 = ffffffff80072dd8, v1 = ffffffff800636e8, sp = ffffffff80074d40
a0 =               1b, a1 =               18, a2 = ffffffff8006cdb0, a3 =                0
t0 =        300000000, t1 = ffffffffbc800032, t2 =               1b, t3 =                2
t4 =                3, t5 = fffffffffffffffe, t6 =                5, t7 = 4000000000400038
s0 = ffffffff80074000, s1 = ffffffffffff00ff, s2 = ffffffff802a0000, s3 = ffffffff802a1ff8
s4 = ffffffff80075400, s5 =  606060606060606, s6 = 1305050606060606, s7 =  e1d161718141513
t8 = ffffffffbc800032, t9 =                4, s8 = ffffffff800564b8, gp =                0
ra = ffffffff80063614, hi =                4, lo =                0

This dump shows the address of the stack frame, status and cause registers, EPC, and the general register file. Other stack frames can be inspected with the F command. You can find stack frames via the TCB and the s command. The STATUS and CAUSE registers are described in the processor manual (see chapter 5). The general registers are displayed with their typically used names. Name to number conversions can be found in sos/include/regdef.h (a hardcopy can be handy).

The U4600 Hardware Switch

The front panel of the U4600 box has two switches, a power button and another button. The other button is wired to interrupt pin 4 on the CPU. This generates an edge-triggered interrupt when you press this button. By default this interrupt is masked. This switch can be a very handy way to generate an event to enter your own OS debugger or display debugging information. Simply register a thread to handle interrupt 6 (4 + 2 software interrupts) and wait for an IPC.

Rebooting the machine

Often when debugging your code you will want to reboot the machine, however you may be working remotely, or otherwise not be able to physically reboot the machine. The kernel debugger can do this for you. First drop into the kernel debugger by hitting ESC. The from the prompt hit 6. This should restart the machine, and get you back to the PMON prompt.

If for some reason this doesn't work L4 may have gotten itself into an unexpected state and you will have to manually reboot the machine.

Last modified: Fri Aug 6 11:58:42 EST 2004