[CSE]  Advanced Operating Systems 
 COMP9242 2010/S2 
UNSW
CRICOS Provider
Number: 00098G

PRINTER Printer-Friendly Version
Administration                        
- Notices
- Course Intro
- Times
- Lecture location/time
- Statistics
- Survey Results
 
Work
- Lectures
- Selected Papers
- Project Spec
- Exam
 
Support
- Forums
- Wiki
 
Resources
- Project Resources
- Slug Lab
- L4 Debugging Guide
- Developing on a Mac
- Developing on Linux
- SOS source browser
 
Documentation
- OKL4 reference manual
- Elfweaver user manual
- IXP42X hardware manual
- OKL Wiki
- NSLU2-Linux HomePage
- Intel IXP400 Software
 
Related Info
- IBM OS Prize
- OS Hall of Fame
 
History
- 2009
- 2008
- 2007
- 2006
- 2005
- 2004
- 2003
- 2002
- 2000
- 1999
- 1998
 
Staff
- Gernot Heiser
- Kevin Elphinstone (LiC)
- Guest Lecturers (TBA)
 
Stureps
- Student Reps

 
Valid HTML 4.0!

Debugging in L4

When developing an operating system on top of L4 you do not have the luxury of using a source level debugger such as gdb. There are still a number of techniques at your disposal to assist debugging, however.

The OKL4 Kernel debugger

If your operating system causes certain types of faults you are likely to be dropped into the L4 kernel debugger. You should also be able to break into the kernel debugger at any time with the escape (ESC) key (provided it is compiled into your kernel).

The kernel debugger has many options that can be displayed by hitting the ? key.

--- KD# breakin ---
> ?
  BS - back up to previous menu
  ?  - this help message
 ESC - back to previous menu
  a  - architecture specifics
  c  - KDB configuration
 SPC - show current user exception frame
  F  - show exception frame
  p  - dump page table
  g  - continue execution
  L  - list all capability lists
  S  - list all address spaces
  d  - dump memory
  P  - dump physical memory
  D  - dump memory in other space
  6  - Reset system
  l  - dump clist contents
  G  - show system sync-point dependency graph
  m  - show created mutexes
  q  - show scheduling queue
  s  - show space info
  t  - show thread control block
  T  - shows thread control block (extended)
  #  - statistics
  b  - tracebuffer menu
  r  - enable/disable/list tracepoints
> 
  

The most useful functions you will use are (in no particular order) most likely SPC, F, g, d (and D), q, t (and T), r and 6. Take the time to learn to use them, they are very handy and will save you many hours (or even days) of debugging.

L4 also allows you to assign a debug name to threads. This can be very handy when creating new threads to aid in debugging.

Debugging a stop

Often when using L4 you will cause the kernel to enter the debugger for one reason or another. This example takes you through debugging an unhandled exception, eg. the output below:

:Exception occured: 0x07, EPC=0x0000000000a013f0 VA=0x000000001c800030
STATUS=0x040180e0
--- KD# Unhandled Exception ---

-- current ASID is 2, CPU 0 -- 
    

The first thing you want to do is show the thread control block (TCB) of the thread that cause the exception. The command 't' will print out the current TCB. eg:

> showtcb
tcb/tid/name [current]: current
===  == TCB: e0020800 == ID: 00104001 = d1f00100/d1f00100 == PRIO: 0x64 ===
UIP: 00530d5c   queues: Rswl      wait : NIL_THRD:NIL_THRD   space: f0028000
USP: 00575668   tstate: RUNNING   ready: 00104001:00104001   pdir : 00000000
KSP: e0020e6c   sndhd : NIL_THRD  send : NIL_THRD:NIL_THRD   pager: roottask
total quant:    0x0 us, ts length  :       0x2710 us, curr ts: 0x2710 us
resources: 00000000 [ek], ARM [PID: 0, vspace: 0, domain: 1]
scheduler: roottask
  partner: ANY_THRD    saved partner: NIL_THRD      saved state: ABORTED
> 
    

NB: Even though showtcb is shown after the prompt you have to actually hit 't', not type showtcb

The next thing you need to do is work out which thread is executing. the ID: field on the first line of output indicates the thread id of the faulting thread. You will probably find it useful to print the thread id of each thread you start in your operating system; at least while you are developing the system.

Once you have determined which thread is executing you want to work out the code it is executing. The UIP: (user instruction pointer) field helps you here. In this case the code was executing at address 0xa013f0.

% armv5b-softfloat-linux-objdump -dl build/l4_rootserver/bin/sos  | less

You can now use the searching facility in less to search for the faulting address. In this case I find the following fragment of output:


00a013d8 <zsccgetreg>:
  a013d8:   30a500ff    andi  a1,a1,0xff
  a013dc:   50a00004    beqzl a1,a013f0 <zsccgetreg+0x18>
  a013e0:   dc830000    ld v1,0(a0)
  a013e4:   dc820000    ld v0,0(a0)
  a013e8:   a0450000    sb a1,0(v0)
  a013ec:   dc830000    ld v1,0(a0)
  a013f0:   03e00008    jr ra
  a013f4:   90620000    lbu   v0,0(v1)

In this case I am lucky and it is a small piece of function so tracking down the bug shouldn't be too hard. In large functions your skill at reading assembler code, (that was one of the recommended skills for this course remember), comes into play. Of course this should encourage to keep you functions small.

More on objdump

Objdump is a very handy utility for working out exactly what is where in an executable so you can work out what exactly is going wrong.

The two standard incantations for objdump are:

% armv5b-softfloat-linux-objdump -dl my_elf.file | less

and

% armv5b-softfloat-linux-objdump -lx my_elf.file | less

The first command (-dl) disassesmbles the text segment and shows you all the instructions and at what address. Using this information you can find out things such as:

  • What does my function compile to?

    eg. Am I compiling the right function? Do the instructions make sense?

  • Where is this bad address coming from?

    Your pager gets the IP and the BVA of a pagefault. If the BVA seems to make no sense, you should be able to work out what that address is calculated from.

  • Why is this instruction crashing?

    Using the kernel debugger to dump the exception frame of a thread you can work out exactly what the instruction was trying to do.

  • Where is this crashing function (eg. memcpy) called from?

    Use the kernel debugger to find the return address in the stack frame. Check the instruction stream to make sure the return address isn't stored anywhere (eg. before calling another function). You may need to dump memory in the kernel debugger to poke around in the stack.

NB: In case it's not yet obvious, you will need to get up to speed on your ARM assembly and not be afraid to get your hands dirty if you want to minimise the time you spend debugging.

The second objdump command (-lx) is useful for when addresses appear inside an object file but outside of the text segment. This is especially useful when debugging ELF loading. The -lx option displays section and symbol information. Further options can be added to dump data segments etc. man objdump is your friend.

Kernel Debugger Tracing

Kernel debug tracing is a handy tool that can be used for many things:

  • Check the system for liveness
  • Check for repeated faulting
  • Check parameters for executed system calls
  • and more...

Kernel debugger configuration is achieved by editing your top level SConstruct file, specifically you can add debug options to the line that instantiates the "pistachio" application. For instance to enable tracing you add , ENABLE_TRACEBUFFER = True as an option, I.e. l4kernel = kernel_env.Package("pistachio", ENABLE_TRACEBUFFER = True). It may also be desirable to set the enter_kdb = True option if you need to enable tracing early during a boot. (You could also set these options in the command line arguments to the build system in the Makefile.)

The simplest way to enable tracing from the kernel debugger is to use r to enter the tracing submenu, then E to enable all tracepoints. You can also ? from the tracing submenu to list all the options. It can be very handy to only trace certain kernel events. NOTE: You will not see any trace events for IPC messages delievered on the fastpath. There is no tracing for this case (otherwise it wouldn't be so fast!).

The run queues

q in the kernel debugger shows all the current L4 threads, in order of priority, and whether or not they are runnable.

> showqueue
[255]: (roottask)
[200]: (0010c001)
[100]: <00104001> (00108001) (00110001)
idle : idle

This dump shows 6 threads: 5 real threads and the idle thread. Four of the real threads are spawned by the roottask, one is the syscall and init loops, another is used by the timer.c service and the last thread is spawned by the IXP400's Operating System Access Layer (OSAL), libs/ixp_osal. The last thread is the user-level tty_test thread.

Threads with parentheses around their thread ID, eg. (00108001) are currently blocked. Thread IDs without parentheses are runnable. A thread ID with angle brackets is currently running. In the above example the only runnable real thread is 00104001, and it is of course also the currently running thread. Other threads may become runnable due to a timeout, IPC or interrupt.

Entering the kernel debugger

The kernel debugger is a life-line to debug your code. Being able to enter the kernel debugger to examine the state at the right time can make hours of difference to debugging.

  • L4_KDB_Enter

    L4_KDB_Enter() is the standard way to programatically enter the kernel debugger from C. You provide a string parameter to print when the debugger enters. This is the function ultimately used by the assert() macro.

Examining Thread State

There is a lot of information about a thread that you may want or need to find out. There are a number of ways to find thread state from within the kernel debugger.

showtcb

The t command be used to dump a thread's TCB. This can be either the current thread (eg. during a debug enter, exception or whatever), or you can specify a thread ID. A sample TCB is shown here. Interesting fields are marked in bold and explained below. The explanation is only a rough guide and may be incomplete. The exact semantics can be checked in the L4 source.

showtcb
tcb/tid/name [current]: roottask
=== roottask == TCB: e0020000 == ID: 00100001 = d1f00000/d1f00000 == PRIO: 0xff ===
UIP: 005302e8   queues: rswl      wait : NIL_THRD:NIL_THRD   space: f0028000
USP: 0056d3e4   tstate: WAIT_FE   ready: roottask:roottask   pdir : 00000000
KSP: e00206e4   sndhd : NIL_THRD  send : NIL_THRD:NIL_THRD   pager: NIL_THRD
total quant:    0x0 us, ts length  :       0x2710 us, curr ts: 0x2710 us
resources: 00000000 [ek], ARM [PID: 0, vspace: 0, domain: 1]
scheduler: roottask  exception handler: roottask
  partner: ANY_THRD    saved partner: NIL_THRD      saved state: ABORTED
  
  • ID

    The ID field is the global thread ID of the thread.

  • PRIO

    This is the priority of the thread. Always handy to check in case you're having a problem with the hard priorities.

  • UIP and USP

    User IP and SP. Handy to find where a thread is and what's on its stack (eg. to poke around in the current frame and call history). Sometimes the UIP can actually be in kernel code depending on the state of the thread, or if it is a kernel thread.

  • space

    This is the address space the thread is in. Using this value allows you to check if two threads are in the same space, and dumping memory from a space.

  • tstate

    The current state of the thread. This can include values such as RUNNING (ready to run), WAIT (waiting to receive an IPC), POLLING (waiting to deliver an IPC).

  • pager

    The thread ID of the pager of this thread.

  • partner

    The thread ID of the partner of this thread.

showtcbext

The T command be used to dump a thread's TCB and extended information as shown below. The first part is basically the same, interesting bits are highlighted and described below.

showtcbext
tcb/tid/name [current]: roottask
=== roottask == TCB: e0020000 == ID: 00100001 = d1f00000/d1f00000 == PRIO: 0xff ===
UIP: 005302e8   queues: rswl      wait : NIL_THRD:NIL_THRD   space: f0028000
USP: 0056d3e4   tstate: WAIT_FE   ready: roottask:roottask   pdir : 00000000
KSP: e00206e4   sndhd : NIL_THRD  send : NIL_THRD:NIL_THRD   pager: NIL_THRD
total quant:    0x0 us, ts length  :       0x2710 us, curr ts: 0x2710 us
resources: 00000000 [ek], ARM [PID: 0, vspace: 0, domain: 1]
scheduler: roottask
  partner: ANY_THRD    saved partner: NIL_THRD      saved state: ABORTED

user handle:       00000000  cop flags:      00        preempt flags:     00 [~]
exception handler: NIL_THRD  intended receiver: NIL_THRD
incomming notify bits: 00000000  notify mask:         00000000
last preempted_ip:     00000000  preempt_callback_ip: 00000000

mr( 0): 000741c0 00701cc7 00700210 00000000 00000000 00000000 00000000 00000000
mr( 8): 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
mr(16): 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
mr(24): 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Message Tag: 0 untyped, label = 7, flags = -X--

Acceptor: 00000000 (a)  Error code: 0
  • Message and Buffer registers

    The extended information also lists message and buffer register contents. This can be useful to debug the exact contents of IPC messages, especially when used in conjunction with kernel break-in on IPC send events.

  • Message Tag

    The IPC message tag is shown in a nice, easy to read format.

  • frame

    The SPC (space bar) command be used to dump a thread's register set (eg. exception frame). Below is an example exception frame.

    frame
    == Stack frame: e0020eb4 == 
    cpsr =        0, pc = e0020800, sp  =        0, lr  = f00159c0
    r0  = f001b318, r1  = f0002d5c, r2  =        0, r3  =        0, r4  =        0
    r5  =        0, r6  =        0, r7  =        0, r8  =        0, r9  =        4
    r10 = f0002be0, r11 = e0020800, r12 =       ac
    

    This dump shows the address of the stack frame, status and cause registers, EPC, and the general register file. Other stack frames can be inspected with the F command. You can find stack frames via the TCB and the s command.

    Rebooting the machine

    Often when debugging your code you will want to reboot the machine, however you may be working remotely, or otherwise not be able to physically reboot the machine. The kernel debugger can do this for you. First drop into the kernel debugger by hitting ESC. The from the prompt hit 6. This should restart the machine, and after you type ^C you will get back to the RedBoot prompt.

    If for some reason this doesn't work L4 may have gotten itself into an unexpected state and you will have to manually reboot the machine.


    Last modified: Wed Aug 06 11:00:11 EST 2008