When developing an operating system on top of L4 you do not
have the luxury of using a source level debugger such as
gdb. However you do have the L4 kernel debugger, which can be
quite useful if you know how to use it, and is certainly better
than just doing printf
style debugging.
If your operating system causes certain types of faults you are likely to be dropped into the L4 kernel debugger. Once you are in the kernel debugger the thing you mostly likely want to do is find out which line of code caused the error. An example output of what happens when you cause an exception is shown below.
:Exception occured: 0x07, EPC=0x0000000000a013f0 VA=0x000000001c800030
STATUS=0x040180e0
--- KD# Unhandled Exception ---
-- current ASID is 2, CPU 0 --
The first thing you want to do is show the thread control block
(TCB) of the thread that cause the exception. The command
't'
will print out the current TCB. eg:
> showtcb
tcb/tid [current]: current
=== TCB: 400000000002a000 === ID: 0000002a00000001 = 000000010000a880/ffffffff80ae6800 === PRIO: 0x64 ========
UIP: 0000000000a013f0 queues: Rswl wait : 0000000000000000:0000000000000000 space: ffffffff80ab6000
USP: 0000000000a5ab78 tstate: RUNNING ready: 400000000002a000:400000000002a000 pdir : 0000000000000000
KSP: 400000000002ad58 sndhd : 0000000000000000 send : 0000000000000000:0000000000000000 pager: 0000002900000001
total quant: 0us, ts length: 10000us, curr ts: 10000us
abs timeout: 0us, rel timeout: 0us
sens prio: 100, delay: max=0us, curr=0us
resources: 0000000000000000 []
partner: 0000000000000000, saved partner: 0000000000000000, saved state: ABORTED, scheduler: 0000002900000001
utcb_page_area: 0000000100000000 40000
NB: Even though showtcb
is shown
after the prompt you have to actually hit 't'
, not
type showtcb
The first thing you need to do is work out which thread is
executing. the ID:
field on the first line of
output indicates the thread id of the faulting thread. You will
probably find it useful to print the thread id of each thread
you start in your operating system; at least while you are
developing the system.
Once you have determined which thread is executing you want to
work out the code it is executing. The UIP:
(user
instruction pointer) field helps you here. In this case the code
was executing at address 0xa013f0
.
Now that we know the faulting address we now want to find out which line of code contains the fault. The first step is to find which executable file the fault is in. Dite helps us here:
% dite -d rootimage.dite
'rootimage.dite': (2 entries)
No. Name Base Size Entry Init Resource
0 sos 0xa00000 0x5af10 0xa00000 Yes Yes
1 tty_test 0xa5b000 0x42ee0 0xa5b000 No No
Dite shows us in this case the the fault is in the
sos
object code. (This may be obvious from which
thread is faulting, but not always.) We can now run objdump on
the appropriate file:
% mips64-elf-objdump -dl sos.app | less
You can now use the searching facility in less
to
search for the faulting address. In this case I find the
following fragment of output:
00a013d8 <zsccgetreg>:
a013d8: 30a500ff andi a1,a1,0xff
a013dc: 50a00004 beqzl a1,a013f0 <zsccgetreg+0x18>
a013e0: dc830000 ld v1,0(a0)
a013e4: dc820000 ld v0,0(a0)
a013e8: a0450000 sb a1,0(v0)
a013ec: dc830000 ld v1,0(a0)
a013f0: 03e00008 jr ra
a013f4: 90620000 lbu v0,0(v1)
In this case I am lucky and it is a small piece of function so tracking down the bug shouldn't be too hard. In large functions your skill at reading assembler code, (that was one of the recommended skills for this course remember), comes into play. Of course this should encourage to keep you functions small.
Often when debugging your code you will want to reboot the
machine, however you may be working remotely, or otherwise not be
able to physically reboot the machine. The kernel debugger can do
this for you. First drop into the kernel debugger by hitting
ESC
. The from the prompt hit 6
. This
should restart the machine, and get you back to the PMON
prompt.
If for some reason this doesn't work L4 may have gotten itself into an unexpected state and you will have to manually reboot the machine.