Printer-Friendly
Version
|
Next: A Critique of the
Up: 05-uk
Previous: 05-uk
Subsections
I'm not interested in making devices look like user-level.
They aren't, they shouldn't, and microkernels are just stupid.
Linus Torvalds
Is Linus right?
- First generation -kernel systems exhibited poor performance when compared
to monolithic UNIX implementations.
- particularly Mach, the best-known example
- Reasons are investigated by [Chen & Bershad 93]:
- instrumented user and system code to collect execution traces
- run on DECstation 5000/200 (25MHz R3000)
- run under Ultrix and Mach with Unix server
- traces fed to memory system simulator
- analyse MCPI (memory cycles per instruction)
- baseline MCPI (i.e. excluding idle loops)
Observations:
- Mach memory penalty (i.e. cache missess or write stalls) higher
- Mach VM system executes more instructions than Ultrix
(but has more functionality).
Claim:
- Degraded performance is (intrinsic?) result of OS structure.
- IPC cost (known to be high in Mach) is not a major
factor[Ber92].
- 1
- OS has less instruction and data locality than user
code.
- System code has higher cache and TLB miss rates.
- Particularly bad for instructions.
- 2
- System execution is more dependent on instruction cache
behaviour than is user execution
- MCPIs dominated by system i-cache misses.
Note: most benchmarks were small, i.e. user code fits in cache.
- 3
- Competition between user and system code is not a
problem
- Few conflicts between user and system caching.
- TLB misses are not a relevant factor
Note: the hardware used has direct-mapped physical caches.
==> Split system/user caches wouldn't help.
- Only examine system cache misses.
Shaded: System cache misses removed by associativity.
MCPI for system-only, using R3000
direct-mapped cache.
Reductions due to associativity were obtained by running system on a
simulator and using a two-way associative cache of the same size.
|
- 4
- Self-interference is a problem in system instruction
reference streams.
- High internal conflicts in system code.
- System would benefit from higher cache associativity.
- 5
- System block memory operations are responsible for a
large percentage of memory system reference costs.
- Particularly true for I/O system calls.
- 6
- Write buffers are less effective for system
references.
- write buffer allows limited asynch. writes on cache misses
- 7
- Virtual to physical mapping strategy can have
significant impact on cache performance
- Unfortunate mapping may increase conflict misses.
- ``Random'' mappings (Mach) are to be avoided.
- System call costs are (inherently?) high.
- Typically hundreds of cycles, 900 for Mach/i486.
- Context (address-space) switching costs are (inherently?) high.
- Getting worse (in terms of cycles) with increasing CPU/memory
speed ratios[Ous90].
- IPC (involving system calls and context switches) is inherently expensive.
- -kernels heavily depend on IPC
- IPC is expensive
- Is the -kernel idea flawed?
- Should some code never leave the kernel?
- Do we have to buy flexibility with performance?
Next: A Critique of the
Up: 05-uk
Previous: 05-uk
Gernot Heiser
2002-08-28
|