AOS Design Guide

Introduction

The AOS project is a very large and complex system. Often the system is built in an ad-hoc manner with little or no design beforehand, frequently with the attitude that it will be cleaned up later.

This design methodology typically produces a system with only partial functionality. This also involves a lot of writing and re-writing the same code. It is usually also the case that "I'll do it later" never eventuates.

This page lists some of the design choices involved when building SOS to allow you to better understand the difficulties before they occur. You are free borrow or ignore as much or as little of this information as you like. This is only a guide, and any SOS project you want to produce is fine.

Data Structures and Subsystems

While designing SOS you will need to consider a number of different subsystems and data structures, and how they inter-relate.

Some systems you will need to consider include:

A task/thread startup protocol
I/O
- File system access
- Console access
- A VFS-like layer
[Virtual] Memory
- Page tables
- Swapping
- Frame management
- Heap management
Thread and Process management
Namespace(s)
Drivers
Scheduling, run queues and blocking system calls

Some systems you probably don't need to implement, but still might like to consider are:

Resource management
Access control / multi-user issues
Networking
A page-cache

Inter-process communication

In Milestone 0 you are required to pass data from an application to the OS in the form of text data for printing. This is an example of passing parameters and data between L4 address spaces.

There are many ways to pass data between applications, not limited to the following:

UNIX-like copyin/copyout
Many short-IPC messages
Long IPC strings (NB: These may not be supported by your kernel)
fpage mapping
Memory objects

Which model you use is up to you, however some mechanisms are more appropriate for some systems and not others (eg. copyin/copyout will not work well with a multi-server OS). A poorly designed IPC primitive is a common problem and leads to an inefficient system. You must ensure it is capable of providing fast communication (eg. simple system calls) and also efficient when dealing with large amount of data (eg. file access). You must also ensure your IPC mechanism is free of DOS.

You are free to use existing systems for interface and stub generation, eg. The Magpie IDL generator.

Debugging

Debugging your SOS project can often be a very difficult task. For more information on debugging in L4 have a look at this page.

L4 Server Thread Management

Regardless of what kind of OS personality you choose to write, you will need to deal with multiple asynchronous requests to your L4 server(s). There are a number of ways this can be achieved. Once again, this list is neither exhaustive nor truly distinct.

Single threaded servers
Have a single server thread and use continuations to avoid blocking and DOS.
Single thread per 'application'
At the cost of extra resources you can remove some of the problems of blocking threads.
Single thread per 'session'
Having more than one L4 server can create a headache for multi-threaded solutions. Using sessions semantics can help to solve this.
Worker threads per sub-system
Dedicating a thread to each subsystem can help place an upper-bound on the number of threads that need creating (and deleting).
Stack-switching
A light-weight alternative to multi-threading is to have a separate stack per client/application. This can be simpler than multiple threads to manage and easier than explicit continuations. Debugging, however, can be ... interesting.
Worker thread pool
Using a multi-threaded solution can leave to wasting of resources. By creating a pool of worker threads that can grown and shrink resource usage can be controlled.

All the above techniques are valid for use in SOS. Each has their own advantages that make them easier/nicer/faster/better/whatever. They also have their own problems with implementation and testing. Make sure your solution is deadlock free, DOS free, thread safe, efficient and easy to debug.

Advanced Work

In AOS you are encouraged to do extra work that you are specifically interested in. This can be in the form of additions to the SOS project, or entirely different OS related projects. Some suggested SOS enhancements for extra marks are listed below. Feel free to come up with your own ideas. If you would like to do one of these (or your own suggestion), talk to us about what is to be done and how marks will be awarded.

Disclaimer: None of these advanced features is easy or trivial. You should avoid making your project dependent on any of these as there is a good chance you will not complete it. You should try to make sure you at least one to two milestones ahead of the marking schedule before you attempt one of these. The motto is "make it work, then make it fancy".

Device Driver
Write a more complex device driver than the OSTS timer. Examples include a USB controllers, USB disk drives etc. Marks can be awarded based on a partially functional driver.
Protocol Implementation
Implement a common network protocol within your SOS project. For example, port (or implement your own) ssh daemon to SOS. The difficulty of the protocol or port will dictate the bonus marks for it.
Virtualise SOS
OS virtualisation is a hot topic right now. Demonstrate two copies of SOS running on L4, each running user-land applications. Each copy of the OS must be segregated from the other, and share device drivers securely.
Distributed SOS
Add distributed shared memory and cross-node process creation support to SOS.
Orthogonal Persistence
Filesystems are passé. Implement orthogonal persistence in SOS so applications restart back to the point when the system was shutdown. Fault tolerance is not necessary.
Multi-user SOS
The standard SOS project has memory protection, but no access control. Implement ACL or capability based access control and provide resource accounting. Multiple concurrent users can be provided with a trivial telnet server.

Alternate System Models

There are a number of different general system structures you can use when building an OS personality on top of L4. A few major categories are listed here. Be aware that this is in no way an exhaustive list, and you are encouraged to come up with or design your own. These categories are also very rough, and in no way well defined.

Monolithic System
Most operating systems, eg. Windows, Linux and any BSD (Net, Free, Open, ...) use a monolithic system model, with most, if not all, of the system implemented in a monolithic kernel.

By placing all system components in the same address space communication is done trivially with shared memory and function calls. Of course, this means that all code in that address space is trusted and can bring down the whole system.

This image shows an example of how a typical monolithic (UNIX) system is laid out.

This model is the most commonly used for SOS as it is the easiest to implement and debug. Within this model there is plenty of scope for creativity and ample work for two people. Other models are presented here so you can incorporate ideas and abstractions. Students with a keen caffeine dependence are welcome to try them for a system model. There are very few systems to source examples from and many have problems which are as yet unsolved. If you choose a system model other than monolithic, you have been warned.
Single Address Space (SAS)
Single-address-space systems are designed to make sharing between applications easier. By placing all applications and the kernel in the same address space (translation) pointers can be passed around while maintaining their meaning. To preserve security the system needs to implement protection as an orthogonal abstraction.

While a SAS is not necessarily an entirely different model (it could be monolithic or multi-server), it does offer some interesting design decisions. Instead of the typical SOS filesystem you could create a persistent system. A SAS also helps in making a distributed shared memory system.

Example single address space operating systems include Mungi, Nemesis and Sombrero.
Multi-server
A multi-server OS is the holy-grail of microkernel systems. By decomposing the system into components (eg. VFS, file systems, memory management, naming), each in their own address space, the system can be constructed in a more flexible way. The layout of such a system is depicted in this image. Multi-server OSes have a strong tendency to move more work into application libraries rather than the OS modules, but still preserving security.

An example Multi-Server OS is SawMill.
Microkernel
L4 is a flexible and fast microkernel, however there are other designs for microkernel (and micro-kernel-like) systems including L3, EROS, Mach, Exokernel, Topsy and K42.

You may choose to pick an existing microkernel (or come up with your own abstractions) and implement this on top of L4. You will also need to implement elements of SOS on this system to demonstrate its usefulness.
OS161 on L4
Many students are probably familiar with OS161 used in the introductory operating systems course. OS161 is a specific example of a (simple) monolithic kernel written for the MIPS32 architecture.

One option for the project is to port OS161 to L4 by adding a new 'L4' architecture. Because OS161 is already quite complete, it will also be necessary to add extensions and features to make this project in-line with writing SOS from scratch.

Last modified: 10 Sep 2007.