Session 1, 2009 - Assignment 1
Please read the assignment carefully! There are many suggestions
contained in the specification that will make your life significantly
easier. Furthermore, check back regularly, as we will put clarifications up
here if there are misunderstandings or bugs in the spec.
News:
There is a FAQ.
Introduction
Your task is to implement a user-space page-based distributed shared
memory (DSM) system that enables a set of processes located on a
collection of Linux hosts to share memory pages. Sequential
consistency is to be realised using a simple
multiple-reader/single-writer scheme implementing a write-invalidate
policy1.
Structure
The structure of the DSM system is as follows.
Node processes
running on separate hosts cooperate by reading and writing memory pages in
a shared virtual address space.
Each node process can access the whole virtual address space, and each page
is accessed at the same virtual address in all node processes (with
different pages being accessed at different addresses).
Client programs that are run by the node processes are linked to
an SM library, which implements the functionality that allows access
to the shared memory.
The library handles page faults when the client attempts to read pages
that are not locally available, or when it tries to write to pages that are
not both available and write-enabled.
The library also implements synchronisation functions and allows
clients to allocate shared memory from the virtual address space.
A centralised allocator process manages communication and tracks
the shared memory pages.
All communication in the system is between the allocator and the
node processes - node processes never communicate directly amongst
themselves.
The allocator manages the shared memory by keeping track of the
location and access rights of each page.
It receives and mediates client requests for pages, retrieving and
distributing pages as necessary.
When it receives a request for a read copy of a page, it simply
retrieves a copy of the page and sends it on to the client that requested
it.
When it receives a request for a write copy of a page however, it
must also send write-invalidate messages to all nodes that maintain a
copy of that page.
The allocator process and the node processes are all started from a
single node using a program called dsm.
This program takes as a parameter the name of the client program
executable and starts the allocator process and a given number of
instances of the client program on separate nodes.
Assignment
The goal of the assignment is to implement the above system.
Specifically, you are required to provide an implementation of the
SM library (that can be linked to a client program), an implementation
of the allocator process, and an implementation of the dsm program
(that is used to start up the node processes and then becomes the
allocator process).
Note that you are not supposed to provide any client programs.
To clarify this, it is useful for you to implement client programs in
order to test your implementation, however, you are not meant to
submit any of these programs.
We have our own set of client programs that we will use to test your
implementation (by linking them to your SM library and running them
together with your allocator).
Submissions will be tested on our vina cluster (hostnames
vina00 to vina19). Marks will be awarded on
the basis of performance on the vina cluster and no other
machines.
The assignment must be coded in ANSI C.
Please see the assignments page for more
details.
dsm
The program that starts the allocator and node processes is called
dsm.
It is started with command line parameters that specify a client
executable to run, a number of instances of the client to start, and a
set of hosts on which to start those clients.
It launches the specified number of client program instances on the
specified set of hosts.
After starting the clients, dsm becomes the allocator
process and
continues to manage the communication and shared pages in the system.
The following brief usage specification of dsm shows
exactly how the program must behave.
Usage: dsm [OPTION]... EXECUTABLE-FILE NODE-OPTION...
-H HOSTFILE list of host names
-h this usage message
-l LOGFILE log each significant allocator action to LOGFILE (read/write
fault, invalidate request)
-n N fork N node processes
-v print version information
Starts the allocator, which forks N copies (one copy if -n not given) of
EXECUTABLE-FILE. The NODE-OPTIONs are passed as arguments to the node
processes. The hosts on which node processes are started are given in
HOSTFILE, which defaults to `hosts'. If the file does not exist,
`localhost' is used.
In this specification EXECUTABLE-FILE is the client program, which
must be linked against the SM library.
The NODE-OPTION arguments are the parameters that are
passed to the client program.
The dsm program starts N node processes and assigns each
node process an identifier between 0 and N-1.
The HOSTFILE argument names a text file that specifies
the nodes on which the node processes will be started.
Each line of this file contains a single hostname.
The first node process is started on the first named host, the
second process on the second named host, and so on.
If there are more node processes than host names, the host
assignment process goes back to the the first host name after having
started a node process on the last host name in the list.
If the -H option is not given, the name of the hosts file defaults
to hosts.
If the file does not exist or is empty, localhost is used as the
only default hostname.
The parameter EXECUTABLE-FILE will usually be an absolute
file name,
and must refer to the exact same executable on all hosts listed in
the hosts file.
It is up to the user of the DSM system to ensure that this condition
is met.
On the vina cluster, NFS ensures that all home directories are
uniformly available on all nodes.
SM library API
The file sm.h (revision 1.2) defines
the shared memory library API.
Allocation of shared memory is done using the sm_malloc
function, which allocates a region of the virtual address space to use
as shared memory.
Note that there is no facility for freeing shared memory (it would
require significant work and is largely orthogonal to the purpose of this
exercise).
Synchronisation between the client processes is done using the
sm_barrier function.
In order to use a barrier for synchronisation all clients
must call the sm_barrier function after which they will wait
until each client has executed the barrier.
After the last process has executed the barrier all processes continue
with their execution simultaneously.
Note that all client processes must participate in each barrier,
otherwise the program will deadlock.
While this synchronisation primitive is rather simple, it is sufficient for
small example applications.
Clients exchange the addresses of allocated shared memory regions using
the sm_bcast function.
A full specification for this function can be found in the
sm.h file, and the example applications show how it can be used.
The bcast function also acts as a barrier, and as with the barrier
function, all client processes must participate in each bcast,
otherwise the program will deadlock.
Do not under any circumstances modify
sm.h.
Furthermore sm.h is the only
header that you may require client programs to include.
In other words do not provide any other header files in your
implementation that client programs will have to include.
The identifiers of all functions and non-static global
variables in
the SM library must be prefixed with sm_ to avoid name
clashes with client code. (Otherwise, you may lose marks due to
non-working test code.)
Communication
Use TCP stream sockets for communication between node processes and
the allocator.
Do not use fixed port numbers.
Since client processes have to honour write-invalidate messages
immediately this assignment will not work with a pure client/server
architecture.
You can use asynchronous socket communication together with signals in
order to be notified of incoming messages when they arrive.
Let all communication go via the allocator (i.e., node processes
never communicate directly). This is very harmful for scalability,
but also significantly simplifies the assignment. Synchronisation is
greatly simplified when all communication goes via the central
allocator, as this orders all messages in the system at the central
allocator. Hence, it is easier to implement a
multiple-reader/single-writer protocol correctly.
Clean-up and error handling
Make sure that the allocator and node processes clean up properly.
This includes wait()ing
for terminated child processes
and ensuring that node processes automatically terminate if the allocator
closes the TCP stream or does not respond anymore.
Failure to do so will cost style marks.
You must also ensure that your program cleans up after itself, even if it
terminates due to an error condition.
This implies the removal of all locally or remotely spawned processes
and the removal of all created files (other than the log file).
Also, make sure that you test for possible error conditions of system
calls and gracefully terminate where fatal errors are diagnosed.
Failure to do so will cost style marks.
The assignment requires the extensive use of signals.
Make sure that system calls are either restarted when interrupted or
that you handle EINTR properly, otherwise, your program
may be "flaky".
During autotesting, programs will be run multiple times on the same
input.
Programs must deterministically return the same results on every run
(with the exception of indeterminism due to concurrency in client
code and ordering of messages in the log file where appropriate).
Implementation Hints
The following are merely hints to help you. You are not required to proceed
as indicated here if you prefer an alternative implementation.
- Use
mmap() and mprotect() to allocate the shared
memory region and to alter the protection on individual pages in that region,
respectively. (Get the page size with getpagesize().)
- The allocator's core is essentially an event or message loop, where it
waits for requests from node processes, which trigger allocation actions and
messages to node processes.
- In the node processes, use
sigaction() to register handlers
for (1) catching segmentation faults due to protected or not mapped
shared memory and (2) catching signals due to asynchronous communication
from the allocator (e.g., to process write-invalidate messages). You may want
to take into account some further hints regarding
signal handlers.
- You may use
getopt() for command line processing.
- You may want to consider these debugging
tips.
The following three examples show how to use a signal handler for SEGV as well
as how to use mmap() and mprotect().
- The program
fault_addr.c
demonstrates how to catch a SEGV and how to determine the fault address in the
signal handler. This requires at least a 2.4 Linux kernel; it will not work with earlier kernels.
- The program
mmap_example.c
demonstrates the use of mmap().
- The program
signal_example.c
combines the previous two examples and enables access to mmap()ed
memory with mprotect() in a SEGV handler.
The following is a breakdown of the assignments into milestone-like
steps to give you an idea of what the assignment entails and let you
plan your time wisely from the start.
- basic
dsm application
- asynchronous socket communication
sm_barrier() and sm_bcast()
- page management
sm_malloc()
Note that this is only a suggestion, you may break the work down in any
way you like. Also note that documentation and testing are not in
this list since they should be done throughout the assignment, every
step of the way.
Example Applications
Two example applications of
the DSM system are provided separately.
Deadline & Submission Procedure
This is a team assignment. You are expected to solve the
assignment in groups of two or three. However, if you really don't want to work in a group you may also do it individually.
The submission deadline is
Sunday, 19 April 2009 (23:58 AEST).
In addition to your code, you have to submit
a design document, in a text file (pure ASCII text, nothing
else) called DESIGN, that outlines the design of your
implementation and highlights any special features or shortcomings. You will
receive style marks for this document.
The total assignment mark will be out of 50, with a maximum of 30 marks for correct code functionality, 10 marks for the code and code style, and 10 for the design documentation.
Please follow the submission guidelines
(otherwise, up to 10% of the full mark for the assignment may
be lost).
The penalty for late submission of assignments is 6% of the maximum
possible mark per day of being late. (Note that this is deducted from the achieved mark, not the maximum achievable mark - see the course outline page for an example.)
No assignment will be accepted later than one
week after the deadline.
There are harsh penalties for plagiarism (e.g., copying and teamwork
outside of official assignment teams),
including 0FL for the whole course. Please see the course outline page
for details.
Hacker's Edition
Important note: You need special permission to
submit a solution to the following challenge instead of the basic assignment.
To receive this permission, you will usually be required to produce a working
version of the basic assignment. You will then modify this to work without any central resource. Getting into time problems, because you started out on
implementing the Hacker's Edition and found it takes more time than you
thought is not a valid excuse for a late submission.
The challenge ammounts to implementing a DSM system that does
not use a central allocator but is based upon
peer-communication among the node processes. Such a system should not use any
central resource, so that it scales well.
In essence, you should implement page table management as used in the Ivy
system with the "Dynamic Distributed Caching Protocol" (cf. the book by
Coulouris, Dollimore & Kindberg). This protocol does not use a central
allocator. Instead, the page table on the nodes is extended with a
hint that specifies the "probable owner" of a page. In the case of a
page fault, a node first contacts the probable owner of that page. If the
probable owner no longer owns the page, it relays its own "probable owner"
information to the node that received the fault. The faulting node continues
its search for the actual owner using this new information. (An alternative
would be for the initially suspected owner to directly relay the request to
the node that it regards to be the probable owner. This saves
communication, but requires some more effort to implement.)
Solutions to this advanced part of the assignment will only be accepted with
detailed documentation of the adopted solution, which must include a precise
description of the protocol for handling page faults, transferal of page
ownership, and write-invalidate messages. Include this in the
DESIGN file.
You can earn up to 10 extra marks (20% of the maximum mark for this
assignment) in the Hacker's Edition. These bonus marks contribute only to the
assignment mark of the course (and not to the exam mark).
1.
Under the write-invalidate policy, at most one thread is allowed write
access to a given memory object. Hence, before a write may be acted on,
the thread must aquire write permission for that memory object, which
includes the invalidation of all other copies of the same object. If
multiple copies of a memory object can exist for read access, this
policy is known as multiple-reader/single-writer.
COMP9243,
School of Computer Science and Engineering,
University of New South Wales
CRICOS Provider Number: 00098G
This page is maintained by
cs9243@cse.unsw.edu.au
Last modified: Tuesday, 19-May-2009 08:51:50 EST
|