UNSW   Faculty of Engineering PRINT VERSIONSITE MAP  
cse | School of Computer Science and Engineering (CRICOS Provider No. 00098G)
    #About CSE     #Undergraduate Study     #Postgraduate Study     #Timetables & Courses     #Research & Publications     #People & Work Units     #Help & Resources     #News & Events     #High School Portal

Last updated 19.05.09

Distributed Systems [COMP9243]

Session 1, 2009 - Assignment 1


Please read the assignment carefully! There are many suggestions contained in the specification that will make your life significantly easier. Furthermore, check back regularly, as we will put clarifications up here if there are misunderstandings or bugs in the spec.

News:
There is a FAQ.

Introduction

Your task is to implement a user-space page-based distributed shared memory (DSM) system that enables a set of processes located on a collection of Linux hosts to share memory pages. Sequential consistency is to be realised using a simple multiple-reader/single-writer scheme implementing a write-invalidate policy1.

Structure

The structure of the DSM system is as follows. Node processes running on separate hosts cooperate by reading and writing memory pages in a shared virtual address space. Each node process can access the whole virtual address space, and each page is accessed at the same virtual address in all node processes (with different pages being accessed at different addresses). Client programs that are run by the node processes are linked to an SM library, which implements the functionality that allows access to the shared memory. The library handles page faults when the client attempts to read pages that are not locally available, or when it tries to write to pages that are not both available and write-enabled. The library also implements synchronisation functions and allows clients to allocate shared memory from the virtual address space.

A centralised allocator process manages communication and tracks the shared memory pages. All communication in the system is between the allocator and the node processes - node processes never communicate directly amongst themselves. The allocator manages the shared memory by keeping track of the location and access rights of each page. It receives and mediates client requests for pages, retrieving and distributing pages as necessary. When it receives a request for a read copy of a page, it simply retrieves a copy of the page and sends it on to the client that requested it. When it receives a request for a write copy of a page however, it must also send write-invalidate messages to all nodes that maintain a copy of that page.

The allocator process and the node processes are all started from a single node using a program called dsm. This program takes as a parameter the name of the client program executable and starts the allocator process and a given number of instances of the client program on separate nodes.

Assignment

The goal of the assignment is to implement the above system. Specifically, you are required to provide an implementation of the SM library (that can be linked to a client program), an implementation of the allocator process, and an implementation of the dsm program (that is used to start up the node processes and then becomes the allocator process). Note that you are not supposed to provide any client programs. To clarify this, it is useful for you to implement client programs in order to test your implementation, however, you are not meant to submit any of these programs. We have our own set of client programs that we will use to test your implementation (by linking them to your SM library and running them together with your allocator).

Submissions will be tested on our vina cluster (hostnames vina00 to vina19). Marks will be awarded on the basis of performance on the vina cluster and no other machines. The assignment must be coded in ANSI C. Please see the assignments page for more details.

dsm

The program that starts the allocator and node processes is called dsm. It is started with command line parameters that specify a client executable to run, a number of instances of the client to start, and a set of hosts on which to start those clients. It launches the specified number of client program instances on the specified set of hosts. After starting the clients, dsm becomes the allocator process and continues to manage the communication and shared pages in the system. The following brief usage specification of dsm shows exactly how the program must behave.

Usage: dsm [OPTION]... EXECUTABLE-FILE NODE-OPTION...

  -H HOSTFILE list of host names
  -h          this usage message
  -l LOGFILE  log each significant allocator action to LOGFILE (read/write
              fault, invalidate request)
  -n N        fork N node processes
  -v          print version information

Starts the allocator, which forks N copies (one copy if -n not given) of
EXECUTABLE-FILE.  The NODE-OPTIONs are passed as arguments to the node
processes.  The hosts on which node processes are started are given in
HOSTFILE, which defaults to `hosts'.  If the file does not exist,
`localhost' is used.

In this specification EXECUTABLE-FILE is the client program, which must be linked against the SM library. The NODE-OPTION arguments are the parameters that are passed to the client program. The dsm program starts N node processes and assigns each node process an identifier between 0 and N-1. The HOSTFILE argument names a text file that specifies the nodes on which the node processes will be started. Each line of this file contains a single hostname. The first node process is started on the first named host, the second process on the second named host, and so on. If there are more node processes than host names, the host assignment process goes back to the the first host name after having started a node process on the last host name in the list. If the -H option is not given, the name of the hosts file defaults to hosts. If the file does not exist or is empty, localhost is used as the only default hostname.

The parameter EXECUTABLE-FILE will usually be an absolute file name, and must refer to the exact same executable on all hosts listed in the hosts file. It is up to the user of the DSM system to ensure that this condition is met. On the vina cluster, NFS ensures that all home directories are uniformly available on all nodes.

SM library API

The file sm.h (revision 1.2) defines the shared memory library API. Allocation of shared memory is done using the sm_malloc function, which allocates a region of the virtual address space to use as shared memory. Note that there is no facility for freeing shared memory (it would require significant work and is largely orthogonal to the purpose of this exercise). Synchronisation between the client processes is done using the sm_barrier function. In order to use a barrier for synchronisation all clients must call the sm_barrier function after which they will wait until each client has executed the barrier. After the last process has executed the barrier all processes continue with their execution simultaneously. Note that all client processes must participate in each barrier, otherwise the program will deadlock. While this synchronisation primitive is rather simple, it is sufficient for small example applications. Clients exchange the addresses of allocated shared memory regions using the sm_bcast function. A full specification for this function can be found in the sm.h file, and the example applications show how it can be used. The bcast function also acts as a barrier, and as with the barrier function, all client processes must participate in each bcast, otherwise the program will deadlock.

Do not under any circumstances modify sm.h. Furthermore sm.h is the only header that you may require client programs to include. In other words do not provide any other header files in your implementation that client programs will have to include. The identifiers of all functions and non-static global variables in the SM library must be prefixed with sm_ to avoid name clashes with client code. (Otherwise, you may lose marks due to non-working test code.)

Communication

Use TCP stream sockets for communication between node processes and the allocator. Do not use fixed port numbers. Since client processes have to honour write-invalidate messages immediately this assignment will not work with a pure client/server architecture. You can use asynchronous socket communication together with signals in order to be notified of incoming messages when they arrive. Let all communication go via the allocator (i.e., node processes never communicate directly). This is very harmful for scalability, but also significantly simplifies the assignment. Synchronisation is greatly simplified when all communication goes via the central allocator, as this orders all messages in the system at the central allocator. Hence, it is easier to implement a multiple-reader/single-writer protocol correctly.

Clean-up and error handling

Make sure that the allocator and node processes clean up properly. This includes wait()ing for terminated child processes and ensuring that node processes automatically terminate if the allocator closes the TCP stream or does not respond anymore. Failure to do so will cost style marks.

You must also ensure that your program cleans up after itself, even if it terminates due to an error condition. This implies the removal of all locally or remotely spawned processes and the removal of all created files (other than the log file). Also, make sure that you test for possible error conditions of system calls and gracefully terminate where fatal errors are diagnosed. Failure to do so will cost style marks.

The assignment requires the extensive use of signals. Make sure that system calls are either restarted when interrupted or that you handle EINTR properly, otherwise, your program may be "flaky". During autotesting, programs will be run multiple times on the same input. Programs must deterministically return the same results on every run (with the exception of indeterminism due to concurrency in client code and ordering of messages in the log file where appropriate).

Implementation Hints

The following are merely hints to help you. You are not required to proceed as indicated here if you prefer an alternative implementation.

  • Use mmap() and mprotect() to allocate the shared memory region and to alter the protection on individual pages in that region, respectively. (Get the page size with getpagesize().)
  • The allocator's core is essentially an event or message loop, where it waits for requests from node processes, which trigger allocation actions and messages to node processes.
  • In the node processes, use sigaction() to register handlers for (1) catching segmentation faults due to protected or not mapped shared memory and (2) catching signals due to asynchronous communication from the allocator (e.g., to process write-invalidate messages). You may want to take into account some further hints regarding signal handlers.
  • You may use getopt() for command line processing.
  • You may want to consider these debugging tips.

The following three examples show how to use a signal handler for SEGV as well as how to use mmap() and mprotect().

  • The program fault_addr.c demonstrates how to catch a SEGV and how to determine the fault address in the signal handler. This requires at least a 2.4 Linux kernel; it will not work with earlier kernels.
  • The program mmap_example.c demonstrates the use of mmap().
  • The program signal_example.c combines the previous two examples and enables access to mmap()ed memory with mprotect() in a SEGV handler.

The following is a breakdown of the assignments into milestone-like steps to give you an idea of what the assignment entails and let you plan your time wisely from the start.

  • basic dsm application
  • asynchronous socket communication
  • sm_barrier() and sm_bcast()
  • page management
  • sm_malloc()
Note that this is only a suggestion, you may break the work down in any way you like. Also note that documentation and testing are not in this list since they should be done throughout the assignment, every step of the way.

Example Applications

Two example applications of the DSM system are provided separately.

Deadline & Submission Procedure

This is a team assignment. You are expected to solve the assignment in groups of two or three. However, if you really don't want to work in a group you may also do it individually.

The submission deadline is Sunday, 19 April 2009 (23:58 AEST).

In addition to your code, you have to submit a design document, in a text file (pure ASCII text, nothing else) called DESIGN, that outlines the design of your implementation and highlights any special features or shortcomings. You will receive style marks for this document.

The total assignment mark will be out of 50, with a maximum of 30 marks for correct code functionality, 10 marks for the code and code style, and 10 for the design documentation.

Please follow the submission guidelines (otherwise, up to 10% of the full mark for the assignment may be lost).

The penalty for late submission of assignments is 6% of the maximum possible mark per day of being late. (Note that this is deducted from the achieved mark, not the maximum achievable mark - see the course outline page for an example.) No assignment will be accepted later than one week after the deadline.

There are harsh penalties for plagiarism (e.g., copying and teamwork outside of official assignment teams), including 0FL for the whole course. Please see the course outline page for details.

Hacker's Edition

Important note: You need special permission to submit a solution to the following challenge instead of the basic assignment. To receive this permission, you will usually be required to produce a working version of the basic assignment. You will then modify this to work without any central resource. Getting into time problems, because you started out on implementing the Hacker's Edition and found it takes more time than you thought is not a valid excuse for a late submission.

The challenge ammounts to implementing a DSM system that does not use a central allocator but is based upon peer-communication among the node processes. Such a system should not use any central resource, so that it scales well.

In essence, you should implement page table management as used in the Ivy system with the "Dynamic Distributed Caching Protocol" (cf. the book by Coulouris, Dollimore & Kindberg). This protocol does not use a central allocator. Instead, the page table on the nodes is extended with a hint that specifies the "probable owner" of a page. In the case of a page fault, a node first contacts the probable owner of that page. If the probable owner no longer owns the page, it relays its own "probable owner" information to the node that received the fault. The faulting node continues its search for the actual owner using this new information. (An alternative would be for the initially suspected owner to directly relay the request to the node that it regards to be the probable owner. This saves communication, but requires some more effort to implement.)

Solutions to this advanced part of the assignment will only be accepted with detailed documentation of the adopted solution, which must include a precise description of the protocol for handling page faults, transferal of page ownership, and write-invalidate messages. Include this in the DESIGN file.

You can earn up to 10 extra marks (20% of the maximum mark for this assignment) in the Hacker's Edition. These bonus marks contribute only to the assignment mark of the course (and not to the exam mark).


1. Under the write-invalidate policy, at most one thread is allowed write access to a given memory object. Hence, before a write may be acted on, the thread must aquire write permission for that memory object, which includes the invalidation of all other copies of the same object. If multiple copies of a memory object can exist for read access, this policy is known as multiple-reader/single-writer.


COMP9243, School of Computer Science and Engineering, University of New South Wales CRICOS Provider Number: 00098G

This page is maintained by cs9243@cse.unsw.edu.au Last modified: Tuesday, 19-May-2009 08:51:50 EST

Top Of Page

 ###
Site maintained by webmistress@cse.unsw.edu.au
Please read the UNSW Copyright & Disclaimer Statement