Thesis Topic Details

Topic ID:
Automatic Checkpointing for Cloud Management
Ingo Weber
Research Area:
Cloud Computing, Distributed Systems, Artificial Intelligence
Associated Staff
Hiroshi Wada
Topic Details
R & D
Group Suitable:
COMP9322 or COMP9423
With the advent of cloud computing and related developments, more and more capabilities become available as APIs. For instance, instead of ordering a new server, waiting a few weeks, and installing the new server in one.s network, nowadays a few API calls suffice to get hold of a new server in a public cloud. While such powerful APIs can provide enormous increases in productivity and time-to-solution, they open new possibilities significant mishaps . e.g., if an administrator inadvertently deletes a virtual disk, all of the contained data is irrecoverably lost. In essence, many administrators operate without a safety net.

In our work, we investigate the undoability of changes. On the one hand, we can check which operations can be undone, and under which circumstances. On the other hand, if undo is required, we can find a sequence of operations that brings a system back to a previously defined, desirable state: a checkpoint. Both techniques make use of Artificial Intelligence (AI) planning, and have been published - see below.

One problem of our undo approach is that it relies on the user setting a checkpoint before doing anything critical. If the user fails to do so, undo is not possible later on. The research question is: when should checkpoints be set, and how can we build / modify systems which automatically set checkpoints? This is a question both for manual cloud operation as well as for automatic cloud management tools - both can benefit from the approach, but the techniques for automatic checkpointing will be slightly different.

In the context of this work, there are numerous open topics for future research, see
/ other topics in the database supervised by me.

Previous publications:
- Ingo Weber, Hiroshi Wada, Alan Fekete, Anna Liu and Len Bass. Supporting undoability in systems operations. USENIX Large Installation System Administration Conference (LISA), 2013.
- Ingo Weber, Hiroshi Wada, Alan Fekete, Anna Liu and Len Bass. Automatic undo for cloud management via AI planning. Workshop on Hot Topics in System Dependability, 2012.
Students will work closely with senior researchers at National ICT Australia (NICTA) in a very friendly, diverse team environment. Suitable for students interested in software design, architecture, and practical industry development methods.
Students will be exposed to latest cloud technologies and advanced methods from business process management / process mining.
Past Student Reports
No Reports Available. Contact the supervisor for more information.

Check out all available reports in the CSE Thesis Report Library.

NOTE: only current CSE students can login to view and select reports to download.