Thesis Topic Details

Topic ID:
Compiler Techniques for Improving Cache Performance for Multi-Core Architectures
Jingling Xue
Research Area:
Programming Languages
Associated Staff
Hui Wu
Topic Details
R & D
Group Suitable:
Knowledge on Computer Architecture and Programming Languages
As multicore processors emerge in mainstream systems, the last-level cache management presents a challenge. Some existing multicore processors use a shared last-level (on-chip) L2 cache to maximize the on-chip capacity and minimise off-chip cache misses (e.g., IBM Power5) while others use private L2 caches to reduce cache access latency. In future multicores (with increasingly more cores), the last-level cache is likely to be distributed across cores. For example, each core in TILE64 has its own L1 and L2 caches and a L3 cache that is considered an aggregate of all the L2 caches.

Some recently proposed hybrid cache designs use cache line replication and migration to balance latency and capacity, including cooperative caching, victim migration, victim replication and adaptive selective replication.

Some researchers have also been exploring the use of OS-level page allocation to manage shared and distributed caches. Static and dynamic page coloring (and recoloring) has been shown to be useful in improving the performance of multiprogrammed workloads and multi-thread workloads.

However, the compiler work for improving the performance of shared and distributed caches is still at its infancy. An early compiler work for SMP processors reduces cache conflicts by using the information available in the compiler (e.g., data access patterns) to direct operating system's page mapping policy. Due to the significant differences between SMP and multicore (i.e., CMP) processors, new compiler techniques for CMP processors are needed. In addition, traditional loop and data transformations for locality enhancement on uniprocessors cannot directly be applied to shared and distributed caches.

The aim of this on-going project is to develop compiler techniques for improving the performance of shared and distributed caches in current and future multicore processors. Such compiler techniques (e.g., compiler-directed cache partitioning) can be used together with OS-based page allocation techniques to reduce both off-chip misses and on-chip access latency (for distributed caches).

Past Student Reports
No Reports Available. Contact the supervisor for more information.

Check out all available reports in the CSE Thesis Report Library.

NOTE: only current CSE students can login to view and select reports to download.