FSAM: Sparse Flow-Sensitive Pointer Analysis for Multithreaded Programs
Authors
Yulei Sui    Peng Di    Jingling Xue   
School of Computer Science and Engineering , UNSW Australia
1. Description

FSAM is a new sparse Flow-Sensitive pointer Analysis for Multithreaded C programs (with Pthreads) conducted through a series of thread interference analysis phases.

2. Paper and Artifact

Sparse Flow-Sensitive Pointer Analysis for Multithreaded Programs 2016 International Symposium on Code Generation and Optimization (CGO' 16)

Virtual Image of FSAM Framework: FSAM.ova (4.5GB)

FSAM is built on top of SVF, a static value-flow analysis framework operating on LLVM-IR. FSAM will be integrated into SVF based on the latest release of LLVM (version 3.7.0). The core source code files reside under "include/MTA" and "lib/MTA" and can be downloaded here . Its executable file "mta" can be found at "Release+Asserts/bin/mta" after a successful build.

To help users reproduce the data in the paper, we have also provided a virtual machine image, which contains the source code of FSAM, together with scripts and benchmarks used in our experimental evaluation, as described in the paper.

The OS in the virtual machine image is Ubuntu 12.04. A user account has been created with both its username and password as "pta". To run FSAM, you are advised to allocate at least 16GB memory to the virtual machine. The traditional non-sparse analysis, denoted NonSparse in the paper, is slow and requires large memory, as a lower memory budget may force OS to kill the running process when it is used to analyse some large programs, e.g., bodytrack, x264. Finally, a VirtualBox with version 4.1.12 or newer is required to run the image.

3. License

GPLv3

4. Quick Guidelines

The FSAM project is located in the "pta" directory under the HOME directory of the user named "pta". To run the tests as we did in our evaluation, open a terminal and issue the following commands:

cd /home/pta/pta/ # Go to the FSAM project directory, denoted as $FSAMHOME
. ./setup.sh # Set up environment variables (note that there is a white space between the two dots)
cd cgo-exp # Go to the test directory

To reproduce our results for the 10 benchmarks evaluated, please do the following:

Collect the results of only FSAM in Figure 12:

./table2.sh # Run FSAM analysis only
./collect_table2.sh mta_res # Collect and print out the results to terminal

Collect the results of both FSAM and Non-Sparse analyses in Figure 12:

./table2.sh both # Run FSAM and Non-sparse analyses
./collect_table2.sh mta_res # Collect and print out the results to terminal

Collect the results in Table 2:

./figure12.sh # Run FSAM with each of its phases disabled
./collect_figure12.sh mta_res # Collect and print out the results to terminal

Please note that all the results related to the analysis times and memory usage reported in our paper were obtained on a 2.70GHz Intel Xeon 4-core CPU running Ubuntu Linux with 64GB memory, as described in Section 4.1. On this platform, obtaining the results for FSAM may take about 20 mins and obtaining the results for NonSparse may take more than five hours.

Initially, the users are advised to analyse a few benchmarks using the default configuration file 'list.stat'. To analyse all the benchmarks, please use another configuration file containing all the benchmarks:

cp list.stat.full list.stat

For comparison purposes, the experimental data used in the paper are provided under "pta/cgo-exp/mta_res_cgo/".

To reproduce the results shown in Table 2 and Figure 12 with the provided data, issue the following commands:

./collect_table2.sh mta_res_cgo
./collect_figure12.sh mta_res_cgo

You can also analyse a single program using run_single.sh under "pta/cgo-exp" folder, say, "word_count.orig", as follows:

./run_single.sh word_count.orig

If you would like to know more about how to use our pointer analysis framework or would like to add your own extensions, please refer to this wiki and this doxygen manual.

5. Benchmarks

The benchmarks used in our paper are stored under "cgo-exp" directory. There are 10 programs selected from Phoenix-3.0, PARSEC-3.0 and open-source applications:

word_count, kmeans, radiosity, ferret, bodytrack, raytrace, x264, automount, mt_daapd and httpd-server

The source code of each program is compiled into bit code files using clang and then merged together using LLVM Gold Plugin at link time stage (LTO) to produce a whole-program bc file.

We have also provided a micro-benchmark suite PTABen to validate the correctness of our pointer analysis algorithms for both single- and multi-threaded C programs. Please refer this link to see the source code hosted on GitHub. The layout of the test suite is as follows:

fi_tests : flow-insensitive tests
fs_tests : flow-sensitive tests
cs_tests : context-sensitive tests
path_tests : path-sensitive tests
complex_tests : complex test cases simplified from real programs
mta : pthread tests
scripts : scripts to run the tests

Some micro-benchmarks can be found under "/home/pta/pta/tests/". You can compile them into LLVM bitcode files by running:

clang -c -emit-llvm -g example.c -o example.bc # add "-g" option for debugging purpose
opt -mem2reg example.bc -o example.opt # optional

For each bitcode file (.opt file) generated, you can try to analyse it by using the script "run_single.sh" as before. For large programs with multiple bitcode files, please refer to

for details on how to link several bitcode files into one.

6. Printing Points-To Information

For the five toy programs given in Figure 1 of our paper, we describe below how to inspect the analysis results produced by the flow-insensitive Andersen's analysis (used as the pre-analysis in FSAM) and FSAM's flow-sensitive pointer analysis.

cd /home/pta/pta/
. ./setup
unzip toyprograms.zip
cd toyprograms
./run_all.sh # analyze the five programs one by one and dump the results to the terminal
./run_single.sh ex1.c.opt # analyze a single file

After a program has been analyzed, the points-to information, together with all the statistics, will be printed on the terminal. Note that when the "-print-pts" option is turned on, FSAM will first print the results from Andersen's analysis during its pre-analysis phase and then the results from its own flow-sensitive pointer analysis phase.

The following table shows the results for analyzing the five programs given in Figure 1 of our paper:

C file LLVM bit code file Andersen's points-to FSAM's flow-sensitive points-to PAG (Program Assignment Graph)
ex1.c ex1.c.opt.ll ex1.ander.pts ex1.fs.pts ex1.dot     ex1.pdf
ex2.c ex2.c.opt.ll ex2.ander.pts ex2.fs.pts ex2.dot     ex2.pdf
ex3.c ex3.c.opt.ll ex3.ander.pts ex3.fs.pts ex3.dot     ex3.pdf
ex4.c ex4.c.opt.ll ex4.ander.pts ex4.fs.pts ex4.dot     ex4.pdf
ex5.c ex5.c.opt.ll ex5.ander.pts ex5.fs.pts ex5.dot     ex5.pdf

Note that the symbol names in a PAG are extracted from its LLVM IR rather than its corresponding C program. For example, "y" in ex1.pdf represents the register pointer pointing to its global object (the address-taken variable with NodeID 10). As is standard, only the points-to results for the top-level pointers are printed, because the points-to results for the address-taken variables are not comparable between flow-sensitive and flow-insensitive analyses. (In a flow-sensitive analysis, separate solutions for different program points must be kept. In a flow-insensitive analysis, the single solution for all program points are maintained.)

For ex1.c and ex2.c (given Figure 1(a) and (b), respectively), both FSAM and Andersen's analysis have the same points-to results.

For ex3.c (Figure 1(c)), FSAM is more precise than Andersen's analysis when dereferencing the pointer "*p" at line 18 in ex3.c. The resulting pointer (with NodeID 50 representing the register pointer value "%4 = load i32** %3" in ex3.c.opt.ll) has different points-to sets in ex3.ander.pts and ex3.fs.pts:

NodeID 50 PointsTo: { 10 14 } # the points-to result in ex3.ander.pts
NodeID 50 PointsTo: { 10 } # the points-to result in ex3.fs.pts
Given a NodeID, its corresponding LLVM value is obtained by using PAGNode::getValue() method in file include/MemoryModel/PAGNode.h

For ex4.c (Figure 1(d)), both FSAM and Andersen's have the same points-to results.

For ex5.c (Figure 1(e)), FSAM is more precise than Andersen's analysis when dereferencing the pointer "*p" at line 21 in ex5.c. The resulting pointer (with NodeID 59 representing the register pointer value "%1 = load i32** %0" in "main" function of ex5.c.opt.ll) has different points-to sets in ex5.ander.pts and ex5.fs.pts:

NodeID 59 PointsTo: { 10 14 } # the points-to result in ex5.ander.pts
NodeID 59 PointsTo: { 10 } # the points-to result in ex5.fs.pts
7. Contacts
Feel free to contact : ysui AT cse.unsw.edu.au if you have any question.

This web site is hosted by the Programming Languages and Compiler Group at the University of New South Wales.