Performance Analysis

COMP2521 20T2 ♢ Performance Analysis [0/19]
❖ Performance

Why do we care about performance?

Good performance less hardware, happy users.

Bad performance more hardware, unhappy users.

Generally:   performance = execution time

Other measures: memory/disk space, network traffic, disk i/o.

Execution time can be measured in two ways:

COMP2521 20T2 ♢ Performance Analysis [1/19]
❖ ... Performance


In the (distant) past, performance was a significant problem

Unfortunately, there is usually a trade-off between ...
Knuth: "Premature optimization is the root of all evil"
COMP2521 20T2 ♢ Performance Analysis [2/19]
❖ Development Strategy

A pragmatic approach to efficiency:

Points to note:
Pike: "A fast program that gets the wrong answer saves no time."
COMP2521 20T2 ♢ Performance Analysis [3/19]
❖ ... Development Strategy


Strategy for developing efficient programs:

  1. Design the program well
  2. Implement the program well   (choose good algorithms)
  3. Test the program well
  4. Only after you're sure it's working, measure performance
  5. If (and only if) performance is inadequate, find the "hot spots"
  6. Tune the code to fix these
  7. Repeat measure-analyse-tune cycle until performance ok
COMP2521 20T2 ♢ Performance Analysis [4/19]
❖ Performance Analysis

Complexity/estimates give some idea of performance in advance.

Often, however ..

Best way to evaluate performance: measure program execution.

Performance analysis can be:

COMP2521 20T2 ♢ Performance Analysis [5/19]
❖ ... Performance Analysis

Coarse-grained performance analysis

The Unix time command provides a suitable mechanism

$ time ./myProg < LargeInput > /dev/null
real  0m5.064s
user  0m4.113s
sys   0m0.802s

COMP2521 20T2 ♢ Performance Analysis [6/19]
❖ ... Performance Analysis

Components of Unix time output:

real time

user time sys time
COMP2521 20T2 ♢ Performance Analysis [7/19]
❖ ... Performance Analysis


Using time for performance analysis

Note: on a very-lightly-loaded system  user+sysreal

Note also: some versions of time have a different output format

COMP2521 20T2 ♢ Performance Analysis [8/19]
❖ ... Performance Analysis


Decades of empirical study of program execution have shown ...

The 90/10 rule generally holds   (or 80/20 rule or ...):

Implications:

To significantly improve performance: make bottlenecks faster.
COMP2521 20T2 ♢ Performance Analysis [9/19]
❖ Profiles


Need a method for locating hot spots

An execution profile for a program is

Cost may be measured via Profiles typically collected at function level   (i.e. code block = function).
COMP2521 20T2 ♢ Performance Analysis [10/19]
❖ ... Profiles

A profile shows how much time is spent in each code block ...

[Diagram:Pics/prof-plot.png]

Software tools can generate profiles of program execution.

COMP2521 20T2 ♢ Performance Analysis [11/19]
gprof: A Profiler

The gprof command displays execution profiles ...

Example of use:

$ gcc -pg -o xyz xyz.c
$ xyz < data > /dev/null
$ gprof xyz | less

For further usage details, man gprof.

COMP2521 20T2 ♢ Performance Analysis [12/19]
❖ ... gprof: A Profiler

The gprof command works at the function level.

It gives a flat profile containing:

It also gives a call graph, containing:

COMP2521 20T2 ♢ Performance Analysis [13/19]
❖ Profile Example

Consider the following program ...

// search for words in text containing a given substring
// display each such word once (in alphabetical order)

int main(int argc, char*argv[])
{
    char  word[MAXWORD];  // current word
    List  matches;        // list of matched words
    char  *substring;     // string to look for
    FILE  *input;         // the input file

    // ... Check command-line args, open input file ... 

    // Process the file - find the matching words
    matches = newList();
    while (getWord(input, word) != NULL) {
        if (contains(word,substring)
                 && !member(matches,word))
            matches = insert(matches,word);
    }
    printWords(matches);
    return 0;
}

COMP2521 20T2 ♢ Performance Analysis [14/19]
❖ ... Profile Example

Flat profile for this program (xwords et data3):

 %  cumulative     self            self    total           
time   seconds  seconds  calls  us/call  us/call  name    
75.00     0.03     0.03  30212     0.99     0.99  getWord
25.00     0.04     0.01  30211     0.33     0.33  contains
 0.00     0.04     0.00    489     0.00     0.00  member
 0.00     0.04     0.00    267     0.00     0.00  insert
 0.00     0.04     0.00      1     0.00 40000.00  main
 0.00     0.04     0.00      1     0.00     0.00  printWords

The flat profile shows which functions contribute most to the execution cost.

For more info on how to interpret gprof flat profiles, look at:

Note:   wc data3    7439 30211 188259.

COMP2521 20T2 ♢ Performance Analysis [15/19]
❖ ... Profile Example

 %  cumulative     self            self    total           
time   seconds  seconds  calls  us/call  us/call  name    
75.00     0.03     0.03  30212     0.99     0.99  getWord
25.00     0.04     0.01  30211     0.33     0.33  contains
 0.00     0.04     0.00    489     0.00     0.00  member
 0.00     0.04     0.00    267     0.00     0.00  insert
 0.00     0.04     0.00      1     0.00 40000.00  main
 0.00     0.04     0.00      1     0.00     0.00  printWords

We can learn quite a lot from this:

COMP2521 20T2 ♢ Performance Analysis [16/19]
❖ ... Profile Example

Call graph for the same execution (xwords et data3):

index %time   self  children    called     name
              0.00    0.04       1/1           _start [2]
[1]   100.0   0.00    0.04       1         main [1]
              0.03    0.00   30212/30212       getWord [3]
              0.01    0.00   30211/30211       contains [4]
              0.00    0.00     489/489         member [5]
              0.00    0.00     267/267         insert [6]
              0.00    0.00       1/1           printWords [7]
-----------------------------------------------
[2]   100.0   0.00    0.04                 _start [2]
              0.00    0.04       1/1           main [1]
-----------------------------------------------
              0.03    0.00   30212/30212       main [1]
[3]    75.0   0.03    0.00   30212         getWord [3]
-----------------------------------------------
              0.01    0.00   30211/30211       main [1]
[4]    25.0   0.01    0.00   30211         contains [4]
----------------------------------------------
              0.00    0.00     489/489         main [1]
[5]     0.0   0.00    0.00     489         member [5]
-----------------------------------------------
              0.00    0.00     267/267         main [1]
[6]     0.0   0.00    0.00     267         insert [6]
-----------------------------------------------
              0.00    0.00       1/1           main [1]
[7]     0.0   0.00    0.00       1         printWords [7]

COMP2521 20T2 ♢ Performance Analysis [17/19]
❖ ... Profile Example

What does each entry mean?

index %time   self  children    called     name
              0.00    0.04       1/1           _start [2]
[1]   100.0   0.00    0.04       1         main [1]
              0.03    0.00   30212/30212       getWord [3]
              0.01    0.00   30211/30211       contains [4]
              0.00    0.00     489/489         member [5]
              0.00    0.00     267/267         insert [6]
              0.00    0.00       1/1           printWords [7]

This entry shows info about the main() function

COMP2521 20T2 ♢ Performance Analysis [18/19]
❖ ... Profile Example


The above call graph is unusual

Normally, each function would call other functions

For more info on how to interpret gprof call graphs, look at:

COMP2521 20T2 ♢ Performance Analysis [19/19]


Produced: 11 Jul 2020