Week 02
Efficiency |
Program Properties | 2/46 |
Recall that we want our programs to be:
... Program Properties | 3/46 |
Trade-off: efficiency vs clarity
"Premature optimization is the root of all evil" ... Don Knuth
"KnuthAtOpenContentAlliance" by Flickr user Jacob Appelbaum, uploaded to en.wikipedia by users BeSherman, Duozmo - Flickr.com (via en.wikipedia). Licensed under CC BY-SA 2.5 via Wikimedia Commons
Program/Algorithm Efficiency | 4/46 |
How to determine efficiency?
Measure program execution costs (experimental)
... Program/Algorithm Efficiency | 5/46 |
Example: finding max value in an unsorted array
int findMax(int a[], int N) { int i, max = a[0]; for (i = 1; i < N; i++) if (a[i] > max) max = a[i]; return max; }
Core operation? ... compare a[i]
max
How many times? ... clearly N-1
Execution cost grows linearly (i.e. 2 × #elements ⇒ 2 × time)
... Program/Algorithm Efficiency | 6/46 |
Example: finding max value in a sorted (ascending) array
int findMax(int a[], int N) { return a[N-1]; }
No iteration needed; max is always last.
Execution cost is constant (same regardless of #elements)
... Program/Algorithm Efficiency | 7/46 |
Example: finding given value k
int search(int k, int a[], int N) { int i; for (i = 0; i < N; i++) if (a[i] == k) return i; return -1; }
Core operation? ... compare a[i]
k
How many times? ... not so straightforward
Need to consider best/worst/average-case costs.
... Program/Algorithm Efficiency | 8/46 |
Best case: find k
a[0]
Worst case: no k
k
a[N-1]
N
Average case: find k
a
N
Could devise "overall average" if we know likelihood of each case.
If not, take pessimistic view ... worst case.
In fact, both worst and average cases grow linearly.
Algorithmic Complexity | 9/46 |
Cost is primarily interesting for large data
Definition: g(n) ∈ O(f(n)) if g(n) ≤ c.f(n) ∀ n > N0
Complexity classes
... Algorithmic Complexity | 10/46 |
Algorithms are "rated" by their complexity class.
Thus we might say "quicksort has worst case complexity O(n2) "
Assigning an algorithm to a complexity class ...
Exercise 1: Assigning Complexity Class | 11/46 |
Give complexity classes for above functions:
Abstract Data Types |
Abstract Data Types | 13/46 |
A data type is ...
FILE *
DTs, ADOs, ADTs, GADTs | 14/46 |
We want to distinguish ...
Set a, b, c;
GADTs ⇒ multiple instances/types (e.g. Set<int> a; Set<char> b;
Warning: Sedgewick's first few examples are ADOs, not ADTs.
Interface/Implementation | 15/46 |
ADT interface provides
FILE*
Collections | 16/46 |
Many of the ADTs we deal with ...
... Collections | 17/46 |
Collection structures:
... Collections | 18/46 |
Or even a hybrid structure like ...
... Collections | 19/46 |
Or this ...
... Collections | 20/46 |
Or this ...
... Collections | 21/46 |
Typical operations on collections
Example ADT: Sets of Integers |
Set ADT | 23/46 |
Set data type: collection of unique integer values.
"Book-keeping" operations:
Set newSet()
void dropSet(Set)
void showSet(Set)
{1,2,3...}
void readSet(FILE*,Set)
Set SetCopy(Set)
... Set ADT | 24/46 |
Data-type operations:
void SetInsert(Set,int)
void SetDelete(Set,int)
int SetMember(Set,int)
Set SetUnion(Set,Set)
Set SetIntersect(Set,Set)
int SetCard(Set)
Set
Exercise 2: pre- and post-conditions | 25/46 |
Each Set
Express these semantics in detail via statements of:
assert()
But only during the development/testing phase
assert()
... Set ADT | 26/46 |
Example set client: set of small odd numbers
Set s; int i; s = newSet(); for (i = 1; i < 26; i += 2) SetInsert(s,i); showSet(s); putchar('\n');
Outputs:
{1,3,5,7,9,11,13,15,17,19,21,23,25}
Set Applications | 27/46 |
Example: eliminating duplicates
// scan a list of items in a file Set seenItems = newSet(); FILE *in = fopen(FileName,"r"); while (fscanf(in, "%d", &item) == 1) { if (SetMember(seenItems, item))// ignore, already processed ; else { SetInsert(seenItems, item); process item; } } fclose(in);
Exercise 3: Set Lab | 28/46 |
Build an interactive tool to manipulate Set
Has 26 "built-in" Set
a
z
With commands:
s S = show Set S i V S = insert value V in Set S d V S = remove value V from Set S m V S = check if value V is in Set S c S = cardinality of Set S (#elems) + S T R = put (S Union T) in Set R * S T R = put (S Intersect T) in Set R r F S = read values from file F into Set S q = quit
Exercise 4: Set ADT Pre/Post-conditions | 29/46 |
If x
T
x
*x
*x
T
Set
Suggest an implementation for an isValid()
Sets as Unsorted Arrays | 30/46 |
... Sets as Unsorted Arrays | 31/46 |
Costs for set operations on unsorted array:
Sets as Sorted Arrays | 32/46 |
Costs for set operations on sorted array:
Sets as Linked Lists | 33/46 |
... Sets as Linked Lists | 34/46 |
Costs for set operations on linked list:
If we don't have nelems
Sets as Hash Tables | 35/46 |
... Sets as Hash Tables | 36/46 |
A hash table is a data structure that
... Sets as Hash Tables | 37/46 |
Hash function: h(value) = (value % TABSIZE).
Operations insert, delete, member
Hash Table Costs | 38/46 |
Costs for set operations via hash table:
Sets as Bit-strings | 39/46 |
... Sets as Bit-strings | 40/46 |
Restrict possible values that can be stored in the Set
bit|1
bit&0
i
#define NWORDS ??? unsigned int bits[NWORDS];
For most common C implementations, gives us #bits = 32*NWORDS
This gives us a set which can hold values in range 0..(#bits-1)
Exercise 5: Implement Bit operations | 41/46 |
Given the following definitions:
#define NBITS 1024 #define NWORDS (NBITS/32) typedef unsigned int Word; typedef Word BitS[NWORDS];
Implement functions for
Exercise 6: Implement Union via Bit Strings | 42/46 |
Given the following definitions:
#define NBITS 1024 #define NWORDS (NBITS/32) typedef unsigned int Word; typedef Word BitS[NWORDS]; BitS a, b, c;
Implement a function to store the union of a
b
c
void BitUnion(BitS a, BitS b, BitS c);
Do it once using getBit()
Setting and unsetting bits | 43/46 |
Setting and unsetting bits by &
|
... Setting and unsetting bits | 44/46 |
Powers of two by bit-shifting
Don't use pow(...)
math.h
Performance of Set Implementations | 45/46 |
Performance comparison:
Data Structure | insert | delete | member | ∪, ∩ | storage |
unsorted array | O(n) | O(n) | O(n) | O(n.m) | O(N) |
sorted array | O(n) | O(n) | O(log2n) | O(n+m) | O(N) |
unsorted linked list | O(n) | O(n) | O(n) | O(n.m) | O(n) |
sorted linked list | O(n) | O(n) | O(n) | O(n+m) | O(n) |
hash table (lists) | O(n) | O(n) | O(n) | O(n+m) | O(n+H) |
bit-maps | O(1) | O(1) | O(1) | O(1) | O(N) |
n,m = #elems, N = max #elems, H = size of hash table
... Performance of Set Implementations | 46/46 |
Notes on performance differences ...
Assume O(1) card
For sorted array, insert cost is: cost of find + cost to interpolate
Cost = log2n tests + (avg)n/2 moves ⇒ O(n)
Two O(n) costs: linked list = k.n, hash table = j.n but j≪k (lists are shorter)
If lists are sorted, use merge for ∪ and ∩ so O(n+m)
Bit-map version restricts range of possible elements to 0..N-1
Produced: 24 Jul 2017