❖ Strings |
A string is a sequence of symbols.
An alphabet Σ is the set of possible symbols in strings.
Examples of strings:
Examples of alphabets:
Often, the "symbols" are letters (so we call them characters from now on)
❖ ... Strings |
Notation:
❖ ... Strings |
Example ...
The string a/a
""
"a"
"a/"
"a/a"
"a/a"
"/a"
"a"
""
""
"a"
"/"
"a/"
"/a"
"a/a"
Note: ""
❖ ... Strings |
ASCII (American Standard Code for Information Interchange)
Often, the character with ascii code 0 is called "NUL" rather than "NULL"
❖ ... Strings |
Reminder ...
In C, a string is an array of char
'\0'
char str[] = "hello";
// same as
char str[] = {'h', 'e', 'l', 'l', 'o', '\0'};
Note: str[]
❖ ... Strings |
C provides string manipulation functions via #include <string.h>
Examples:
// return length of string int strlen(char *str) // copy one string into a string buffer char *strcpy(char *dest, char *src) // make a copy of a string (uses malloc()) char *strdup(char *) // concatenate two strings, into buffer holding dest char *strcat(char *dest, char *src) // find substring pat inside string str char *strstr(char *str, char *pat) // copy at most n chars from src into a dest buffer char *strncpy(char *dest, char *src, int n)
❖ ... Strings |
Details of one #include <string.h>
char *strncat(char *dest, char *src, int n)
src
dest
'\0'
dest
'\0'
dest[]
dest
n
strlen(src)
n
'\0'
dest
❖ Pattern Matching |
strstr()
The general string matching (aka pattern matching) problem
NULL
strstr(str,pat)
pat
str
T is sometimes called "haystack", P is sometimes called "needle"
Note that regular expression pattern matching is quite a different beast.
❖ ... Pattern Matching |
Brute-force pattern matching algorithm:
BruteForceMatch(T,P): | Input text T of length n, pattern P of length m | Output starting index of a substring of T equal to P | -1 if no such substring exists | | for all i = 0..n-m do | | j = 0 // check from left to right | | while j<m ∧ T[i+j]=P[j] do // test ith shift of pattern | | j = j+1 | | if j=m then | | return i // entire pattern checked | | end if | | end while | end for | return -1 // no match found
❖ ... Pattern Matching |
C version of above algorithm:
int bruteForce(char *t, char *p) { int n = strlen(t); //length of string int m = strlen(p); //length of pattern // for each starting position of pattern for (int i = 0; i <= n-m; i++) { // scan pattern, comparing against string for (int j = 0; j < m; j++) { // mismatch detected, terminate scan if (T[i+j] != P[j]) break; } // if reached end of pattern, have a match if (j == m) return i; } return -1; // no match }
❖ Complexity Analysis |
Brute-force pattern matching runs in O(n·m)
Examples of worst case (forward checking):
aaa … aaah
aaah
Produced: 25 Jul 2020