String matching consists of finding one or more generally, all of the occurrences of a pattern in a text. Finding a certain pattern in a text is a problem arises in text-editing programs and web "surfing". Here we study various fundamental text processing algorithms (CLR) for quickly performing string searching and string matching.
Problem Formulation
Given a text array, T[1 . . n], of n character and a pattern array, P[1 . . m], of m characters. The problem is to find an integer s, called valid shift where 0 £ s < n - m and T[s +1 . . . s + m] = P[1 . . m]. In other words, to find whether P in T i.e., where P is a substring of T. The elements of P and T are character drawn from some finite alphabet such as {0, 1} or {A, B, . . Z, a, b, . . . , z}.
Given a string T[1 . . n], the substrings is define as T[i . . j] for some 0 £ i £ j £ n -1, that is, the string formed by the characters in T from index i to index j, inclusive. This means that a string is a substring of itself (simply take i = 0 and j = m).
The proper substring, of string T[1 . . n] is T[i . . j] for some 0< i £ j £ n -1. That is, we must have either i > 0 or j < m -1.
Using these definition, we can say given any string T[1 . . n], the substrings are
T[i . . j] = T[i] T[i + 1] T[i + 2] . . . T[j] for some 0 £ i £ j £ n -1.
And proper substrings are
T[i . . j] = T[i] T[i + 1] T[i + 2] . . . T[j] for some 0 < i £ j £ n -1.
Note that if i > j, then T[i . . j] is equal to the empty string or null, which has length zero.
Using these notations, we can define of a given string T[1 . . n] as T[0 . . i] for some 0 £ i £ n -1 and suffix of a given string T[1- n] as T[i . . n -1] for some 0 £ i £ n -1.
For example, given string T[1 . . B] is
T: a b c a b a a b c a b a c
then a proper substring of T is
a b a a
a prefix string of T is
a b c
and a suffix string of T is
b a c
Suppose we given string w, x, y and z
w: a b c
x: a b c a
y: a b c a b
z: a b c a b a a b
Here
x is a substring of z.
y is a substring of z.
Since |x| £ |y| therefore, we have
x is a substring of y
ê y: a b c a b ê x: a b c a
In this case,
Since |y| ≥ |x|
therefore x is a substring of y
y: a b c a b
x: a b c a
In this case, since
|x| = |w|
therefore, x = y
x = a b c
w = a b c