|

Naïve String Matching

 

The naïve approach simply test all the possible placement of Pattern P[1 . . m] relative to text T[1 . . n]. Specifically, we try shift s = 0, 1, . . . , n - m, successively and for each shift, s. Compare T[s +1 . . s + m] to P[1 . . m]

 

NAÏVE_STRING_MATCHER (T, P)

  1. n length [T]
  2. m length [P]
  3. for s 0 to n - m do
  4.     if P[1 . . m] = T[s +1 . . s + m]
  5.         then return valid shift s

 

The naïve string-matching procedure can be interpreted graphically as a sliding a pattern P[1 . . m] over the text T[1 . . n] and noting for which shift all of the characters in the pattern match the corresponding characters in the text.

In other to analysis the time of naïve matching, we would like to implement above algorithm to understand the test involves in line 4.

Note that in this implementation, we use notation P[1 . . j] to denote the substring of P from index i to index j.
That is,
P[1 . . j] = P[i] P[i +1] . . . P[j].

 

NAÏVE_STRING_MATCHER (T, P)

  1. n length [T]
  2. m length [P]
  3. for s 0 to n-m do
  4.     j 1
  5.    while j m and T[s + j] = P[j] do
  6.         j j +1
  7.     If j > m then
  8.         return valid shift s
  9. return no valid shift exist     // i.e., there is no substring of T matching P.

 

Referring to implementation of naïve matcher, we see that the for-loop in line 3 is executed at most n - m +1 times, and the while-loop in line 5 is executed at most m times. Therefore, the running time of the algorithm is O((n - m +1)m), which is clearly O(nm). Hence, in the worst case, when the length of the pattern, m are roughly equal, this algorithm runs in the quadratic time.

One worst case is that text, T, has n number of A's and the pattern, P, has (m -1) number of A's followed by a single B.