In this post we'll see how to write program in Java for KMP algorithm.
The KMP (Knuth-Morris-Pratt) algorithm is an efficient algorithm for finding occurrences of a pattern string with in a given string. This algorithm does pattern searching in linear time by creating a longest proper prefix which is also a suffix (LPS) array as a preprocessing step. This LPS array helps in reducing redundant comparisons.
Refer this post to learn more about "longest proper prefix which is also a suffix of a given string" algorithm and its Java implementation- Longest Prefix Suffix Java Program
To understand KMP algorithm better you should also have idea about pattern searching algorithm using brute force approach. So, first we'll see how to write a Java program for pattern searching algorithm using brute force approach
Pattern searching in a String - Java program (Brute force approach)
In this approach you start from index 0 of the string in an outer loop and then try to match character by character with the pattern string in an inner loop. If there is a mismatch at any point, you start from the next value of i in the outer loop.
public class PatternSearch { public static void main(String[] args) { String sentence = "abcdeabcdf"; String word = "abcdf"; stringPatternSearch(sentence, word); } static void stringPatternSearch(String str, String pattern) { boolean patternFound = false; for(int i = 0; i <= str.length() - pattern.length(); i++) { System.out.println("Starting from i " + i); int j = 0; for(; j < pattern.length(); j++) { System.out.println("j value " + j); if(str.charAt(i+j) == pattern.charAt(j)) { continue; }else { break; } } if(j == pattern.length()) { System.out.println("Found pattern at index " + i); patternFound = true; } } if(!patternFound) { System.out.println("Pattern not found "); } } }Output
Starting from i 0 j value 0 j value 1 j value 2 j value 3 j value 4 Starting from i 1 j value 0 Starting from i 2 j value 0 Starting from i 3 j value 0 Starting from i 4 j value 0 Starting from i 5 j value 0 j value 1 j value 2 j value 3 j value 4 Found pattern at index 5
As you can see from the output for String as "abcdeabcdf" and pattern as "abcdf", when i=0, there is a match from 0..3 before a mismatch. Then the comparison starts from i=1 and j=0 again.
For String of length n and pattern of length m, in worst case, for each character in String you may match up to m-1 characters before finding a mismatch making the time complexity for this approach as O(n X m).
KMP algorithm for pattern searching - Java Program
KMP algorithm tries to avoid these redundant comparisons as we saw in brute force approach by using the LPS array.
By using LPS array, it can deduce how many characters from the previous match can be reused, avoiding redundant checking, especially with repeated characters.
KMP algorithm steps are as-
- Start loop for the String length. 0<=i<=stringlength
- Match each character in the String with each character in the pattern string, if character matches, move to the next character in both string and pattern. If match continues for the whole length of the pattern that means pattern is found in the String.
- If there is a mismatch-
- pattern pointer j is not 0 then consult the LPS array to find the previous longest prefix that is also the suffix and move j by that length rather than starting from 0 again.
- If j= 0, then increment i by 1. So, you don't reset i as done in the brute force approach you keep incrementing it to the next value.
Will try to clarify these conditions with example strings after the code.
public class KMPPatternSearch { public static void main(String[] args) { String sentence = "abcdeabcdf"; String word = "abcdf"; KMPSearch(sentence, word); } static void KMPSearch(String str, String pattern) { int patternLength = pattern.length(); int strLength = str.length(); // LPS array int lps[] = new int[patternLength]; // track index for pattern int j = 0; // calculate lps array computeLPSArray(pattern, patternLength, lps); System.out.println("LPS Array- " + Arrays.toString(lps)); // track index for string int i = 0; while (i < strLength) { System.out.println("Starting from i " + i + " and j "+ j); if (pattern.charAt(j) == str.charAt(i)) { j++; i++; if (j == patternLength) { System.out.println("Found pattern at index " + (i - j)); j = lps[j - 1]; } } // mismatch after j matches else { // Skip the previously matching characters if (j != 0) j = lps[j - 1]; else i = i + 1; } } } static void computeLPSArray(String pattern, int patternLength, int[] lps) { int length = 0; int i = 1; // lps[0] is always 0 lps[0] = 0; while (i < patternLength) { if (pattern.charAt(i) == pattern.charAt(length)) { lps[i] = ++length; i++; } else { if (length != 0) { // Use LPS array to find a smaller prefix length = lps[length - 1]; } else { lps[i] = 0; i++; } } } } }Output
LPS Array- [0, 0, 0, 0, 0] Starting from i 0 and j 0 Starting from i 1 and j 1 Starting from i 2 and j 2 Starting from i 3 and j 3 Starting from i 4 and j 4 Starting from i 4 and j 0 Starting from i 5 and j 0 Starting from i 6 and j 1 Starting from i 7 and j 2 Starting from i 8 and j 3 Starting from i 9 and j 4 Found pattern at index 5
As you can see from the output for String as "abcdeabcdf" and pattern as "abcdf". The LPS array is [0, 0, 0, 0, 0] as there are no matching prefixes and suffixes.
In the matching logic, when i=0, there is a match from 0..3 before a mismatch. At that point both i and j have value as 4. Since previous index value in LPS is 0 so j becomes 0 but i remains at 4, it is not reset to 1.
If you run with the following parameters-
String sentence = "abababc"; String word = "ababc";
Then the output is-
LPS Array- [0, 0, 1, 2, 0] Starting from i 0 and j 0 Starting from i 1 and j 1 Starting from i 2 and j 2 Starting from i 3 and j 3 Starting from i 4 and j 4 Starting from i 4 and j 2 Starting from i 5 and j 3 Starting from i 6 and j 4 Found pattern at index 2
The LPS array is [0, 0, 1, 2, 0] because of the matching prefixes and suffixes.
In the matching logic, when i=0, there is a match from 0..3 before a mismatch. At that point both i and j have value as 4.
So, j = lps[j-1] for j = 4 means j will have value lps[3] which is 2. After the mismatch the next comparison will start from j=2, not again from j=0.
This works, because we know the previous longest prefix suffix is "abab" or we can say that because of the previous value in LPS array we know that substring for j=0..1 in the pattern matches the substring for j = 2..3.
So, no need to start from j=0 as suffix "ab" of the matched portion is equal to the "ab" prefix of the pattern.
Time and space complexity of KMP algorithm
For String of length n and pattern of length m, you traverse the pattern to create LPS array. Which means time complexity of O(m).
While searching the pattern string n is traversed once, because of LPS array redundant comparisons are avoided.
Thus, the time complexity of KMP algorithm is O(n+m)
Auxiliary space for the LPS array of size m is required, thus the space complexity of KMP algorithm is O(m).
That's all for this topic KMP Algorithm For Pattern Search - Java Program. If you have any doubt or any suggestions to make please drop a comment. Thanks!
>>>Return to Java Programs Page
Related Topics
You may also like-
No comments:
Post a Comment