In this post we'll see how to write a Java program for Z algorithm which is used for pattern searching. In order to efficiently do searching in linear time i.e. O(n + m) there is pre-computation step in the Z algorithm to compute a Z array
Z array in the Z algorithm
For a string of length n, a Z array is also created of the same length and each index in the Z array i.e. Z[i] stores the length of the longest substring starting from i that is also a prefix of the string. Which means each index contains the length of the substring from i onward that matches the beginning of the String.
For example, if the String is "aabcbaab" then the computed Z array is-
[0, 1, 0, 0, 0, 3, 1, 0]
Z[0] is always 0 by convention.
Z[1] is 1 because from position one, single character matches from the beginning of the string.
Z[2], Z[3], Z[4] are having value as 0 as 'b', 'c' or 'b' doesn't match with 'a'.
Z[5] has value 3 because from index 5 we have a substring of length 3 'aab' that matches the prefix 'aab'.
Z[6] is 1 because there is only a single character match.
Z[7] is 0 because there is no match.
Optimization while calculating Z array
Rather than using nested loops, two pointers l and r are used to represent the borders of the current substring that matches the prefix of the string.
l is the start index of that substring and r is the end index of the substring.
In the String "aabcbaab", for index 5 we have a substring of length 3 matching the prefix of the same string.
With the help of these 2 pointers, it reuses previously computed values to avoid redundant comparisons, achieving O(n) time complexity. There are the following two scenarios-
When current index i is within the range of l and r, for example, if current index i is 6 then it is with in the range of l
and r when l is 5 and r is 7. Then Z[i] = min(r - i + 1, z[i - l])
because str[i] matches with str[i - l] for at least
right-i+1 characters. So, you can start comparison after those characters.
When current index i is greater than r which means it is outside the range of l and r then reset l and r to reflect current substring.
Note that r keeps moving only forward, it never decreases which avoids redundant comparisons.
Here is the Z function that computes the Z array.
public static int[] computeZArray(String s) { int len = s.length(); int[] z = new int[len]; // it’ll anyway initialize to 0, just to show it explicitly z[0] = 0; int l = 0, r = 0; for(int i = 1; i < len; i++) { if(i <= r) { z[i] = Math.min(r - i + 1, z[i - l]); } while((i+z[i]) < len && s.charAt(z[i]) == s.charAt(i + z[i])) { z[i]++; } if(i + z[i] - 1 > r) { l = i; r = i + z[i] - 1; } } return z; }
Z algorithm for pattern searching - Java Program
If the above precomputation of Z array is clear then there is not much left in the Z algorithm for pattern searching. Though there is one more pre-processing step here which is to combine the text and the pattern string using a character that doesn't appear in the string. In the program "$" is used to combine the strings.
Combined string = pattern + "$" + text
This combined string is passed to compute the Z array.
The idea here is; if pattern is considered the prefix and same pattern exists somewhere else in the text, then with in the Z array you should have the index that has the value equalling pattern length.
Here is the code snippet that shows the same logic-
for (int i = patLength + 1; i < z.length; i++) { if (z[i] == patLength) { // Match found at this index in text result.add(i - patLength - 1); } }
Here is the complete Z algorithm pattern searching program in Java.
import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class ZAlgorithm { public static void main(String[] args) { String str = "aabcbaab"; String pattern = "aab"; //int[] z = computeZArray(str); //System.out.println(Arrays.toString(z)); List<Integer>result = findPattern(pattern, str); System.out.println("Pattern found at " + result); } public static int[] computeZArray(String s) { int len = s.length(); int[] z = new int[len]; z[0] = 0; int l = 0, r = 0; for(int i = 1; i < len; i++) { if(i <= r) { z[i] = Math.min(r - i + 1, z[i - l]); } while((i+z[i]) < len && s.charAt(z[i]) == s.charAt(i + z[i])) { z[i]++; } if(i + z[i] - 1 > r) { l = i; r = i + z[i] - 1; } } return z; } // Finds all occurrences of pattern String // in the text using Z-algorithm public static List<Integer> findPattern(String pattern, String text) { String concat = pattern + "$" + text; int[] z = computeZArray(concat); System.out.println("Computed Z array- " + Arrays.toString(z)); List<Integer> result = new ArrayList<>(); int patLength = pattern.length(); for (int i = patLength + 1; i < z.length; i++) { if (z[i] == patLength) { // Match found at this index in text result.add(i - patLength - 1); } } return result; } }
Output
Computed Z array- [0, 1, 0, 0, 3, 1, 0, 0, 0, 3, 1, 0] Pattern found at [0, 5]
Time and space complexity of Z algorithm
If original string is of length n and pattern string is of length m then Z array computation is O(n + m) because both pattern string and original string are combined.
Pattern searching is done only for length of the original string so that is O(n).
So, the total time complexity is O(n + m) + O(n) which can be termed as O(n + m).
Auxiliary space is needed for creating a new combined string of size m + 1 + n and Z array is also created of same m + 1 + n length. So, the space complexity is also O(n + m).
That's all for this topic Z Algorithm For Pattern Search - Java Program. If you have any doubt or any suggestions to make please drop a comment. Thanks!
>>>Return to Java Programs Page
Related Topics
You may also like-
No comments:
Post a Comment