Rabin-Karp Algorithm For Pattern Search - Java Program

In this post we'll see how to write a Java program for Rabin-Karp string matching algorithm.

Rabin-Karp string matching algorithm uses hashing to search for a pattern in a String. The idea is to calculate hash for the pattern and for the substring of the same length as pattern in the original string.

To determine that the pattern and substring are same you compare the calculated hash values.

If hash values match, then as an added precaution (to negate the scenario of hash collision), you do character by character comparison also for the pattern and that substring.

If hash values don't match, then in the original string you move by one character and recalculate hash for the next substring of the same length as pattern string.

For example, if the string is "aabcde" and the searched pattern is "abc" then calculate hash for the pattern string "abc" and start with the substring of the same length in the original string which will be "aab" and calculate hash for this substring.

Then compare the calculated hash of the pattern string with the hash of the substring. If they don't match then slide by one character in the original string to calculate hash for the next substring.

This sliding or rolling of hash is the core idea in the Rabin-Karp algorithm to make the algorithm efficient.

Rolling hash in Rabin-Karp algorithm

The concept of rolling hash allows you to compute the hash of a substring in constant time by reusing the already calculated hash of the previous substring. Instead of recalculating hash for the next substring from scratch, sliding window concept is used to roll the hash by-

Removing the hash of the leftmost character
Adding a new character at the end

Rolling hash formula in the Rabin-Karp algorithm

If str is the text, patLength is the length of the pattern string, strHash is the hash of the current substring, base and modulo are the base and prime modulus variables respectively and h is the precomputed weight, then the rolling hash formula is

strHash = (BASE * (strHash - str.charAt(i) * h) + str.charAt(i + patLength)) % MODULO;

Base and modulus in the Rabin-Karp algorithm

How base and modulo are chosen for calculating hash helps in the proper distribution of hash and preventing many hash collisions.

The base in Rabin-Karp should be at least the size of the number of possible characters used to represent the string, ensuring each character can be uniquely represented. For example, if using extended ASCII characters, the base should be at least 256 to cover all characters from 0 to 255.

The modulus is typically a large prime number which helps in distributing hash values more uniformly, thus reducing the likelihood of hash collisions. We have seen the formula for calculating hash in the above section and that can give a very large value; by using the modulus operation the size of the hash value is reduced to a smaller value that will be within the range of data type used (int or long).

Rabin-Karp String Matching Algorithm - Java Program

public class RobinKarp {
  private static final int R = 256; // Number of characters in the input alphabet
  static final int MODULO = 1000000009; // A prime number for modulo operations

  public static void main(String[] args) {
    String str = "This is a test, this test should pass";
    String pattern = "test";
    
    //String str = "abcabcabc";
    //String pattern = "ab";
    searchPattern(str, pattern);

  }
  
  public static void searchPattern(String str, String pattern) {
    int patLength = pattern.length();
    int strLength = str.length();
    long patHash = 0;
    long strHash = 0;
    long h = 1;
    // precomputed weight of the polynomial hash
    for (int i = 0; i < patLength - 1; i++)
      h = (h * R) % MODULO;

    // calculate hash for the pattern string and the 
    // substring of the same length in the original string
    for(int i = 0; i < patLength; i++) {
      patHash = (R * patHash + pattern.charAt(i)) % MODULO;
      strHash = (R * strHash + str.charAt(i)) % MODULO;
    }

    int j;
    
    for(int i = 0; i <= strLength - patLength; i++) {    
      // if hashes are same
      if(patHash == strHash) {
        // do character by character comparison to confirm 
        // strings equality and ruling out hash collision
        for(j = 0; j < patLength; j++) {
          
          if(str.charAt(i+j) !=  pattern.charAt(j)) {
            break;
          }
        }
        if(j == patLength) {
          System.out.println("Pattern found at index " + i);
        }
      }
      // Need this check to avoid index out of bound for (i+patlength)
      if(i < strLength - patLength) {
        // remove the char at the beginning and add one at the end ** Sliding Window
        strHash = (R * (strHash - str.charAt(i) * h) + str.charAt(i + patLength)) % MODULO;
        // in case calculated hash is negative bring it with in range 0..MODULO-1
        if(strHash < 0) {
          strHash += MODULO;
        }
      }
    }  
  }
}

Output

Pattern found at index 10
Pattern found at index 21

Now, the confusion in this algorithm is why do need to pre-calculate this h variable and what exactly is happening in this formula for rolling hash

strHash = (R * (strHash - str.charAt(i) * h) + str.charAt(i + patLength)) % MODULO;

Let's try to understand with the help of one example string, String str = "abcabcabc"; and pattern to be searched as String pattern = "abc";

Here length of the pattern is 3 so value of h will be

h = R² % MODULO

and strHash, which is calculated using the following formula

strHash = (R * strHash + str.charAt(i)) % MODULO;

Let's remove the Modulo part to make it easy to follow, then it will have the values as

For S₀- strHash = R * 0 + S₀ = S₀
For S₁- strHash = R * (S₀) + S₁ = RS₀ + S₁
For S₂- strHash = R * (RS₀ + S₁) + S₂ = R²S₀ + RS₁ + S₂
For S₃- strHash = R * (R²S₀ + RS₁ + S₂) + S₃

But, if we are taking substrings of length 3 then we should have removed hash for S₀, while calculating hash for S₃, so strHash for S₃ will actually be-

strHash  = R * (RS₁ + S₂) + S₃ = R²S₁+RS₂+S₃

If you have observed the pattern, leftmost character always have the exponent for RADIX as patternLength - 1, which is R² in case pattern length is 3. That's what precomputed h also gives us.

Now, when you are removing S₀ and adding S₃, what you need to do is remove hash for S₀ and add S₃, in the formula if we replace the values (here again let's remove the Modulo to make it easy to follow)-

strHash = R * (R²S₀ + RS₁ + S₂ - S0 * R²) + S₃ = R * (R²S0 + RS₁ + S₂ - R²S₀) + S₃
   = R * (RS₁ + S₂) + S₃ = R²S₁ + RS₂ + S₃

Which is same as strHash of S₃ as calculated above.

Hope that makes it easy to understand why we keep precomputed value of h and how it is used to get rolling hash.

Time and Space complexity of Rabin-Karp algorithm

If original string is of length n and pattern string is of length m then-

Initial hash of the pattern and substring of the same length- O(m)

Rolling hash for the rest of the String- O(n-m) which can be considered O(n)

So, the best case scenario where there are no hash collisions (spurious hits) and no full comparisons then time complexity is O(n + m)

In worst case scenario time complexity is O(n X m), which may happen if there are spurious hits in each sliding window triggering many full length comparisons.

That's all for this topic Rabin-Karp Algorithm For Pattern Search - Java Program. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Java Programs Page

Related Topics

You may also like-

Tech Tutorials

Monday, October 6, 2025