Several problems on rolling polynomial hashes

Published Nov 04, 2019Last updated Nov 07, 2019

One of the crutial techniques on the interview is "string hashing".
It allows to solve many problems in reasonable time which is essential for the interview.
In this post I explain some basics and provide solutions for several typical string hashing problems.
In the solutions I tried to highlight with comments places where people often do mistakes.

Polynomial hash function for a string is fhash(s) = (sum_i (p^i * s_i)%M)%M.
We have to use mod M due to finite size of integers on computers.
If we don't employ %M, we basically rely on mod 2^31 (if you are using int). Sometimes it is fine but generally undesirable because powers of 2 are not prime numbers.

Why is it better to use prime number for mod?
Because in this case it is easy to find the inverse element in the field.
From small Fetmat theorem, a^(M-1) = 1 (mod M) which means that a^(M-2) is the inverse element in the field.
From my experience, don't use this fact on the interview because rarely people understand the proof easily but you spend precious time. I used this fact only in the problem in the appendix 1 which is more advanced than the others here.

Shortest palindrome

Solution for https://leetcode.com/problems/shortest-palindrome/

    const int p = 31; // must be greater than the alphabet size
    				  // I like primes but it is not 
                      /// necessary to use prime for p
    const int M = 1e9 + 7; 
    // I prefer always use mod prime even if we don't need to 
  // find a reverse element. Just for consistancy, 
    // it is faster to write one standard way
  // every time instead of creating something ad hoc
    string shortestPalindrome(string s) {
        const int n = s.size();
        if (s.size() == 0) return "";
        // compute the length of longest prefix palindrome
        long long fhash = 0, bhash = 0;
        int lenMaxPrefixPalindrome = 0;
        long long pp = 1; // use to store the power in the sum
        for (int i = 0; i < n; ++i) {
            int c = s[i] - 'a' + 1; // To have values in the range [1,27]
            // it is essential to increment by 1 to distinguish 'aa' from ''
            fhash = (fhash + (c*pp)%M)%M;
            pp = (p * pp)%M;
            
            bhash = (bhash * p)%M + c;
            
            if (fhash == bhash) {
                lenMaxPrefixPalindrome = i;
            }
        }
        lenMaxPrefixPalindrome += 1;
        // now just copy the not palindromic suffix in reversed order
    // to the buffer and than copy original string there
        string res(2*n - lenMaxPrefixPalindrome, 'a');
        int j = s.size() - 1;
        for (int i = 0; i < 2*n - lenMaxPrefixPalindrome; ++i) {
            if (i < n - lenMaxPrefixPalindrome) {
                res[i] = s[j];
                --j;
            } else {
                res[i] = s[i - (n - lenMaxPrefixPalindrome)];
            }
        }
        
        return res;
    }

Note that this problem can be also solved with KMP.
Although the solution is somewhat simpler, it requires you to remember prefix_function implementation which is tricky.
I put the solution using KMP to the Appendix.

Count palindromic substrings

Now the task is to count all the palindromic substrings.

using lint = long long int;
const lint M = 1e9 + 7;
const lint p = 31;

int getCode(char c) { return c - 'a' + 1; } // be careful, +1 is important

lint count(string const& s) {
    lint res = 0;

    int n = s.size();
    lint pp = 1;
    vector<lint> bh(n+1, 0);
    vector<lint> powers(n, 0);

    for (int i = n - 1; i >= 0; --i) {
        bh[i] = (bh[i+1] + (getCode(s[i])*pp)%M)%M;
        powers[n - i - 1] = pp;
        pp = (pp*p)%M;
    }

    for (int i = 0; i < n; ++i) {
        lint fh = 0, pp = 1;
        for (int j = i; j < n; ++j) {
            fh = (fh + (getCode(s[j])*pp)%M)%M;
            auto curBH = (bh[i] - bh[j+1] + M)%M; // + M because mod might be negative in C++
            auto curFH = (fh * powers[n - 1 - j])%M; // p^(n-1-j) to have the same power for forward as for backward.
            pp = (pp*p)%M;
            if (curBH == curFH) ++res;
        }
    }

Note, p^(n-1-j) comes from desired property p^(j-i + x) == p^(n-i-1) from which x = n-1-j.
It is because we want the highest power in forward hash to be equal to the one in backward.

There is also a linear solution for this problem, thanks to Manacher.
A good explanation can be found elsewhere yet in russian.

Appendix 1. A more complex problem

Below is in example of a problem where one need to use an inverse element in a field.
The problem is to find the longest common substring by given set of strings.

// http://codeforces.com/gym/100126/attachments/download/1321/20122013-tryenirovka-spbgu-b-15-heshi-ru.pdf
// Problem A
#include <bits/stdc++.h>
using namespace std;
typedef long long int lint;
typedef long double ldouble;

const lint p = 307;
const lint M = lint(1e9) + 7;

vector<lint> mem_pow; // it makes sense to store only F(l) = (p^(M-2)*p^l)%M

vector<string> strs;
vector< vector<lint> > phashes;
void computePHash(int i) {
    phashes[i].resize(strs[i].length()+1, 0);
    lint p_cur = 1;
    for (int l = 1; l <= strs[i].length(); ++l) {
        phashes[i][l] = (phashes[i][l-1] + strs[i][l-1]*p_cur)%M;
        p_cur = (p_cur*p)%M;
    }
}

lint powmod(lint x, lint y) {
    if (y == 0)
        return 1;
    lint prev = powmod(x, y/2);
    return ((prev * prev)%M * (y%2 ? x : 1) )%M;
}
// returns hash for [l, r) substring
lint h(int i, int l, int r) {
    lint dif = (phashes[i][r] - phashes[i][l]);
    while (dif < 0)
        dif += M;
    lint res = (dif *mem_pow[l])%M;
    return res;
}

string check(int len) {
    // compute hashes for all substrings of length len
    // if some hashes values exist for all strings - means that exist substring
    // in all strings, return true
    // map: hash -> bit mask (known that K <= 10)
    // K*N complexity
    int mask = (1<<strs.size()) - 1;

    unordered_map< lint, int > hashes(strs.size());
    for (int i = 0; i < strs.size(); ++i) {
        for (int l = 0; l <= strs[i].length() - len; ++l) {
            lint curh = h(i, l, l + len);
            hashes[curh] |= 1<<i;
            if (hashes[curh] == mask)
                return strs[i].substr(l, len);
        }
    }
    return "";
}   

string solve() { // KNLogN complexity
    auto minStrLen = 100000;
    for (auto& s : strs)
        minStrLen = min(minStrLen, (int)s.length());
        
    // standard binary search below
    string candidate;
    int l = 0, r = minStrLen+1;
    while (l+1 < r) {
        int m  = l + (r-l)/2;
        candidate = check(m);
        if (candidate.length() != 0) {
            l = m;
        } else {
            r = m;
        }
    }
    return candidate;
}

int main(int, char**) {
    std::ios::sync_with_stdio(false);
    ifstream fin("substr.in");
    ofstream fout("substr.out");

    // fillin pow_mem
    mem_pow.resize(10001, 1);
    mem_pow[1] = powmod(p, M-2);
    for (int i = 2; i < 10001; ++i)
        mem_pow[i] = (mem_pow[i-1]*mem_pow[1])%M;    

    int K;
    fin >> K;
    strs.resize(K);
    phashes.resize(K);
    for (int i = 0; i < K; ++i) {
        fin >> strs[i];
        // 1. compute prefix hash
        computePHash(i);
    }
    fout << solve() << endl;
    return 0;
}

Appendix 2. Shortes palindrome using KMP

Below is the implementaion using finction prefix_function which I recommend you to take from the e-maxx (in russian).

    vector<int> prefix_function(const string& s) {
        auto n = s.length();
        vector<int> pi(n);
        for (size_t i=1; i<n; ++i) {
            int j = pi[i-1];
            while (j > 0 && s[i] != s[j])
                j = pi[j-1];
            if (s[i] == s[j])  ++j;
            pi[i] = j;
        }
        return pi;
    }
    string shortestPalindrome(string s) {
        if (s.size() == 0) return s;
        int n = s.size();
        
        string rev = s;
        reverse(rev.begin(), rev.end());
        vector<int> pif = prefix_function(s + "#" + rev);
        
        return rev.substr(0, n - pif.back()) + s;
    }

C++Algorithm C

Report

Enjoy this post? Give Kirill Lykov a like if it's helpful.

Kirill Lykov

Programmer, mathematician

Sr Software engineer and scientist

Discover and read more posts from Kirill Lykov

get started