Friday, September 30, 2011

Find the smallest window in a string containing all characters of another string

Problem
Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

Example
S = "ADOBECODEBANC",  T = "ABC",
Minimum window is "BANC".

Note:
  • If there is no such window in S that covers all characters in T, return the emtpy string "".
  • If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.

Solution
 Method 1 ( Brute force solution )
a) Generate all substrings of string1 (“this is a test string”)
b) For each substring, check whether the substring contains all characters of string2 (“tist”)
c) Finally print the smallest substring containing all characters of string2.

Method 2 - Using 2 hashmaps
The main idea is to maintain two pointers which are called begin and end, then use two HashMap to record current status, one record what char need to find, another used to record what has found.
if the two HashMaps show current substring of S has contain all of the chars in T, then we try to shrink our start and end pointer 

Lets see the example
string1 = "acbbaca" and string2 = "aba".
Here, we also use the term "window", which means a contiguous block of characters from string1 (could be interchanged with the term substring).
  1. string1 = "acbbaca" and string2 = "aba".

  2. The first minimum window is found. Notice that we cannot advance begin pointer as hasFound['a'] == needToFind['a'] == 2. Advancing would mean breaking the constraint.
  3. The second window is found. begin pointer still points to the first element 'a'. hasFound['a'] (3) is greater than needToFind['a'] (2). We decrement hasFound['a'] by one and advance begin pointer to the right.
  4. We skip 'c' since it is not found in string2. Begin pointer now points to 'b'. hasFound['b'] (2) is greater than needToFind['b'] (1). We decrement hasFound['b'] by one and advance begin pointer to the right.
  5. Begin pointer now points to the next 'b'. hasFound['b'] (1) is equal to needToFind['b'] (1). We stop immediately and this is our newly found minimum window.

The idea is mainly based on the help of two pointers (begin and end position of the window) and two tables (needToFind and hasFound) while traversing string1. needToFind stores the total count of a character in string2 and hasFound stores the total count of a character met so far. We also use a count variable to store the total characters in string2 that's met so far (not counting characters where hasFound[x] exceeds needToFind[x]). When count equals string2's length, we know a valid window is found.

Each time we advance the end pointer (pointing to an element x), we increment hasFound[x] by one. We also increment count by one if hasFound[x] is less than or equal to needToFind[x]. Why? When the constraint is met (that is, count equals to string2's size), we immediately advance begin pointer as far right as possible while maintaining the constraint.

How do we check if it is maintaining the constraint? Assume that begin points to an element x, we check if hasFound[x] is greater than needToFind[x]. If it is, we can decrement hasFound[x] by one and advancing begin pointer without breaking the constraint. On the other hand, if it is not, we stop immediately as advancing begin pointer breaks the window constraint.

Finally, we check if the minimum window length is less than the current minimum. Update the current minimum if a new minimum is found.
Essentially, the algorithm finds the first window that satisfies the constraint, then continue maintaining the constraint throughout.

Code in java
public String minWindow(String S, String T) {
 if (S == null || T == null) {
  return null;
 }
 if (S.length() == 0 && T.length() == 0) {
  return "";
 }
 if (S.length() < T.length()) {
  return "";
 }
 HashMap<Character, Integer> needFind = new HashMap<Character, Integer>();
 HashMap<Character, Integer> alreadyFind = new HashMap<Character, Integer>();
 for (int i = 0; i < T.length(); i++) {
  alreadyFind.put(T.charAt(i), 0);
  if (needFind.containsKey(T.charAt(i))) {
   needFind.put(T.charAt(i), needFind.get(T.charAt(i)) + 1);
  } else {
   needFind.put(T.charAt(i), 1);
  }
 }
 int minStart = -1;
 int minEnd = S.length();
 int start = 0;
 int len = 0;
 for (int i = 0; i < S.length(); i++) {
  if (alreadyFind.containsKey(S.charAt(i))) {
   alreadyFind.put(S.charAt(i), alreadyFind.get(S.charAt(i)) + 1);
   if (alreadyFind.get(S.charAt(i)) <= needFind.get(S.charAt(i))) {
    len++;
   }
   if (len == T.length()) {
    while (!needFind.containsKey(S.charAt(start))
      || alreadyFind.get(S.charAt(start)) > needFind
        .get(S.charAt(start))) {
     if (needFind.containsKey(S.charAt(start))) {
      alreadyFind.put(S.charAt(start),
        alreadyFind.get(S.charAt(start)) - 1);
     }
     start++;
    }
    if (i - start < minEnd - minStart) {
     minStart = start;
     minEnd = i;
    }
   }
  }
 }
 if (minStart == -1) {
  return "";
 }
 return S.substring(minStart, minEnd + 1);
}


References
http://stackoverflow.com/questions/2459653/how-to-find-smallest-substring-which-contains-all-characters-from-a-given-string
http://rleetcode.blogspot.in/2014/01/minimum-window-substring-java.html

0 comments:

Post a Comment