Saturday, December 31, 2011

Find anagrams in the array of Strings

Question: You are given an array of strings and you are asked to display all the anagrams within the array. For those who don’t know, two words are anagrams if they contain the same characters.

We have already discussed how to check if 2 strings are anagrams here.

Answer: The brute force solution would be to take each string and compare it to every other string in the array one character at a time ( O(n^4) solution, assuming there are n strings with maximum n characters each). This will surely give you the answer but almost certainly not the job unless you improve it.

Let’s use this example to improve on the solution {“abc”, “albert”, “cat”, “gate”, “cab”, “grow”, “act”}

One way we can quicken the string comparison process is by pre-sorting each string alphabetically (any sorting would be fine actually). Pre-sorting helps you avoid comparing each character in a string to each character in another string. You only need to compare every character to the character in the corresponding position of the other string. This reduces the order of the solution ( O(n^2 log n (sorting n strings with n characters each) + n^3 (n^2 comparisons of n characters each)) = O(n^3)) assuming the sorting algorithm is O(n log n)). However, this increases memory usage since you now need to store the original strings so you can print them out later.

After sorting the characters in the string, the array would look like this {“abc”, “abelrt”, “act”, “aegt”, “abc”, “gorw”, “act”}.

Currently, we are comparing each string with every other string. That seems inefficient. How about in addition to sorting the characters in each string, we sort each string in the array alphabetically (again any other sorting will do). Sorting the strings in the array means you do not have to compare each string to every other string, you only have to compare it to the next string in line. If they happen to be the same (i.e. found an anagram), then you can compare with the one after that. Whichever the case is, the order of comparison in this case will be of the order O(n) instead of O(n^2). This sorting will reduce the order to ( O(n^2 log n (sorting characters) + n^2 log n (sorting strings) + n^2 (n comparisons of n characters each)) = O(n^2 log n) assuming the sorting algorithm is O(n log n)). Make sure you also move the strings from the original array in order to maintain the mapping of the sorted array to the original array.

After sorting the strings, the array would look like {“abc”, “abc”, ”abelrt”, “act”, “act”, “aegt”, “gorw”}.

There is an alternate solution using hashing. Put each string in a hash table ( O(n) ). If the spot in the hash table is already taken, then that is an anagram. The order for this solution is ( n^2 (hashing n strings) + n (n insertions into hash table) = O(n^2 ) assuming hashing each string is O(n) ).


0 comments:

Post a Comment