Friday, January 6, 2012

2 Sum Problem : Given an integer array and a number T, find all unique pairs of (a, b) whose sum is equal to T

You are given an array of n integers and a target sum T. The goal is to determine whether or not there are two numbers x,y in A with x+y=T.

Example : Suppose we have an int array = {5, 3, 7, 0, 1, 4, 2} and T = 5. The unique pairs that sum up to 5 are (5, 0) (3, 2) and (1, 4).

There are three approaches to solve this problem - 1) brute force, 2) sort the array and use binary and search, and 3) Using the hashtable. 
Please scroll down, if you are only interested in the best approach i.e. approach 3 using hashtables.

Approach 1 : Brute force method
The first approach is to use brute force which gives time complexity of O(n^2) and space complexity of O(n). We basically just have to loop through the array and for each number we add it to each of other numbers and see if they sum up to 5. If so, we print out the pair. Here is the code for brute force approach:
void findPairOfSum(int arrayOfNum[], int arraySize, int sum)
{
  for (int i = 0; i < arraySize; i++)
  {
    for (int j = i; j < arraySize; j++)
    {
      if (arrayOfNum[i] + arrayOfNum[j] == sum)
        cout << "(" << arrayOfNum[i] << ", " << arrayOfNum[j] << ")" << endl;
    }
  }
} 
The second approach is to use a hash table to store the difference between sum and each of the elements in the array. Then as we loop through the array, check each element against the hash table. If the element presents, then print out the key value pair. For example, if we hash the example array we'll have this hash table:
Key   Value
5        5 - 5 = 0
3        5 - 3 = 2
7        5 - 7 = -2
0        5 - 0 = 5
1        5 - 1 = 4
4        5 - 4 = 1
2        5 - 3 = 2

This approach will have the time complexity of O(n) and space complexity of O(n). Thus, chose your method wisely, depending on your need (speed or space efficiency).

2nd Approach - Use sorted array
A better way would be to sort the array. This takes O(n log n)
Then for each x in array A, use binary search to look for T-x. This will take O(nlogn).
So, overall search is  O(n log n)

Hers is how the pseudo code will look:
arr = {};//some array
sortedArr = sort(arr);
for( i = 0;i < arr.length - 1; i++)
{
   x = arr[i];
   bool found = binarySearch(sortedArr, T-x);//Search for T-x in sorted Arrary
   if(found)
    print "pair", x, T-x;
}

Approach 2b - Using sorting but using variant of binary search
Assuming array sorted in ascending order. Now we take first and last element, and sum them up. If sum is equal to T, we have found the pair, but if sum is greater than T, we reduce right pointer by 1, or increment left pointer otherwise.

arr = {};//some array
sortedArr = sort(arr);
left = start;
right= arr.length;
while(left < right)
{
   x = arr[left];
   y = arr[right];
   sum = x+y;
   if(sum == T)
      found=true;
   if(sum > T)
      right--;
   if(sum < T)
      left++;  
   
   if(found)
    print "pair", x, T-x;
}

3rd and Best - Use hash table
I have already mentioned in problem in the application of hash table here.
The best way would be to insert every element into a hash table (without sorting). This takes O(n) as constant time insertion.
Then for every x, we can just look up its complement, T-x, which is O(1).
Overall it takes will be O(n).

Here is how the pseudocode will look.
Let arr be the given array.
And T be the given sum

for (i=0 i<arr.length - 1 ;i++)
{
  hash(arr[i]) = i  // key is the element and value is its index.
}

for (i=0 i<arr.length - 1; i++)
{
  if (hash(T - arr[i]) != i ) // if T - ele exists and is different we found a pair
    print "pair i , hash(T - arr[i]) has sum T"
  
}


Please let us know if you know any other approach. Thanks.

Reactions:

15 comments:

  1. how to build the hash table in the approach 3. also i think you should check if hash(T-arr[i]) is empty first.

    ReplyDelete
    Replies
    1. Hi Jianchen, I think we can use hashmap or we can use element as an index in the array. I think only way hash() is empty when the given array is null or has no element. In that case we will not enter into any for loop.

      -Kinshuk

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. What if there are duplicates in the array? We should keep track of the number of occurrences of each number and update as we go.

    ReplyDelete
    Replies
    1. Still it work, as we have to just find the sum. If 1 element is in the hashmap, and say other element repeats, we won't care as their sum will not be equal to given sum T.

      Delete
  4. I don't want to find unique pairs,I want all the numbers that form the sum.
    eg: array = {5, 6, 7, 4, 1, 3, 2, 8, 9, 10} and T = 15
    result:
    1+9+5=15
    5+10=15
    6+4+5=15
    2+8+5=15
    3+2+6+4=15 ... etc

    ReplyDelete
    Replies
    1. Hi Vikram, this is a different problem all together. Please refer here for the solution - http://codereview.stackexchange.com/questions/56270/list-all-possible-numbers-from-a-given-array-that-sums-up-to-a-given-number.

      Delete
  5. The hashmap one is a fucking genius. Got excited a bit :x

    ReplyDelete
  6. The hashmap solution will double print many pairs. How do you handle that?

    ReplyDelete
    Replies
    1. Hi Anonymous, we will break out of the loop as soon as find the first pair :). Thanks

      Delete
  7. How to handle if negative & positive numbers exist and T(positive)?

    ReplyDelete
    Replies
    1. This problem will not occur if you use hashmap, instead of array. But here is a solution via array approach.
      Then we can create 2 arrays, hashN and hashP for negative and positive numbers respectively. While inserting, we can have something like hashN[absolute(arr[i])] = arr[i], and TP will fill normally, without absolute i.e. hashP[arr[i]] = arr[i]. Now iterate over the array arr again, and if (arr[i] > T) , then you need negative number, so refer hashN[T-arr[i]], if it exists then you got the number otherwise its a problem. Please let me know if I wasn't clear enough.

      - Kinshuk

      Delete
  8. hi, i think the last solution also takes O(n^2) you are making hashtable.get operation inside for loop hash table search also has worst case of of O(n) for search operation... so overall it will be O(n^2)

    ReplyDelete
    Replies
    1. Hi Janardan, creation of hashtable takes O[n] time. Getting a single element in hashtable takes O(1) times, hence as we are in the loop, that takes another O(n) time, hence O(n) as whole. Please let me know if that works for you.

      Delete