Hirschberg's algorithm, named after its inventor, Dan Hirschberg, is a dynamic programming algorithm that finds the least cost sequence alignment between two strings, where cost is measured as Levenshtein edit distance, defined to be the sum of the costs of insertions, replacements, deletions, and null actions needed to change one string to the other. Hirschberg's algorithm is simply described as a divide and conquer version of the NeedlemanWunsch algorithm.
The NeedlemanWunsch algorithm finds an optimal alignment in O(nm) time, using O(nm) space. Hirschberg's algorithm is a clever modification of the NeedlemanWunsch Algorithm which still takes O(nm) time, but needs only O(min{m,n}) space.
Understanding algorithm
To understand Hirschberg's algorithm, it is important to first understand that edit distances can be computed using linear space.
What we call the "forward subprogram" computes the values of Edit(Pref[x,i],Pref[y,j]) for all i and j, just as the NeedlemanWunsch and returns the array {Edit(x,Pref[y,j])}0 ≤ j ≤ m. The "backward subprogram" is similar, except that the dynamic program is done in the opposite direction, i.e., starting from the right ends of the strings. It returns the array {Edit(x,Suff[y,j])}0 ≤ j ≤ m, where Suff[y,j] is the suffix of y of length j.
The forward and backward subprograms compute values in the matrix by using previously computed values, but save space by erasing values that will no longer be needed during that execution of the subprogram. Unfortunately, the erased values will need to be recomputed later; thus, Hirschberg's algorithm takes roughly twice as much time as NeedlemanWunsch.
Pseudocode
Forwards[x,y]
Backwards[x,y]
//High level description of Hirschberg's algorithm:
Hirschberg(x,y)
The NeedlemanWunsch algorithm finds an optimal alignment in O(nm) time, using O(nm) space. Hirschberg's algorithm is a clever modification of the NeedlemanWunsch Algorithm which still takes O(nm) time, but needs only O(min{m,n}) space.
 The edit distance Edit(x,y) is the least cost of changing x into y by using the operations Insert, Substitute, and Delete, where the cost of each kind of operation is given.
 We write Ins(a) for the cost of inserting a symbol a into a string, Sub(a,b) for the cost of substituting a symbol b for a in a string, and Del(a) for the cost of deleting a symbol a from a string.
 Pref[x,i] denotes the prefix of x of length i.
Understanding algorithm
To understand Hirschberg's algorithm, it is important to first understand that edit distances can be computed using linear space.
What we call the "forward subprogram" computes the values of Edit(Pref[x,i],Pref[y,j]) for all i and j, just as the NeedlemanWunsch and returns the array {Edit(x,Pref[y,j])}0 ≤ j ≤ m. The "backward subprogram" is similar, except that the dynamic program is done in the opposite direction, i.e., starting from the right ends of the strings. It returns the array {Edit(x,Suff[y,j])}0 ≤ j ≤ m, where Suff[y,j] is the suffix of y of length j.
The forward and backward subprograms compute values in the matrix by using previously computed values, but save space by erasing values that will no longer be needed during that execution of the subprogram. Unfortunately, the erased values will need to be recomputed later; thus, Hirschberg's algorithm takes roughly twice as much time as NeedlemanWunsch.
Pseudocode
Forwards[x,y]

Forwards[x,y] is

1. n = x; m = y

2. For all i from 1 to n:

Edit[Pref[x,i],Pref[y,0]] = 0

3. For all j from 1 to m:

A. Edit[Pref[x,0],Pref[y,j]] = Edit[Pref[x,0],Pref[y,j1]] + Ins(y_j)

B. For all i from 1 to n, execute the following steps:

i. Edit[Pref[x,i],Pref[y,j]] =

min{Edit[Pref[x,i1],Pref[y,j]] + Del(x_i),

Edit[Pref[x,i1],Pref[y,j1]] + Sub(x_i,y_j),

Edit[Pref[x,i],Pref[y,j1]] + Ins(y_j)}

ii. Erase Edit[Pref[x,i1],Pref[y,j1]]

C. Erase Edit[Pref[x,i1],Pref[y,j]]

4. RETURN Edit[x] %% an array of length m+1
Backwards[x,y]

1. n = x; m = y

2. For all i from 1 to n:

Edit[Suff[x,i],Suff[y,0]] = 0

3. For all j from 1 to m:

A. Edit[Suff[x,0],Suff[y,j]] = Edit[Suff[x,n],Suff[y,j1]] + Ins(y_{mj+1})

B. For all i from 1 to n:

i. Edit[Suff[x,i],Suff[y,j]] =

min{Edit[Suff[x,i1],Suff[y,j]] + Del(x_{ni1}),

Edit[Suff[x,i1],Suff[y,j1]] + Sub(x_{ni1},y_{mj+1}),

Edit[Suff[x,i],Suff[y,j1]] + Ins(y_{mj+1})}

ii. Erase Edit[Suff[x,i1],Suff[y,j1]]

C. Erase Edit[Suff[x,i1],Suff[y,j]]

4. RETURN Edit[x] %% an array of length m+1
//High level description of Hirschberg's algorithm:
Hirschberg(x,y)

1. n = x; m = y

2. If n <= 1 or m <= 1:

OUTPUT Alignment(x,y) using NeedlemanWunsch.

Else:

A. mid = floor(n/2)

B. x^left = Pref[x,mid]

C. x^right = Suff[x,nmid]

D. Edit[x^left] = Forwards(x^left,y) %% an array of length m+1

E. Edit[x^right] = Backwards(x^right,y) %% an array of length m+1

F. cut = ArgMin{Edit[x^left,Pref[y,j]] + Edit[x^right,Suff[y,mj]]} %% j ranges from 1 to m1

G. Hirschberg(x^left,Pref[y,cut])

H. Hirschberg(x^right,Suff[y,mcut])
0 comments:
Post a Comment