Graph Based Algorithm (GBA)

Proposed algorithm improvements

November 30, 2007 · Leave a Comment

Occasionally, GBA yields LCRs that appear fragmented. This flaw has been attributed to some lack of optimization in the steps the algorithm performs to find the LCRs in a sequence. Two improvements to two separate steps in the GBA algorithm have been proposed to remedy the situation.

First, note the four step process GBA uses to determine LCRs:

  1. Construct a graph
  2. Find the longest path
  3. Extend longest path intervals
  4. Post-process extended intervals

More details on each of these steps may be found in Xuehui Li’s paper listed in the Publications section.

The first proposed improvement pertains to the first step in the GBA algorithm. In the current set-up, a vertex is created whenever two letters in a sequence are the same as, or similar to, each other. Edges are inserted between two vertices if the combination of the first letters from each vertex and the combination of the second letters from each vertex form a repeat. The improvement to this process defines each vertex as a k-gram sequence similarity rather than just a 2-letter similarity.

The second proposed improvement takes aim at finding the longest path. Rather than use a purely greedy method for jumping from a source node to a destination (by traveling along the longest edge), the improvement suggests an N-lookahead before picking each edge on the path. So, for each node the improved algorithm will calculate which route yields the longest path N nodes down the line. It chooses the first edge on this path to follow, then looks ahead again. Note that while this method will add a certain amount of foresight when approximating the longest path, it will also add complexity that increases exponentially with the value of N.

Categories: Uncategorized

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment