Authors
Juho Rousu,
John Shawe-Taylor,
Publication date
2004
Publisher
Total citations
Description
We present a sparse dynamic programming algorithm that, given two strings s, t, a gap penalty λ, and an integer p, computes the value of the gap-weighted length-p subsequences kernel. The algorithm works in time O (p| M| log min (| s|,| t|)), where M={(i, j)| si= tj} is the set of matches of characters in the two sequences. The new algorithm is empirically evaluated against a full dynamic programming approach and a trie-based algorithm on synthetic data. Based on the experiments, the full dynamic programming approach is the fastest on short strings, and on long strings if the alphabet is small. On large alphabets, the new sparse dynamic programming algorithm is the most efficient. On medium-sized alphabets the trie-based approach is best if the maximum number of allowed gaps is strongly restricted.