Authors
Niclas Thomas,
James Heather,
Wilfred Ndifon,
Benjamin Chain,
Benjamin Chain,
Publication date
2013
Publisher
Oxford University Press
Total citations
Description
Summary: High-throughput sequencing provides an opportunity to analyse the repertoire of antigen-specific receptors with an unprecedented breadth and depth. However, the quantity of raw data produced by this technology requires efficient ways to categorize and store the output for subsequent analysis. To this end, we have defined a simple five-item identifier that uniquely and unambiguously defines each TcR sequence. We then describe a novel application of finite-state automaton to map Illumina short-read sequence data for individual TcRs to their respective identifier. An extension of the standard algorithm is also described, which allows for the presence of single-base pair mismatches arising from sequencing error. The software package, named Decombinator, is tested first on a set of artificial in silico sequences and then on a set of published human TcR-β sequences. Decombinator assigned …