The project references and/or makes use of information provided by the following papers. All papers without links can be found in their respective journals/publications.
1. Xuehui Li, Tamer Kahveci. A Novel Algorithm for Identifying Low-complexity Regions in a Protein Sequence.
Bioinformatics 22:24 (2006), 2980-2987 (PubMed) (Bioinformatics)
2. Xuehui Li, Tamer Kahveci. Quality-Based Similarity Search for Biological Sequence Databases.
BIOCOMP 2007 (full text)
3. Alb,M. et al. (2002) Detecting cryptically simple protein sequences using the SIMPLE algorithm. Bioinformatics, 18, 672–678.
4. Altschul,S. et al. (1990) Basic Local Alignment Search Tool. JMB, 215, 403–410.
5. Bairoch,A. et al. (2004) Swiss-Prot: juggling between evolution and stability. Brief. Bioinformatics, 1, 39–55.
6. Benson,G. (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res., 27, 573–580.
7. Claverie,J.-M. and States,D. (1993) Information enhancement methods for large scale sequence analysis. Comput. Chemi., 17, 191–201.
8. Dijkstra,E. (1959) A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269–271.
9. Drake,J. (1999) The distribution of rates of spontaneous mutation over viruses, prokaryotes, and eukaryotes. Ann. NY Acad. Sci., 870, 100–107.
10. Gilbert,A.C. (2001) Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. VLDB J., 79–88.
11. Hancock,J. and Simon,M. (2005) Simple sequence repeats in proteins and their potential role in network evolution. Gene, 345, 113–118.
12. Henikoff,S. and Henikoff,J. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA, 10915–10919.
13. Drake,J. et al. (1998) Rates of spontaneous mutation. Genetics, 148, 1667–1686.
14. Kurtz,S. and Schleiermacher,C. (1999) REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics, 15, 426–427.
15. Lanz,R. et al. (1995) A transcriptional repressor obtained by alternative translation of a trinucleotide repeat. Nucleic Acids Res., 23, 138–145.
16. Leadbetter,M.R., Lindgren,G. and Rootzen,H. (1983) Extreme and Related Properties of Random Sequences and Processes, Chapter 1. Springer.
17. Nandi,T. et al. (2003) A novel complexity measure for comparative analysis of protein sequences from complete genomes. J. Biomol. struct. dynam., 20, 657–668.
18. Pevzner,P.A. et al. (2001) An Eulerian path approach to DNA fragment assembly. Proc. Natl Acad. Sci. USA, 98, 9748–9753.
19. Promponas,V. et al. (2000) CAST: an iterative algorithm for the complexity analysis of sequence tracts. Bioinformatics, 16, 915–922.
20. Shannon,C. (1951) Fast incremental maintenance of approximate histograms. Bell Syst. Tech. J., 50–60.
21. Shin,S.W. and Kim,S.M. (2005) A new algorithm for detecting low-complexity regions in protein sequences. Bioinformatics, 21, 160–170.
22. Smith,T. and Waterman,M. (1981) Identification of common molecular subsequences. JMB, 147, 195–197.
23. Wan,H. (2003) Discovering simple regions in biological sequences associate with scoring schemes. JCB, 10, 171–185.
24. Wise,M.J. (2001) 0j.py: a software tool for low complexity proteins and protein domains. Bioinformatics, 17, S288–S295.
25. Wootton,J. and Federhen,S. (1996) Analysis of compositionally biased regions in sequence databases. Methods in Enzymol., 266, 554–571.
26. Wootton,J. (1994) Sequences with ‘unusual’ amino acid compositions. Curr. Opin. Struct. Biol., 4, 413–421.
27. Other papers related to and/or referenced by the papers listed above