We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
A new statistic for efficient detection of repetitive sequences.
- Authors
Chen, Sijie; Chen, Yixin; Sun, Fengzhu; Waterman, Michael S; Zhang, Xuegong
- Abstract
Motivation Detecting sequences containing repetitive regions is a basic bioinformatics task with many applications. Several methods have been developed for various types of repeat detection tasks. An efficient generic method for detecting most types of repetitive sequences is still desirable. Inspired by the excellent properties and successful applications of the D 2 family of statistics in comparative analyses of genomic sequences, we developed a new statistic D 2 R that can efficiently discriminate sequences with or without repetitive regions. Results Using the statistic, we developed an algorithm of linear time and space complexity for detecting most types of repetitive sequences in multiple scenarios, including finding candidate clustered regularly interspaced short palindromic repeats regions from bacterial genomic or metagenomics sequences. Simulation and real data experiments show that the method works well on both assembled sequences and unassembled short reads. Availability and implementation The codes are available at https://github.com/XuegongLab/D2R%5fcodes under GPL 3.0 license. Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects
VECTOR spaces; SPACETIME; SEQUENCE analysis; COMPARATIVE studies
- Publication
Bioinformatics, 2019, Vol 35, Issue 22, p4596
- ISSN
1367-4803
- Publication type
Article
- DOI
10.1093/bioinformatics/btz262