We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
DNACLUST: accurate and efficient clustering of phylogenetic marker genes.
- Authors
Ghodsi, Mohammadreza; Liu, Bo; Pop, Mihai
- Abstract
Clustering is a fundamental operation in the analysis of biological sequence data. New DNA sequencing technologies have dramatically increased the rate at which we can generate data, resulting in datasets that cannot be efficiently analyzed by traditional clustering methods.This is particularly true in the context of taxonomic profiling of microbial communities through direct sequencing of phylogenetic markers (e.g. 16S rRNA) - the domain that motivated the work described in this paper. Many analysis approaches rely on an initial clustering step aimed at identifying sequences that belong to the same operational taxonomic unit (OTU). When defining OTUs (which have no universally accepted definition), scientists must balance a trade-off between computational efficiency and biological accuracy, as accurately estimating an environment's phylogenetic composition requires computationally-intensive analyses. We propose that efficient and mathematically well defined clustering methods can benefit existing taxonomic profiling approaches in two ways: (i) the resulting clusters can be substituted for OTUs in certain applications; and (ii) the clustering effectively reduces the size of the data-sets that need to be analyzed by complex phylogenetic pipelines (e.g., only one sequence per cluster needs to be provided to downstream analyses).
- Publication
BMC bioinformatics, 2011, Vol 12, p271
- ISSN
1471-2105
- Publication type
Journal Article
- DOI
10.1186/1471-2105-12-271