We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
qc3C: Reference-free quality control for Hi-C sequencing data.
- Authors
DeMaere, Matthew Z.; Darling, Aaron E.
- Abstract
Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have—thus far—relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods. Author summary: The Hi-C sequencing technique offers the potential for significant scientific insight about the spatial arrangement of DNA, however achieving such outcomes is highly dependent on the quality of the resulting sequencing library. Unlike conventional next-gen sequencing, only a fraction of a given Hi-C library contains this useful spatial information (the signal) with the remainder being effectively noise. As Hi-C remains a challenging laboratory technique, signal strength of resulting libraries can vary greatly. As a quality metric, the quantification a library's signal content is an essential asset in any quality mitigation strategy. Quality assessment of Hi-C data has until now relied on access to a (ideally refined) reference sequence, by which indirect indicators of quality are determined. Here we describe qc3C, a software tool capable of the direct, reference-free estimation of the signal content of a Hi-C library. In doing so, not only can researchers make informed decisions on how to progress based on library information content, but eliminating the reference also enables Hi-C quality management for non-model organism and metagenomics researchers.
- Subjects
NUCLEOTIDE sequencing; PROBLEM solving; SPATIAL arrangement; TOTAL quality management; SOFTWARE development tools
- Publication
PLoS Computational Biology, 2021, Vol 17, Issue 10, p1
- ISSN
1553-734X
- Publication type
Article
- DOI
10.1371/journal.pcbi.1008839