We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
A COMPARISON OF THREE VARIANT CALLING PIPELINES USING SIMULATED DATA.
- Authors
Nguyen Van Tung; Nguyen Thi Kim Lien; Nguyen Huy Hoang
- Abstract
Advances in next generation sequencing allow us to do DNA sequencing rapidly at a relatively low cost. Multiple bioinformatics methods have been developed to identify genomic variants from whole genome or whole exome sequencing data. The development of better variant calling methodologies is limited by the difficulty of assessing the accuracy and completeness of a new method. Normally, computational methods can be benchmarked using simulated data which allows us to generate as much data as desired and under controlled scenarios. In this study, we compared three variant calling pipelines: Samtools/VarScan, Samtools/Bcftools, and Picard/GATK using two simulated datasets. The result showed a significant difference between the three pipelines in two cases. In Chromosome 6 dataset, GATK and Bcftools pipelines detected more than 90% of variants. Meanwhile, only 82.19% of mutations were detected by VarScan. In NA12878 datasets, the result showed GATK pipeline was more sensitive than Bcftools and Varscan pipeline. All pipelines showed a high Positive Predictive Value. Moreover, by a measure of run time, VarScan was the highest pipeline but GATK has an option for multithreading which is a way to make a program run faster. Therefore, GATK is more effective than Bcftools and Varscan to variant calling with a lower coverage dataset.
- Subjects
DNA; BIOINFORMATICS; GENOMICS; DATA; CHROMOSOMES
- Publication
Journal of Biology / TẠp chí Sinh HỌc, 2021, Vol 43, Issue 2, p46
- ISSN
0866-7160
- Publication type
Article
- DOI
10.15625/2615-9023/16006