We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction.
- Authors
Yao Zhou; Vales, M. Isabel; Aoxue Wang; Zhiwu Zhang
- Abstract
Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that theminimumvalue for prediction accuracy should hover around zero when randomly permuted data sets are analyzed.We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by themathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as themean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using amodified formula.
- Subjects
GENOMICS; STATISTICAL correlation; ACCURACY; PREDICTION models; PHENOTYPES
- Publication
Briefings in Bioinformatics, 2017, Vol 18, Issue 5, p744
- ISSN
1467-5463
- Publication type
Article
- DOI
10.1093/bib/bbw064