We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets.
- Authors
Xia Jiang; Binghuang Cai; Diyang Xue; Xinghua Lu; Cooper, Gregory F.; Neapolitan, Richard E.
- Abstract
Objective The objective of this investigation is to evaluate binary prediction methods for predicting disease status using high-dimensional genomic data. The central hypothesis is that the Bayesian network (BN)-based method called efficient Bayesian multivariate classifier (EBMC) will do well at this task because EBMC builds on BN-based methods that have performed well at learning epistatic interactions. Method We evaluate how well eight methods perform binary prediction using high-dimensional discrete genomic datasets containing epistatic interactions. The methods are as follows: naive Bayes (NB), model averaging NB (MANB), feature selection NB (FSNB), EBMC, logistic regression (LR), support vector machines (SVM), Lasso, and extreme learning machines (ELM). We use a hundred 1000-single nucleotide polymorphism (SNP) simulated datasets, ten 10 000-SNP datasets, six semi-synthetic sets, and two real genome-wide association studies (GWAS) datasets in our evaluation. Results In fivefold cross-validation studies, the SVM performed best on the 1000-SNP dataset, while the BN-based methods performed best on the other datasets, with EBMC exhibiting the best overall performance. In-sample testing indicates that LR, SVM, Lasso, ELM, and NB tend to overfit the data. Discussion EBMC performed better than NB when there are several strong predictors, whereas NB performed better when there are many weak predictors. Furthermore, for all BN-based methods, prediction capability did not degrade as the dimension increased. Conclusions Our results support the hypothesis that EBMC performs well at binary outcome prediction using high-dimensional discrete datasets containing epistaticlike interactions. Future research using more GWAS datasets is needed to further investigate the potential of EBMC.
- Subjects
MEDICAL forecasting; CLINICAL trials monitoring; BAYESIAN analysis; EVALUATION of clinical trials; FORECASTING methodology
- Publication
Journal of the American Medical Informatics Association, 2014, Vol 21, Issue e2, pe312
- ISSN
1067-5027
- Publication type
Article
- DOI
10.1136/amiajnl-2013-002358