Do, Kieu Trinh; Wahl, Simone; Raffler, Johannes; Molnos, Sophie; Laimighofer, Michael; Adamski, Jerzy; Suhre, Karsten; Strauch, Konstantin; Peters, Annette; Gieger, Christian; Langenberg, Claudia; Stewart, Isobel D.; Theis, Fabian J.; Grallert, Harald; Kastenmüller, Gabi; Krumsiek, Jan

doi:10.1007/s11306-018-1420-2

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies.
Authors: Do, Kieu Trinh; Wahl, Simone; Raffler, Johannes; Molnos, Sophie; Laimighofer, Michael; Adamski, Jerzy; Suhre, Karsten; Strauch, Konstantin; Peters, Annette; Gieger, Christian; Langenberg, Claudia; Stewart, Isobel D.; Theis, Fabian J.; Grallert, Harald; Kastenmüller, Gabi; Krumsiek, Jan
Abstract: Background: Untargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample preparation.Methods: We investigated patterns of missing data in an MS-based metabolomics experiment of serum samples from the German KORA F4 cohort (n = 1750). We then evaluated 31 imputation methods in a simulation framework and biologically validated the results by applying all imputation approaches to real metabolomics data. We examined the ability of each method to reconstruct biochemical pathways from data-driven correlation networks, and the ability of the method to increase statistical power while preserving the strength of established metabolic quantitative trait loci.Results: Run day-dependent LOD-based missing data accounts for most missing values in the metabolomics dataset. Although multiple imputation by chained equations performed well in many scenarios, it is computationally and statistically challenging. K-nearest neighbors (KNN) imputation on observations with variable pre-selection showed robust performance across all evaluation schemes and is computationally more tractable.Conclusion: Missing data in untargeted MS-based metabolomics data occur for various reasons. Based on our results, we recommend that KNN-based imputation is performed on observations with variable pre-selection since it showed robust results in all evaluation schemes.
Subjects: MASS spectrometry; DETECTION limit; K-nearest neighbor classification; SIMULATION methods &; models; STATISTICAL correlation
Publication: Metabolomics, 2018, Vol 14, Issue 10, p1
ISSN: 1573-3882
Publication type: Article
DOI: 10.1007/s11306-018-1420-2

We found a match

Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies.

Do, Kieu Trinh; Wahl, Simone; Raffler, Johannes; Molnos, Sophie; Laimighofer, Michael; Adamski, Jerzy; Suhre, Karsten; Strauch, Konstantin; Peters, Annette; Gieger, Christian; Langenberg, Claudia; Stewart, Isobel D.; Theis, Fabian J.; Grallert, Harald; Kastenmüller, Gabi; Krumsiek, Jan

MASS spectrometry; DETECTION limit; K-nearest neighbor classification; SIMULATION methods &; models; STATISTICAL correlation

Metabolomics, 2018, Vol 14, Issue 10, p1

1573-3882

Article

10.1007/s11306-018-1420-2