We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Systematic data quality assessment of electronic health record data to evaluate study-specific fitness: Report from the PRESERVE research study.
- Authors
Razzaghi, Hanieh; Goodwin Davies, Amy; Boss, Samuel; Bunnell, H. Timothy; Chen, Yong; Chrischilles, Elizabeth A.; Dickinson, Kimberley; Hanauer, David; Huang, Yungui; Ilunga, K. T. Sandra; Katsoufis, Chryso; Lehmann, Harold; Lemas, Dominick J.; Matthews, Kevin; Mendonca, Eneida A.; Morse, Keith; Ranade, Daksha; Rosenman, Marc; Taylor, Bradley; Walters, Kellie
- Abstract
Study-specific data quality testing is an essential part of minimizing analytic errors, particularly for studies making secondary use of clinical data. We applied a systematic and reproducible approach for study-specific data quality testing to the analysis plan for PRESERVE, a 15-site, EHR-based observational study of chronic kidney disease in children. This approach integrated widely adopted data quality concepts with healthcare-specific evaluation methods. We implemented two rounds of data quality assessment. The first produced high-level evaluation using aggregate results from a distributed query, focused on cohort identification and main analytic requirements. The second focused on extended testing of row-level data centralized for analysis. We systematized reporting and cataloguing of data quality issues, providing institutional teams with prioritized issues for resolution. We tracked improvements and documented anomalous data for consideration during analyses. The checks we developed identified 115 and 157 data quality issues in the two rounds, involving completeness, data model conformance, cross-variable concordance, consistency, and plausibility, extending traditional data quality approaches to address more complex stratification and temporal patterns. Resolution efforts focused on higher priority issues, given finite study resources. In many cases, institutional teams were able to correct data extraction errors or obtain additional data, avoiding exclusion of 2 institutions entirely and resolving 123 other gaps. Other results identified complexities in measures of kidney function, bearing on the study's outcome definition. Where limitations such as these are intrinsic to clinical data, the study team must account for them in conducting analyses. This study rigorously evaluated fitness of data for intended use. The framework is reusable and built on a strong theoretical underpinning. Significant data quality issues that would have otherwise delayed analyses or made data unusable were addressed. This study highlights the need for teams combining subject-matter and informatics expertise to address data quality when working with real world data. Author summary: It is important to understand how the data used in a study meet the needs of analyses, especially when the study is reusing data collected for clinical care, such as electronic health records, and repurposed for analyses, rather than collecting new data. This effort, which we term data quality assessment (DQA), currently varies widely across studies. We describe how a process for systematic data quality assessment that is based on principles from informatics theory and practices from healthcare research can be applied to large, multicenter studies, using a study of chronic kidney disease as an example. We performed DQA in two rounds, first remote and then on the analytic dataset, showing that each approach identifies many data quality issues that can affect study results. Using our results, institutional data teams were able to fix over 120 issues, including 2 sites that would otherwise have needed to drop out of the study. The study team used the results to make final definitions of analytic variables, and can account for other intrinsic data problems through statistical methods. This standardized approach, including design, organized results visualization, and interaction with data providers and statisticians, can be replicated for other clinical studies.
- Subjects
KIDNEY physiology; VITAL signs; PROTEINS; RESEARCH funding; COMPUTER software; CREATININE; DATABASE management; SCIENTIFIC observation; HYPERTENSION; TREATMENT effectiveness; COMPUTER science; RETROSPECTIVE studies; EXPERIMENTAL design; RESEARCH bias; ELECTRONIC health records; INFORMATION science; DATA quality; CHRONIC kidney failure in children; QUALITY assurance; ANTHROPOMETRY; CALIBRATION; RELIABILITY (Personality trait); SENSITIVITY &; specificity (Statistics); EVALUATION
- Publication
PLoS Digital Health, 2024, Vol 3, Issue 6, p1
- ISSN
2767-3170
- Publication type
Article
- DOI
10.1371/journal.pdig.0000527