We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Missing values in deduplication of electronic patient data.
- Authors
Sariyar M; Borg A; Pommerening K; Sariyar, M; Borg, A; Pommerening, K
- Abstract
<bold>Introduction: </bold>Systematic approaches to dealing with missing values in record linkage are still lacking. This article compares the ad-hoc treatment of unknown comparison values as 'unequal' with other and more sophisticated approaches. An empirical evaluation was conducted of the methods on real-world data as well as on simulated data based on them.<bold>Material and Methods: </bold>Cancer registry data and artificial data with increased numbers of missing values in a relevant variable are used for empirical comparisons. As a classification method, classification and regression trees were used. On the resulting binary comparison patterns, the following strategies for dealing with missingness are considered: imputation with unique values, sample-based imputation, reduced-model classification and complete-case induction. These approaches are evaluated according to the number of training data needed for induction and the F-scores achieved.<bold>Results: </bold>The evaluations reveal that unique value imputation leads to the best results. Imputation with zero is preferred to imputation with 0.5, although the latter shows the highest median F-scores. Imputation with zero needs considerably less training data, it shows only slightly worse results and simplifies the computation by maintaining the binary structure of the data.<bold>Conclusions: </bold>The results support the ad-hoc solution for missing values 'replace NA by the value of inequality'. This conclusion is based on a limited amount of data and on a specific deduplication method. Nevertheless, the authors are confident that their results should be confirmed by other empirical analyses and applications.
- Publication
Journal of the American Medical Informatics Association, 2012, Vol 19, Issue e1, pe76
- ISSN
1067-5027
- Publication type
journal article