We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Reducing patient re-identification risk for laboratory results within research datasets.
- Authors
Atreya, Ravi V.; Smith, Joshua C.; McCoy, Allison B.; Malin, Bradley; Miller, Randolph A.
- Abstract
Objective To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation. Materials and methods In our threat model, an attacker obtains 5-7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor deidentified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied--simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed. Results Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%). Discussion and conclusion With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.
- Subjects
RIGHT of privacy &; medical records; MEDICAL databases; CLINICAL pathology; ELECTRONIC health records; MEDICAL record access control; MEDICAL informatics; MANAGEMENT; SECURITY systems
- Publication
Journal of the American Medical Informatics Association, 2013, Vol 20, Issue 1, p95
- ISSN
1067-5027
- Publication type
Article
- DOI
10.1136/amiajnl-2012-001026