We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks.
- Authors
Chao Pang; van Enckevort, David; de Haan, Mark; Kelpin, Fleur; Jetten, Jonathan; Hendriksen, Dennis; de Boer, Tommy; Charbon, Bart; Winder, Erwin; van der Velde, K. Joeri; Doiron, Dany; Fortier, Isabel; Hillege, Hans; Swertz, Morris A.
- Abstract
Motivation: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. Results: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data.
- Subjects
BIOBANKS; BIOLOGICAL specimens; ACQUISITION of data; ONTOLOGY; PHENOTYPES; GENETICS
- Publication
Bioinformatics, 2016, Vol 32, Issue 14, p2176
- ISSN
1367-4803
- Publication type
Article
- DOI
10.1093/bioinformatics/btw155