We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
A self-supervised deep learning method for data-efficient training in genomics.
- Authors
Gündüz, Hüseyin Anil; Binder, Martin; To, Xiao-Yin; Mreches, René; Bischl, Bernd; McHardy, Alice C.; Münch, Philipp C.; Rezaei, Mina
- Abstract
Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models. Self-GenomeNet, a self-supervised learning technique, is trained by predicting unlabeled reverse-complement genome sequences of different lengths and improves the performance of models substantially when a limited amount of labeled data is available.
- Subjects
SUPERVISED learning; DEEP learning; MACHINE learning; GENOMICS; MACHINE performance
- Publication
Communications Biology, 2023, Vol 6, Issue 1, p1
- ISSN
2399-3642
- Publication type
Article
- DOI
10.1038/s42003-023-05310-2