We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Precious2GPT: the combination of multiomics pretrained transformer and conditional diffusion for artificial multi-omics multi-species multi-tissue sample generation.
- Authors
Sidorenko, Denis; Pushkov, Stefan; Sakip, Akhmed; Leung, Geoffrey Ho Duen; Lok, Sarah Wing Yan; Urban, Anatoly; Zagirova, Diana; Veviorskiy, Alexander; Tihonova, Nina; Kalashnikov, Aleksandr; Kozlova, Ekaterina; Naumov, Vladimir; Pun, Frank W.; Aliper, Alex; Ren, Feng; Zhavoronkov, Alex
- Abstract
Synthetic data generation in omics mimics real-world biological data, providing alternatives for training and evaluation of genomic analysis tools, controlling differential expression, and exploring data architecture. We previously developed Precious1GPT, a multimodal transformer trained on transcriptomic and methylation data, along with metadata, for predicting biological age and identifying dual-purpose therapeutic targets potentially implicated in aging and age-associated diseases. In this study, we introduce Precious2GPT, a multimodal architecture that integrates Conditional Diffusion (CDiffusion) and decoder-only Multi-omics Pretrained Transformer (MoPT) models trained on gene expression and DNA methylation data. Precious2GPT excels in synthetic data generation, outperforming Conditional Generative Adversarial Networks (CGANs), CDiffusion, and MoPT. We demonstrate that Precious2GPT is capable of generating representative synthetic data that captures tissue- and age-specific information from real transcriptomics and methylomics data. Notably, Precious2GPT surpasses other models in age prediction accuracy using the generated data, and it can generate data beyond 120 years of age. Furthermore, we showcase the potential of using this model in identifying gene signatures and potential therapeutic targets in a colorectal cancer case study.
- Subjects
COMPUTER simulation; DATABASE management; TISSUES; DATA analysis; COMPUTER software; MULTIOMICS; RESEARCH evaluation; BENCHMARKING (Management); PROBABILITY theory; COLORECTAL cancer; AGE distribution; MANN Whitney U Test; META-analysis; GENE expression; DNA methylation; CELL lines; GENES; DEEP learning; ANIMAL experimentation; GENE expression profiling; AGING; STATISTICS; ARTIFICIAL neural networks; COLLECTION &; preservation of biological specimens; GENETIC mutation; DIGITAL image processing; REGRESSION analysis
- Publication
NPJ Aging, 2024, Vol 10, Issue 1, p1
- ISSN
2731-6068
- Publication type
Article
- DOI
10.1038/s41514-024-00163-3