We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Predicting regional somatic mutation rates using DNA motifs.
- Authors
Liu, Cong; Wang, Zengmiao; Wang, Jun; Liu, Chengyu; Wang, Mengchi; Ngo, Vu; Wang, Wei
- Abstract
How the locus-specificity of epigenetic modifications is regulated remains an unanswered question. A contributing mechanism is that epigenetic enzymes are recruited to specific loci by DNA binding factors recognizing particular sequence motifs (referred to as epi-motifs). Using these motifs to predict biological outputs depending on local epigenetic state such as somatic mutation rates would confirm their functionality. Here, we used DNA motifs including known TF motifs and epi-motifs as a surrogate of epigenetic signals to predict somatic mutation rates in 13 cancers at an average 23kbp resolution. We implemented an interpretable neural network model, called contextual regression, to successfully learn the universal relationship between mutations and DNA motifs, and uncovered motifs that are most impactful on the regional mutation rates such as TP53 and epi-motifs associated with H3K9me3. Furthermore, we identified genomic regions with significantly higher mutation rates than the expected values in each individual tumor and demonstrated that such cancer-related regions can accurately predict cancer types. Interestingly, we found that the same mutation signatures often have different contributions to cancer-related and cancer-independent regions, and we also identified the motifs with the most contribution to each mutation signature. Author summary: Locus-specific epigenetic modifications play critical roles in various biological processes. However, it remains elusive how proteins and their binding motifs regulate such locus-specific epigenetic patterns. A contributing mechanism is that epigenetic enzymes are recruited to specific loci by DNA binding factors recognizing particular sequence motifs (referred to as epi-motifs). Using these motifs to predict biological outputs depending on local epigenetic state such as somatic mutation rates would confirm their functionality. Here, we developed an interpretable neural network model using contextual regression (CR) to predict somatic mutation rates at kilobase resolution using DNA motifs in 13 diverse cancers and identified the most informative motifs particularly epi-motifs. Furthermore, we showed that the genomic regions with significantly higher mutation rates than the predicted values can be used for cancer classification, thus facilitating discovery of the underlying mechanisms. Importantly, this study provides candidate motifs and TFs for the investigation of new mechanisms and the trained CR model is readily applicable to new cancers and identifying cancer-related regions. The CR model can also be applied to other biological questions, such as predicting histone modification using DNA sequences.
- Subjects
SOMATIC mutation; DNA; EPIGENETICS; TUMOR classification; CARRIER proteins
- Publication
PLoS Computational Biology, 2023, Vol 19, Issue 10, p1
- ISSN
1553-734X
- Publication type
Article
- DOI
10.1371/journal.pcbi.1011536