We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Recognition of cyanobacteria promoters via Siamese network-based contrastive learning under novel non-promoter generation.
- Authors
Yang, Guang; Li, Jianing; Hu, Jinlu; Shi, Jian-Yu
- Abstract
It is a vital step to recognize cyanobacteria promoters on a genome-wide scale. Computational methods are promising to assist in difficult biological identification. When building recognition models, these methods rely on non-promoter generation to cope with the lack of real non-promoters. Nevertheless, the factitious significant difference between promoters and non-promoters causes over-optimistic prediction. Moreover, designed for E. coli or B. subtilis , existing methods cannot uncover novel, distinct motifs among cyanobacterial promoters. To address these issues, this work first proposes a novel non-promoter generation strategy called phantom sampling, which can eliminate the factitious difference between promoters and generated non-promoters. Furthermore, it elaborates a novel promoter prediction model based on the Siamese network (SiamProm), which can amplify the hidden difference between promoters and non-promoters through a joint characterization of global associations, upstream and downstream contexts, and neighboring associations w.r.t. k-mer tokens. The comparison with state-of-the-art methods demonstrates the superiority of our phantom sampling and SiamProm. Both comprehensive ablation studies and feature space illustrations also validate the effectiveness of the Siamese network and its components. More importantly, SiamProm, upon our phantom sampling, finds a novel cyanobacterial promoter motif ('GCGATCGC'), which is palindrome-patterned, content-conserved, but position-shifted.
- Subjects
ESCHERICHIA coli; PREDICTION models
- Publication
Briefings in Bioinformatics, 2024, Vol 25, Issue 3, p1
- ISSN
1467-5463
- Publication type
Article
- DOI
10.1093/bib/bbae193