We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Enhanced Index Based DNA Sequence Compression Algorithm.
- Authors
Gurunathan, Arunachalaprabu; Moideen, Fathima Bibi Kaja
- Abstract
Biological data analyses involve researchers from several fields. To store and manipulate the huge volume of biological data obtained from different aspects is difficult. Compression algorithms considerably increase the storage medium's capacity while reducing the number of bits required representing the sequence. The core concept behind EIBDNASCA involves creating an optimized index file, which stores the non-repetitive bases (run length less than 8). This index file plays a crucial role in swiftly retrieving and reconstructing specific segments of the DNA sequence during the decompression process. In addition to the index file, EIBDNASCA incorporates a work file, which stores the repetitive bases (run length above 8) and represented in binary form. This work file allows the algorithm to perform various pre-processing and transformation tasks on the DNA sequence before generating the final compressed output. Finally, an enhanced Huffman coding technique is applied to the symbols present in the index file, optimizing the encoding process for more efficient compression. The proposed algorithm is examined using a variety of different GenBank database sources. Compression ratio, compression gain, and time required to compress and decompress the sequences are the metrics used to assess the performance. The experimental findings indicate that EIBDNASCA attains an average compression ratio of 1.23 bpb (bits per base) with an average compression gain of 84.52%. The average compression time is recorded at 0.590 seconds, and decompression is completed in 0.625 seconds.
- Subjects
NUCLEOTIDE sequence; DNA sequencing; IMAGE compression; HUFFMAN codes; ALGORITHMS; DATABASES
- Publication
International Journal of Intelligent Engineering & Systems, 2024, Vol 17, Issue 1, p108
- ISSN
2185-310X
- Publication type
Article
- DOI
10.22266/ijies2024.0229.11