Chakravarty, Nidhi; Dua, Mohit

doi:10.1007/s10772-024-10093-w

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection.
Authors: Chakravarty, Nidhi; Dua, Mohit
Abstract: With the increasing adoption of voice-based authentication systems, the threat of audio spoofing attacks has become a significant concern. These attacks aim to deceive voice authentication systems by manipulating or impersonating audio signals. To improve the audios security, we have introduced a spectrogram-based solution. Spectrograms, known for their effectiveness in audio analysis and feature extraction, offer valuable insights into combating audio spoofing. Our proposed model is divided into two parts that is frontend and backend. For implementing the frontend, our proposed model extensively investigates the utility of Mel Spectrogram, Gammatone Cepstral Coefficients Spectrogram (GTCC), Acoustic Ternary Pattern Spectrogram (ATP), and Mel-Frequency Cepstral Coefficients Spectrogram (MFCC). For backend implementation, two deep learning models that are Convolutional Neural Network (CNN) and Residual Network (ResNet50) have been leveraged individually with these four spectrograms. The effectiveness of the proposed system is validated through successful experimentation on the ASV Spoof 2019 Logical Access (LA), Physical Access (PA) evaluation datasets and our own Voice Impersonation Corpus in Hindi Language (VIHL) dataset. The outcome demonstrates that the proposed combination of GTCC spectrograms and ResNet50 outperforms all other proposed combinations by achieving Equal Error Rate (EER) of 0.6%, 1.15%, 4.3% for LA, PA and VIHL, respectively.
Subjects: CONVOLUTIONAL neural networks; FEATURE extraction; SPECTROGRAMS; DEEP learning; HINDI language
Publication: International Journal of Speech Technology, 2024, Vol 27, Issue 1, p225
ISSN: 1381-2416
Publication type: Article
DOI: 10.1007/s10772-024-10093-w

We found a match

Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection.

Chakravarty, Nidhi; Dua, Mohit

CONVOLUTIONAL neural networks; FEATURE extraction; SPECTROGRAMS; DEEP learning; HINDI language

International Journal of Speech Technology, 2024, Vol 27, Issue 1, p225

1381-2416

Article

10.1007/s10772-024-10093-w