We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection.
- Authors
Chakravarty, Nidhi; Dua, Mohit
- Abstract
With the increasing adoption of voice-based authentication systems, the threat of audio spoofing attacks has become a significant concern. These attacks aim to deceive voice authentication systems by manipulating or impersonating audio signals. To improve the audios security, we have introduced a spectrogram-based solution. Spectrograms, known for their effectiveness in audio analysis and feature extraction, offer valuable insights into combating audio spoofing. Our proposed model is divided into two parts that is frontend and backend. For implementing the frontend, our proposed model extensively investigates the utility of Mel Spectrogram, Gammatone Cepstral Coefficients Spectrogram (GTCC), Acoustic Ternary Pattern Spectrogram (ATP), and Mel-Frequency Cepstral Coefficients Spectrogram (MFCC). For backend implementation, two deep learning models that are Convolutional Neural Network (CNN) and Residual Network (ResNet50) have been leveraged individually with these four spectrograms. The effectiveness of the proposed system is validated through successful experimentation on the ASV Spoof 2019 Logical Access (LA), Physical Access (PA) evaluation datasets and our own Voice Impersonation Corpus in Hindi Language (VIHL) dataset. The outcome demonstrates that the proposed combination of GTCC spectrograms and ResNet50 outperforms all other proposed combinations by achieving Equal Error Rate (EER) of 0.6%, 1.15%, 4.3% for LA, PA and VIHL, respectively.
- Subjects
CONVOLUTIONAL neural networks; FEATURE extraction; SPECTROGRAMS; DEEP learning; HINDI language
- Publication
International Journal of Speech Technology, 2024, Vol 27, Issue 1, p225
- ISSN
1381-2416
- Publication type
Article
- DOI
10.1007/s10772-024-10093-w