We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Closed-set speaker identification using VQ and GMM based models.
- Authors
Barai, Bidhan; Chakraborty, Tapas; Das, Nibaran; Basu, Subhadip; Nasipuri, Mita
- Abstract
An array of features and methods are being developed over the past six decades for Speaker Identification (SI) and Speaker Verification (SV), jointly known as Speaker Recognition(SR). Mel Frequency Cepstral Coefficients (MFCC) is generally used as feature vectors in most of the cases because it gives higher accuracy compared to other features. The presented paper focuses on comparative study of state-of-the-art SR techniques along with their design challenges, robustness issues and performance evaluation methods. Rigorous experiments have been performed using Gaussian Mixture Model (GMM) with variations like Universal Background Model (UBM) and/or Vector Quantization (VQ) and/or VQ based UBM-GMM (VQ-UBM-GMM) with detail discussion. Other popular methods have been included, namely, Linear Discriminate Analysis (LDA), Probabilistic LDA (PLDA), Gaussian PLDA (GPLDA), Multi-condition GPLDA (MGPLDA), Identity Vector (i-vector) for comparative study only. Three popular audio data-sets have been used in the experiments, namely, IITG-MV SR, Hyke-2011 and ELSDSR. Hyke-2011 and ELSDSR contain clean speech while IITG-MV SR contains noisy audio data with variations in channel (device), environment, spoken style. We propose a new data mixing approach for SR to make the system independent of recording device, spoken style and environment. The accuracy we obtained for VQ and GMM based methods for databases, Hyke-2011 and ELSDSR are varies from 99.6 % to 100 % whereas accuracy for IITG-MV SR is upto 98 % . Indeed, in some cases the accuracies degrade drastically due to mismatch between training and testing data as well as singularity problem of GMM. The experimental results serve as a benchmark for VQ/GMM/UBM based methods for the IITG-MV SR database.
- Subjects
GAUSSIAN mixture models; VECTOR quantization; AUTOMATIC speech recognition; SPEECH processing systems; PROSODIC analysis (Linguistics); LINEAR statistical models; EVALUATION methodology
- Publication
International Journal of Speech Technology, 2022, Vol 25, Issue 1, p173
- ISSN
1381-2416
- Publication type
Article
- DOI
10.1007/s10772-021-09899-9