Barai, Bidhan; Chakraborty, Tapas; Das, Nibaran; Basu, Subhadip; Nasipuri, Mita

doi:10.1007/s10772-021-09899-9

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Closed-set speaker identification using VQ and GMM based models.
Authors: Barai, Bidhan; Chakraborty, Tapas; Das, Nibaran; Basu, Subhadip; Nasipuri, Mita
Abstract: An array of features and methods are being developed over the past six decades for Speaker Identification (SI) and Speaker Verification (SV), jointly known as Speaker Recognition(SR). Mel Frequency Cepstral Coefficients (MFCC) is generally used as feature vectors in most of the cases because it gives higher accuracy compared to other features. The presented paper focuses on comparative study of state-of-the-art SR techniques along with their design challenges, robustness issues and performance evaluation methods. Rigorous experiments have been performed using Gaussian Mixture Model (GMM) with variations like Universal Background Model (UBM) and/or Vector Quantization (VQ) and/or VQ based UBM-GMM (VQ-UBM-GMM) with detail discussion. Other popular methods have been included, namely, Linear Discriminate Analysis (LDA), Probabilistic LDA (PLDA), Gaussian PLDA (GPLDA), Multi-condition GPLDA (MGPLDA), Identity Vector (i-vector) for comparative study only. Three popular audio data-sets have been used in the experiments, namely, IITG-MV SR, Hyke-2011 and ELSDSR. Hyke-2011 and ELSDSR contain clean speech while IITG-MV SR contains noisy audio data with variations in channel (device), environment, spoken style. We propose a new data mixing approach for SR to make the system independent of recording device, spoken style and environment. The accuracy we obtained for VQ and GMM based methods for databases, Hyke-2011 and ELSDSR are varies from 99.6 % to 100 % whereas accuracy for IITG-MV SR is upto 98 % . Indeed, in some cases the accuracies degrade drastically due to mismatch between training and testing data as well as singularity problem of GMM. The experimental results serve as a benchmark for VQ/GMM/UBM based methods for the IITG-MV SR database.
Subjects: GAUSSIAN mixture models; VECTOR quantization; AUTOMATIC speech recognition; SPEECH processing systems; PROSODIC analysis (Linguistics); LINEAR statistical models; EVALUATION methodology
Publication: International Journal of Speech Technology, 2022, Vol 25, Issue 1, p173
ISSN: 1381-2416
Publication type: Article
DOI: 10.1007/s10772-021-09899-9

We found a match

Closed-set speaker identification using VQ and GMM based models.

Barai, Bidhan; Chakraborty, Tapas; Das, Nibaran; Basu, Subhadip; Nasipuri, Mita

GAUSSIAN mixture models; VECTOR quantization; AUTOMATIC speech recognition; SPEECH processing systems; PROSODIC analysis (Linguistics); LINEAR statistical models; EVALUATION methodology

International Journal of Speech Technology, 2022, Vol 25, Issue 1, p173

1381-2416

Article

10.1007/s10772-021-09899-9