We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
G-Cocktail: An Algorithm to Address Cocktail Party Problem of Gujarati Language Using Cat Boost.
- Authors
Gupta, Monika; Singh, R. K.; Singh, Sachin
- Abstract
This paper is an attempt to address to the problem of native language in a mixed voice environment. G- Cocktail would aid these applications in identifying commands given in Gujarati, even from a mixed voice stream. There are two phases of G-cocktail in the first phase, it creates features after filtering the voices and in the second it trains and classifies the dataset. This trained dataset helps in recognizing the new voice signal. The challenge in training a native language is the availability of a small dataset. A single-word input is used in model and phrase benchmark dataset from Microsoft and the Linguistic Data Consortium for Indian Languages (LDC-IL). To overcome the over fitting problem due to smaller dataset we used CatBoost algorithm. And fine-tuned the classification model to avoid the over fitting issue. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). MFCC is good for human voices but noises in the sound makes it less productive. To avoid this shortcoming of MFCC, first filtered the voices are used and then calculated the MFCC. The most relevant features are retained to make it more robust. With MFCC features, the pitch of the voices is also added, as pitch could vary with regional changes, mood of the person, age, and knowledge of the language to the speaker. A voice print of the whole sound files is constructed and fed it as features to the classification model. For training and testing 70% and 30% ratio is used in algorithms like K-means, Naïve Bayes, and Light GBM. Proposed model is compared with given data set and results proved that G-cocktail using XBoost performed better than the others under the given scenario in all parameters.
- Subjects
MICROSOFT Corp.; COCKTAIL parties; NATIVE language; HUMAN voice; VOICEPRINTS; VOICE culture; ABSOLUTE pitch
- Publication
Wireless Personal Communications, 2022, Vol 125, Issue 1, p261
- ISSN
0929-6212
- Publication type
Article
- DOI
10.1007/s11277-022-09549-6