We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
A Hybrid DMFCC-LPC Based Feature Extraction with DCNN Clustering for Speaker Diarization.
- Authors
Kaka, Jhansi Rani; Kangala, Vijay Kumar
- Abstract
Speaker Diarization (SD) or speaker indexing is a procedure for automatically partitioning a conversation by the number of speakers into homogeneous segments. The trustworthy diarization method accurately estimates the variable length assertion, and it involves major steps such as speech detection, speaker merges, and speaker change. The major problem with the SD method is enhancing the readability of speech transcription. In this study, a hybrid method of feature extraction based on the Dynamic Mel Frequency Cepstral Coefficient (DMFCC) and Linear Prediction Coding (LPC) is proposed for SD. The Voice Activity Detection (VAD) method is utilized to detect the presence or absence of a speaker in the audio lecture, which is followed by speaker segmentation utilizing the extracted features. The Deep Convolutional Neural Network (DCNN) is utilized to determine the feature vector and cluster the speaker from the audio lecture. The results show that the proposed DMFCC-LPC delivers a robust performance on metrics such as accuracy, Diarization Error Rate (DER) and False Positive Rate (FPR) of 0.967, 0.31 and 0.119 on CALLHOME dataset, in contrast to the Speaker Siarization System using HXLP-DCNN with Sailfish Optimization Algorithm (SDS-HXLP-DCNN-SOA), Feature-Level fusion, and Self-supervised clustering with Path Integral Clustering (SSC-PIC).
- Subjects
CONVOLUTIONAL neural networks; OPTIMIZATION algorithms; LINEAR codes; FEATURE extraction; TRANSCRIPTION (Linguistics); PATH integrals
- Publication
International Journal of Intelligent Engineering & Systems, 2024, Vol 17, Issue 3, p757
- ISSN
2185-310X
- Publication type
Article
- DOI
10.22266/ijies2024.0630.59