Longbin Lu; Xinman Zhang; Xuebin Xu; Dongpeng Shang

doi:10.1117/1.JEI.24.5.053023

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Video analysis using spatiotemporal descriptor and kernel extreme learning machine for lip reading.
Authors: Longbin Lu; Xinman Zhang; Xuebin Xu; Dongpeng Shang
Abstract: Lip-reading techniques have shown bright prospects for speech recognition under noisy environments and for hearing-impaired listeners. We aim to solve two important issues regarding lip reading: (1) how to extract discriminative lip motion features and (2) how to establish a classifier that can provide promising recognition accuracy for lip reading. For the first issue, a projection local spatiotemporal descriptor, which considers the lip appearance and motion information at the same time, is utilized to provide an efficient representation of a video sequence. For the second issue, a kernel extreme learning machine (KELM) based on the single-hiddenlayer feedforward neural network is presented to distinguish all kinds of utterances. In general, this method has fast learning speed and great robustness to nonlinear data. Furthermore, quantum-behaved particle swarm optimization with binary encoding is introduced to select the appropriate feature subset and parameters for KELM training. Experiments conducted on the AVLetters and OuluVS databases show that the proposed lipreading method achieves a superior recognition accuracy compared with two previous methods.
Subjects: SPATIOTEMPORAL processes; MACHINE learning; LIPREADING; AUTOMATIC speech recognition; NEURAL circuitry
Publication: Journal of Electronic Imaging, 2015, Vol 24, Issue 5, p053023-1
ISSN: 1017-9909
Publication type: Article
DOI: 10.1117/1.JEI.24.5.053023

We found a match

Video analysis using spatiotemporal descriptor and kernel extreme learning machine for lip reading.

Longbin Lu; Xinman Zhang; Xuebin Xu; Dongpeng Shang

SPATIOTEMPORAL processes; MACHINE learning; LIPREADING; AUTOMATIC speech recognition; NEURAL circuitry

Journal of Electronic Imaging, 2015, Vol 24, Issue 5, p053023-1

1017-9909

Article

10.1117/1.JEI.24.5.053023