Miao, Haoran; Cheng, Gaofeng; Zhang, Pengyuan

doi:10.1049/ell2.12349

Back to matches

Your institution may have rights to this item. Sign in to continue.

Title: Low‐latency transformer model for streaming automatic speech recognition.
Authors: Miao, Haoran; Cheng, Gaofeng; Zhang, Pengyuan
Abstract: Transformer models have made great progress in automatic speech recognition. However, it is challenging for streaming transformer models to make trade‐off between output latency and recognition accuracy. In this letter, it is aimed to propose a low‐latency transformer model with satisfactory recognition accuracy. First, a streaming transformer is designed and explain how it works streamingly. Second, the authors propose to use CTC during training to minimise the latency of transformer models. Finally, the authors also propose to utilise CTC as a backup during decoding to ensure that the low‐latency characteristic is maintained. The authors fairly compare our streaming transformer model to existing streaming models, particularly the transducer model, which is a popular low‐latency approach. The experiments show that, while having comparable output latency, the transformer model outperforms the transducer model by average relative character (or word) error rate reduction of 22.18%, 26.71% and 19.36% on HKUST, Switchboard and Call Home, respectively.
Subjects: AUTOMATIC speech recognition; TRANSDUCERS; PATTERN recognition systems; ELECTROMECHANICAL devices; ELECTRONIC equipment
Publication: Electronics Letters (Wiley-Blackwell), 2022, Vol 58, Issue 1, p44
ISSN: 0013-5194
Publication type: Article
DOI: 10.1049/ell2.12349

We found a match

Low‐latency transformer model for streaming automatic speech recognition.

Miao, Haoran; Cheng, Gaofeng; Zhang, Pengyuan

AUTOMATIC speech recognition; TRANSDUCERS; PATTERN recognition systems; ELECTROMECHANICAL devices; ELECTRONIC equipment

Electronics Letters (Wiley-Blackwell), 2022, Vol 58, Issue 1, p44

0013-5194

Article

10.1049/ell2.12349