An integrated study of speaker normalisation and HMM adaptation for noise robust speaker-independent speech recognition
مقال من تأليف: Hariharan, Ramalingam ; Viikki, Olli ;
ملخص: Inter-speaker variability and sensitivity to background noise are two major problems in modern speech recognition systems. In this paper, we investigate different techniques that have been developed to overcome these issues. These methods include vocal tract length normalisation (VTLN), on-line HMM adaptation and gender-dependent acoustic modelling. Our objective in this paper is to combine these techniques so that the system recognition performance is maximised. Moreover, we propose a vocal tract length normalisation technique, which is more implementation-friendly than the previously published utterance-specific VTLN (u-VTLN). In order to ensure the wide applicability of the methods to be studied, the performance evaluation is done both in connected digit recognition and monophone-based isolated word recognition. The recognition results obtained indicate the importance of the combined use of these techniques. The integrated use of VTLN and on-line adaptation always provided the highest performance in both types of recognition experiments using gender-independent models. As expected, on-line HMM adaptation provided the major performance improvement with respect to a gender- and speaker-independent baseline system. The combination of speaker-specific VTLN (s-VTLN) or gender-dependent acoustic modelling further improved the system accuracy. However, while the joint use of s-VTLN and gender-dependent HMMs improved the recognition rate with original unadapted models, a minor performance degradation was observed when s-VTLN was applied to on-line adapted gender-dependent HMMs
لغة:
إنجليزية