The development of the HTK Broadcast News transcription system
An overview
مقال من تأليف: Woodland, P. C. ;
ملخص: This paper describes in detail the development of the HTK Broadcast News (BN) transcription system and presents full evaluation results from the 1996, 1997 and 1998 DARPA BN evaluations. It starts with a description of the underlying HTK large vocabulary recognition system and presents the modifications used in successive generations of the HTK BN system. Initially acoustic models that relied on fairly precise manual audio-type classification were used. To enable the use of automatic segmentation and classification systems, acoustic models were developed that were independent of fine audio classifications. The basic structure of the current HTK BN system includes a high-quality segmentation stage, multiple decoding passes which initially use triphones and trigrams, and then quinphone acoustic models along with word 4-gram and category language models applied in the final pass. This system gave the lowest error rate in the 1997 BN evaluation by a statistically significant margin. Refinements to the system are then described that examine the use of a larger acoustic training set, vocal tract length normalisation, full variance transforms and improved language modelling. Furthermore a version of the system was developed that ran in less than 10 times real time with only a small increase in error rate which has been used for the bulk transcription of broadcast news for information retrieval from audio data.
لغة:
إنجليزية