用户名: 密码: 验证码:
语音识别声学模型压缩的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在基于HMM的连续语音识别中考虑上下文环境的影响,需要做模型基元扩展,模型数目大大增加;为了提高模型的描述精度,模型的每一个状态都由多个高斯分量来描述,所以高斯分量数目多、声学模型空间规模大,存储声学模型所需的内存很大,而且解码过程中为得到状态似然值所需的计算量也非常大,实时性比较差,为了克服这两方而的问题论文研究了声学模型压缩技术,以期找到较好的方法减少存储量的开销并提高解码速度。
     高斯分量共享和高斯选择两类模型压缩方法成为论文的研究重点。
     1.研究了高斯分量共享算法,包括传统的高斯分量共享和高斯分量子空间共享,并进行了对比实验。
     论文重点研究了传统的高斯分量共享算法和高斯分量子空间共享算法,对于传统的高斯分量共享算法,论文从聚类算法和距离测度准则等方面最了研究,优秀的聚类算法和距离测度准则能获得更好的压缩效果;对于高斯分量子空间共享算法,论文从子空间的划分、码本数量等方而做了研究,其中子空间的划分和码本数量对于子空间聚类算法的效果有很大的影响。
     2.提出了一种改进的基于单音素状态的高斯选择算法。
     论文重点研究了标准高斯选择算法、标准高斯选择算法的优缺点、传统的改进高斯选择算法以及这些算法的一些不足,为了弥补这些传统改进高斯选择算法的不足之处,论文提出了一种改进的高斯选择算法,经过试验验证,这种改进的高斯选择算法基本解决了高斯选择“状态受限”的缺点,取得了识别率与实时性的一个较好折中。
     3.提出了一种基于分量共享的高斯选择算法。
     传统的高斯选择算法只考虑了计算量的减少,并没有考虑内存的开销,因此在资源受限的系统中,高斯选择算法不能发挥很好的效果,为此,论文把高斯分量共享和高斯选择两种思想融合,达到了既能缩减内存开销又能减少计算量的目的。
Phoneme expansion is made from mono-phone to tri-phone because of context, and the number of acoustic models increases sharply. Each state of every model consists of multiple Gaussian mixtures for high accuracy and a large-scale Hidden Markov Model contains quite a large number of Gaussian mixtures. As a consequence, storing these Gaussian mixtures requires substantial amounts of memory and calculating likelihood probability of all states takes a lot of time. That results in high real time factor for most auto speech recognition systems. We study technologies about acoustic model compressing to overcome these two problems.
     This paper focus on two technologies:Gaussian mixture tying and Gaussian selection.1.Gaussian mixture tying:traditional Gaussian mixture tying and subspace distribution clustering.
     We research about traditional Gaussian mixture tying and subspace distribution clustering. This paper makes a research about deciding whether clustering algorithms and distance measure criterion have an important influence on traditional Gaussian mixture tying. Besides, weinvestigate subspace and the number of code words of each subspace which are two key techniques belonging to subspace distribution clustering.
     2.A new Monophone State-Based Gaussian selection.
     We make a research about standard Gaussian selection, traditional Gaussian selection and find that these technologies all have some disadvantages. A new Gaussian selection technology is presented to weaken influence made by these disadvantages. The new Gaussian selection technology is proved to be effective after a lot of tests.
     3. A new Tied-Mixture-Based Gaussian selection.
     The aim of Gaussian selection technology is to reduce amount of computation during decoding process, so substantial amounts of memory is still required. A widespread use of Gaussian selection technology is impossible owing to this disadvantage. This paper proposes a new technology based on Gaussian mixture tying and Gaussian selection for lowering memory usage and reducing decoding time.
引文
[1]彭荻,语音识别系统中上下文相关声学模型建模优化,北京邮电大学学报,第29卷增刊,2006年11月,pages 2
    [2]Rechl W, Chou W, Robust Decision Tree State Tying for Continuous SpeechRecognition [J]. IEEE Transactions, Speech and Audio Proc,2000, 8(5):555-556
    [3]Chien J T, Huang C H, Chen S J, Compact Decision Trees with Cluster Validity for Speech Recognition [C]//ICCASP. Orlando:[s.n.],2002,2462-2465
    [4]Gao Sheng, Zhang Jin song, Nakamura S, et al, Weighted Graph Based DecisionTree Optimization for High Accuracy Acoustic Modeling [C]//ICSLP. Denver:[s.n.] 2002,1233-1236
    [5]赵力,语音信号处理,第二版,机械工业出版社,2009.5
    [6]沈海峰,朱永宣,刘刚等,连续语音识别系统声学单元选择与模型训练的研究,计算机科学,Vo1.31,2004
    [7]徐燃,自动语音识别中声学模型鉴别性训练的研究与应用,[博士学位论文],北京,中国科学院研究生院,2009
    [8]曹志刚,钱亚生,现代通信原理,清华大学出版社,1992
    [9]S.Kapadia, Discriminative Training of Hidden Markov Models, PhD thesis, University of Cambridge, UK,1998
    [10]J. J. Odell, The Use of Context in Large Vocabulary Speech Recognition, PhD thesis, University of Cambridge, UK,1995
    [11]J. J. Odell, V. Valtchev, P. C. Woodland and S. J. Young, A One Pass Decoder Design for Large Vocabulary Recognition, Proceedings of the Workshop on DARPA Human Language Technology,1994,405-410
    [12]L. GU and K. Rose, Substate tying with combined parameter training and reduction in tied-mixture HMM design, Proceedings, IEEE Transactions on Speech And Audio Processing, vol.10, pp.137-145,2002
    [13]郑方,牟晓隆,徐明星等,汉语语音听写机技术的研究与实现,软件学报,1999,436-444
    [14]Lee C, Rabiner L., Pieraccini R. and Wilpon J., Acoustic Modeling for Large Vocabulary Speech Recognition, Computer, Speech and Language,4,1990, 127-165
    [15]S.J.Young and P.C.Woodland, Tree-based State Tying for High Accuracy Acoustic Modeling, Proc. Human Language Technology Workshop,3,1994, 307-312
    [16]Li, J., Zheng, F., and Wu, W. H., (Li 2000) Context-Independent Chinese Initial-Final Acoustic Modeling, International Symposium on Chinese Spoken Language Processing (ISCSLP'00), Beijing,2000, pages 23-26
    [17]彭荻,语音识别系统的声学建模研究,[硕士学位论文],北京,北京邮电大学,2007
    [18]Young S, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P, The HTK Book(for HTK Version 3.4), Cambridge University Engineering Department, 2009
    [19]X. D. Huang, A. Acero and H.W. Hon, Spoken Language Processing, Prentice-Hall, Englewood Cliffs, N.J.2001
    [20]于胜民,张树武,徐波,汉英双语混合声学建模方法研究,中文信息学报,第18卷第5期,2004,8,4-5
    [21]H. Hermansky, Perceptual Linear Predictive (PLP) Analysis of Speech, The Journal of the Acoustical Society of America,1990,87(4):1738-1752
    [22]S.Davis and P.Mermelstein, Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences, IEEE Trans. Acoustics, Speech and Signal Proc.,1980,28(4):357-366
    [23]E. Bocchieri and B. Mark, Subspace Distribution Clustering for Continuous Observation Density Hidden Markov Models, in Proc.5th Eur. Conf. Speech Communication Technology, vol.1, Rhodes, Greece, Sept.1997, pp.107-110
    [24]张辉,杜利民,声学模型压缩、复合以及语音信号盲分离研究,[博士学位论文],北京,中国科学院研究生院,2005
    [25]K.M.Knill, M.J.F.Gales and S.J.Young, Use of Gaussian Selection in Large Vocabulary Continuous Speech Recognition using HMMs, In Proc. ICSLP, volume 1, pages Ⅰ-470-Ⅰ-473, Philadelphia,1996
    [26]E. Bocchieri., Vector Quantization for Efficient Computation of Continuous Density Likelihoods, In Proc. ICASSP, volume II, pages Ⅱ-692-Ⅱ-695, Minneapolis,1993
    [27]H. Murveit, P. Monaco, V. Digalakis, and J. Butzberger, Techniques to Achieve An Accurate Real-Time Large-Vocabulary Speech Recognition System, In Proc. ARPA Workshop on Human Language Technology, pages 368-373, Plainsboro, N. J., Mar 1994
    [28]Y. Linde, A. Buzo, and R. M. Gray, An Algorithm for Vector Quantizer Design, IEEE Trans Comms, COM-28(1):84-95, Jan 1980
    [29]A. Lee, Gaussian Mixture Selection using Context independent HMM, In Proc. 2001 IEEE ICASSP,2001
    [30]A. Lee, T. Kawahara, K. Takeda and K. Shikano, A New Phonetic Tied-Mixture Model for Efficient Decoding, In Proc. IEEE-ICASSP, pages 1269-1272,2000
    [31]M. J. F. Gales, K.M. Knill and S. J. Young, State-Based Gaussian Selection in Large Vocabulary Continuous Speech Recognition using HMM's, In IEEE Trans. on Speech and Audio Processing, Vol.7, No.2, pages 152-161,1999

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700