基于区分性原理的汉语语音识别中声调问题的研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于区分性原理的汉语语音识别中声调问题的研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Discriminative Methodologies for Tone Problem Solving in Mandarin Speech Recognition
作者：黄浩
论文级别：博士
学科专业名称：电路与系统
中文关键词：汉语语音识别 ; 声调建模 ; 区分性训练 ; 区分性特征提取 ; 隐条件随机场 ; 大间隔方法 ; 最小音子错误
英文关键词：Mandarin speech recogntion ; tone modeling ; discriminative training ; discriminative feature extraction ; hidden conditional random fields ; large margin ; minimum phone error
学位年度：2008
导师：朱杰
学科代码：080902
学位授予单位：上海交通大学
论文提交日期：2008-11-01

摘要

汉语是一种带调语言,声调在汉语语音中具有非常重要的意义。相同的声母和韵母构成的音节随声调的不同而具有完全不同的意义,对应着不同的方块字。特别是当语言模型上下文缺失的情况下,声调在汉语普通话中承担着重要的构字辨义的作用。因此,将声调信息应用于汉语普通话的语音识别系统当中,将会有效地提高识别系统的性能。近年来,基于区分性原理的机器学习方法已成为模式识别特别是自动语音识别研究领域的热门研究方向之一。利用区分性原理在模型训练以及特征优化方面提出的一些方法,在小规模的分类任务以及大词汇连续语音识别系统中都显示了优越的性能。
     本文以汉语普通话大词汇连续语音识别系统为应用背景,旨在根据汉语声调发音的特点,从区分性原理的角度来讨论汉语语音的声调建模以及声学建模中的声调信息利用问题。回顾了语音识别技术的发展历史,介绍了声调在汉语语音识别中的作用,系统性描述了区分性训练准则以及应用比较成功的区分性模型与方法,并由此提出了不同模型下改进声调识别性能以及利用声调信息改进声学建模性能的区分性方法,为汉语语音识别中声调问题的解决提供了新的研究思路。这些方法可概括如下:
     首先从区分性训练的角度研究了基于隐马尔可夫模型的声调建模方法。为了提高汉语声调识别率,从模型空间中利用区分性训练的参数更新方法对模型参数进行重估。在汉语普通话中,由于协同发音的存在,连续语音的声调识别较孤立语音声调识别复杂。声调协同发音体现为对当前音节的声调感知高度依赖于上下文声调。基于上述原理,在特征空间的区分性训练方面,提出区分性声调特征提取方法。该方法根据区分性线性特征补偿的思想,根据区分性目标函数训练得到的线性变换,将上下文基音频率进行映射并补偿至当前音节基音频率特征。实验表明区分性声调特征提取显著提高了声调识别率,声调特征提取基础上的模型参数联合训练进一步提高了声调识别的性能。并从识别率以及特征变换参数的角度进行分析,说明特征提取方法与传统声调特征归一化的本质不同。
     条件随机场(conditional random fields,CRFs)是近年来在自然语言处理领域使用的成功的数学模型。论文采用条件随机场的一种扩展-隐条件随机场对汉语语音声调进行显式建模,提出一种对传统动态特征的扩展-广义动态特征来更好地捕捉基音频率曲线的动态变化。声调识别实验表明采用相同的特征和结构,隐条件随机场较最大似然训练的隐马尔可夫模型声调识别率有显著提高,加入广义动态特征之后声调识别率有一致性改进。隐条件随机场区别于HMM的重要特性在于无须对特征采用统一的利用方式,这使得该模型非常适合于处理汉语语音中基音频率在浊音段连续、清音段不连续的声学现象。提出了隐条件随机场对断续F_0进行直接建模的隐式声调建模方法,带调音节分类实验表明在隐条件随机场下对断续基音频率序列的直接建模较使用清音段平滑F_0特征的识别率有明显的提高,该实验结果对利用隐条件随机场在大词汇连续语音识别系统下,声学建模中对断续基音频率序列的直接建模提供初步的实验依据。
     讨论了大间隔(large margin)高斯混合模型的声调建模方法,根据大间隔区分性训练准则对模型参数进行区分性训练。对于参数的更新,针对基于Quasi-Newton梯度下降方法收敛速度慢的缺点,提出一种扩展Baum Welch(extended Baum Welch,EBW)形式的大间隔高斯混合模型的参数更新方法,该方法借助弱辅助函数的原理对高斯参数进行优化,实验表明该方法与基于Quasi-Newton的梯度方法相比只需要几次迭代就可以达到相同甚至更高的识别结果。另一方面,对于基于段特征的高斯混合模型,选取什么样的特征能够达到更好的识别率往往需要反复试凑得到最优的识别结果。本文利用线性判别分析方法来对声调特征进行降维,通过线性判别分析得到更加适合于声调区分的段特征,声调识别实验上表明在维数缩减特征基础上的高斯混合声调模型,较传统的重叠双音调高斯混合模型在声调识别性能方面有明显的提高,这表明线性判别分析获得的特征要优于人工选取的超音段声调特征。
     最后讨论了一种区分性模型权重的训练方法,将显式训练的声调模型加入大词汇量连续语音识别系统中来提高汉语连续语音识别率。该方法根据最小音子错误(minimumphone error,MPE)准则,区分性地训练模型相关的概率权重。利用这些权重对传统基于传统谱特征的HMM模型概率以及声调模型概率进行加权,通过调整模型之间的作用程度提高系统识别率。推导了利用扩展Baum-Welch算法的权重更新公式。根据汉语上下文相关声学建模的特点,由此提出了带调音节相关、韵母模型相关、模型组合相关和整词相关的模型权重策略。对不同模型权重组合策略进行了评估。在实验中,由于训练语料的有限性,各种权重策略随着可训练参数增多,容易受到过训练的影响。具体表现在:对训练数据目标函数增大,但是测试数据识别率反而下降。提出利用权重之间的平滑的方法来克服权重训练过拟合的问题。分别通过大词汇连续语音的带调音节输出和汉字输出两种识别任务来验证区分性模型权重训练的性能。实验结果表明在两种识别任务上,使用区分性的模型权重较使用全局模型权重显著地降低了误识率,这表明了区分性模型权重对提高声调模型集成性能的有效性。
Chinese is a tonal language and tones are of fundamental importance to Mandarin speech recognition.Tones can be as important as phonemes when contextual information is limited or missing.Utilization of tone information to improve performance in Mandarin speech recognition has been widely studied in recent research.Significant improvements have been achieved on various scale speech recognition tasks in both clean and noisy en-vironment.In recent years,discriminative machine learning method has been one of the hottest direction in pattern recognition and especially in automatic speech recognition research. Several model parameter estimation and feature extraction methods based on discriminative principles have shown to be successful in both classification and continuous speech recognition tasks.
     This dissertation aims at solving tone problems which are unique in Mandarin speech recognition,and hence improving the performance of large vocabulary speech recognition system,by taking advantage of the recently proposed discriminative training criteria,models and methods.An systematic overview of the discriminative training criteria,models and correspondingly derived discriminative techniques is provided.Several discriminative ap-proaches to tone problem solving in Mandarin speech recognition are proposed,which can be summarized as follows:
     Traditional tone modeling based on hidden Markov models is firstly investigated from a new,discriminative training perspective.To improve tone recognition accuracy,discriminative training in both the model space and the feature space is proposed.In the model space, the model parameters are trained by using an objective function termed as minimum tone error,which is a smooth approximation of tone recognition accuracy.In the feature space, based on the fact that Mandarin tones are greatly influenced by the context tones,a tonal feature extraction method for HMM based tone modeling is inroduced.The method uses linear transforms to project F_0(fundamental frequency) features of neighboring syllables as compensations,and adds them to original F_0 features of current syllable.The trans-forms are discriminatively trained according to the same objective function.Experiments show the new tonal features achieve significant tone recognition improvement,compared with baseline using maximum likelihood trained HMM on normal F_0 features.The overall discriminative training on the new features introduces further improvement.It is also found the DTFE method brings additional improvements to traditional F_0 normalization technique.
     Conditional random fields(CRFs) should be one of the most successfully applied mathematical models in the research field of natural language processing.Tone modeling using the extension of CRFs,hidden conditional random fields(HCRFs) is explored.To better capture the F_0 contour,a generalized dynamic feature is introduced.Experimental results on tone recognition have shown the HCRFs based tone model outperform both the maximum likelihood and discriminatively trained HMM tone models when using the same model structure and observations.The generalized dynamic features introduces consistent gain over the normal dynamic features.It has been pointed out that a key advantage of CRFs or HCRFs is their great flexibility to include a wide variety of arbitrary,non-independent features of the input.In Mandarin speech recognition,unlike the spectral features,no F_0 is observed in unvoiced region.The discontinuity between voiced and unvoiced segments has traditionally made tone modeling difficult.Thus the model of HCRFs is more suitable for dealing with this special phenomenon.A preliminary evaluation of HCRFs for embedded tone modeling in Mandarin speech recognition is presented.Experimental results on tonal syllable classification tasks have shown HCRFs on discontinuous F_0 features is better than using smooth F_0 feature.
     The large margin methods have attracted a lot of research attentions in the field of machine learning.The fact that it is the margin in classification rather than the raw training error that matters has become a key tool in recent years when dealing with discriminative classifiers.We build segmental feature based tone classifier on Gaussian mixture model.A discriminative objective function termed as large margin criterion is adopted to train Gaus-sian mixture parameters.A novel model parameter updating equation using the weak-sense auxiliary function is formulated to obtain an efficient iterative training approach of the Gaussian parameters.Linear discriminant analysis feature reduction algorithm is applied to extraction critical segmental feature of the tones.Experimental results on tone recog-nition tasks have shown the margin based discriminative criterion is better than empirical risk based objective function.The proposed Extended Baum Welch(EBW) like updating algorithm have achieve a comparable performance when using only several iterations.The GMMs trained on LDA derived features are better than the previously proposed overlapped di-tone Gaussian mixture models.
     When integrating explicitly trained tone models into lattice based rescoring,a discriminative framework of tone model integration is proposed.The method is to use model dependent weights to scale probabilities from various models:the HMM based on spectral features and tone models based on F_0 related tonal features.The weights are discriminatively trained by the minimum phone error(MPE) criterion and update equation of model weights based on the EBW algorithm is derived.Various schemes of model weight combination such as tonal syllable dependent,final model dependent,model combination dependent and word dependent are evaluated and a smoothing technique is introduced to make training robust to over fitting.The proposed method is evaluated on tonal syllable output and character output speech recognition tasks.Experiments results show the proposed method has obtained significant relative error reduction than global weight on the two tasks due to a better interpolation of the given models.

引文

[1]Axelrod S,Gopinath R,Olsen P,and Visweswariah K.Dimensional reduction,covariance modeling and computational complexity in ASR systems.In Proc.of ICASSP 2003,vol.1,864-867.
    [2]Axelrod S,Goel V,Gopinath R A,Olsen P A,Visweswariah K.Discriminative training of subspace constrained Gaussian mixture models for speech recognition.IEEE Trans.Speech and Audio Processing,Mar.2004.
    [3]Benson S J,Ye Y,Zhang X.Solving Large-Scale Sparse Semidefinite Programs for Combinatorial Optimization.SIAM Journal on Optimization,vol.10(2),443-461,2000.
    [4]Bahl L R,Brown P F,Souza P,Mercer R.Maximum mutual information estimation of hidden Markov model parameters for speech recognition.In Proc.of ICASSP,vol.1,49-52,1986.
    [5]Baker J K.The Dragon System-An Overiew.IEEE Trans.on Acoustics,Speech and Signal Processing,vol.23,24-29,1975.
    [6]Baum L E and Eagon J A.An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology.Bull.Amer.Math.Soc.,vol.73,360-363,1967.
    [7]Biem A,Katagiri S,McDermott E,Juang B H.An application of discriminative feature extraction to filter-bank based speech recognition.IEEE Trans.on Speech and Audio Processing,vol.9(2),96-110,Mar.2001.
    [8]Boersma P,Weenink D.Praat:doing phonetics by computer(Version 4.3.14)[Com-puter program].Retrieved May 26,2005,http://www.praat.org/.
    [9]Boyd S,and Vandenbergerhe L.Convex Optimization.Cambridge University Press,2004.
    [10]蔡铁。基于支持向量机的稳健语音识别技术研究。博士学位论文,上海交通大学,2005。
    [11]Cao Yang,Deng Yong gang,Zhang Hong,Huang Taiyi,Xu Bo.Decision Tree Based Mandarin Tone Model And its Application to Speech Recognition.In Proc.of ICASSP,vol.3,1759-1762,2000.
    [12]Cao Yang,Zhang Shu Wu,Huang Tai Yi,et al.Tone modeling for continuous Mandarin speech recognition.International Journal of Speech Technology,2004,7(2-3):115-128.
    [13]曹阳,黄泰翼,徐波。基于统计方法的汉语连续语音中声调模式的研究。自动化学报,第30卷,第2期,191—198页,2004。
    [14]Chang E,Shi Y,Zhou J L,et al.Speech lab in a box:a Mandarin speech toolbox to jumpstart speech related research.In Proc.of Eurospeech,2001.2779-2782.
    [15]Chen S H,Wang Y R.Tone Recognition of Continuous Mandarin Speech Based on Neural Networks.IEEE Trans On Speech And Audio Processing,vol.3(2),146-150,1995.
    [16]Chen C J,Gopinath R A,Monkowski M D,Picheny M A,and Shen K.New Methods in Continuous Mandarin Speech Recognition.In Proc.of Eurospeech,1543-1546,1997.
    [17]Chen Z,Lee K F,Li M J.Discriminative training on language model,in Proc.ICSLP,vol.1,493-496,2000.
    [18]Clarkson P,Moreno P.On the use of support vector machines for phonetic classification.In Proc.of ICASSP vol.2,585-588,1999.
    [19]Chengalvarayan R and Deng L.HMM-based speech recognition using statedependent,discriminatively derived transforms on Mel-warped DFT features.IEEE Trans.on Speech and Audio Processing,vol.5(3),243-256,May 1997.
    [20]Davis S B and Mermelstein P.Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences.IEEE Trans.on Acoustic,Speech and Signal Processing,28(4),357-366,1980.
    [21]Dempster A P,Laird N M,Rubin D B.Maximum likelihood from incomplete data via the EM algorithm.Journal of the Royal Statistical Society,vol.39,1-38,1977.
    [22]Duda R O and Hart P E.Pattern Classification and Scene Analysis.John Wiley & Sons.
    [23]Deng L,Wu J,Droppo J,Acero A.Analysis and comparison of two speech feature extraction/compensation algorithms.IEEE Signal Processing Letters,vol.12(6),477-480,2005.
    [24]Droppo J,Deng L,and Acero A.Evaluation of the SPLICE algorithm on the Aurora2database.In Proc.of of Interspeech,217-220,2001.
    [25]Fukunaga K.Introduction to statistical pattern recognition.Academic Press,New York,1990.
    [26]Gales M J F,Woodland P C,Mean and Variance Adaptation Within the MLLR Framework.Computer,Speech and Language,vol.10,249-264,1996.
    [27]Gales M J F.Maximum Likelihood Multiple Projection Schemes for Hidden Markov Models.IEEE Trans,on Speech and Audio Processing,37-47,vol.10,2002.
    [28]Gauvain J L,Lee C H.Maximum aposteriori estimation for multivariate Gaussian mixture observations of Markov chains.IEEE Trans,on Speech and Audio Processing,vol.2,291-298,1994.
    [29]Goel V,Axelrod S,Gopinath R,Olsen P,and Visweswariah K.Discriminative estimation of Subspace Precision and Mean (SPAM) models.In Proc.of Eurospeech,2617-2620,2003.
    [30]Gopalakrishnan P,Kanevsky D,Nadas A,and Nahamoo D.A generalisation of the baum algorithm in rational objective functions.In Proc.of ICASSP,vol.1,631-634,1989.
    [31]Gopinath R A.Maximum likelihood modeling with gaussian distributions for classification.In Proc.of ICASSP,vol.2,661-664,1998.
    [32]Gunawardana A,Hahajan M,Acero A,Platt J C.Hidden conditional random fields for phone classification.In Proc.of Eurospeech,1117-1120,2005.
    [33]Hermansky H.Perceptual Linear Predictive (PLP) analysis of speech.Journal of the Acoustical Society of America,87(4),1738-1752,1990.
    [34]Hermansky H,Ellis D,and Sharma S.Tandem connectionist feature stream extraction for conventional HMM systems.In Proc.of the ICASSP,1635-1638,2000.
    [35]Hermes D.Measurement of pitch by subharmonic summation.Journal of Acoustic Society of America.83(1),257-264,1988.
    [36]Huang C H,Side F.Pitch tracking and tone features for mandarin speech recognition.In Proc.of ICASSP,1523-1526,2000.
    [37]M.J.Hunt and C.Lef e bre.A comparison of several acoustic representations for speech recognition with degraded and undegraded speech.In Proc.of ICASSP,262-265,1989.
    [38]黄贤军,杨玉芳,吕士楠。汉语语调的降价实验研究,声学学报,vol.32(1),804-809,2007．
    [39]Jiang H,Li X,Liu C.Large Margin Hidden Markov Models for Speech Recognition.IEEE Trans,on Audio,Speech and Language Processing,vol.14(5),1584-1595,2006.
    [40]Jelinek F.Statistical Methods for Speech Recognition.MIT press.
    [41]Juang B H.Maximum likelihood estimation for mixture multivariate stochastic ob-servatins of Markov chinas.AT&T Technical Journal,vol.64(6),1985.
    [42]Juang B H,Chou W,Lee C H.Minimum classification error rate methods for speech recognition.IEEE Transactions on Speech Audio Processing,vol.5(2),266-277,1997.
    [43]Karmarkar N.A new polynomial time algorithm for linear programming.Combina-torica,1984 (4),373-395.
    [44]Kumar N,Andreou A G.Heterroscedastic Discriminative Analysis and Reduced Rank HMMs for Improved Speech Recognition.Speech Communication,vol.26,283-297,1998.
    [45]Kuo H K,Fosler-Lussier E,Jiang H,and Lee C H.Discriminative Training of Language Models for Speech Recognition.In Proc.ICASSP,vol.325-328,2002.
    [46]Kuo H K J and Gao Y Q.Maximum entropy direct model as a unified direct model for acoustic modeling in speech recognition.In Proc.of ICSLP,681-684,2004.
    [47]Kuo H K J and Gao Y Q.Maximum Entropy Direct Models for Speech Recognition.IEEE Trans,on Audio,Speech and Language Processing,vol.14(3),873-881,2006.
    [48]Kuo W J and Chen B.Minimum Word Error Based Discriminative Training of Language Models.In Proceedings of Interspeech.1277-1280,2005.
    [49]Kushner H J,Yin G.Stochastic Approximation Algorithms and Applications,Springer-verlag,1997.
    [50]Lafferty J,McCallum A,Pereira F.Conditional random field:Probabilistic models for segmenting and labeling sequence data.In Proceedings of the International Conference on Machine Learning (ICML),282-289,2001.
    [51]Lee L,and Rose R.A Frequency Warping Approach to Speaker Normalization.IEEE Trans,on Speech and Audio Processing,vol.6(1),49-60,1998.
    [52]LEE T,Lau W,Wong Y W,et al.Using tone information in Cantonese continuous speech recognition.ACM Transactions on Asian Language Information Processing,1(1),83-102,2002.
    [53]Leggetter C J and Woodland P C.Maximum likelihood linear regression for speaker adaptation of continuous density HMMs.Computer,Speech and Language,vol.9,171-186,1995.
    [54]Lei X,Siu M H,Hwang M,Ostendorf M,et al.Improved Tone Modeling for Mandarin Broadcast News Speech Recognition.In Proc.of Interspeech,1277-1280,2006.
    [55]Lei X,Ostendorf M.Word level tone modeling for Mandarin Speech Recognition.In Proc.of ICASSP,vol.4,665-668,2007.
    [56]Li X,Jiang H,Liu C.Large margin HMMs for speech recognition.In Proc.of ICASSP,513-516,2005.
    [57]Li X,Jiang H.Solving Large Margin HMM Estimation via Semi-definite Programming.In Proc.of ICSLP,1064-1067,2006.
    [58]Likhododev A and Gao Y.Direct models for phoneme recognition In Proc.of ICASSP,vol.1,89-92,2002.
    [59]Lowerre B T.The Harpy Speech Recognition System.Ph.D.Thesis,Carnegie Melon University,1976.
    [60]Mangu L,Brill E,and Stolcke A.Finding consensus in speech recognition:word error minization and other application of confusion network.Computer Speech and Language,vol.14 (4),373-400,2000.
    [61]McCallum A,Freitag D,Pereira F.Maximum entropy Markov models for information extraction and segmentation.In Proc.of International Conference of Machine Learning,591-598,2000.
    [62]McCallum A.Efficiently inducing features of conditional random fields.In Proc.of the Nineteenth Conference on Uncertainty in Artificial Intelligence,403-410,2003.
    [63]McDermott E.Discrminative training for speech recognition.Ph.D thesis,Waseda University,School of Engnineering,1997.
    [64]McDermott E,and Katagiri S.Discrminative training for large cocabulary speech recognition using minimum classifciiation error.IEEE Trans,on Speech and Audio Processing,vol 15(1),203-222,2007.
    [65]Morris J,Fosler-Lussier E.Combining phonetic attributes using conditional random fields.In Proc.of Interspeech,USA,597-600,2006.
    [66]Morris J,Fosler-Lussier E.Further experiments with detector-based conditional random fields in phonectic recognition.In Proc.of ICASSP,vol.4,441-444,2007.
    [67]Mahajan M,Gunawardana A,Acero A.Training Algorithm for Hidden Conditional Random Fields.In Proc.of ICASSP,vol.1,273-276,2006.
    [68]Nesterov Y and Nemirovsky A general approach to polynomial-time algorithms design for convex programming,Technical report,Centr.Econ and Math.Inst.,USSR Acad.Aci.,Moscow,USSR,1988.
    [69]H.Ney.Dynamic programming algorithm for optimal estimation of speech parameter contours.IEEE Trans,on Systems,Man,and Cybernetics,vol.13,208-214,1988.
    [70]Ng T,Siu M,Ostendorf M.A Quantitative Assessment of the Importance of Tone in Mandarin Speech Recognition.IEEE Signal Processing Letters,12(12),867-870,2005.
    [71]Nocedal J and Wright S J.Numerical Optimization.Springer,1999.
    [72]Normandin Y and Morgera D.An improved MMIE training algorithm for speakerinde-pendent small vocabulary continuous speech recognition.In Proc.of ICASSP,537-540,1991.
    [73]Normandin Y,Lacouture R,Cardin R.MMIE training for large vocabulary continuous speech recognition.In Proc.of ICSLP,1367-1371,1994.
    [74]Ostendorf M,Shriberg E,Stolcke A.Human Language Technology:Opportunities And Challenges.In Proc.of ICASSP,949-952,2005.
    [75]Povey D,Woodland P C.Minimum Phone Error and I-smoothing for Improved Discriminative Training.In Proc.of ICASSP,105-108,2002.
    [76]Povey D.Discriminative Training for Large Vocabulary Speech Recognition,Ph.D.thesis,Cambridge University,2004.
    [77]Povey D,Kingsbury B,Mangu L,et al.fMPE:discriminatively trained features for speech recognition.In Proc of ICASSP,vol.1,961-964,2005.
    [78]Povey D,Kanevsky D,Kingsbury B,Ramabhadran B,Saon G,Visweswariah K.Boosted MMI for Model and Feature space discrminative training.In Proc.of ICASSP,2008.
    [79]Peng G,Wang W S.Tone recognition of continuous Cantonese speech based on support vector machines.Speech Communication,vol.45,49-62,2005.
    [80]Petersen K B,Pedersen M S.The Matrix Cookbook.http://www matrixcookbook.com/.
    [81]Qian Y,LEE T,Li Y J.Overlapped ditone modeling for tone recognition in continuous Cantonese speech.In Proc.of Eurospeech,1845-1848,2003.
    [82]乔春雷,吴及,王作英。在汉语语音识别中应用声调信息的研究。计算机工程与应用,vol.12,51—53,2002。
    [83]Quattoni A,Wang S,Morency L P,Collins M,and Darrell T.IEEE Trans.on Pattern Analysis and Machine Intelligence,vol.29(10),1848-1853,2007.
    [84]Rabiner L R et al.A comparative performance study of several pitch detection algorithms.IEEE Trans.on Acoustics,Speech and Signal Processing,vol.24(5),399-418,1976.
    [85]Rabiner L R,Juang B H Fundamentals of speech recognition.Englewood Cliffs,Prentice Hall,1993.
    [86]Rabiner L R,Juang B H,Levison S E,Sondhi N M.Recognition of Isolated Digits Using Hidden Markov Models With Continuous Mixture Densities.AT&T,Tech.J.,vol.64,1211-1234,1985.
    [87]Rennie J D M and Srebro N.Fast maximum margin matrix factorization for collaborative prediction.In Proc.of the International Conference on Machine Learning(ICML),2005.
    [88]Ross M,Shaffer H,Cohen A,Freudberg R,Manley H.Average magnitude difference function pitch extractor.IEEE Trans.on Acoustics,Speech,and Signal Processing.vol.22(5),1974.
    [89]Roark B,Saraclar M,Collins M,Johnson M.Discriminative language modeling with conditional random fields and the perception algorithm.In Proc.of Association of Computional Linguistics(ACL),2004.
    [90]Sakoe H,Chiba S.A Dynamic Programming Approach to Continuous Speech Recognition.In Proc.of ICASSP,1971.
    [91]Saon G,Padmanabhan M,Gopinath R,Chen S.Maximum Likelihood Discriminant Feature Spaces.In Proc.of ICASSP,vol.2,1129-1132,2000.
    [92]Schwartz R,Chow Y,Kimball O,Roucos S,Krasner M,Makhoul J.Context-dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech.In Proc.of ICASSP,1205-1208,1985.
    [93]Sha F,Pereira F.Shallow parsing with conditional random fields.In Proceedings of Human Language Technology -NAACL,134-141,2003.
    [94]Sha F,Saul L K.Large margin Gaussian mixutre modeling for phonectic classification and recognition.In Proc.of ICASSP,vol.265-268,2006.
    [95]Sha F,Saul L K.Large margin hidden Markov models for automatic speech recognition.In B.Sch(?)kopf,J.Platt,and T.Hofmann,editors,Advances in Neural Information Processing Systems 19,Cambridge,MA,2007.MIT Press.
    [96]Sha F,and Saul L K.Comparison of large margin training to other discriminative methods for phonetic recogtnion by hidden Markov models.In Proc.of ICASSP.vol 4.314-316,2007.
    [97]Sha F.Large Margin Tranining of Acoustic Models for Speech Recognition,Ph.D.thesis,University of Pennsylvania,2007.
    [98]史媛媛,刘加,刘润生。一种改进的线性区分分析方法及其在汉语数码语音识别上的应用。电子学报。30(7),959-962,2002。
    [99]Sim K C and Gales M J F.Minimum phone error training of precision matrix models.IEEE Trans.on Speech and Audio Processing,vol.14(3),882-889,2006.
    [100]Sim K C.Structured Precision Matrix Modelling for Speech Recognition.Cambridge University,July 2006.
    [101]Soltau H,Kingsbury B,Mangu L,Povey D,Saon G,Zweig G.The IBM 2004 conversational telephony system for rich transcription.In Proc.of ICASSP,vol.1,205-208,2005.
    [102]Sung Y H,Boulis C,Manning C,Jurafsky D.Regularization,adaptation,and nonindependent features improve hidden conditional random fields for phone classification.In Proc.of ASRU,2007.
    [103]Swets D L,Weng J.Using discriminatnt eigenfeatures for image retrieval.IEEE Trans.on Pattern Analysis and Machine Intelligence,vol.18(8),1996.
    [104]Thubthong N,Kijsirikul B.Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half-tone model.International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,9(6),815-825,2001.
    [105]Tokuda K,Masuko T,Miyazaki N,Kobayashi T.Multi-space Probability Distribution HMM.IEICE Trans.Information and System,E85-D(3),455-464,2002.
    [106]Tsakalidis S,Doumpiotis V,Byrne W.Discriminative linear transforms for feature normalisation and speaker adaptation in HMM estimation.IEEE Trans.on Speech and Audio Processing,vol.13(3),367-376,2005.
    [107]Vandenberghe L and Boyd S R Semidefinite programming.SIAM Review,38(1):49-95,March 1996.
    [108]Umbach H,Ney H.Linear Discriminant Analysis for Improved Large Vocabulary Continuous Speech Recognition.In Proc.of ICASSP,13-16,1992.
    [109]Valtchev V,Odell J J,Woodland P C,and Young J.Lattice-based discriminative training for large vocabulary speech recognition.In Proc.of ICASSP,vol.2,605-608,1996.
    [110]Valtchev V,Odell J J,Woodland P C,Young S J.MMIE Training of Large Vocabulary Speech Recognition Systems.Speech Communication,vol.22,303-314,1997.
    [111]Vapnik V.The Nature of Statistical Learning Theory,Springer-Verlag,New York,1995.
    [112]Vintsyuk T K.Speech Recogntion by Dynamic Programming.Kibernetika(Cybernetics),vol.4,81-88.
    [113]Wang H L,Qian Y,Soong F K,Zhou J L,Han J Q.A Multi-Space Distribution(MSD)Approach to Speech Recognition of Tonal Languages.In Proc.of ICSLP,1473-1477,2006.
    [114]Wang H L,Qian Y,Soong F K,Zhou J L,et al.Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone models.In Proc.of ISCSLP,445-443,2006.
    [115]王欢良,钱瑶,Soong F K,韩纪庆。基于声调建模的带调汉语数字串语音识别。声学学报。32(5),454-460,2007。
    [116]Wang H M,et al.Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data.IEEE Trans.on Speech and Audio Processing,5(2),196-201.
    [117]Wang L and Woodland P C.Discriminative adaptive training using the MPE criterion.In Proc.of ASRU,2003.
    [118]王韫佳。音高和时长在普通话轻声知觉中的作用。声学学报,29(5),453-461,2004。
    [119]Wang J X,Li D,Chan J.Modeling context-dependent phonetic units in a continuous speech recognition system for Mandarin Chinese.International Conference on Spoken Language Processing(ICSLP),vol.4,2281-2284,1996.
    [120]Wong P F,Siu M H.Decision tree based tone modeling for Chinese speech recognition.In Proc.of ICASSP,905-908,2004.
    [121]Woodland P C,Povey P.Large scale discriminative training of hidden Markov models for speech recognition.Computer Speech and Language,vol.16,25 47,2002.
    [122]徐向华。汉语连续语音识别中基于决策树参数聚类及其结构调整。博士学位论文,上海交通大学,2005。
    [123]Yang W J,Lee J C,Chang Y C,et al.Hidden Markov Model for Mandarin lexical tone recognition.IEEE Trans.on Acoustic Speech and Signal Processing.1988,36(7),988-992.
    [124]Young S J,Odell J J,Woodland P C.Tree-based State Tying for High Accuracy Acoustic Modeling.Proc.ARPA Human Language Technology Workshop.Plainsboro,NJ.,307-312.
    [125]Yu D,Deng L,He X,Acero A.Large margin minimum classification error training for large scale speech recognition tasks.In Proc.of ICASSP,vol.4,1137-1140,2007.
    [126]Yu D,L Deng.Large-margin discriminative training of hidden Markov models for speech recognition.In Proc.of International confererce on semantic computing,429-436,2007.
    [127]Yu K,Gales M J F.Discriminative cluster adaptive training.IEEE Transactions on Speech and Audio Processing,14(5),1694-1703,2006.
    [128]Yu Kai.Adaptive Training for Large Vocabulary Continuous Speech Recognition.PhD thesis,Cambridge University,2006.
    [129]Young S,Evermann G,Hain T,Kershaw D,Moore G,Odell J,Ollason D,Povey D,Valtchev V,Woodland P C.The HTK Book,Cambridge University Engineering Department,http://htk.eng.cam.ac.uk,2004.
    [130]章文义,朱杰,徐向华。利用声调提高中文连续数字串语音识别系统性能。上海交通大学学报,38(2),185-188,2004。
    [131]Zhang B,Matsoukas S.Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition.In Proc.of ICASSP,vol.925-928,2005.
    [132]Zhang B,Matsoukas S,Schwartz R.Discriminatively trained region dependent feature transforms for speech recognition.In Proc.of ICASSP,313-316,2006.
    [133]Zhang J S and Hirose K.Tone nucleus modeling for Chinese lexical tone recogntion.Speech Communiation.vol.42,447-466,2004.
    [134]Zheng J,Cetin O,Hwang M Y,Lei X,Stolcke A,Morgan N.Combining discrimiantive feature,transform and model training for large vocabulary speech recognition.In Proc.of ICASSP,vol.4,633-636,2007.
    [135]周俊生,戴新宇,尹存燕,陈家骏。基于层叠条件随机场模型的中文机构名自动识别。电子学报,vol.34(5),804-809,2006。
    [136]祖漪清。汉语连续语音库的设计。声学学报,vol.24(3),236-247,1999。

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700