小词汇量的孤立词语音识别方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

小词汇量的孤立词语音识别方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Small Vocabulary Isolated Words Speech Recognition Method Research
作者：舒琦
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：语音识别 ; 预处理 ; 特征提取 ; DTW
英文关键词：speech recognition ; preprocess ; feature extraction ; DTW
学位年度：2012
导师：卢珞先
学科代码：081001
学位授予单位：武汉理工大学
论文提交日期：2012-05-01
答辩委员会主席：刘泉

摘要

计算机在人类生活中扮演的角色越来越重要,人类期望与计算机的交流不再仅仅只是依靠键盘输入,而希望能够用更加直接快速的方式与之进行交流,让计算机能够听懂人类的自然语音。语音识别就是在人类的这种期许之下才渐渐发展起来的,这是一门涉及多学科的新兴科学,具有广泛的应用前景。随着计算机处理能力和语音处理技术的迅速提高,语音识别技术也得到了飞速的发展,相关的研究成果层出不穷。越来越多的涉及语音识别技术的产品走进了人类生活的方方面面,许多智能家居产品中都有语音识别的功能模块,这些都为人类的生活提供了便捷。
本论文回顾了语音识别技术的发展历史,并对研究现状也做了分析,详细研究了语音识别的基本原理与处理流程。分析了语音信号预处理的过程,如预加重,分帧加窗,端点检测等,并对预处理过程当中的端点检测做了详细的研究,运用双门限端点检测的方法检测出了实验语音的起始端点与结束端点。对于语音信号的特征提取过程,分析了语音的LPC、LPCC、MFCC特征参数提取的方法,通过实验对比分析,发现结合一阶差分及倒谱提升后的MFCC参数较其它特征参数,在识别率及抗噪声性能方面都具备一定的优势,可作为语音识别的特征参数。在对语音识别方法的研究中,首先对DTW算法的原理进行了分析,并对其进行了仿真实验,然后提出了一种改进的DTW算法,该算法相比传统的DTW算法,缩小了匹配区域,实验结果表明改进后的算法在不影响语音识别系统的识别率的情况下,可以显著的提高系统的识别速度,对于提高语音识别系统的实时性起到了积极的作用。最后设计了一个基于DTW的特定人孤立词语音识别系统,该系统集成了录音、信号观测、预处理、参数分析、特定人的孤立字语音识别等功能,并对此系统的识别率,实时性进行了测试,系统对0到9十个数字字符的识别取得了很好的识别效果,证明了此系统在识别率及实时性方面有着较好的表现,此外针对系统对于特定人与非特定人时的表现也做了对比分析,结果表明此系统对于特定人的语音识别具有更好的效果。
The computer plays a more and more important role in the human's life, the human expect to communicate with the computer not only just through the keyboard, but also hope to communicate with it in a more direct way to let the computer understand human's nature language. Because of human's that expectation, speech recognition gradually developed, It's a multi-disciplinary emerging science, and it has a broad prospect of application. Along with the computer processing capacity and speech processing technology increasing quickly, speech recognition technology is also obtained the rapid development, the related research achievements are endless. more and more speech recognition products came into all aspects of human's life, a lot of intelligent household products use the speech recognition function modules, provides convenient for human's life.
This thesis reviewed the speech recognition technology development history, analyzed the research status. a detailed study of the basic principle of a speech recognition and treatment process. Analysis of the speech signal preprocess, such as pre-emphasis, enframes, endpoint detection. and give out a detailed study of the endpoint detection, using the method of double threshold endpoint detection test out the experiment of speech endpoint and starting over endpoints. For voice signal feature extraction process, analyzes the LPC, LPCC and MFCC characteristic parameters extraction method, through the contrast analysis, found that, combined with an order difference and cepstrum after MFCC parameters of ascension than the other characteristic parameters, the rate and the noise performance has certain advantages, and can be used for voice recognition of the characteristic parameters. In the study of the method of speech recognition, DTW algorithm for analysis, and take out the simulation experiment of DTW algorithm, and then proposes an improved DTW algorithm, The algorithm is compared with the traditional DTW algorithm, reducing matching area, The experimental results show that the improved algorithm in does not affect the recognition rate of speech recognition system, can improve the system of significant recognition speed, to improve the real-time speech recognition system plays an important role. Finally the design based on the Specific person DTW isolated words speech recognition system, this system integration the recording, signal observation, preprocess, parameter analysis, specific person isolated words speech recognition, and other functions, and take out the system recognition rate and real time contest, the system recognize0to9these ten digitals has got a good performance. the results show that this system has a very good performance at the recognition rate and real time. and in the system for specific person with the non-specific person performance are analyzed, the result shows that this system for specific person speech recognition has better effect.

引文

[1]徐婷婷.语音识别中的若干问题研究[D].北京：北京邮电大学,2011.
    [2]鹏辉,魏玮,陆建华.特定人孤立词的语音识别系统研究[J].控制工程,2011,18(3)：397—-400.
    [3]杨钊.基于特征补偿的自动语音识别研究[D].合肥：中国科学技术大学,2010.
    [4]刘纪平.多重演化神经网络在语音识别中的应用[D].武汉：武汉大学,2011.
    [5]魏春明.语音变化分析及其在孤立词识别中的应用[D].杭州：浙江大学,2010.
    [6]Sibrewala, Hermansky. Sub-band based recognition of noisy speech[J]. IEEE Proc. ICASSP97,2008:1225-1258.
    [7]梁维谦,原道德,丁玉国.大词表孤立词语音识别的快速搜索算法[J].清华大学学报(自然科学版),2011,51(1)：101—104.
    [8]罗元,黄璜,张毅.一种新的语音端点检测方法及在智能轮椅人机交互中的应用[J].重庆邮电大学学报(自然科学版),2011,23(4)：487—-491.
    [9]PELLOM B L, SARIKA R, HANSEN JH L. Fast likelihood computation techniques in nearest-neighbor based search for continuous speech recognition[J]. IEEE Signal Processing Letters,2009,8(8):221-224.
    [10]孙暐.听觉特性与鲁棒语音识别算法研究[J].东南大学学报(自然科学版),2008,12(2)：34—40.
    [11]吴鹏,蒋冬梅,王风娜.基于发音特征的音视频融合语音识别模型[J].计算机工程,2011,37(16)：1—3.
    [12]Bemdt D J, Clifford J. Using dynamic time warp to find pattens in times series[C]. AAAI Workshop on Knowledge Discovery in Database. AAAI Press,2009:359-370.
    [13]Rabiner.L.R. An algorithm for determing the endpoints of isolated utterance[J]. Bell System Technical Journal,2010:297-315.
    [14]J.H.L Hansen,L.M.Arslan. Robust feature estimation and objective quality assessment for noisy speech recognition using the credit card corpus[J]. IEEE Trans. Speech Audio Processing 2009:169-184.
    [15]李祖鹏,姚佩阳.一种语音段起止端点检测新方法[J].电讯技术,2008,21(3)：68—70.
    [16]宋薇,陶智,顾济华.基于改进LPCC和MFCC的汉语耳语音识别[J].计算机工程与应用,2009,43(30)：213—215.
    [17]韦晓东,胡光锐,任晓林.应用倒谱特征的带噪语音端点检测方法[J].上海交通大学学报,2008,34(2)：185—188.
    [18]Wu B F, Wang K C. Robust endpoint detection algorithm based on the adaptive band partitioning spectral entropy in adverse environments[J]. IEEE Trans Speech Audio Processing,2009,13(5):762-774.
    [19]姚敏锋,李心广,黄文涛.一种K均值和神经网络集成的语音识别方法[J].计算机工程与应用,2012,12(9)：144—147.
    [20]Wu G D, Lin C T. Word boundary detection with mel-scale frequency bank in noise environment[J]. IEEE Trans Speech Audio Processing,2007,8(3):541-554.
    [21]魏勋,耿志辉,王晓攀.语音识别的鲁棒性特征提取方法研究[J].清华大学学报(自然科学版),2010,12(2)：146—152.
    [22]张震,王化清.语音信号特征提取中Mel倒谱系数MFCC的改进算法[J].计算机工程与应用,2008,44(22)：54—58.
    [23]童强,黄剑,王永骥.浴室噪声环境下小词汇量语音识别系统研究[J].华中科技大学学报(自然科学版),2011,39(2)：309-311.
    [24]Sandipan C, Anindya R, Sourav M, et al. Capturing complementary information via reversed filter bank and parallel implementation with MFCC for improved text-independent speaker identification[C]. IEEE International Confernce on Computing:Thoery and Application, India,2009:463-467.
    [25]袁正午,肖旺辉.改进的混合MFCC语音识别算法研究[J].计算机工程与应用,2009,45(33)：108—110.
    [26]许鑫,苏开娜,胡起秀.几种改进的MFCC特征提取方法在说话人识别中的应用[C].第一届建立和谐人机环境联合学术会议.昆明,2008,336-342.
    [27]张林.噪声环境下基于MFCC的鲁棒语音识别研究[D].湖南：湖南大学,2009.
    [28]王宪保,陈勇,汤丽平.结合MFCC分析和仿生模式识别的语音识别研究[J].计算机工程与应用,2011,47(12)：20—-22.
    [29]张震,王化清.语音识别中的DTW模型的改进算法研究[J].矿山机械,2008,36(22)：30—34.
    [30]范婷,刘宏.电视背景环境下语音命令识别系统[J].华中科技大学学报(自然科学版),2011,39(2)：312—-315.
    [31]何勇军,韩纪庆.噪声环境下畸变模型线性化处理的顽健语音识别方法[J].通信学报,2010,31(9)：8—14.
    [32]汲清波,卢侃,李康.在孤立词语音识别中动态时间规整的改进算法[J].计算机工程与应用,2010,46(25)：118—120.
    [32]刘长明,任一峰.语音识别中DTW特征匹配的改进算法研究[J].中北大学学报,2009,27(1)：37-40.
    [33]Mayumi Beppu, Koichi Shinoda, Sadaoki Furui. Noise Robust Speech Recognition based on Spectral Reduction Measure[C]. APSIPA ASC. Xi'an,2011,324-328.
    [34]张军,李学斌.一种基于DTW的孤立词语音识别算法[J].计算机仿真,2009,26(10)：348—351.
    [35]尚福华,孙达辰,吕海霞.提高DTW运算效率的改进算法[J].计算机工程与设计,2010,31(15)：3518—3520.
    [37]吴晓婕,胡占义,吴毅红.基于Segmental-DTW的无监督行为序列分割[J].软件学报,2008,19(9)：2285—2292.
    [38]Yusuke S, Masanmi A. Bayesian feature enhancement using a mixture of unscented transformations for uncertainty decoding of noisy speech[C]. Proceedings of ICASSP. Taiwan,2009,4569-4572.
    [39]肖荣,吴英姿.多语言综合信息服务系统研究与设计[J].计算机工程,2009,35(2)：263—267.
    [40]Vincent Duffy, Richard Linn & Ameersing Luximon. Voice Recognaition Based On Human-Computer Interface Design[J]. Computers & Industrial Engineering,2009,37(2): 300=306.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700