基于wav文件的语音特征参数提取方法改进研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于wav文件的语音特征参数提取方法改进研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

作者：张凯歌
论文级别：硕士
学科专业名称：计算机系统结构
中文关键词：语音识别 ; 特征提取 ; LPCC ; MFCC ; 语音信号加速 ; Matlab
英文关键词：Speech Recognition ; Feature Extraction ; LPCC ; MFCC ; Speech Signal
英文关键词：Acceleration ; Matlab
学位年度：2012
导师：张力
学科代码：081201
学位授予单位：昆明理工大学
论文提交日期：2012-03-01

摘要

语音识别基本任务就是将语音转化为相应的命令或者文本,这项技术具有非常广泛的应用前景,同时作为一个交叉学科也有很重要的研究价值。在语音识别系统中,语音信号特征参数的提取是其中关键的一项技术,语音特征参数的选用对语音识别系统有着重大的影响,尤其是在非特定人的语音识别系统中,语音特征参数是否合适,是否能代表语音信号的特征并尽可能的去除人与人之间语调、语速、音量上的差异,对语音识别系统的运行效率和识别率都有着决定性的作用。
     本文对语音识别技术和语音特征参数的提取进行了研究。现有的典型的语音系统分为语音信号预处理、端点检测、特征提取、模式匹配和后处理几个环节,而在特征提取阶段,目前主要采用特征参数的是由基于声学模型的线性预测倒谱(LPCC)系数和基于听觉模型的Mel频率倒谱(MFCC)参数。本文通过对人耳听觉现象的观察,发觉语音信号在加速到一定速率内播放时依然能被人耳所轻易识别,而加速后的语音在波形和频谱上表现的更为简单,根据这种现象,本文就针对加速后的语音信号进行特征参数提取实验,并对提取到的语音特征参数进行实际的语音识别效果分析。
     本论文首先介绍了语音识别技术的概况和语音识别的应用及国内外研究现状,然后对语音识别的原理做了介绍,对语音信号的预加重、分帧加窗、端点检测都做了详细分析。由于本文要对语音识别中的特征参数提取方法做改进,接下来又对语音特征参数的提取做了深入的探讨,提出了对加速后语音信号进行特征提取的方案。然后又运用微软的DirectShow技术和VS2010集成开发环境设计了语音信号的加速变频工具,为后续的实验提供合适的原始语音信号,语音信号都保存为符合RIFF规范的wav文件格式,便于在windows环境下处理。在此之后,本文在Matlab环境下,运用DTW匹配算法做了孤立词语音识别仿真实验,对正常速度下的语音识别效果和加速变频后的语音识别效果分别做了实验分析,得出了实验结论。最后,本文对本次研究做了总结,对以后的研究做出了展望。
Fundamentals of speech recognition task is that speech is converted to the corresponding command or text, this technology has a very wide range of applications, at the same time as an interdisciplinary field, also has very important research value. In a speech recognition system, speech signal feature extraction is one of the key technology, speech feature parameter selection on the speech recognition system has great influence, especially in the speaker-independent speech recognition systems, speech feature parameters are appropriate, whether can represent the characteristics of the speech signal and as far as possible removal of between person and person, tone, speed volume differences,and it has a decisive role in the speech recognition system operating efficiency and the recognition.
     Based on the technology of speech recognition and speech feature parameter extraction is studied. The existing typical voice system divides the speech signal pretreatment, endpoint detection, feature extraction, pattern matching and processing aspects, and in the feature extraction stage, mainly uses the characteristic parameters of acoustic model is based on linear prediction cepstrum coefficient (LPCC) and Mel frequency cepstrum based on auditory model (MFCC) parameters. Based on human auditory phenomenon observation, found in the speech signal acceleration play situation can still be ear easily identified, and accelerated after the speech on the waveform performance is more simple, according to this phenomenon, this article aims at the accelerated after the speech signal feature parameters extraction experiments, and the extraction of speech feature parameters for the actual speech recognition effect analysis.
     This paper firstly introduces the technology of speech recognition and speech recognition application situation and research status at home and abroad, and then on the speech recognition principle is introduced, and the speech signal preemphasis, frames and windows, endpoint detection has done a detailed analysis. As a result of this article to the speech recognition feature extraction methods improved, then the voice characteristic parameter extraction of doing an in-depth, extraction of the improved scheme. Then using the Microsoft DirectShow technology and VS2010 integrated development environment designed to accelerate the conversion of speech signal, for follow-up experiments provide the fit of the original speech signal. The voice signal is maintained to comply with RIFF standard wav file format, convenient environment in windows processing After this, in the environment of MATLAB, using DTW matching algorithm to do the isolated word speech recognition experiment,and the normal rate of speech recognition under effect and accelerate the frequency of speech recognition performance experimental analysis has been done, the experimental conclusion. Finally, this article focuses on the research summed up and the future research prospects.

引文

[1]张卫清.语音识别算法的研究：[硕士学位论文].南京：南京理工大学,2004
    [2]吴朝辉,杨莹春.说话人识别模型与方法[M].北京：清华大学出版社,2009,P3-P5
    [3]http://baike.baidu.com/view/652891.htm百度百科,语音识别
    [4]马俊.语音识别技术研究[D]：[硕士学位论文].哈尔滨：哈尔滨工程大学硕士论文,2004
    [5]易克初,田斌.语音信号处理[M].北京：国防工业出版社,2000
    [6]朱民雄,闻新.计算机语音识别技术[M].北京：北京航空航天大学出版社,2002,P134-135
    [7]LRRabiner, "A tutorial on hidden markovmodels and seleetes application in speeeh reeognition, " Proc, of IEEE,77(2)257-286,1989
    [8]吴晓平,崔光照,路康.基于DTW算法的语音识别系统实现[J].电子工程师2004年第7期
    [9]郭春霞,裘雪红.基于MFCC的说话人识别系统[J].电子科技.2005年第11期
    [10]李宏松,苏健民,黄英来,于慧伶.基于声音信号的特征提取方法的研究[J].信息技术,2006年第一期
    [11]F Vahid, T Givargis. Embedded System Design:A Unified Hardware/Software Introduction[M]. US:Frank Vahid and TOny Givargis John Wiley&Sons,2002
    [12]李建文,张晋平.基于改进语音特征提取方法的语音识别[J].微电子学与计算机,2009年第9期
    [13]B.A.Dautrieh, L.R.Rabiner and T.B.Martin, On the Effeets of Varying Filter Ballk Par neters on Isolated Word Reeognition[M] IEE Trans, Aeousties, Speeeh, Signal Proe, vol.31(4), PP793-807, August1983
    [14]S.J.Young, Large Voeabulaly Continuous Speeeh Reognition:A Review[J], IEEE Sigal Processing Magazine. PP45-57, September 1996
    [15]www.speech.cs.cmu.edu/sphinx. Carnegie Mellon University
    [16]http://www.yuyinshibie.com.语音识别网
    [17]Ivica Rogina, Patrick Roessler. Automatic Speech Recogintion[J]. CMU&IRA. 1998
    [18]Richard V.Cox, Candance A.Kamm. Speech and Language Processing for Next-Millennium Communication Services[J]. Proceedings of the IEEE.2000. V01.88(8):1314-1335
    [19]朱淑琴.语音识别系统关键技术研究[硕士学位论文].西安：西安电子科技大学,2004
    [20]张震,王化清.语音信号特征提取中Mel倒谱系MFCC的改进算法[J].计算机工程与应用,2004
    [21]鄢卉,李仁发。语音信号倒谱特征提取建模与仿真[J]。系统仿真学报,2005年,第5期
    [22]董林,田家斌,刘辉新,刘倩.语种识别和说话人识别的关键技术研究[C],军事电子信息学术会议论文集,2006
    [23]马元锋,陈克安,王娜,郑文.听觉模型输出谱特征在声目标识别中的应用[J].声学学报,2009,34(2)
    [24]黄海波,蒋伟荣,程登良.通用语音处理系统的DSP实现[J].微计算机信息,2006,5(2)
    [25]Alessandro Rubini and Jonathan corbet. Linux Device Drivers(Second Edetion) O'REILLY,2006
    [26]鲍长春.低比特率数字语音编码基础[M].北京：北京工业大学出版社,2001,P233-P234
    [27]梁五洲,张雪英.基于加权组合过零峰值幅度特征的抗噪语音识别[J].太原理工大学学报,2006,37(1)
    [28]赵姝彦,张雪英,焦志平.基于ZCPA和DHMM的孤立词语音识别系统[J].太原理工大学学报,2005,36(3)
    [29]张晓辉,李辉.基于ZCPA特征参数的口令识别系统[J].电子技术,2010,47(7)
    [30]Jeffrey Richter. CLR Via C#[M].北京：清华大学出版社,2010,P39-P45
    [31]www.csdn.net
    [32]陆其明.DirectShow开发指南[M].北京：清华大学出版社,2003,P15-P26
    [33]B.H.JuangandL.R.Rabiner, " TheSegmentalK-MeansAlgorthmforEstimating ParametersofHiddenMarkovModels " IEEETrans.Aeousties, SPeeeh, SignalProe vol.38(9), PP1639-1641, SePtember1990
    [34]张玲华,郑宝玉,杨震.基于LPC分析的语音特征参数研究及其在说话人识别中的应用[J].南京邮电学院学报,2005,25(6)
    [35]李春洪,毛跃奇,陈贵来,钱自林.基于DirectSound的声音实时仿真研究[J].计算机仿真,2001,18(3)
    [36]杨鹏,姚旺生.基于DirectSound的3D虚拟声音技术应用研究[J].计算机仿真,2006,23(5)
    [37]王彪.基于Matlab的语音识别系统研究[J]。计算机与数字工程,2011年,第12期
    [38]Jeremy Bentham. TCP/IP Lean Web Servers for Embedded System[M]. CMP Books,2002
    [39]李潇,王大堃.基于MATLAB的孤立字语音识别试验平台[J].四川理工学报.2006,19(3)
    [40]张军.基于MFCC的语音识别加速技术研究。北京：北京化工大学硕士论文,2009
    [41]R.Fielding, J.Gettys, J.C.Mogul.Request for Comments:HypertextTransfer Protocol-HTTP/1.1 [M]. Network Working Group,1999
    [42]夏敏磊.语音端点检测：[硕士学位论文].杭州：浙江大学.2005
    [43]龙银东,刘宇红,敬岚,乔卫民.在MATLAB环境下实现的语音识别[J].微电子信息,2007,23(34)
    [44]付丽辉.语音识别关键性技术的MATLAB仿真实现[J].仪表仪器用户,2010,17(3)

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700