基于信息熵和神经网络的语音端点检测算法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于信息熵和神经网络的语音端点检测算法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research of Endpoint Detection Algorithms of Speech Based on Information Entropy and Neuron Network
作者：乔峰
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：端点检测 ; 幅度熵 ; 谱熵 ; 神经网络 ; 模糊
英文关键词：speech endpoint detection ; amplitude entropy ; spectral entropy ; neuron network ; fuzzy
学位年度：2007
导师：张雪英
学科代码：081002
学位授予单位：太原理工大学
论文提交日期：2007-05-01

摘要

语音端点检测是语音分析、语音合成和语音识别中的一个重要环节。在实际运用中，通常要求首先对系统的输入信号进行判断，准确的找出语音信号的起始点和终止点。这样才能采集到真正的语音数据，减少数据量和运算量，并减少处理时间。因此语音端点检测算法研究意义重大。
本文首先介绍了几种典型的语音端点检测算法。随后对三种语音端点检测算法进行研究。算法一：在对信息熵原理理解的基础上，分析了语音信号与背景噪声的幅度熵及谱熵的差异，并根据这些差异进行了基于幅度熵和谱熵的语音端点检测算法研究，仿真实验结果表明该方法检测准确率较高。算法二：在对信息熵语音端点检测研究的基础上，结合神经网络在模式分类方面的优越特性，提出了基于信息熵和神经网络的语音端点检测算法，该算法针对语音信号和背景噪声进行建模，把语音信号端点检测问题看作是对每帧信号进行分类，通过建立相应的模型，根据模型的匹配程度对每帧语音矢量进行划分，确定其属于语音或背景噪声。仿真实验表明，该算法在检测准确率上要优于信息熵语音端点检测算法，且避免了常规方法需要设置阈值门限这一难点。算法三：语音信号端点检测研其本质是区分语音与背景噪声。语音与背景噪声的划分不是绝对的，分界处的语音信号有可能属于语音，也有可能是背景，因此语音端点检测属于边界分类模糊问题。而模糊技术在处理边界模糊问题上具有独特的优势，通过对语音数据的模糊化，结合对神经网络语音端点检测的研究，提出了基于模糊神经网络的语音端点检测算法。仿真实验表明，该算法在检测准确率上是本文研究的三种算法中最高的，但该算法的缺点是算法复杂。文章最后对本文所研究的三种算法进行了总结，提出了一些在今后工作中需要进一步研究的问题，并对近几年来出现的一些研究新方向作了简单的介绍和展望，指出了端点检测未来的发展前景。
Speech endpoint detection is an important step in the field of speech analysis,speech synthesis and speech recognition. In the application,the system usually need to find out the beginning and ending point of the speech. So we can collect the true speech data, cut down the amount of data and calculating as well as the time of operating. Therefore, the research on endpoint detection algorithms of speech is significant.
The article introduced several typical algorithms of endpoint detection of speech and studied three algorithms of endpoint detection of speech.Algorithm one: on the basis of comprehension on information entropy,the article analyzed the difference of speech signal and background signal, studied the endpoint detection algorithm of speech based on the amplitude entropy and spectral entropy. The simulating experiments show that the method have a good accuracy and is easy to realize. Algorithm two: Under the study of algorithm one and combining with neural network's superior characteristic in the field of pattern classification, the article put forward an endpoint detection algorithm of speech based on information-entropy and neural network. The algorithm build the model for speech signal and noise of background ,and then the problem of speech endpoint detection will become a classification problem on each frame of speech signal. After building the corresponding model, we can judge the signal to be speech signal or noise of background according to the matching degree to the model. The simulating experiments show that this algorithm not only have a better accuracy but also avoid the problem of setting up threshold. Algorithm three: Speech endpoint detection studies the starting point and ending point of speech, while the division of speech signal and noise of background is not absolute. The points on the border may be speech signals and also may be noise of background, and actually it is a fuzzy classification problem for border signal. Technology of fuzzy have a superior advantage in the field of dealing with the fuzzy border problems. Studied the Technology of fuzziflcation of speech and combined with the neuron network technology, the article put forward the speech endpoint algorithm based on the Fuzzy-Neuron-Network. The simulating experiments show that the algorithm have the best accuracy among the three algorithms, but it is still not perfect because of complex operation. At last,the article summarized the three algorithms and put forward some problems which need to be studied further and introduced some new studying directions and also pointed out the prospect of speech endpoint detecting.

引文

Conference on Digital Signal Processing, 1997, 757-760
    [15] 贺会玲，熵与生态环境，生物学通报，2005，40(7)：19-21
    [16] 李宗荣，理论信息学：概念、原理与方法，医学信息学，2005，18(1)：1-10
    [17] 萧宝瑾，信息论和编码，北京，兵器工业出版社，2000，14-17
    [18] 陈四根，何应民，一种基于信息熵的语音端点检测算法，2001，28(3)：13-14．
    [19] 严剑峰，付宇卓，一种新的基于信息熵的带噪语音端点检测方法，计算机仿真 2005，22(11)：117-119．
    [20] Martin Hagan, Howard B. Demuth, Mark H. Beale, Neural Network Design，机械工业出版社，2002，197-207
    [21] 游小微，语音识别的神经网络方法研究，浙江师范大学学报(自然科学版)，2002，25(3)：255-257
    [22] 阎平凡，张长水．人工神经网络与模拟进化计算(第2版)，北京，清华大学出版社，2005，7-9
    [23] 姜静清，宋初一，刘娜仁等，RBF神经网络的训练方法及分析，内蒙古民族大学学报(自然科学版)，2003，18(4)：301-303
    [24] 李鸣华，一种基于听觉模型的语音特征提取方法，计算基于现代化，2000(3)：9-13
    [25] 殷勇，邱明，一种基于高斯核的RBF神经网络学习算法，计算机工程与应用，2002(21)：118-12
    [26] 葛强，叶会英，王忠勇等，基于径向基函数网络的人脸识别，河南科学，2003，21(3)：290-294
    [27] Young-Sup Hwang, Sung-Yang Bang, An Efficient Method to Construct a Radial Basis Function Neural Network Classifier, NEURAL NETWORK, NOWEMBER 1997, 10(9): 1495-1503
    [28] Jau-Jia Guo, Peter B. Luh, Selecting Input Factors for Clusters of Gaussian Radial Basis Function Networks to Improve Market Clearing Price Prediction, IEEE TRANSACTIONS ON POWER SYSTEMS, 2003, 18(2): 665-672
    [29] 高新波，模糊聚类分析及其应用，西安：西安电子科技大学出版社 2004，9-12
    [30] 王民，费仁元，龙金华，基于专家知识融入的模糊神经元网络结构及在镗削颤振判别中的应用，机械科学与技术，1999，18(3)：445-448
    [31] 陈元琳，基于人工神经网络的动态系统仿真模型和算法研究，[学位论文]，大庆石油学院，2006
    [32] Wang L X, Mendel J M. Fuzzy basis functions, universal approximation, and orthogonal least square learning, IEEE Trans. Neural Networks, 1992 (3): 807-813
    [33] Kosko B, Fuzy systems as universal approximators, IEEE Int. Conf. Fuzzy Syst, 1992, 1153-1162
    [34] 刘风霞，刘前进，基于模糊神经网络的故障类型识别，继电器，2006，34(3)：12-19
    [35] 胡宏宇，基于人工和模糊神经网络的电力系统负荷预测，[学位论文]，南昌大学，2005
    [36] Sankar K. Pal, Multiplayer Perceptron, Fuzzy Sets, and Classification, IEEE TRANSACTION ON NEURAL NETWORKS, 1992, 3(5): 683-696

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700