基于噪声分类和双自适应阈值判决的语音活动检测方法

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于噪声分类和双自适应阈值判决的语音活动检测方法

详细信息查看全文 | 推荐本文 |

英文篇名：Voice Activity Detection Method Based on the Noise Classification and Double Adaptive Threshold Decision
作者：姚睿 ; 曾泽清 ; 杜君杰
英文作者：YAO Rui;ZENG Zeqing;DU Junjie;College of Automation Eng., Nanjing Univ.of Aeronautics and Astronautics;
关键词：语音活动检测 ; 双自适应阈值 ; 噪声分类 ; 特征联合
英文关键词：voice activity detection;;double adaptive threshold;;noise classification;;feature conjunction
中文刊名：SCLH
英文刊名：Advanced Engineering Sciences
机构：南京航空航天大学自动化学院;
出版日期：2018-07-11 12:04
出版单位：工程科学与技术
年：2018
期：v.50
基金：国家自然科学基金资助项目(61402226)
语种：中文;
页：SCLH201804022
页数：9
CN：04
ISSN：51-1773/TB
分类号：174-182

摘要

为了解决复杂背景噪声环境中语音活动检测(voice activity detection,VAD)命中率较低的问题,提出具有环境意识的VAD算法。针对常用算法中采用单阈值抗噪性差的不足,对语音帧和噪声帧相互转换过程采用不同阈值,并对两个阈值进行自适应更新;为克服单一特征无法应对复杂环境的缺陷,提出将统计模型似然比、能量熵特征和平均谐波数量值特征等进行特征联合的方法;引入环境噪声分类的思想,利用支持向量机对噪声环境进行分类,并根据噪声类型选择最优特征组合,进一步提升算法性能。使用NOIZEUS语音库,以babble、pink、white、f16、volvo这5类噪声作为背景噪声,通过仿真实验评估了所提出算法的性能,比较了各类特征组合的命中率。实验结果证明,所提方法的识别效果优于现有算法,针对各种噪声可取得约80%的总体命中率,且能更好地平衡语音命中率和虚警率。
In order to solve the problem of insufficient hit rates of voice activity detection(VAD) in complex background noise environments, an environment-aware VAD algorithm is proposed. Aiming at the poor noise immunity of the single fixed threshold method used in conventional algorithms, different thresholds are adopted during the mutual conversion processes of voice and noise frames, and the thresholds are updated adaptively. And a method of feature combination is proposed to overcome the defect that a single feature cannot cope with the complex noise environments, which combines the likelihood ratio, energy entropy characteristic, and mean harmonic number value characteristic. Then, the idea of environmental noise classification is introduced, which classifies the noise environments using supported vector machine and selects optimal feature combination according to the type of noise environments, so as to improve the performance of the algorithm further. Finally, simulation experiments are conducted to evaluate the performance of the proposed algorithm, in which the NOIZEUS speech database is utilized, and five kinds of noises such as babble, pink, white, f16 and volvo are selected as background noise. And the hit rates of various feature combinations are compared to verify the effectiveness of the algorithm. Experimental results show that the proposed algorithm outperforms existing algorithms and can achieve about 80% overall hit rate in various noise environments, and it can balance the voice hit rate and the false alarm rate as well.

引文

[1]Sohn J,Kim N S,Sung W.A statistical model-based voice activity detection[J].IEEE Signal Processing Letters,1999,6(1):1-3.
    [2]Saeedi J,Ahadi S M,Faez K.Robust voice activity detection directed by noise classification[J].Signal,Image and Video Processing,2015,9(3):561-572.
    [3]Xu Zili.Doubletalk detection using detection statistics formed by the output signal of the echo canceller[J].Journal of Sichuan University(Engineering Science Edition),2006,38(1):133-135.[徐自励.利用声回波对消器输出信号构建检测统计量的双端语音检测[J].四川大学学报(工程科学版),2006,38(1):133-135.]
    [4]Ying Tao,Huang Gaoming,Zhou Cheng.Speech enhancement algorithm based on probabilistic latent component analysis[J].Journal of Sichuan University(Engineering Science Edition),2014,46(1):128-133.[应涛,黄高明,周成.基于概率潜分量分析的语音增强算法[J].四川大学学报(工程科学版),2014,46(1):128-133.]
    [5]Ghosh P K,Tsiartas A,Narayanan S.Robust voice activity detection using long-term signal variability[J].IEEE Transactions on Audio,Speech,and Language Processing,2011,19(3):600-613.
    [6]Ma Y,Nishihara A.Efficient voice activity detection algorithm using long-term spectral flatness measure[J].EURASIP Journal on Audio,Speech,and Music Processing,2013(21):1-18.
    [7]Huang L S,Yang C H.A novel approach to robust speech endpoint detection in car environments[C]//IEEE International Conference on Acoustics,Speech,and Signal Processing.Istanbul:IEEE,2000:1751-1754.
    [8]Ouzounov A.A robust feature for speech detection[J].Cybernetics and Information Technologies,2004,4(2):3-14.
    [9]Yao R,Zeng Z Q,Zhu P.A priori SNR estimation and noise estimation for speech enhancement[J].EURASIP Journal on Advances in Signal Processing,2016,101:1-15.
    [10]Ramirez J,Segura J C,Benitez C,et al.An effective subband OSF-based VAD with noise reduction for robust speech recognition[J].IEEE Transactions on Speech&Audio Processing,2005,13(6):1119-1129.
    [11]Wu B F,Wang K C.Voice activity detection based on autocorrelation function using wavelet transform and teager energy operator[J].Computational Linguistics and Chinese Language Processing,2006,11(1):87-100.
    [12]Wang Hongzhi,Xu Yuchao,Li Meijing.Voice activity detection algorithm based on Mel frequency cepstrum coefficient(MFCC)similarity[J].Journal of Jilin University(Engineering and Technology Edition),2012,42(5):1331-1335.[王宏志,徐玉超,李美静.基于Mel频率倒谱参数相似度的语音端点检测算法[J].吉林大学学报(工学版),2012,42(5):1331-1335.]
    [13]Yuan Wenhao,Lin Jiajun,Wang Yu,et al.A speech enhancement approach based on noise classification[J].Journal of East China University of Science and Technology(Natural Science Edition),2014,40(2):196-201.[袁文浩,林家骏,王雨,等.一种基于噪声分类的语音增强方法[J].华东理工大学学报(自然科学版),2014,40(2):196-201.]
    [14]Zheng Yongtao,Liu Yushu.An analysis of multi-class support vector machines[J].Computer Engineering and Application,2005,41(23):190-192.[郑勇涛,刘玉树.支持向量机解决多分类问题研究[J].计算机工程与应用,2005,41(23):190-192.]

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700