基于判别邻域嵌入算法的说话人识别

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于判别邻域嵌入算法的说话人识别

详细信息查看全文 | 推荐本文 |

英文篇名：Speaker Recognition Using Discriminant Neighborhood Embedding
作者：梁春燕 ; 袁文浩 ; 李艳玲 ; 夏斌 ; 孙文珠
英文作者：LIANG Chunyan;YUAN Wenhao;LI Yanling;XIA Bin;SUN Wenzhu;College of Computer Science and Technology, Shandong University of Technology;College of Computer and Information Engineering, Inner Mongolia Normal University;
关键词：说话人识别 ; 总变化因子分析 ; 邻域保持嵌入 ; 判别邻域嵌入
英文关键词：Speaker recognition;;Total variability factor analysis;;Neighborhood Preserving Embedding(NPE);;Discriminant Neighborhood Embedding(DNE)
中文刊名：DZYX
英文刊名：Journal of Electronics & Information Technology
机构：山东理工大学计算机科学与技术学院;内蒙古师范大学计算机与信息工程学院;
出版日期：2019-07-15
出版单位：电子与信息学报
年：2019
期：v.41
基金：国家自然科学基金(11704229,61701286,61562068);; 山东省自然科学基金(ZR2017LA011,ZR2015FL003,ZR2017MF047);; 山东省高等学校科技计划项目(J17KA078);; 内蒙古自然科学基金项目(2015MS0629)~~
语种：中文;
页：DZYX201907033
页数：5
CN：07
ISSN：11-4494/TN
分类号：254-258

摘要

该文提出一种基于判别邻域嵌入(DNE)算法的说话人识别。判别邻域嵌入算法作为流形学习方法的一种,可以通过构建邻接图获取数据的局部邻域结构信息;同时该算法可以充分利用类间判别信息,具有更强的判别能力。在美国国家标准技术研究院2010年说话人识别评测(NIST SRE 2010)电话-电话核心测试集上的实验结果表明了该算法的有效性。
Discriminant Neighborhood Embedding(DNE) algorithm is introduced into the speaker recognition system. DNE is a manifold learning approach and aims at preserving the local neighborhood structure on the data manifold. As well, DNE has much more power in discrimination by sufficiently using the between-class discriminant information. The experimental results on the telephone-telephone core condition of the NIST 2010 Speaker Recognition Evaluation(SRE) dataset indicate the effectiveness of DNE algorithm.

引文

[1]REYNOLDS D A and ROSE R C.Robust text-independent speaker identification using Gaussian mixture speaker models[J].IEEE Transactions on Speech and Audio Processing,1995,3(1):72-83.doi:10.1109/89.365379.
    [2]KINNUNEN T and LI Haizhou.An overview of textindependent speaker recognition:From features to supervectors[J].Speech Communication,2010,52(1):12-40.doi:10.1016/j.specom.2009.08.009.
    [3]王伟,韩纪庆,郑铁然,等.基于Fisher判别字典学习的说话人识别[J].电子与信息学报,2016,38(2):367-372.doi:10.11999/JEIT150566.WANG Wei,HAN Jiqing,ZHENG Tieran,et al.Speaker recognition based on fisher discrimination dictionary Learning[J].Journal of Electronics&Information Technology,2016,38(2):367-372.doi:10.11999/JEIT150566.
    [4]KENNY P,BOULIANNE G,OUELLET P,et al.Speaker and session variability in GMM-based speaker verification[J].IEEE Transactions on Audio,Speech,and Language Processing,2007,15(4):1448-1460.doi:10.1109/tasl.2007.894527.
    [5]郭武,戴礼荣,王仁华.采用因子分析和支持向量机的说话人确认系统[J].电子与信息学报,2009,31(2):302-305.doi:10.3724/SP.J.1146.2007.01289.GUO Wu,DAI Lirong,and WANG Renhua.Speaker verification based on factor analysis and SVM[J].Journal of Electronics&Information Technology,2009,31(2):302-305.doi:10.3724/SP.J.1146.2007.01289.
    [6]DEHAK N,KENNY P J,DEHAK R,et al.Front-end factor analysis for speaker verification[J].IEEETransactions on Audio,Speech,and Language Processing,2011,19(4):788-798.doi:10.1109/tasl.2010.2064307.
    [7]DHANUSH B K,SUPARNA S,AARTHY R,et al.Factor analysis methods for joint speaker verification and spoof detection[C].Proceedings of 2017 IEEE International Conference on Acoustics,Speech and Signal Processing,New Orleans,USA,2017:5385-5389.doi:10.1109/ICASSP.2017.7953185.
    [8]SU Hang and WEGMANN S.Factor analysis based speaker verification using ASR[C].Proceedings of the Interspeech2016,San Francisco,USA,2016:2223-2227.doi:10.21437/Interspeech.2016-1157.
    [9]MAK M W,PANG Xiaomin,and CHIEN J T.Mixture of PLDA for noise robust i-vector speaker verification[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2016,24(1):130-142.doi:10.1109/TASLP.2015.2499038.
    [10]LEI Yun and HANSEN J H L.Speaker recognition using supervised probabilistic principal component analysis[C].Proceedings of the Interspeech 2010,Makuhari,Japan,2010:382-385.
    [11]LIANG Chunyan,YANG Lin,ZHAO Qingwei,et al.Factor Analysis of neighborhood-preserving embedding for speaker verification[J].IEICE Transactions on Information and Systems,2012,95(10):2572-2576.doi:10.1587/transinf.e95.d.2572.
    [12]YANG Jinchao,LIANG Chunyan,YANG Lin,et al.Factor analysis of Laplacian approach for speaker recognition[C].Proceedings of 2012 IEEE International Conference on Acoustics,Speech and Signal Processing,Kyoto,Japan,2012:4221-4224.doi:10.1109/ICASSP.2012.6288850.
    [13]CHIEN J T and HSU C W.Variational manifold learning for speaker recognition[C].Proceedings of 2017 IEEEInternational Conference on Acoustics,Speech and Signal Processing,New Orleans,USA,2017:4935-4939.doi:10.1109/ICASSP.2017.7953095.
    [14]WU Di.Speaker recognition based on i-vector and improved local preserving projection[C].Proceedings of the 2015Chinese Intelligent Automation Conference,Fuzhou,China,2015:115-121.doi:10.1007/978-3-662-46469-4_12.
    [15]HE Xiaofei,CAI Deng,YAN Shuicheng,et al.Neighborhood preserving embedding[C].Proceedings of the Tenth IEEE International Conference on Computer Vision,Beijing,China,2005:1208-1213.doi:10.1109/ICCV.2005.167.
    [16]KAJAREKAR S S and STOLCKE A.NAP and WCCN:Comparison of approaches using MLLR-SVM speaker verification system[C].Proceedings of 2017 IEEEInternational Conference on Acoustics,Speech and Signal Processing,Honolulu,USA,2007:IV-249-IV-252.doi:10.1109/ICASSP.2007.366896.
    [17]HAEB-UMBACH R and NEY H.Linear discriminant analysis for improved large vocabulary continuous speech recognition[C].Proceedings of 1992 IEEE International Conference on Acoustics,Speech,and Signal Processing,San Francisco,USA,1992:13-16.doi:10.1109/ICASSP.1992.225984.
    [18]DING Chuntao and ZHANG Li.Double adjacency graphsbased discriminant neighborhood embedding[J].Pattern Recognition,2015,48(5):1734-1742.doi:10.1016/j.patcog.2014.08.025.
    [19]WANG Jing,CHEN Fang,and GAO Quanxue.Discriminant neighborhood structure embedding using trace ratio criterion for image recognition[J].Journal of Computer and Communications,2015,3(11):61282.doi:10.4236/jcc.2015.311011.
    [20]魏权龄,王日爽,徐冰,等.数学规划与优化设计[M].北京:国防工业出版社,1984:358-470.WEI Quanling,WANG Rishuang,XU Bing,et al.Mathematical Programming and Optimization Design[M].Beijing:National Defense Industry Press,1984:358-470.
    [21]NIST.The NIST year 2010 speaker recognition evaluation plan[EB/OL].http://www.oalib.com/references/16891962,2012.
    [22]SCHEFFER N,FERRER L,GRACIARENA M,et al.The SRI NIST 2010 speaker recognition evaluation system[C].Proceedings of 2011 IEEE International Conference on Acoustics,Speech and Signal Processing,Prague,Czech Republic,2011:5292-5295.doi:10.1109/ICASSP.2011.5947552.
    [23]JOACHIMS T.SVM-light support vector machine[EB/OL].http://svmlight.joachims.org/,2008.
    [24]KINNUNEN T,JUVELA L,ALKU P,et al.Non-parallel voice conversion using i-vector PLDA:towards unifying speaker verification and transformation[C].Proceedings of2017 IEEE International Conference on Acoustics,Speech and Signal Processing,New Orleans,USA,2017:5535-5539.doi:10.1109/ICASSP.2017.7953215.
    [25]BAHMANINEZHAD F and HANSEN J H L.iVector/PLDA speaker recognition using support vectors with discriminant analysis[C].Proceedings of 2017 IEEEInternational Conference on Acoustics,Speech and Signal Processing,New Orleans,USA,2017:5410-5414.doi:10.1109/ICASSP.2017.7953190.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700