摘要
该文提出一种基于判别邻域嵌入(DNE)算法的说话人识别。判别邻域嵌入算法作为流形学习方法的一种,可以通过构建邻接图获取数据的局部邻域结构信息;同时该算法可以充分利用类间判别信息,具有更强的判别能力。在美国国家标准技术研究院2010年说话人识别评测(NIST SRE 2010)电话-电话核心测试集上的实验结果表明了该算法的有效性。
Discriminant Neighborhood Embedding(DNE) algorithm is introduced into the speaker recognition system. DNE is a manifold learning approach and aims at preserving the local neighborhood structure on the data manifold. As well, DNE has much more power in discrimination by sufficiently using the between-class discriminant information. The experimental results on the telephone-telephone core condition of the NIST 2010 Speaker Recognition Evaluation(SRE) dataset indicate the effectiveness of DNE algorithm.
引文
[1]REYNOLDS D A and ROSE R C.Robust text-independent speaker identification using Gaussian mixture speaker models[J].IEEE Transactions on Speech and Audio Processing,1995,3(1):72-83.doi:10.1109/89.365379.
[2]KINNUNEN T and LI Haizhou.An overview of textindependent speaker recognition:From features to supervectors[J].Speech Communication,2010,52(1):12-40.doi:10.1016/j.specom.2009.08.009.
[3]王伟,韩纪庆,郑铁然,等.基于Fisher判别字典学习的说话人识别[J].电子与信息学报,2016,38(2):367-372.doi:10.11999/JEIT150566.WANG Wei,HAN Jiqing,ZHENG Tieran,et al.Speaker recognition based on fisher discrimination dictionary Learning[J].Journal of Electronics&Information Technology,2016,38(2):367-372.doi:10.11999/JEIT150566.
[4]KENNY P,BOULIANNE G,OUELLET P,et al.Speaker and session variability in GMM-based speaker verification[J].IEEE Transactions on Audio,Speech,and Language Processing,2007,15(4):1448-1460.doi:10.1109/tasl.2007.894527.
[5]郭武,戴礼荣,王仁华.采用因子分析和支持向量机的说话人确认系统[J].电子与信息学报,2009,31(2):302-305.doi:10.3724/SP.J.1146.2007.01289.GUO Wu,DAI Lirong,and WANG Renhua.Speaker verification based on factor analysis and SVM[J].Journal of Electronics&Information Technology,2009,31(2):302-305.doi:10.3724/SP.J.1146.2007.01289.
[6]DEHAK N,KENNY P J,DEHAK R,et al.Front-end factor analysis for speaker verification[J].IEEETransactions on Audio,Speech,and Language Processing,2011,19(4):788-798.doi:10.1109/tasl.2010.2064307.
[7]DHANUSH B K,SUPARNA S,AARTHY R,et al.Factor analysis methods for joint speaker verification and spoof detection[C].Proceedings of 2017 IEEE International Conference on Acoustics,Speech and Signal Processing,New Orleans,USA,2017:5385-5389.doi:10.1109/ICASSP.2017.7953185.
[8]SU Hang and WEGMANN S.Factor analysis based speaker verification using ASR[C].Proceedings of the Interspeech2016,San Francisco,USA,2016:2223-2227.doi:10.21437/Interspeech.2016-1157.
[9]MAK M W,PANG Xiaomin,and CHIEN J T.Mixture of PLDA for noise robust i-vector speaker verification[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2016,24(1):130-142.doi:10.1109/TASLP.2015.2499038.
[10]LEI Yun and HANSEN J H L.Speaker recognition using supervised probabilistic principal component analysis[C].Proceedings of the Interspeech 2010,Makuhari,Japan,2010:382-385.
[11]LIANG Chunyan,YANG Lin,ZHAO Qingwei,et al.Factor Analysis of neighborhood-preserving embedding for speaker verification[J].IEICE Transactions on Information and Systems,2012,95(10):2572-2576.doi:10.1587/transinf.e95.d.2572.
[12]YANG Jinchao,LIANG Chunyan,YANG Lin,et al.Factor analysis of Laplacian approach for speaker recognition[C].Proceedings of 2012 IEEE International Conference on Acoustics,Speech and Signal Processing,Kyoto,Japan,2012:4221-4224.doi:10.1109/ICASSP.2012.6288850.
[13]CHIEN J T and HSU C W.Variational manifold learning for speaker recognition[C].Proceedings of 2017 IEEEInternational Conference on Acoustics,Speech and Signal Processing,New Orleans,USA,2017:4935-4939.doi:10.1109/ICASSP.2017.7953095.
[14]WU Di.Speaker recognition based on i-vector and improved local preserving projection[C].Proceedings of the 2015Chinese Intelligent Automation Conference,Fuzhou,China,2015:115-121.doi:10.1007/978-3-662-46469-4_12.
[15]HE Xiaofei,CAI Deng,YAN Shuicheng,et al.Neighborhood preserving embedding[C].Proceedings of the Tenth IEEE International Conference on Computer Vision,Beijing,China,2005:1208-1213.doi:10.1109/ICCV.2005.167.
[16]KAJAREKAR S S and STOLCKE A.NAP and WCCN:Comparison of approaches using MLLR-SVM speaker verification system[C].Proceedings of 2017 IEEEInternational Conference on Acoustics,Speech and Signal Processing,Honolulu,USA,2007:IV-249-IV-252.doi:10.1109/ICASSP.2007.366896.
[17]HAEB-UMBACH R and NEY H.Linear discriminant analysis for improved large vocabulary continuous speech recognition[C].Proceedings of 1992 IEEE International Conference on Acoustics,Speech,and Signal Processing,San Francisco,USA,1992:13-16.doi:10.1109/ICASSP.1992.225984.
[18]DING Chuntao and ZHANG Li.Double adjacency graphsbased discriminant neighborhood embedding[J].Pattern Recognition,2015,48(5):1734-1742.doi:10.1016/j.patcog.2014.08.025.
[19]WANG Jing,CHEN Fang,and GAO Quanxue.Discriminant neighborhood structure embedding using trace ratio criterion for image recognition[J].Journal of Computer and Communications,2015,3(11):61282.doi:10.4236/jcc.2015.311011.
[20]魏权龄,王日爽,徐冰,等.数学规划与优化设计[M].北京:国防工业出版社,1984:358-470.WEI Quanling,WANG Rishuang,XU Bing,et al.Mathematical Programming and Optimization Design[M].Beijing:National Defense Industry Press,1984:358-470.
[21]NIST.The NIST year 2010 speaker recognition evaluation plan[EB/OL].http://www.oalib.com/references/16891962,2012.
[22]SCHEFFER N,FERRER L,GRACIARENA M,et al.The SRI NIST 2010 speaker recognition evaluation system[C].Proceedings of 2011 IEEE International Conference on Acoustics,Speech and Signal Processing,Prague,Czech Republic,2011:5292-5295.doi:10.1109/ICASSP.2011.5947552.
[23]JOACHIMS T.SVM-light support vector machine[EB/OL].http://svmlight.joachims.org/,2008.
[24]KINNUNEN T,JUVELA L,ALKU P,et al.Non-parallel voice conversion using i-vector PLDA:towards unifying speaker verification and transformation[C].Proceedings of2017 IEEE International Conference on Acoustics,Speech and Signal Processing,New Orleans,USA,2017:5535-5539.doi:10.1109/ICASSP.2017.7953215.
[25]BAHMANINEZHAD F and HANSEN J H L.iVector/PLDA speaker recognition using support vectors with discriminant analysis[C].Proceedings of 2017 IEEEInternational Conference on Acoustics,Speech and Signal Processing,New Orleans,USA,2017:5410-5414.doi:10.1109/ICASSP.2017.7953190.