用户名: 密码: 验证码:
基于CNN-LSTM网络的声纹识别研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:VOICEPRINT RECOGNITION BASED ON CNN-LSTM NETWORK
  • 作者:闫河 ; 董莺艳 ; 王鹏 ; 罗成 ; 李焕
  • 英文作者:Yan He;Dong Yingyan;Wang Peng;Luo Cheng;Li Huan;College of Computer Science and Engineering, Chongqing University of Technology;College of Artificial Intelligence, Chongqing University of Technology;
  • 关键词:声纹识别 ; CNN-LSTM网络 ; 语谱图 ; 时序特征
  • 英文关键词:Voiceprint recognition;;CNN-LSTM Network;;Spectrogram;;Timing features
  • 中文刊名:JYRJ
  • 英文刊名:Computer Applications and Software
  • 机构:重庆理工大学计算机科学与工程学院;重庆理工大学两江人工智能学院;
  • 出版日期:2019-04-12
  • 出版单位:计算机应用与软件
  • 年:2019
  • 期:v.36
  • 基金:国家自然科学基金项目(61173184);; 重庆市自然科学基金项目(cstc2018jcyjAX0694)
  • 语种:中文;
  • 页:JYRJ201904027
  • 页数:5
  • CN:04
  • ISSN:31-1260/TP
  • 分类号:172-176
摘要
传统声纹识别方法过程复杂,模型识别准确率低,是声纹识别应用发展的关键问题。利用深度学习具有自主特征提取及分类的特点,结合卷积神经网络(CNN)和长短期记忆网络(LSTM),提出一种结合的网络模型学习声纹识别特征及对其进行身份认证。将原始语音转换为固定长度语谱图,顺序进入CNN、LSTM,结合网络进行训练以及声纹特征学习。通过对比CNN、LSTM以及DNN网络,验证CNN-LSTM网络在声纹识别中具有较少迭代次数情况下高准确率的特性。经实验结果可以得出,语音空间特征及时序特征均是声纹识别中重要的影响因素,实验中的CNN-LSTM网络模型准确率达到95.42%,损失低值达到0.097 3。该方法有利于实际声纹识别的应用。
        The traditional voiceprint recognition method is complex with low recognition accuracy, which is a key issue in the development of voiceprint recognition applications. In this paper, we used deep learning with autonomous feature extraction and classification, combining with convolutional neural network(CNN) and long-term and short-term memory network(LSTM). A combined network model was proposed to learn the features of voiceprint recognition and identity authentication. The original speech was converted into a fixed-length spectrogram, and sequentially entered into the combined network CNN and LSTM for training, and learning voiceprint feature. By comparing CNN, LSTM and DNN, We verified the high accuracy of the CNN-LSTM network in voiceprint recognition with fewer iterations. The experimental results show that the speech space features and time series features are important factors in voiceprint recognition. The accuracy of CNN-LSTM network model in the experiment reaches 95.42%, and the loss value is 0.0973. The method is benefical to the practical application of voiceprint recognition.
引文
[1] Schmidhuber J. Deep learning in neural networks: an overview [J]. Neural Networks, 2014, 61(3): 85-94.
    [2] Abdel-Hamid O, Mohamed A R, Jiang H, et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition[C]//2012 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). IEEE, 2012: 4277-4280.
    [3] Simonyan K, Zisserman A. Very Deep Convolutional Networks for LargeScale Image Recognition [J]. Computer Science, 2014, 13(2): 120-131.
    [4] Variani E,Lei X,Mcdermott E,et al.Deep neural networks for small footprint text-dependent speaker verification[C]//2014 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). IEEE, 2014.
    [5] Snyder D, Garcia-Romero D, Povey D, et al. Deep Neural Network Embeddings for Text-Independent Speaker Verification[C]//Proc. InterSpeech 2017:999-1003.
    [6] Waibel A,Hanazawa T,Hinton G,et al.Phoneme recognition using time-delay neural networks[J]. IEEE transactions on acoustics, speech, and signal processing, 1989, 37(3): 328-339
    [7] Abdel-Hamid O, Mohamed A R, Jiang H, et al. Convolutional Neural Networks for Speech Recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(10):1533-1545.
    [8] 余玲飞, 刘强. 基于深度循环网络的声纹识别方法研究及应用[J]. 计算机应用研究, 2019,36(1):153-158.
    [9] Bhattacharya G, Alam J, Stafylakis T, et al. Deep Neural Network based Text-Dependent Speaker Recognition: Preliminary Results[C]//Odyssey 2016.21-24 Jun 2016,Bilbao,Spain.
    [10] Heigold G, Moreno I, Bengio S, et al. End-to-End Text-Dependent Speaker Verification[C]//Acoustics, Speech and Signal Processing(ICASSP), 2016 IEEE International Conference on. IEEE, 2016.
    [11] Chowdhury F A R R, Wang Q, Moreno I L, et al. Attention-Based Models for Text-Dependent Speaker Verification[C]//ICASSP 2018—2018 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). IEEE, 2018.
    [12] Zhang C, Koishida K. End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances[C]//Interspeech, 2017.
    [13] Greff K, Srivastava R K, Koutník, Jan, et al. LSTM: A Search Space Odyssey[J]. IEEE Transactions on Neural Networks & Learning Systems, 2015, 28(10):2222-2232.
    [14] TensorFlow.谷歌深度学习框架[EB/OL]. 2018. https://www.tensorflow.org/?hl=zh-cn.
    [15] Free ST Chinese Mandarin Corpus[DB/OL]. 2016. http://www.openslr.org/38/.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700