摘要
本文主要介绍了深度学习和语音识别技术的发展历史和发展现况,研究意义及目的。通过深度学习技术建立声学模型,从而引入语音识别技术中分析其发展前景。本文希望通过对基于深度学习的语音识别方法研究,将语音识别的效率优化,准确率提高,从而促进语音识别技术的发展。
The paper mainly introduces the history and development of deep learning and speech recognition technology,and elaborates significance and purpose of the study. The acoustic model is established by deep learning technology to introduce the development prospect of the speech recognition technology. In this paper,based on deep learning,the research could improve the efficiency and the accuracy of speech recognition,so as to promote the development of speech recognition technology.
引文
[1]赵力.语音信号处理[M].3版.北京:机械工业出版社,2016.
[2]JUANG B H,RABINER L R.Automatic speech recognition-a brief history of the technology development[EB/OL].[2015-01-17].http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/354_LALI-ASRHistory-final-10-8.pdf.
[3]BENESTY J,SONDHI M M,HUANG Yiteng.Springer handbook of speech processing[M].Berlin/Heidelberg:Springer Science&Business Media,2008.
[4]RABINER L R.First-Hand:The_Hidden_Markov_Model[EB/OL].[2015-01-12].https://ethw.org/First-Hand:The_Hidden_Markov_Model.
[5]PINOLA M.Speech recognition through the decades:How we ended up with Siri[EB/OL].[2017-07-28].https://www.pcworld.com/article/243060/speech_recognition_through_the_decades_how_we_ended_up_with_siri.html.
[6]KURZWEIL R.KurzweilAINetwork[EB/OL].[2014-09-25].http://www.kurzweilai.net/ray-kurzweil-biography.
[7]HERVB,MORGAN N.Connectionist speech recognition:Ahybrid approach[M].Boston:Kluwer Academic Publishers,1994.
[8]BAKER J M,LI Deng,GLASS J R,et al.Developments and directions in speech recognition and understanding,Part 1[J].IEEE Signal Processing Magazine,2009,26(3):75-80.
[9]孙志军,薛磊,许阳明,等.深度学习研究综述[J].计算机应用研究,2012,29(8):2806-2810.
[10]Kaldi.History of the Kaldi project[EB/OL].[2017-07-26].http://www.kaldi-asr.org/doc/history.html.
[11]GRAVES A,LIWICKI M,FERNNDEZ S,et al.A novel connectionist system for unconstrained handwriting recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(5):855-868.
[12]SAK H,SENIOR A,BEAUFAYS F.Long short-term memory recurrent neural network architectures for large scale acoustic modeling[C]//Interspeech 2014.Singapore:ISCA,2014:338-342.
[13]LI Xiangang,WU Xihong.Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition[J].arXiv preprint arXiv:1410.4281,2014.
[14]LECUN Y.LeNet-5,convolutional neural networks[EB/OL].[2013-11-16].http://sites.google.com/site/chumerin/projects/mycunn.
[15]ZHANG Wei.Shift-invariant pattern recognition neural network and its optical architecture[C]//Proceedings of annual conference of the Japan Society of Applied Physics,1988.
[16]ZHANG W,ITOH K,TANIDA J,et al.Parallel distributed processing model with local space-invariant interconnections and its optical architecture[J].Applied Optics,1990,29(32):4790-4797.
[17]MATSUGU M,MORI K,MITARI Y,et al.Subject independent facial expression recognition with robust face detection using a convolutional neural network[J].Neural Networks,2003,16(5):555-559.
[18]VAN DEN OORD A,DIELEMAN S,SCHRAUWEN B.Deep content-based music recommendation(PDF)[C]//NIPS'13Proceedings of the 26th International Conference on Neural Information Processing Systems.Lake Tahoe,Nevada:ACM,2013,2:2643-2651.
[19]COLLOBERT R,WESTON J.A unified architecture for natural language Processing:Deep neural networks with multitask learning[C]//ICML,volume 307 of ACM International Conference Proceeding Series.Helsinki,Finland:ACM,2008:160-167.