用户名: 密码: 验证码:
低速率语音编码算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
低速率语音编码算法在现代通信系统中有着非常广泛的应用,超低速率下的语音压缩编码算法是目前语音信号处理领域的重要研究课题之一。正弦激励线性预测(Sinusoidal excitation linear prediction, SELP)编码算法采用基于线性预测的正弦混合激励技术,在2.4kbps及更低速率的语音压缩编码算法中具有非常优越的性能。论文的研究目的是在SELP模型的基础上,对语音编码算法中的关键技术进行分析和研究,设计实现150bps的超低速率语音压缩编码算法。
     论文首先提出了高效的特征参数量化算法。在线谱频率参数(Line spectralfrequency, LSF)的标量量化中,提出了基于动态规划的全局最优LSF差值量化算法,并采用多码本进一步提高参数的量化性能,该算法能够在每帧28bits达到LSF参数的透明量化。在对基音周期参数进行矢量量化时,利用人耳的听觉特性,提出了基于感觉加权的失真度量准则,提高了参数的量化性能,并设计了一种码字搜索的整型优化算法,降低了基音周期最优码字的误搜索概率。
     针对超低速率语音编码算法中,特征参数量化比特不足的问题,提出了利用参数间相关性的特征参数解码端恢复算法。首先提出基于隐马尔可夫模型(HiddenMarkov model, HMM)的能量参数恢复算法,根据LSF参数和子带清浊音(Unvoiced/Voiced, U/V)参数估计能量参数的变化轨迹。随后提出基于高斯混合模型(Gaussian Mixed Model, GMM)的U/V参数恢复算法,利用LSF参数和归一化能量参数,对U/V参数的概率分布特性进行估计,从而节省了参数量化所需的比特数。
     随后,从解码端角度考虑,提出了特征参数插值方式的改进算法,以提高清浊音过渡时声码器的合成语音自然度。为了提高声码器的抗连续丢包处理能力,提出基于分模式线性预测的丢包隐藏算法,改善了连续丢包情况下的合成语音质量。
     最后,综合上述研究成果,设计并实现了150bps SELP语音编码算法,合成语音的客观平均意见分(Mean Opinion Score, MOS)为2.424,判断韵字测试(Diagnostic rhyme test, DRT)的准确率达到82.9%,码本存储量为120Kword,算法延时为325ms,总体性能指标超出国家十一五专项项目的要求。
The low bit rate speech coding algorithm is widely used in modern communicationsystem, and the ultra low bit rate speech compression coding is one of the mostsignificant research topics in speech signal processing area at present. Sinusoidalexcitation linear prediction (SELP) algorithm uses linear-prediction based sinusoidalmixed excitation technique, and has very outstanding performance among the speechcompression coding algorithms at the bit rate of2.4kbps or less. The research purposeof this dissertation is to analyze and research the essential techniques in speech coding,and design the150bps ultra low bit rate speech compression coding algorithm based onSELP model.
     The high-efficiency quantization methods of characteristic parameters areresearched first. In the scalar quantization of line spectral frequency (LSF), the globaloptimal difference quantization of LSF based on dynamic programming is proposed. Ituses multi-codebook to further improve the parameter’s quantization performance, andcan attain the transparent quantization of LSF at the rate of28bits/frame. In the vectorquantization of pitch parameter, the perceptual weighting distortion measure whichutilizes the auditory characteristics of human ears is proposed to improve thequantization performance of pitch, and the integer changed optimization technique isdeveloped to further reduce the search error rate of the optimal codeword for pitchparameter.
     In the ultra low bit rate speech coding, the bits assigned to each frame is severelyinadequate to quantize the characteristic parameters. In order to solve this problem, therecovery algorithm of characteristic parameters in the decoder is proposed based on thecorrelation between different parameters. First the energy is recovered based on thehidden Markov model (HMM). It utilizes the LSF and the sub-band unvoiced andvoiced (U/V) parameters to estimate the change of energy parameters. Then the U/Vrecovery algorithm is proposed based on the Gaussian mixed model (GMM), whichutilizes the LSF and the normalized energy to estimate the probability distribution ofU/V parameter, so as to save the bits assigned to quantizing it.
     From the consideration of the decoding end, the interpolation algorithm for the characteristic parameters is developed to improve the naturalness of synthesized speechin the transition period from unvoiced speech to voiced speech. In order to improvevocoder’s resistance to packet loss, mode-based linear prediction packet lossconcealment algorithm is propose, which can improve the synthesized speech qualityunder the existence of consecutive packet loss.
     Finally, integrating the research achievements mentioned above, the150bps SELPspeech coding algorithm is designed and realized. The vocoder’s mean opinion score(MOS) is2.424, the accurate rate of the diagnostic rhyme test (DRT) is82.9%, thecodebook size is120Kword, and the algorithm delay is325ms. To sum up, the entireperformance index of the150bps SELP vocoder exceeds the requirement of the nationalEleventh Five-Year major project.
引文
[1] A M Kondoz. Digital Speech: coding for low bit rate communication systems.Chichester: John Wiley&Sons,2004.
    [2] McCree A, Truong K, Geofge E B, et al.2.4kbit/s MELP coder candidate for the newU.S. federal standard. Proceedings of ICASSP,1996:200-203.
    [3] Kleijn W B. Encoding speech using prototype waveforms. IEEE Transactions onSpeech and Audio Processing,1993,1(4):386-399.
    [4] MvAulay R J, Quatieri T F. Speech analysis/synthesis based on a sinusoidalrepresentation. IEEE Trans Acoustic, Speech, Signal Processing,1986:744-754.
    [5] Griffin D W, Lim J S. Multi-band excitation vocoder. IEEE Trans. Acoustic, Speech,Signal Processing,1988:1223-1235.
    [6] NATO STANAG4591. MIL-STD-3005analog-to-digital conversion of voice by2400bit/second mixed excitation linear prediction (MELP). The Hague: NATO,1999.
    [7] Wang T, Koishida K, Cuperman V, et al. A1200bps coder based on MELP. IEEEInternational Conference on Acoustics, Speech and Signal Processing (ICASSP):IEEE Press,2000:1375-1378.
    [8] Wang T, Koishida K, Cuperman V, et al. A1200/2400bps coding suite based on MELP.IEEE Speech Coding Workshop: IEEE Press,2002:90-92.
    [9] NATO STANAG4591ANNEX M Ratification Draft1. MELPe variation for600bit/sNATO narrow band voice coder. The Hague: NATO,2003.
    [10]江灏.低速率语音编码的研究[博士学位论文].北京:清华大学,1998.
    [11]赵铭.极低速率语音编码算法研究[博士学位论文].北京:清华大学,2006.
    [12]鲍长春.低比特率数字语音编码基础.北京:北京工业大学出版社,2001.
    [13] Atal B S, Hanauer S L. Speech analysis and synthesis by linear prediction of thespeech wave. The Journal of the Acousticol Society of America,1971,50(2):637-655.
    [14] Bishnu S, Joel R. A new model of LPC excitation for producing natural soundingspeech at low bit rates. IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP): IEEE Press,1982:614-617.
    [15]王炳坤.语音编码.西安:西安电子科技大学出版社,2002.
    [16]孙圣和,陆哲明.矢量量化技术及应用.北京:科学出版社,2002:53-161.
    [17] K Engan, S O Aase, J H Hus y. Multi-frame compression: theory and design.EURASIP Signal Process,2000,80(10):2121-2140.
    [18] Thomas Eriksson, Jan Linden. Interframe LSF quantization for noisy channels. Speechand Audio Processing: IEEE Trans,1999,7(5):495-509.
    [19] Thomas Eriksson, Jan Linden, Jan Skoglund. Interframe correlation in spectralquantization, a study of different memory VQ schemes. Proc. Int. Conf. Acoustic.,Speech, signal Processing (ICASSP),1996:765-768.
    [20]李军林.低速率语音编码算法研究[博士学位论文].北京:清华大学,2004:24-34.
    [21] K K Paliwal, W B Kleijn. Quantization of LPC Parameters. IEEE Trans. Speech andAudio Processing,1995:433-466.
    [22]张雄伟.现代语音处理技术及应用.机械工业出版社,2003:7-19.
    [23] Y Shoham. Vector predictive quantization of the spectral parameters for low ratespeech coding. Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing,1987,4:2181-2184.
    [24] M Yong, G Davidsson, A Gersho. Encoding of LPC spectral parameters usingswitched-adaptive interframe vector prediction. Proc. IEEE Int. Conf. Acoustics,Speech, Signal Processing,1988,1:402-405.
    [25] V Cuperman, A Gersho. Vector predictive coding of speech at16kbits/s. IEEE Trans.Commun.,1985,33:685-696.
    [26] Felici M, Borgatti M, Guerrieri R. Very low bit rate speech coding using adiphone-based recognition and synthesis approach. Electronics Letters,1998,34(9):859-860.
    [27] Hoshiya T, Sako S, Zen H, et al. Improving the performance of HMM-based very lowbit rate speech coding. Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP): IEEE Press,2003:800-803.
    [28] Morishima S, Harashima H S. Very low bit rate speech coding based on a phonemerecognition. IEEE International Symposium on Information Theory, IEEE Press,1988:71-72.
    [29] Maia R D S, Cirigliano R J, Da R, et al. Mixed-excited phonetic vocoding at265bps.Proceedings of IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP): IEEE Press,2003:796-799.
    [30] Alan McCree. A scalable phonetic vocoder framework using joint predictive vectorquantization of MELP parameters. Proceedings of IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP): IEEE Press,2006:705-708.
    [31] Lopes C V, Chadha A. A40bps speech coding scheme. Global TelecommunicationsConference, IEEE GLOBLECOM: IEEE press,2003,4:2223-2226.
    [32]魏旋.参数相关超低速率语音编码算法[博士学位论文].北京:清华大学,2010.
    [33]张雪英.数字语音处理及MATLAB仿真.北京:电子工业出版社,2010.
    [34] ITU-T P.862. Perceptual evaluation of speech quality (PESQ), an objective method forend-to-end speech quality assessment of narrow-band telephone networks and speechcodecs,2001.
    [35]李晔,洪侃,王童,等.正弦激励线性预测声码器子带清浊音模糊判决.清华大学学报:自然科学版,2008,48(7):1101-1103.
    [36]李晔.低速率语音编码技术与算法研究[博士学位论文].北京:清华大学电子工程系,2009.
    [37] Paliwal K K, Atal B S. Efficient vector quantization of LPC parameters at24bits/frame. IEEE Transactions on Speech and Audio Processing,1993,1(1):3-14.
    [38] LeBlanc W P, Bhattacharya B, Mahmoud S A, et al. Efficient search and designprocedures for robust multi-stage VQ of LPC parameters for4kb/s speech coding.IEEE transactions on speech and audio processing,1993,1(4):373-385.
    [39] Xydeas C S, Papanastasiou C. Efficient coding of LSP parameters using split matrixquantization (ICASSP): IEEE Press,1995:740-743.
    [40] B H Juang, Gray A Jr. Multiple stage vector quantization for speech coding.International Conference on Acoustics, Speech and Signal Processing,1982,1:597-600.
    [41] R L Rost, C F Barnes, F Xu. Design and performance of residual quantizers.Proceedings Data Compression Conference: IEEE Computer Society Press,1991:129-138.
    [42]邹霞,张雄伟.线谱对参数预测多级矢量量化联合优化算法.数据采集与处理,2008,23(2):186-190.
    [43] M Yong, G Davidson, A Gersho. Encoding of LPC spectral parameters using switchedadaptive interframe vector prediction. IEEE International Conference on Acoustics,Speech and Signal Processing,1988,1:402-405.
    [44] Fischer T R, Tinnen D J. Quantized control using differential encoding. OptimalControl Applications and Methods,1984,5(1):69-83.
    [45]李军林,崔慧娟,唐昆,等.极低速率语音编码中LSP参数的高效量化算法.清华大学学报:自然科学版,2004,44(10):1422-1425.
    [46] LI Qinghua, SUN Zhizhao, TANG Kun, et al. A variable weighting factor baseddistortion measure for pitch quantization. International Conference on InformationEngineering and Computer Science,2009,2:893-895.
    [47]李清华.600bits/s及800bits/s的语音编码技术及算法研究[硕士学位论文].北京:清华大学电子工程系,2010:23-31.
    [48]李昌立,吴善培.数字语音:语音编码实用教程.北京:人民邮电出版社,2004:96-97.
    [49] F Soong, B Juang. Line spectrum pair and speech data compression. Proc. Int. Conf.Acoustic., Speech, signal Processing (ICASSP),1984:37-40.
    [50] F Soong, F Juang. Optimal quantization of LSP parameters using delayed decisions.Proc. Int. Conf. Acoustic., Speech, signal Processing (ICASSP),1990:185-188.
    [51] R Hagen, P Hedelin. Low bit-rate spectral coding in CELP, a new LSP method. Proc.Int. Conf. Acoustic., Speech, signal Processing (ICASSP),1990:189-192.
    [52] Chih-Chung Kuo, Fu-Rong Jean, Hsiao-Chuan Wang. Low bit rate quantization of LSPparameters using two-dimensional differential coding. Proc. Int. Conf. Acoustic.,Speech, signal Processing (ICASSP),1992:97-100.
    [53] E Erizin, A E Cetin. Interframe differential coding of line spectrum frequencies. IEEETrans. Speech and Audio Processing,1994,1(1):350-352.
    [54] E Erizin, A E Cetin. Interframe differential vector coding of line spectrum frequencies.Proc. Int. Conf. Acoustic., Speech, signal Processing (ICASSP),1993,2:25-28.
    [55] JI Zhe, WEI Xuan, TANG Kun, et al. Improvement of delayed decision doding forLSF difference quantization. International Conference on Natural Computation(ICNC),2009,5:284-287.
    [56] F Soong, B Juang. Optimal quantization of LSP parameters. Proc. Int. Conf. Acoustic.,Speech, signal Processing (ICASSP),1988:394-397.
    [57]罗亚飞,鲍长春.低速率WI编码器中4_6bit基音量化算法研究.电子与信息学报,2007,1(11):2669-2671.
    [58] Thomas Eriksson, Hong-Goo Kang. Pitch quantization in low bit-rate speech coding.IEEE International Conference on Acoustics, Speech, and Signal Processing,1999,(1):489-492.
    [59]计哲,李晔,唐昆,等. SELP声码器基音周期参数量化合成改进算法.高技术通讯,2010,20(1):45-48.
    [60]魏旋,党晓妍,唐昆,等.基于Gauss混合模型的清浊音解码端恢复算法.清华大学学报,2010,50(1):79-82.
    [61] Keiichi Tokuda, Takashi Masuko, Jun Hiroi, et al. A very low bit rate speech coderusing HMM-based speech recognition/synthesis techniques. Proc. Int. Conf. Acoustic.,Speech, signal Processing (ICASSP),1998:609-612.
    [62] Takahiro Hoshiya, Shinji Sako, Heiga Zen, et al. Improving the performance ofHMM-based very low bit rate speech coding. Proc. Int. Conf. Acoustic., Speech,signal Processing (ICASSP),2003:800-803.
    [63] Zhao M, Tang K, Cui H. Mode-based quantization of LP parameters for very low bitrate vocoder. International conference on Communications, Circuits and Systems andWest Sino Expositions: IEEE Press,2002:28-31.
    [64]洪侃,李晔,唐昆,等.基于子带清浊音模式的声码器增益参数抗误码算法.清华大学学报:自然科学版,2008,48(10):1621-1624.
    [65] Lawrence Rabiner, Biing-Hwang Juang. Fundamentals of speech recognition. NewJersey: Prentice Hall,1993.
    [66]党晓妍.极低速率语音编码算法与技术研究[博士学位论文].北京:清华大学电子系,2008:37-44.
    [67] S Theodoridis, K Koutroumbas. Pattern Recognition.3rd ed. Beijing: China MachinePress,2006.
    [68] Plante F, Meyer G F. A pitch extraction reference database. European Conf on SpeechCommunication and Technology,1995:837-840.
    [69]计哲,李晔,唐昆,等. SELP2.4kb/s语音编码算法跳跃帧判决及处理方法.清华大学学报(自然科学版),2009,49(8):1152-1155.
    [70] C Hoene, B Rathke, A Wolisz. On the Importance of a VoIP Packet. Proc. of ISCATutorial and Research Workshop on the Auditory Quality of Systems,2003.
    [71] Lijing Ding, Goubran R A. Assessment of effects of packet loss on speech quality inVoIP. Internatioal Workshop on Haptic, Audio and Visual Environments and TheirApplications: IEEE,2003:49-54.
    [72]李如玮,鲍长春. VoIP丢包处理技术的研究进展.通信学报,2007,28(6):103-110.
    [73]王培明,施寅. VoIP中丢包隐藏技术研究.计算机技术与发展,2006,16(7):26-31.
    [74] Rodbro CA, Murthi MN, Andersen SV, et al. Hidden Markov Model-Based PacketLoss Concealment for Voice over IP. IEEE Transactions on Audio, Speech, andLanguage Processing,2006,14(5):1609-1623.
    [75] Minkyu Lee, Imed Zitouni, Qiru Zhou. Prediction Based Packet Loss Concealment forVoice over IP: A Statistical N-gram Approach. IEEE Global TelecommunicationsConference,2004,4:2308-2312.
    [76]计哲,徐敬德,唐昆,等.基于SELP声码器的连续丢包隐藏算法.清华大学学报:自然科学版,2010,12:2003-2006.
    [77] Pan J S, Chu S C. Non-redundant VQ channel coding using tabu search strategy. IEEElectronics Letters,1996,32(17):1545-1546.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700