用户名: 密码: 验证码:
基于MNB2算法的语音编解码器客观评估平台研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着信息技术在通信领域的高速发展,语音编解码器也大量涌现。为了测试这些语音编解码器的质量,研究人员建立了许多主观评估方法进行全方位的测试。虽然,主观评估具有符合人对语音质量的真实感觉的优点,它也有耗费大量时间、人力资源、金钱,且灵活性不够,重复性和稳定性较差,受个体的主观因素影响大的缺点。因此,需要一种客观评估方法克服主观评估的缺陷。通过对语音质量客观评估的研究,可以极大推动语音识别、语音编码、说话人情绪分析、语音增强、语音安全技术等相关学科和领域的发展。
     本文首先全面总结了各种主观、客观评估方法,并比较了各种客观评估方法的优势,展望了客观评估方法的发展方向。本文研究了当前世界最先进的基于输入/输出的客观评估算法——MNB2算法。该算法综合了可察觉和可辨别两种模型,改变了以前客观评估算法模型的单一性。本文深入研究了该算法的心理声学原理,以及针对通信频带的语音编解码器的评估算法构造,并对该算法进行了MATLAB算法仿真研究和全方位的实验。为了研究该算法对不同语音编解码器进行客观评估的准确程度;算法对不同语种的适应性;对背景噪声的适应性。实验当中对超过1500个总长约16,000秒的语音片段进行了5000多次处理,获得了约4000多个数据。结果表明,算法有很强适应性,实用性强,完全可以作为新型编码器质量评估的重要参考指标。在实验研究的基础上,对该算法模型进行了简化,并对简化模型进行了与完整模型同等的高强度测试,结果表明使用简化模型进行评估的结果与完整模型评估的结果保持了很好的一致性,简化模型可以节省一定算法所需的存储空间,实现更直接简便。
     最后,使用C++语言实现了该算法,并在此基础上针对Windows 2000操作系统开发了一个集成了该算法的语音编解码器客观评估平台。平台开发中使用OOP(面向对象程序设计)技术进行开发,使用UML(统一建模语言)对评估平台进行了建模,使用DirectX技术中的DirectSound技术和多线程技术实现对大型Wlave文件的高效播放及语音波形的显示,并且为平台未来的功能扩展打下了良好基础。
With the rapid development of information technology in communication field, many speech codecs have been emerging. In order to measure the quality of these speech codecs, researchers have built many subjective listening test plans to examine all aspects of a speech codec. The prime advantage of subjective evaluation is that its result represents the real perception of human. However, subjective listening test is time consuming, labor-intensive, expensive, inflexible, nonrecurring and unstable. So we need the objective evaluation to avoid the disadvantages of the subjective listening test. The study of objective evaluation can greatly improve other relative research field, such as speech recognition, speech enhancement, and so on.
    Firstly, this dissertation fully summarized subjective evaluation methods and objective evaluation methods, and demonstrated the features of each objective methods. In this dissertation, I studied the world's most advanced I/O-based objective evaluation algorithm~MNB2 Algorithm, which combines the delectability model and the judgment model and totally improves the model built by early researchers, In my research, I studied the psychoacoustics principle of this algorithm and a concrete structure for telecommunication band speech codec. And then, I simulated the algorithm in MALAB. During the simulation study, over 1,500 speech segments, about 11s each, were processed more than 5,000 times and got about 4,000 data. The accuracy of the algorithm, the difference between the evaluation results of different languages and the background noise's effects to the evaluation, were all included. The results demonstrated that the algorithm is very meaningful and can be applied to the evaluation of new speech codec. And a
    reduced MNB2 model that got from early research was also examined. The results showed reduced model and complete model could get almost the same evaluation data. At the same time, the reduced model can save many memories.
    Finally, A C++ implementation of this algorithm was integrated into a objective evaluation platform of speech codec. OOP(Object Oriented Programming), UML(Unified Modeling Language), DirectX and multithread techniques were used in the development of this platform.
引文
[1]易克初,田斌,付强.语音信号处理.北京:国防工业出版社,2000.331~335
    [2]Stephen D. Voran. Objective Estimation of Perceived Speech Quality Using Measuring Normalizing Blocks. NTIA Report 98-347. 1~2
    [3]杨行峻,迟惠生等.语音信号数字处理.北京:电子工业出版社,1995.166~168
    [4]陈国,胡修林等.语音质量客观评价方法研究进展.电子学报,2001,29(4):548~552
    [5]S.R.Quackenbush,T.P.Barmwell Ⅲ, M.A.Clements.Objective Measures of Speech Quality [M].Engle wood Cliffs,NJ:Prentice Hall, 1988.
    [6]Bell Northern Research.Objective Evaluation of Non-linear Disortion Effects on Voice Transmission Quality [M].Comtribution to CCITT.COM Ⅻ-46-E,March 1986.
    [7]Bell Northern research.Re-ebaluation of the Objective Method for Measurement of Non-linear Disortion [M].Contribution to CCITT.COM Ⅻ- 175-E.June 1987.
    [8]N.Kitawaki,H.Negabuchi,K.Itoh.Objective quality evalution for low-bit-rate speech coding systems [J]. IEEE Journa on Sel.Areas in Communications, 1988, 6(2): 242-248.
    [9]J.Lalou. The information index:An objective measure of speech transmission performance [J]. Ann.Telecommun., 1990,45(1-2):47-65.
    [10]R.Kubichek, E.A.Quincy, L.L.Kiser.Speech quality assessment using expert pattern recognition techniques [J]. IEEE Pacific Rim Conference on Computers, Communication and Signal processing, Jun. 1989:216-219.
    [11]R.Kubichek, Atkinson.D, A.Webster. Advance in objective voice quality assessment [A]. Proc.of IEEE Global Telecommunication Conference [C], 1991.3: 1765-1770.
    [12]S.Wang,A.Sekey,A.Gersho. An objective measure for predicting subjective quality of speech coders [J]. IEEE Joumal on Sel.Areas in Communications, June 1992,10(5):815-829.
    [13]W.Yang, M.Dixon,R,Yantomo. A modified bark spectral distortion measure which uses noise masking threshold [A]. in Proc. 1997 IEEE Workshop Speech Coding for Telecommunications [C],1997:55-56.
    [14]W.Yang,M.Benbouchta, R.Yantorno. Performance of the modified bark spectral distortion as an objective speech quality measure [A]. Proc.1998 IEEE ICASSP [C],1998:541-544.
    
    
    [15] W.Yang,R.Yantorno. Improvement of MBSD by scaling noise masking threshold and correlation analysis with MOS difference instead of MOS [A].Proc.1999 IEEE ICASSP [C], 1999:673-666.
    [16] 陈国,胡修林等 一种基于听觉特性的语音失真测度方法[J].声学学报,2000,25(5):463-467.
    [17] 王瑛,张知易 一种基于人耳听觉特性的语音可观测度研究[J].通信技术,1999,3:62-66。
    [18] 吴淑贞,赵朝阳基于听觉模型的客观音质评价方法研究[J].电子学报,1999,27(7):92-94.
    [19] ITU-T Rec.P.861 Objective Quality Measure of Telephine-band Speech Codecs [M].Geneva, 1996.
    [20] S.Voran. A simplified version of the ITU algorithm for objective measure of speech eodee quality [A].Proc, 1998 IEEE ICASSP [e], 1998:537-540.
    [21] S.Voran. Estimation of perceived speech quality using measuring normalizing blocks [A]. Proceedings of the 1997 IEEE Speech Coding Workshop [C],1997:83-84.
    [22] S.Vorna. Objective estimation of perceived speech quality part Ⅰ: Development of the measuring normalizing block technique [J].IEEE Trans on Speech and Audio Processing, 1999,7(4):371-372.
    [23] S.Voran. Objective estimation of perceived speech quality part Ⅱ: Evaluation of measuring normalizing block technique [J].IEEE Trans on Speech and Audio Processing, 1999,7(4):383-390.
    [24] Rix, Antony W. (Psytechnics Limited), Hollier, Michael P., Hekstra, Andries P., Beerends, John G. Perceptual evaluation of speech quality (PESQ) - A new method for speech quality assessment of telephone networks and codecs. Source: AES: Journal of theAudio Engineering Society, v 50, n 10, October, 2002, p 755-764
    [25] Rix, Antony W. (Psytechnics Limited), Hollier, Michael P. Hekstra, Andries P., Beerends, John G., Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality assessment. Part Ⅰ - Time-delay compensation. Source: AES: Journal of the Audio Engineering Society, v 50, n 10, October, 2002, p 755-764
    [26] Beerends, John G.(Royal PTT Nederland NV), Hekstra, Andries P., Rix, Antony W., Hollier, Michael P. Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality assessment. Part Ⅱ - Psychoacoustic model. Source: AES: Journal of the Audio Engineering Society, v 50, n 10, October, 2002, p
    
    765-778
    [27] J.Liang,R.Kubichek. Vector quantization techniques for output-based objective speech quality [A].Proc. 1996 IEEE ICASSP [C], 1994,3:1719-1723.
    [28] C.Jin,R.Kubichek Vector quantization techniques for output-based objective speech quality [A].Proc, 1996:491-494.
    [29] H.Hermansky. Percepual linear predictive PLP analysis of speech [J]. J.Acoust .Soc.Am., 1990,87(4): 1738-1752.
    [30] 陈国.胡修林.张蕴玉.朱耀庭.多标度分形理论及其在语音质量客观评价中的应用.声学学报(中文版)2002年06期:531-535
    [31] 吴淑珍, L.C.W.Pols.汉语通信系统客观音质评价方法[J].华中理工大学学报,1999,23(2):101-103.
    [32] 陈国,胡修林等.对Itakura语音失真测度的改进[J].华中理工大学学报,1999,27(10):101-103.
    [33] K.H.Lam, O.C.Au. Objective speech measure for Chinese in wireless environment [A]. Proc. 1996 IEEE ICASSP [C], 1995:227-280.
    [34] K.H.Lam,O.C.Au. Objective speech quality measure for cellular phone [A]. Proc. IEEE ICASSP [C], 1996:487-490.
    [35] R.Kubichek, Mel-cepstral measure for objective speech quality assessment [A]. in Proc.IEEE Pacific Conf [C]. Communications,Computer, and signal Processing, 1993:125-128.
    [36] 丁瑾,钟涛,胡健栋.语音质量的一种新的评价方法[J].电子学报.1997,25(4):6-9.
    [37] Stephen D. Voran. Objective Estimation of Perceived Speech Quality Using Measuring Normalizing Blocks. NTIA Report 98-347. 1~2.
    [38] 李人厚,张平安等译校.精通MATLAB综合辅导与指南.西安:西安交通大学出版社,1999.171~184
    [39] Theodore S.Rappaport著,无线通信原理与应用,北京:电子工业出版社,1999.376
    [40] Grady Booth James Rumbaugh Ivar Jacobson, The Unified Modeling Language User Guide, Addison-Wesley Pub Co. September 30, 1998.1~6
    [41] Jim Beveddge,Robert Wiener著,侯捷译,Win32线程程序设计,武汉:华中科技大学出版社.1~7
    [42] George Shepherd,Scot Wingo著,深入解析MFC,赵剑云,卿瑾译,北京:中国电力出版社,2003.1~5
    [43] Jeff Prosise著,MFC Windows程序设计(第2版),北京博彦科技发展有限
    
    公司,清华大学出版社.16~19

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700