用户名: 密码: 验证码:
混叠语音的计算听觉场景分析研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
计算听觉场景分析试图利用计算机技术通过对人类听觉心理过程及听觉生理过程的模拟,使计算机具备象人耳那样处理声音(分离并解释)的能力,这是一项新兴的边缘研究课题。本论文对混叠语音的场景分析问题进行了研究,建立了一个初步完整的混叠语音听觉场景分析系统,并完成了有一定创新性的工作:
     1、针对以往混叠语音听觉场景分析系统存在的分离效果差,过程复杂,计算量大等问题,借鉴MBE(Multi-Band Excitation)技术,提出了双基频多带激励场景分析模型(Double Pitch Multi Band Excitation Scene Analysis Model,DP-MBE SAM)及基于DP-MBE SAM的混叠浊音听觉场景分析系统。由于模型以基频轨迹为线索,同时考虑混叠语音参数的提取,而不是以固定频宽的滤波器阵列机械地将混叠信号分解,所以,DP-MBE SAM模型更适合语音信号的频率特性变化的要求,混叠语音的分离具有更接近人类听觉系统的鲁棒性和灵活性,我们所提出的系统不同于传统的语音听觉场景分析系统,可实现混叠语音信号的有效分离。由于模型中混叠信号的参数是对应基频而定,参数确定时就对应着分为两组,从而节省了传统语音听觉场景分析系统复杂的分组环节,大大降低了系统的计算量。另外,DP-MBE SAM机理可以推广到多于两个语音信号的场景分析。且代表语音的参数作为一组数字化信息更适合作为联系底层听觉系统和高层大脑的中层表达的要求。实验结果表明了我们的系统能有效分离基频不同的混叠语音信号。
     2、针对基于DP-MBE SAM的混叠浊音听觉场景分析系统存在的问题,提出了改进的基于DP-MBE SAM的混叠语音听觉场景分析系统,包括两部分内容,第一,针对DP-MBE SAM在实际的应用中存在的求解矩阵奇异引起的参数模糊及混叠信号谐波频率相同引起的参数模糊问题,借鉴多帧内插方法,提出了改进的DP-MBE SAM;第二,在基于DP-MBE SAM的混叠浊音听觉场景分析系统加入清音分离环节,将系统的应用从浊音推广到含有清浊音的语音信号。将改进系统用于两个混叠语音信号的场景分析,实验结果表明了改进系统的有效性。
     3、针对Meddis建立的基于听觉心理生理的混叠语音基频提取算法存在的问题,提出了基于听觉心理生理的混叠语音基频提取新算法,由于采取了闭环自适应提取模块及相应的潜在基频确定方法,提高了搜索潜在基频的鲁棒性,又利用潜在基频重新划分频带;有效提高了基频提取精度。实验结果证实了混叠语音基频提取新算法具有听觉场景分析所需要的较好的鲁棒性和柔韧性,所提取的基频可作为听觉场景分析系统的声音归类线索。
The purpose of Computational Auditory Scene Analysis(CASA) is to modify the psychological and physiological function of human's auditory system by using computer technique, to make computer to have the ability to process sound like human ear do(including segregation and explanation). This is a brand new topic for research. In this dissertation, scene analysis for concurrent speech signals are discussed, and we have set up a basically complete concurret speech signals ASA system The novel achievements are listed as follows:
    1. For the bad separation results, complexity of grouping process and huge computation in the existing ASA system for concurrent speech signals, refering to Multi-Band Excitation (MBE) technique, we present a double pitch multi band excitation scene analysis model (DP-MBE SAM), and a new ASA system for concurrent speech signals based on DP-MBE SAM. The new system differs from the existing ASA system in that mixed signals is not decomposed into elements with fixed filters banks, but according to the given pitches, their parameters are estimated simultaneously and naturally into two speech groups, thus the DP-MBE SAM model is more applicable to the frequency change of the speech signals and possess the roubustness and flexibility in concurrent speech signals separation, whose character is close to human auditory system. The new system presents a more efficient way to decompose close harmonies than the existing ASA system and reduces the complexity of grouping. Besides, the mechanism of DP-MBE SAM can be extended to the scene analysis of more than two concurrent speech signals, and the parameters of the speech is a middle level present to communicate the low level auditory system and the high level brain. Experimental results show that concurrent speeches with different pitches are separated efficiently.
    2 . For the existing problem of the ASA system for concurrent speech signals based on DP-MBE SAM in chapter 3, we present an improved ASA system based on DP-MBE-SAM. The improvement include two parts: firstly, for the parameter ambiguity caused by the singularity of parameter extracting matrix, and other ambiguities, by referring to multi-frame interpolation method. We present an improved DP-MBE SAM; secondly, un-voice analysis is introduced to the DP-MBE-SAM, so that the improved ASA system can be used not only to separate concurrent vowels but concurrent speech signals with unvoice as well. Experimental results show the efficiency of the improved ASA system.
    3. For the existing problem of Meddis' psychophysically faithful method for extracting pitches from concurrent speech signalsis, we present a new pitch extracting method, which uses close-loop adaptive frequencies picking block to
引文
[1] Bergman A. S. Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA. : The MIT Press, 1990
    
    [2] Cherry, E. C. Some experiments on the recognition of speech, with one and with two ears, Journal of Acoustic Society of America, 1953, 25: 975-979
    
    [3] Rosenthal D. F. and Okuno H. G. Computational Auditory Scene Analysis, Lawrence Erlbaum Associates, Inc., Publishers, NJ. 1998
    
    [4] Parson T. W. Separation of speech from interfering speech by means of harmonic selection, Journal of Acoustic Society of America, 1976, 60(4): 911-918
    
    [5] Scheffers M. T. M. Sifting vowels: auditory pitch analysis and sound segregation, Doctoral thesis, University of Groningen, 1983
    
    [6] Stubbs R. J. and Summerfield Q. Evaluation of two voice- separation algorithms using normal-hearing and hearing-impaired listeners, Journal of Acoustic Society of America, 1988, 84(4): 1236-1249
    
    [7] Meddis R. and Hewitt M. J. Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification, J. Acoust. Soc. Am. 1991, 89 (6): 2866-2882
    
    [8] Meddis R. and Hewitt M. J. Modeling the identification of concurrent vowels with different fundamental frequencies, J. Acoust. Soc. Am. 1992, 91(1) : 233-245
    
    [9] Brown G. J. and Cooke M. Computational auditory scene analysis, Computer Speech and Language, 1994, 8: 297-336
    
    [10] Lyon R. F. A computational model of binaural localization and separation, Proceedings of ICASSP' 83 :1148-1151
    
    [11] von der Malsburg C. and Schneider W. A neural cocktail-party processor, Biol. Cyber. 1986, 54: 29~40
    
    [12] Weintraub M. A theory and computational model of auditory monaural sound separation, PhD dissertation, EE department, Stanford University, 1985
    
    [13] Ellis D. Divisive issues in computational auditory scene analysis, Stanford Hearing Seminar, 1997
    
    [14] Cooke M. P. and Ellis D. The auditory organization of speech in listeners and machines, (in print) The Auditory Basis of Speech Perception, (Eds.) Greenberg & W. Ainsworth. 1998
    
    [15] Shaw E. A. G. The external ear, Handbook of Auditory Physiology, (Eds.) W. D. Keidel and W. D. Neff, Springer, Berlin, 1974
    
    [16] Glasdberg B. R. and Moore B. C. J. Derivation of auditory filter shapes from notched-noise data, Hearing Research, 1990, 47: 103-138
    
    [17] Lyno R. F. A computational model of filtering detection and compression in the cochlea, Proceedings of ICASSP' 82
    
    [18] Lyno R. F. and C. Mead, An analog electronic cochlea, IEEE Trans. Acoustics, Speech and Signal Processing, 1988, Vol.36: 1119— 1134
    
    [19] Patterson R. D. and Robinson K. and Holdworth J. and McKeown D. and Zhang C. and Allerhand M. H. Complex sounds and auditory images, Auditory Physiology and Perception, (Eds.) Y Cazals L. and Demany K. Horner, Pergamon, Oxford, 1992: 429-446
    
    [20] Seneff S. A joint synchrony/mean-rate model of auditory speech processing, Journal of Phonetics, 1988, Vol.16: 55-76
    
    [21] Schroeder M. R. An integrable model for the basilar membrane, J. Acoust. Soc. Am., 1973, 53: 429-434
    
    [22] Zweig G. Z. and Lipes R. and Pierce J.R. The cochlear compromise, J. Acoust. Soc. Am., 1976, 59: 975-982
    
    [23] Slaney M. Lyon. A cochlear model, Apple Technical Report #13, 1988
    
    [24] de Boer E. and de Jongh H. R. On cochlear encoding: potentialities and limitations of the reverse-correlation technique, J. Acoust. Soc. Am., 1978, 63: 115-135
    
    [25] Hewitt M. J. and Meddis R. An evaluation of eight computer models of mammalian inner hair-cell function, J. Acoust. Soc. Am. 1991, 90 (2) : 904-917
    
    [26] Meddis R. Simulation of auditory-neural transduction: Further studies, J. Acoust. Soc. Am., 1986, 79(3) : 702-711
    
    [27] Meddis R. Simulation of mechanical to neural transduction in the auditory receptor, J. Acoust. Soc. Am., 1988, 83(3) :1056-1063.
    
    [28] Licklider J. C. R. A duplex theory of pitch perception, Experntia, 1951, 7: 128-134
    
    [29] Carlson R. and Granstrom B. Towards an auditory spectrograph, The Pre-presentation of Speech in the Peripheral Auditory System, R. Carlson and B. Granstrom (Eds.), Elsevier, Amsterdam, Holland, 1982
    
    [30] Ciocca V and Bregman A.S. Perceived continuity of gliding and steady-state tones through interrupting noise, Perception and Psychophysics, 1987, 42:476-484
    
    [31] Riley M. D. Speech Time-Frequency Representations. Kluwer Academic Publishers, Boston. , 1989
    
    [32] McAulay R. J. and Quatier, T .F. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans, on ASSP, 1986, 34: 744-754
    
    [33] Cooke M. P. Modeling Auditory Processing and Organization, Cambridge University Press. 1993
    
    [34] Assmann P. F.and Summerfield Q. Modeling the perception of concurrent vowels: Vowels with different fundamental frequencies, J. Acoust. Soc. Am., 1990, 88: 680-697
    
    [35] Ellis D. P. W. A Computer implementation of Psychoacoustic Grouping Rules, Proc. 12th Conf. on Pattern Recognition, Jerusalem, 1994
    
    [36] Carver, N. and Lessor, V. Blackboard system for knowledge- based signal understanding, Symbolic and Knowledge-based Signal Processing (Eds. A. V. Oppenheim & S. H. Nawab), Prentice Hall, 1992
    
    [37] Nakatani T. and Kawabata and T. Okuno H. G. Residue-driven architecture for computational auditory scene analysis, Proc. of IJCAI-95
    
    [38] Slaney M. A critique of pure audition, Proceedings of the first Workshop on CASA, Int. Joint Conf. AI, Montreal, 1995
    
    [39] Marr D. Vision, New York, W. H. Freeman, 1982[40] Churchland P., Ramachandran V.S., & Sejnowski P. A critique of pure vision. Large-Scale Neuronal Theories of the Brain, (Eds: Christof Koch and Joel Davis Cambridge), MA: MIT Press. 1994
    
    [41] Varga A.P. and Moore R.K. Hidden markov model decomposition of speech and noise, Proceedings of the ICASSP' 90, 845-848
    
    [42] Cooke M., Crawford M., and Green P. Learning to recognize speech in noisy environments. ATR TR-H-121, Proceedings of the ATR workshop on A Biological Framework for Speech Perception and Production, Kyoto, 1995, 13-17
    
    [43] Nawab S. H. and Lessor V. Integrated processing and understanding of signals, Symbolic and Knowledge-based Signal Processing (Eds: A. V. Oppenheim & S. H. Nawab), Prentice Hall, 1992
    
    [44] Ellis D. P. W. Prediction-driven computational auditory scene analysis, Doctoral dissertation, MIT, 1996
    
    [45] Godsmark D. and Brown G. J. Modeling the perceptual organization of polyphonic music, Proceedings of the 2nd Workshop on CASA, Int. Joint Conf. Artificial Intelligence, Nagoya. 1997
    
    [46] Klasnner F. , Lesser V. and Nawab S. H. The IPUS blackboard architecture as a framework for computational auditory scene analysis, Computational Auditory Scene Analysis (eds: D. Rosenthal & H. Okuno), Lawrence Erlbaum, 1998
    
    [47] Nakatani T., Okuno H. G., Goto M. and Ito, T. Multi-agent based binaural sound stream segregation, Computational Auditory Scene Analysis(eds: D.Rosenthal & H. Okuno), Lawrence Erlbaum. 1998
    
    [48] Nakatani T., Okuno H. G. and Kawabata T. Auditory stream segregation in auditory scene analysis with a multi-agent system. Proc. of AAAI-94
    
    [49] Lesser V., et al. Integrated Signal Processing and Signal Understanding, Technical Report 91-34, Comp. Sci. Dept., Univ. of Massachusetts, Amherst, 1991
    
    [50] Kashino K. , et al. Application of the Bayesian probability network to music scene analysis, Computational Auditory Scene Analysis(eds: D. Rosenthal & H. Okuno), Lawrence Erlbaum. 1998
    [51] Cooke M . P. , Brown G. J. , Crawford M. D. and Green, P. D. Computational auditory scene analysis: Listening to several things at once, Endeavor, 1993, 17,186-190
    
    [52] Bregman A. S. Psychological data and computational ASA, Computational Auditory Scene Analysis(eds: D. Rosenthal & H. Okuno), Lawrence Erlbaum,1998
    
    [53] Wang D. L. Auditory stream segregation based on oscillatory correlation. Proceedings of IEEE Workshop on Neural Networks for Signal Processing, 1994, 624-632
    
    [54] Brown G. J. and Cooke M. Temporal synchronization in a neural oscillator model of primitive auditory stream segregation, Computational Auditory Scene Analysis(eds: D. Rosenthal & H. Okuno), Lawrence Erlbaum, 1998
    
    [55] Keller E. (Ed.), Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State of the Art and Future Challenges, John Wiley & Sons Ltd., 1995
    
    [56] Boden M. Binaural modeling and auditory scene analysis, Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk, 1995
    
    [57] Cooke M. P. Auditory organization and speech perception: Arguments for an integrated computational theory, ESCA Workshop on the Auditory Basis of Speech Perception, Keele University, 1996
    
    [58] Beet S. W. Automatic speech recognition using a reduced auditory representation and positicn-tolerant discrimination, Computer Speech & Language, 1990, 4(1), 17-33
    
    [59] Cooke M. P., Morris A.C. and Green P. D. Recognizing occluded speech, Univ. of Sheffield Dept. of Computer Science Technical Report, 1995
    
    [60] Green P. D. , Cooke M. P. and Crawford M. D. Auditory scene analysis and HMM recognition of speech in noise, Proc. of ICASSP' 95, 401-404
    
    
    [61] Morris A. C., Green P. D. and Cooke M. P. Bayesian techniques for recognition with missing data: application to the recognition of occluded speech by Hidden Markov Models, Univ. of Sheffield Dept. of Computer Science Technical Report TR-98-02, 1998
    [62] Ellis D. P. W. Computational auditory scene analysis exploiting speech-recognition knowledge, Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustic, Mohonk, 1997
    [63] Ellis D. P. W. Mid-Level Representations for Computational Auditory Scene Analysis: the Weft Element. in Computational Auditory Scene Analysis (eds: D. Rosenthal & H. Okuno), Lawrence Erlbaum, 1998, 43-58
    [64] D.W. Griffin, Multi-band excitation vocoder, Ph.D. Thesis, EECS Department, MIT, 1988
    [65] Meddis R., O' Mard L. Psychophysically Faithful Methods for Extracting Pitch. in Computational Auditory Scene Analysis (eds: D. Rosenthal & H. Okuno), Lawrence Erlbaum. 1998
    [66] Daniel W. G. and Jae S.L.: 'Multiband Excitation Vocoder', IEEE Trans. ASSP, 1988, 36, 1223-1235
    [67] Cooke M, Ellis D P W. The auditory organization of speech and other sources in listeners and computational models. Speech Communication,, 2001, 35(3-4), 141-177
    [68] 赵鹤鸣、舒春燕等.基于SHS的中叠语音基频分离检测方法.信号处理,2000,16(1),63-67
    [69] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Englewood Cliffs, NJ: Prentice-Hall. 1987
    [70] 王田、崔慧娟、冯重熙,“改进型LPC声码器语音编码算法”,通信学报,1996,17(5),1-8
    [71] David A K, Russell J N, An Autocorrelation Pitch Detector and Voicing Decision with Confidence Measures Developed for NoiseCorrupted Speech, IEEE Trans. on Signal Processing, 1991, 39(2), 319-329
    [72] Quatieri T.F., Danisewicz R.G., An Approach to Co-Channels Talker Interference Suppression Using a Sinusoidal Model for Speech, IEEE trans. On Acoustics. Speech and Signal Processing, 1990, 38(1), 56-69
    [73] L.B. Almeida, F.M. Silva, Variable-frequecy Synthesis: An Improved Harmonic Coding Scheme, Proc. Int. Conf. Acoust, Speech, Signal Processing, San Diego, CA, 1984, 27-33
    [74] D.W. Griffin, J. S. Lim, Signal estimate from modified shorttime Fourier transform, IEEE Trans. Acoust, Speech, Signal Processing, 1984, 236-243
    [75] 杨行峻、迟惠生等.语音信号数字处理.电子工业出版社,1995[76] 韦岗、邱伟.现代信号处理技术.华南理工大学出版,1992
    [77] 吴玺宏、迟惠生、王楚.基于听觉模型的语音信号听觉神经处理.生物物理学报,1997:Vol.13,No.3,427-435
    [78] 吴镇扬、王卫斌.基于空间特征抽取与神经网络的人耳空间听觉模型.声学学报(中文版),1999:24(6),645-652
    [79] 赵鹤鸣、周旭东、金延庆、翁桂荣.基于小波变换的重叠语音基频提取及声调识别.声学学报(中文版),1999:Vol.24,No.1,87-93
    [80] 张军.抗噪声语音识别技术的研究,华南理工大学工学博士学位论文,2003
    [81] Malcolm Slaney, Lyon's Cochlear Model, Apple Computer Technical Report #13, 1988
    [82] Assmann, P. F., Summerfield, Q, The contribution of waveform interactions to the perception of concurrent vowels, J. Acous. Soc. Am., 1994, 95(1), 471-484
    [83] Beranek, L. L,. Concert hall acoustics, J. Acous. See. Am., 1992, 92(1), 1-39
    [84] Berthommier, F., Lorenzi, CImplications of physiological mechanisms of amplitude modulation processing for modelling complex sounds analysis andseparation, in working notes of the workshop on Comp. Aud. Scene Analysis at the Intl. Joint Conf. on Artif. Intel., Montreal,. 1995, 26-31
    [85] Bilmes, J. ATiming is of the essence: Perceptual and computational techniques for representing, learning, and reproducing expressive timing in percussive rhythm, M.S. thesis, Media Laboratory, Massachusetts Institute of Technology, 1993
    [86] Bodden, M. Modeling human sound-source localization and the cocktail-party effect, Acta Acustica, 1993, 1, 43-55
    [87] Bregman, A. S. Psychological Data and Computational ASA, in working notes for the workshop on Comp. Aud. Scene Analysis at the Intl. Joint Conf. on Artif. Intel., Montreal, 1995, 4-8
    [88] Brown, G. J., Computational auditory scene analysis: A representational approach, 1992,. Ph.D. thesis CS-92-22, CS dept., Univ. of Sheffield
    [89] Brown, G. J., Cooke, M., Temporal synchronisation in a neural oscillator model of primitive auditory stream segregation, in working notes of the workshop on Comp. Aud. Scene Analysis at the Intl. Joint Conf. on Artif. Intel., Montréal, 1995, 41-47
    [90] Carlyon, R. P., Discriminating between coherent and incoherent frequency modulation of complex tones, J. Acous. Soc. Am 89(1), 329-340
    [91] Carver, N., Lesser, V., The evolution of blackboard control??architectures, U. Mass. Amherst CMPSCI tech. report #92-71
    
    [92] Cooke, M., Crawford, M., Green, P. Learning to recognize speech from partial descriptions, Proc. Intl. Conf. on Spoken Lang. Proc. , Yokohama. , 1994
    
    [93] Darwin, C. J., Ciocca, V. Grouping in pitch perception: effects of onset asynchrony and ear of presentation of a mistuned component, J. Acous. Soc. Am. 1992, 91(6), 3381-90
    
    [94] Denbigh, P. N., Zhao, J., Pitch extraction and the separation of overlapping speech, Speech Communication 1, 1992, 1, 119-125.
    
    [95] Duda, R. O., Lyon, R. F., Slaney, M., Correlograms and the separation of sounds, Proc. IEEE Asilomar conf. on sigs., sys. & computers. 1990
    
    [96] Ellis, D. P. W. A perceptual representation of audio, MS thesis, EECS dept, Massachusetts Institute of Technology., 1992
    
    [97] Ellis, D. P. W., Hierarchic Models of Hearing for Sound Separation and Reconstruction, Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. And Audio, Mohonk., 1993
    
    [98] Ellis, D. P. W. A computer implementation of psychoacoustic grouping rules, Proc. 12th Intl. Conf. on Pattern Recognition, Jerusalem., 1994
    
    [99] Ellis, D. P. W. Underconstrained noisy representations for top-down models of auditory scene analysis, Proc. IEEE Workshop on Apps. of Sig. Proc.to Audio and Acous., Mohonk.,1995
    
    [100] Ellis, D. P. W., Vercoe, B. L., Quatieri, T. F. A perceptual representation of audio for co-channel source separation, " Proc. IEEE Workshop on Apps. of Sig. Proc. to Audio and Acous., Mohonk. , 1991
    
    [101] Flanagan, J. L., Golden, R. M. Phase vocoder, The Bell System Technical Journal, 1966,. 1493-1509
    
    [102] Ghitza, O., Adequacy of auditory models to predict human internal representation of speech sounds, J. Acous. Soc. Am, 93(4), 2160-2171
    
    [103] Godsmark, D. J., Brown, G. J. Context-sensitive selection of competing auditory organisations: a blackboard model, " in working notes of the workshop on Comp. Aud. Scene Analysis at the Intl. Joint Conf. on Artif. Intel., Montreal, 1995), 60-67
    
    [104] Grabke, J. W. , Blauert, J. Cocktail-party processors based on binaural models, in working notes of the workshop on Comp. Aud. Scene Analysis at the Intl. Joint Conf. on Artif. Intel., Montreal, 1995, 105-110
    
    [105] Hall, J. W. 3rd, Grose, J. H. Comodulation Masking Release and auditory grouping, J. Acous. Soc. Am. 88(1), 119-125
    
    [106] Hewitt, M. J., Meddis, R., Regularity of cochlear nucleus stellate cells: A computational modeling study, J. Acous. Soc. Am. 93(6), 3390-3399
    
    [107] Kollmeier, B., Koch, R., Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction, J. Acous. Soc. Am. 95(3), 1593-1602.
    
    [108] J-P. Adoul and M.Delprat, Design algorithm for variable-length vector quantizers, Proc.Allerton Conf .circuits,Systems Computers, 1986, 1004-1011
    
    [109] Moore, B. C. J., Glasberg, B. R. , Suggested formulae for calculating auditory-filter bandwidths and excitation patterns, J. Acous. Soc. Am. 74(3), 750-753
    
    [110] Parsons, T. W., Separation of speech from interfering speech by means of harmonic selection, J. Acous. Soc. Am. 60(4), 911-918.
    
    [111] Patterson, R. D. , A pulse ribbon model of monaural phase perception, J. Acous. Soc. Am. 82(5), 1560-1586
    
    [112] Patterson, R. D. , The sound of a sinusoid: Time-interval models, J. Acous. Soc. Am. 96, 1419-1428
    
    [113] Patterson, R. D., Allerhand, M. H., Giguere, C. , Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform, J. Acous. Soc. Am. 98(4), 1890-1894.
    
    [114] Portnoff, M. R. Time-scale modification of speech based on short-time Fourier analysis, IEEE Tr. ASSP ,1981, 29(3), 374-390.
    
    [115] Quatieri, T. F., Danisewicz, R. G. , An approach to co-channel talker interference suppression using a sinusoidal model for speech, IEEE Tr. ASSP, 1990, 38(1)
    
    [116] Rabiner, L. R. , Schafer, R. W. Digital Processing of Speech Signals, 1978, Prentice-Hall
    
    [117] Schaefer, R. , Rabiner, L. Digital representations of speech signals, Proc. IEEE, 1975, 63(4), 662-667

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700