用户名: 密码: 验证码:
基于支持向量机的有机化合物红外光谱结构解析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
自然科学与技术科学的信息化是科技发展的重要趋势。科学数据的大量积累,往往导致重大科学规律的发现。这为化学计量学的数据挖掘研究提供了机遇。几十年以来,人们一直在探索如何从红外谱图中极大可能地提取信息,将解析经验化。随着商品化红外光谱仪的计算机化,出现了许多计算机辅助红外光谱识别方法,这些方法大致可以分为三类:专家系统,谱图检索系统,模式识别方法。其中最常用的模式识别方法是人工神经网络和偏最小二乘法。文献中大部分利用它们对子结构或特定类别的化合物进行识别,而对整个有机化合物的红外光谱的深入研究尚未涉及,对化合物的特征吸收峰也没有深入的讨论。此外,即使应用最多的人工神经网络在识别子结构时,对结构碎片的预测准确度也不是很高,且神经网络存在不稳定、容易陷入局部极小和收敛速度慢等问题。
     本文尝试利用支持向量机算法对有机化合物的红外光谱进行规律探讨。根据各类有机化合物红外吸收的不同,设计了一个分等级系统对OMNIC数据库中6352个有机化合物进行分类。该系统首先将有机化合物分为五大类:芳香化合物、烃类、含氧化合物以及含氮化合物;然后根据各类化合物的红外光谱特征,进一步对其细分:芳香化合物按照取代类型和邻近官能团的不同分为四大类;烃类分为饱和和不饱和烃;含氧化合物根据氧原子所连接官能团不同分为四大类:羟基化合物、羰基化合物、醚、酸;含氮化合物也同样根据红外光谱的特点分为肼、酰胺、芳香胺、脂肪胺;接着根据各类化合物红外吸收的特点又进行了更细致的分类。将支持向量机所得结果与人工神经网络所得结果进行比较,在大部分有机化合物的识别中,支持向量机均优于人工神经网络。在此基础上,利用支持向量机详细研究了芳香化合物的识别.芳香化合物包含五个特征频率区:苯环=C—H键的伸缩振动、苯环=C—H键的面外振动的倍频和和频、苯环骨架振动、苯环=C—H键的面内弯曲振动和苯环=C—H键的面外弯曲振动。讨论了利用芳香化合物五个特征频率区光谱片断以及它们的组合作为支持向量机输入对识别能力的影响,并比较说明了所得结果。
     结果表明在有机化合物结构识别中,支持向量机的表现优于人工神经网络,表明支持向量机在红外光谱谱构关系研究中具有优异性,较适合红外光谱的研究;在芳香化合物谱构关系的讨论部分,可以看到苯的五个振动方式中,C-H和C-C面外弯曲振动在区别苯衍生物取代类型时是最有意义的,这与经典红外理论一致;在片段光谱和全谱预测结果相比较时,我们发现最好的结果不一定都由全谱得到。这一结论为红外光谱信息的深度挖掘提供了新的思路。
     支持向量机在红外光谱领域展示出良好的性能,是一种很好的计算机辅助红外光谱解析的工具。将包含特征峰的光谱片段用于光谱识别的研究则为红外光谱计算机解析领域提供新的思路,为最大限度的提取红外光谱信息,最终实现光谱的完全计算机解析打下基础。
An important trend of the development of technique is the informationization of science and techniques. Historically, the accumulated collection of the scientific data always results in the discovery of important scientific rules. This provides the opportunity to mine the data of chemometrics. With the bigger amount of the infrared spectra database, the deeper development of the infrared technology and of the computer, it is urgent to find a solution about how to utilize and enlarge the application of infrared spectra. Along with the computerization of the commercialized infrared spectrometry, there are many computer- assisted interpretation of infrared spectra emerged. The automatic structure elucidation of infrared spectra generally falls into three groups: library search, knowledge-based systems, or pattern recognition. Among the last group of method, artificial neural networks (ANNs) and partial least squares (PLS) were most frequently used. Automatic interpretation of infrared spectra by using pattern recognition techniques such as artificial neural networks has dominant focus on specifically sub-structure prediction. The whole organic compounds and absorption bands of compounds are ignored on classification. This paper tried to discuss the rule of infrared spectra of organic compounds. Furthermore, ANNs have several major drawbacks: unsteadiness, local minima and very low speed of convergence.
     A recently actively used intelligence algorithm, support vector machine (SVM), is introduced to build classifiers for a hierarchical classification structure of 6352 compounds. In this system, the organic compounds were firstly separated into four classes: aromatic compounds, hydrocarbons, oxygen-contained compounds and nitrogen-contained compounds; then a detailed separation was taken on based on the characteristic of infrared spectra for each kind of compound: aromatic compounds were subdivided into four kinds on the base of the substituted types and adjacent functional groups of benzene, hydrocarbons were separated into saturated hydrocarbons and unsaturated hydrocarbons, oxygen-contained compounds were separated into four classes: hydroxyl, carbonyl, ether, carboxylic acids, nitrogen-contained compounds were comprised of aliphatic amines, aromatic amines, amides and hydrazones; in the next place, a more detailed separation were taken on for each compounds according to their characteristic absorbtion in infrerad spectra. Results from support vector machine were compared favorably with those obtained by using artificial neural networks methods. Obviously, support vector machine shows better performance.
     In addition, aromatic compounds were more studied by support vector machine. Five characteristic infrared absorptions are contained in aromatic compounds: C-H stretch vibration, the overtone and combination of benzene, C=C stretch vibration, C-H wagging in-plane vibration and C-H wagging out-plane vibration. The five segmental spectra aromatic compounds and various combinations of the segmental spectra are fed to SVM to build classifiers respectively.
     The results showed that in distinguishing the organic compounds, SVM behaved appreciably better than ANN which suggested that SVM approach can be an efficient tool for the information extracting of infrared spectra; in the process of analyzing each Segmental spectrum, it can be concluded that C–H and C–C wagging out-of-plane vibration was the most important vibrational mode in judging different substituted types of ordinary benzene derivatives of all five absorption of aromatic compunds to affecting its substituted types, which agrees with related known research results; When the results from segmental and entire spectra were compared ,we found that some compounds can be well recognized by using only one or two segmental spectra with reasonable results. It means that some segmental spectra may represent the most significant structure information concealed in entire spectra. In another word, the best results are not always got by entire spectra in computer-insistent interpretation of infrared spectra.
     Support vector machine as a good tool in interpretation spectra shows excellent performance in the filed of infrared spectra. This article provides the quantitative methods and introduces a new strategy for the establishment of infrared spectra intelligent interpretation system. And SVM approach can be an efficient tool for the information extracting of infrared spectra.
引文
1 林沝, 吴平平, 周文德, 王俊德. 实用付里叶变换红外光谱学. 中国环境科学出版社 1991
    2 谢晶曦, 常俊标, 王绪明. 红外光谱在有机化学和药物化学中的应用. 科学出版社 2001
    3 宁永成. 有机化合物结构鉴定与有机波谱学. 科学出版社 2002
    4 荆煦瑛. 付里叶变换红外光谱的应用研究. 吉林大学出版社 1989
    5 赵珧兴, 孙祥玉. 光谱解析与有机结构鉴定. 中国科学技术大学出版社 1992
    6 苏克曼, 潘铁英, 张玉兰. 波谱解析法. 华东理工大学出版社 2002
    7 L. Mariey, J.P. Signolle, C. Amiel, J. Travert. Discrimination, classification, identification of microorganisms using FTIR spectroscopy and chemometrics. Vibrational Spectroscopy 2001, 26, 151-159
    8 范晓燕. 傅立叶变换红外光谱在生命科学中的应用. 生命科学研究 2003, 7(2), 83-87
    9 徐贵云, 范金石, 姚金水. 红外光谱法在聚合物鉴别中的应用. 山东轻工业学院学报 1999, 13(3), 39-41
    10 朱蕾, 苏艳. 傅里叶红外光谱分析在环境试验中的应用. 环境技术 2002, 3, 5-9
    11 袁洪福, 陆婉珍. 现代光谱分析中常用的化学计量学方法. 现代科学仪器 1998, 5, 6-9
    12 刘民武, 田敏, 江天籁. 最小二乘-红外光谱法定量测定羟基化合物. 分析化学 1997, 25(6), 718-721
    13 何锡文, 陈鼎, 王永泰. 优化迭代目标转换因子分析法在多组分混合物红外光谱解析中的应用. 高等学校化学学报 1996, 16(6), 868-870
    14 刘芳, 王俊德. 遗传算法用于傅里叶变换红外光谱的定量解析. 光谱学与光谱分析 2001, 21(5), 607-610
    15 李燕, 孙秀云, 王俊德. 人工神经网络法测定五组分红外光谱体系. 光谱学与光谱分析 2000, 20(6), 773-776
    16 李燕, 王俊德, 顾炳和, 孟广政. 人工神经网络及其在光谱分析中的应用. 光谱学与光谱分析 1999, 19(6), 844-849
    17 赵虹. 含氧化合物的红外光谱特征. 光谱学与光谱分析 2001, 21(4), 506-507
    18 J. Coates. Interpretation of Infrared Spectra, A Practical Approach. John Wiley & Sons Ltd2000
    19 潘彦斌, 赵勇, 张福义. 红外指纹区特点及解析. 现代仪器 2000, 1, 1-13
    20 姜小平. 红外光谱定性解析原则. 张家口师范专科学校学报 2001, 17(6) , 56-58
    21 胡鑫尧, 孙扬名, 王心枢. 计算机在分析化学中的应用. 清华大学出版社 1983
    22 俞汝勤. 现代分析化学的信息理论基础. 湖南大学出版社 1987
    23 梁文平, 庄乾坤. 分析化学的明天-学科发展前沿与挑战. 科学出版社 2003
    24 徐光宪. 21世纪的化学是研究泛分子的科学. 中国科学基金 2002, 2, 70-76
    25 M.E. Munk. Computer-based structure determination: then and now. Journal of Chemical Information and Computer Sciences 1998, 38(6), 997-1009
    26 K. Varmuza, M. Karlovits, W. Demuth. Spectral similarity versus structural similarity: infrared spectroscopy. Analytica Chimica Acta 2003, 490(1-2), 313-324
    27 F. Ehrentreich. Three-step procedure for infrared spectrum interpretation. Analytica Chimica Acta 2001, 427(2), 233-244
    28 H.J. Luinge, E.D. Leussink, T. Visser. Trace-level identity confirmation from infrared spectra by library searching and artificial neural networks. Analytica Chimica Acta 1997, 345(1-3), 173-184
    29 Alexander Kai-man Leung, Foo-tim Chau, Jun-bin Gao, Tsi-min Shih. Application of wavelet transform in infrared spectrometry: spectral compression and library search. Chemometrics and Intelligent Laboratory Systems 1998, 43, 69-88
    30 K. Varmuza, N.T. Kochev, P.N. Penchev. Evaluation of Hitlists from IR Library Searches by the Concept of Maximum Common Substructures. Analytical Science 2001, 17(supplement), 659-662
    31 K. Varmuza, P.N. Penchev, H. Scsibrany. Large and frequently occurring substructures in organic compounds obtained by library search of infrared spectra. Vibrational Spectroscopy 1999, 19, 407-412
    32 C.S. Chen, Y. Li, C. W. Brown. Searching a mid-infrared spectra library of solid and liquids with spectra of mixtures. Vibrational Spectroscopy 1997, 14, 9-17
    33 Z.Y. Meng, Y.J. Ma. An Expert System Knowledge Base for the Analysis of Infrared Spectra of Organophosphorus Compounds. Analytica Chimica Acta 1996, 53(3), 371-375
    34 G.N. Andreev, K. Argirov. Implementation of human expert heuristics in computer supported infrared spectra interpretation. Journal of Molecular Structure 1995, 347(1), 439-448
    35 B. Debska, B. Guzowska-Swider. SCANKEE—computer system for interpretation of infrared spectra. Journal of Molecular Structure 1999, 511 (23), 167-171
    36 G.N. Andreev, O.K. Argirov, V. Ognyaova. Implementation of the expert system EXPIRS for interpretation of infrared spectra of negative ions of organic compounds. Journal of Molecular Structure 2001, 598(1), 27-35
    37 G.N. Andreev, O.K. Argirov. EXPIRS, an expert system for generation of alternative sets of substructures, derived by infrared spectra interpretation. Analytica Chimica Acta 1996, 321(1), 105-111
    38 U.M. Weigel, R. Herges. Simulation of infrared spectra using artifical neural networks based on semiempirical and empirical data. Analytica Chimica Acta 1996, 331(1-2), 63-74
    39 刘宇, 孟令辉, 黄玉东, 严粉顺. 红外光谱计算机检测分析系统. 计算机与应用化学 2001, 18(6), 574-576
    40 E.W. Robb, M.E. Munk. A neural network approach to infrared spectrum interpretation. Mikrochimica Acta 1990, 1, 131-155
    41 M.E. Munk, M.S. Madison, E.W. Robb. Neural network models for infrared spectrum interpretation. Mikrochimica Acta 1991, 2, 505-514
    42 M. Meyer, T. Weigelt. Interpretation of infrared spectra by artificial neural networks. Analytica. Chimica Acta 1992, 265(2), 183-190
    43 K. Tanabe, T. Tamura, H. Uesaka. Neural network system for the identification of infrared spectra. Applied Spectroscopy 1992, 46(5), 807-810
    44 Q.C. Van Est, P.J. Schoenmakers, J.R.M. Smits, W.P.M. Nijssen. Practical implementation of neural networks for the interpretation of infrared spectra. Vibrational Spectroscopy 1993, 4(3), 263-272
    45 U.M. Weigel, R. Herges. Automatic interpretation of infrared spectra: recognition of aromatic substitution patterns using neural networks. Journal of Chemical Information and Computer Sciences 1992, 32(6), 723-731
    46 D. Ricard, C. Cachet, D. Cabrol-Bass, T.P. Forrest. Neural network approach to structural feature recognition from infrared spectra. Journal of Chemical Information and Computer Sciences 1993, 33(2), 202-210
    47 C. Klawun, C.L. Wilkins. A novel algorithm for local minimum escape in back-propagation neural networks: application to the interpretation of matrix isolation infrared spectra. Journal of Chemical Information and Computer Sciences 1994, 34(4), 984-993
    48 C. Klawun, C.L. Wilkins. Optimization of functional group prediction from infrared spectra using neural networks. Journal of Chemical Information and Computer Science 1996, 36(1), 69-81
    49 C. Cleva, C. Cachet, D. Cabrol-Bass, T.P. Forrest. Advantages of a hierarchical system of neural-networks for the interpretation of infrared spectra in structure determination. Analytica Chimica Acta 1997, 348(2), 255-265
    50 M.E. Munk, M.S. Madison, E.W. Robb. The neural network as a tool for multispectral interpretation. Journal of Chemical Information and Computer Sciences 1996, 36(2), 231-238
    51 李梦龙, 罗明亮, 孙兆林, 张晓彤. BP 网络用于红外光谱碎片结构识别-模型与算法的研究. 抚顺石油学院学报 1996, 13(3), 37-42
    52 M. Novic, J. Zupan. Investigation of infrared spectra-structure correlation using kohonen and counterpropagation neural network. Journal of Chemical Information and Computer Sciences 1995, 35(3), 454-466
    53 J.R.M. Smits, P. Schoenmakers, A. Stehmann, F. Sijstermans, G. Kateman. Interpretation of infrared spectra with modular neural-network systems. Chemometrics and Intelligent Laboratory Systems 1993, 18(1), 27-39
    54 M. Meyer, K. Meyer, H. Hobert. Neural networks for interpretation of infrared spectra using extremely reduced spectral data. Analytica Chimica Acta 1993, 282(2), 407-415
    55 P.N. Penchev, G.N. Andreev, K. Varmuza. Automatic classification of infrared spectra using a set of improved expert-based features. Analytica Chimica Acta 1999, 388(2), 145-159
    56 T. Visser, H.J. Luinge, J.H. Van der Maas. Recognition of visual characteristics of infrared spectra by artificial neural networks and partial least squares regression. Analytica ChimicaActa 1994, 296(2), 141-154
    57 倪永年,化学计量学在分析化学中的应用. 科学出版社. 北京. 2004
    58 Alexander Kai-man Leung, Foo-tim Chau, Jun-bin Gao. A review on applications of wavelet transform techniques in chemical analysis: 1989-1997. Chemometrics and Intelligent Laboratory Systems 1998, 43, 165-184
    59 高隽. 人工神经网络原理及仿真实例. 机械工业出版社 2003
    60 S. Haykin. Neural Networks: a Comprehensive Foundation. Prentice-Hall 1999
    61 A.J. Smola. Learning with Kernels. GMD-Forschungszentrum Infornationstechnik GmbH, Germany, 1998
    62 A.I. Belousov, S.A. Verzakov, J. Von Frese. A flexible classification approach with optimal generalisation performance: support vector machines. Chemometrics and Intelligent Laboratory Systems 2002, 64 (1), 15-25
    63 Vapnik 著, 许建华, 张学工译. 统计学习理论(Statistical Learning Theory). 电子工业出版社 2004
    64 Christopher J. C. Buregs. A Tutorial on Support Vector Machine for Pattern Recognition. Data Mining and Knowledge Discovery. 1998, 2, 121-167
    65 Nello Cristianini, John Shawe-Taylor 著, 李国正, 王猛, 曾华军译. An Introduction to Support Vector Machine and Other Kernel-based Learning Methods(支持向量机导论). 电子工业出版社 2004
    66 边肇祺, 张学工等. 模式识别. 清华大学出版社 2000
    67 谢微,李光明,陆敏春,聂伏生,李梦龙. 基于支持向量机的羰基化合物红外光谱研究. 分析化学 2006,34, s113-s117
    68 W. Xie, F.S. Nie, M.L. Li, G.M. Li, M.C. Lu, Alcohols' Classification by Infrared Spectra Segment Based on Support Vector Machines. Chinese Chemical Letters 2006,7,81-84
    69 K. Brudzewski, S. Osowski, T. Markiewicz. Classification of milk by means of an electronic nose and SVM neural network. Sensors and Actuators B 2004, 98, 291-298
    70 U. Thissen, M. Pepers, B. Ustun, W.J. Melssen, L.M.C. Buydens. Comparing support vector machines to PLS for spectral regression applications. Chemometrics and Intelligent Laboratory Systems 2004, 73 (2), 169-179
    71 A.I. Belousov, S.A. Verzakov, J. von Frese. Applicational aspects of support vector machines. Journal of Chemometrics 2002, 16, 482-489
    72 T. Tarumi, G. W. Small, R. J. Combs, R. T. Kroutil. High-pass filters for spectral background suppression in airborne passive Fourier transform infrared spectrometry. Analytica Chimica Acta 2004, 501, 235-247
    73 J.H. Liu, M.C. Lu, F.S. Nie, X.Y. Feng, M.L. Li. Substructure Prediction from Infrared Spectra by Using Support Vector Machines. Chinese Chemical Letters 2005, 16(10), 1354-1356
    74 Johann Gasteiger, Thomas Engel 著, 粱逸曾, 徐峻, 姚建华译. Chemoinformatics(化学信息学教程). 化学工业出版社,化学与应用化学出版中心 2005
    75 刘树深, 易忠胜. 基础化学计量学. 科学出版社 1999
    76 梁逸曾, 俞汝勤. 分析化学手册(第十分册, 化学计量学). 化学工业出版社 2000
    77 朱尔一, 杨梵原. 化学计量学技术及应用. 科学出版社 2001
    78 许禄. 化学计量学方法. 科学出版社 1997
    79 俞汝勤. 化学计量学导论. 湖南教育出版社 1991
    80 陈念贻, 陆文聪. 支持向量机算法在化学化工中的应用. 计算机与应用化学 2002, 19(6), 673-376
    81 马云潜, 张学工. 支持向量机函数在分形插值中的应用. 清华大学学报 2000, 40(3), 76-78
    82 S.R. Amendolia, G. Cossu, M.L. Ganadu, B. Golosio, G.L. Masala, G.M. Mura. A comparative study of K-nearest neighbour, support vector machine and multi-layer perceptron for thalassemia screening. Chemometrics and Intelligent Laboratory Systems 2003, 69 (1), 13-20
    83 K.W. Lau, Q.H. Wu. Online training of support vector classifier. Pattern Recognition 2003, 36 (8), 1913-1920
    84 F.E.H. Tay, L.J. Cao. Modified support vector machines in financial time series forecasting. Neurocomputing 2002, 48(1-4), 847-861
    85 Z. Yuan, J.S. Mattick, R.D. Teasdale. SVMtm: support vector machines to predict transmembrane segments. Journal of Computer Chemistry 2004, 25, 632-636
    86 C. Angulo, X. Parra, A. Catala. K-SVCR. A support vector machine for multi-class classification. Neurocomputing 2003, 55(1-2), 57-77
    87 M. Pal, P.M. Mather. Assessment of the effectiveness of support vector machines for hyperspectral data. Future Generation Computer Systems 2004, 20 (7), 1215-1225
    88 A. Kulkarni, V.K. Jayaraman, B.D. Kulkarni. Support vector classification with parameter tuning assisted by agent-based technique. Computers and Chemical Engineering 2004, 28 (3), 311-318
    89 庄镇泉, 王煦法, 王东生. 神经网络与神经计算机. 科学出版社 1992
    90 闻新, 周露, 李翔, 张宝伟. Matlab 神经网络仿真与应用. 科学出版社 2003
    91 邵学广, 蔡文生, 徐筱杰. 化学计量学-统计学与计算机在分析化学中的应用. 科学出版社 2003
    92 李燕, 王俊德, 顾炳和, 孟广政. 人工神经网络及其在光谱分析中的应用. 光谱学与光谱分析 1999, 19(6), 844-849
    93 Tong, K.C. Cheng. Mass spectral search method using the neural network approach. Chemometrics and Intelligent Laboratory Systems 1999, 49, 135-150
    94 X.J. Yao, X.Y. Zhang, R.S. Zhang, M.C. Liu, Z.D. Hu, B.T. Fan. Prediction of gas chromatographic retention indices by the use of radial basis function neural networks. Talanta 2002, 57, 297-306
    95 E. Marengo, M. Bobba, E. Robotti, M. Lenti. Hydroxyl and acid number prediction in polyester resins by near infrared spectroscopy and artificial neural networks. Analytica Chimica Acta 2004, 511, 313-322
    96 M. Severcan, F. Severcan, P. I. Haris. Estimation of protein secondary structure from FTIR spectra using neural networks. Journal of Molecular Structure 2001, 565-566, 383-387
    97 C.S. Tong, K.C. Cheng. Mass spectral search method using the neural network approach. Chemometrics and Intelligent Laboratory Systems 1999, 49, 135-150
    98 P.N. Penchev, K. Varmuza. Characteristic substructures in sets of organic compounds with similar infrared spectra. Computers and Chemistry 2001, 25(3), 231-237
    99 K. Varmuza, P.N. Penchev, H. Scsibrany. Maximum common substructures of organic compounds exhibiting similar infrared spectra. Journal of Chemical Information andComputer Sciences. 1998, 38(3), 420-427
    100 G. Cerruela Garcia, I. Luque Ruiz, M.A. Gomez-Nieto. Step-by-step calculation of all maximum common substructures through a constraint satisfaction based algorithm. Analytica Chimica Acta 2004, 44(1), 30-41
    101 陈海峰, 罗时玮, 姚建华, 袁身刚, 郑崇直, 范波涛. 红外谱图中特征峰与对应子结构相互关系的确定. 计算机与应用化学 2000, 17(2), 183-184
    102 F. Ehrentreich. Joined knowledge and signal processing for infrared spectrum interpretation. Analitica Chimica Acta 1999, 393, 193-200
    103 王晓峰, 王天然, 程远杰, 尹丹娜. Apriori 算法在红外光谱数据挖掘中的应用. 计算机与应用化学 2001, 18(5), 477-483

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700