蛋白质分子中RNA结合位点的分析和预测

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

蛋白质分子中RNA结合位点的分析和预测

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Analysis and Prediction of Rna-binding Residues in Protein Molecules
作者：查磊
论文级别：博士
学科专业名称：生物化学与分子生物学
中文关键词：蛋白质-RNA相互作用 ; 结合位点 ; 预测模型 ; 偏好性
英文关键词：Protein-RNA interaction ; binding site ; prediction model ; bias
学位年度：2012
导师：李伍举 ; 应晓敏
学科代码：071010
学位授予单位：中国人民解放军军事医学科学院
论文提交日期：2012-05-21
答辩委员会主席：徐宁志

摘要

蛋白质与RNA的相互作用广泛存在于RNA剪切、翻译、病毒的复制以及细胞中的其它生物学过程中。因此，探讨蛋白质与RNA相互作用并确定蛋白质中与RNA结合的氨基酸残基，对于理解蛋白质与RNA之间的相互作用机制具有重要意义。
     目前，对这一问题的研究主要从实验与生物信息学两个方面入手。从实验上看，主要是通过X射线晶体衍射、核磁共振等方法得到蛋白质与RNA复合物的三维结构信息。基于三维结构信息进一步确定与蛋白质中与RNA相互作用的氨基酸残基。实验的优点是结果可靠，缺点在于时间和经费方面的花费较大，并且在具体实施时，还面临着不少实际问题。例如，某些蛋白质-RNA复合物结晶很难获得。
     随着蛋白质结构数据的增多，研究人员开始尝试从生物信息学角度出发对这个问题进行研究与分析。从研究途径看，可以分为通过RNA结合结构域来判定、通过分子动力学模拟来判定以及通过统计分析或者机器学习方法来判定这3个方面。对于结构域方法来说，我们可以通过SCOP等蛋白质结构数据库搜索并确定蛋白质中RNA结合结构域所在的位置，从而大致确定该蛋白质与RNA的结合位点。但结构域方法的缺陷在于仅适用于已测定了RNA结合结构域的蛋白质。此外，目前对RNA结合结构域的作用机制尚未完全阐明，存在着结构域中的氨基酸残基不与靶标RNA区域结合，而是结合到其它区域，甚至导致该蛋白质结合到另一个蛋白质上的情况。另一种寻找RNA作用位点的方法是分子动力学模拟，该方法能够较为直观的观察到蛋白质与RNA的结合过程，以及这个过程中一些能量和构象上的变化。但该方法的缺陷在于模拟耗时较长，仅适用于小规模体系。此外，各种参数的设定也对模拟结果的正确性有影响。从目前来看，如果从生物信息学角度出发对该问题进行研究，比较适合的途径是通过提取各种特征，利用机器学习方法构建模型来判别。随着近几年，特别是2005年以来，蛋白质-RNA复合物三维结构数据的增多，为采用这一方法对该问题进行研究提供了数据基础，逐步有研究人员基于这些数据，开始了相关的研究工作。然而，先前的研究存在着如下缺陷：①数据量小，有的工作仅基于十多例数据，得到的结论可能存在偏性；②关注的特征较少，有的工作仅关注某一个或者某一方面的特征；③有些特征需要从三维结构数据中获取。或者是比较复杂、难以计算的理化特征，限制了其应用。
     针对以上研究存在的缺陷，我们认为，目前还缺少这样一个结合RNA的氨基酸位点预测模型：①该模型基于大规模数据构建，以避免得到可能存在偏性的结论；②构建模型时应该综合考虑各种特征，以提高其分类精度；③构建该模型所依赖的特征，应该能够从蛋白质序列中获取或者计算得出，这样构建的模型才具有实用性。
     为此，我们首先从PDB数据库中提取了截止至2011年6月，所有的经X射线晶体衍射测定的、分辨率大于3、仅含有蛋白质与RNA的复合物数据，共计532个。去除90个核酸序列过短(长度小于等于4)或者序列存在错误的复合物数据后，剩余442个复合物数据，共包含1970条蛋白质序列以及823条RNA序列。由于蛋白质相似性较高，为了避免数据冗余，我们采用BLASTClust程序对其中的蛋白质序列按照序列相似性不超过25%的阈值进行了聚类。BLASTClust将其聚为了429类。对于每一类，我们选择其中的第一条序列作为该类的代表。这样我们得到了429条蛋白质序列，共包含90735个氨基酸残基。
     我们基于PDB结构数据，采用距离定义法来确定与RNA相互作用的氨基酸位点。即对于蛋白质与RNA复合物中的某个氨基酸，如果其包含的至少一个原子与某个RNA碱基所包含的任何一个原子之间的距离小于3.5，那么就认为该氨基酸与RNA相互作用。这样，在90735个氨基酸中，有10525个被判定为RNA结合位点，其余的80210个被判定为非结合位点。
     确定作用位点后，对于每一个氨基酸，我们分别提取如下9大类共计150个特征：①氨基酸所包含的原子数；②氨基酸所带的静电荷值；③氨基酸所包含的氢键数目；④侧链pKa值；⑤疏水性；⑥相对溶剂可及表面积比例；⑦氨基酸所处的二级结构构象；⑧改进的PSSM矩阵；⑨基于偶极矩与侧链体积的氨基酸分类。
     我们利用我们编写的TClass分类程序，基于Na ve Bayes方法和前向特征选择策略自动进行特征筛选与模型构建。以留一交叉法验证精度为目标来判定所构建模型的性能。此外，我们还采用了属性bagging方法进行了集成学习，以进一步提高模型性能。在独立测试集上的测试表明，我们所构建的模型分类精度为83.46%，特异性为84.33%，敏感性为78.55%，具有较好的性能。我们还将我们构建的模型应用于分析Xlrbpa蛋白，所预测出的结合位点与其经过实验测定的RNA结合位点有很好的重叠，证实了我们工作的实际有效性和可应用性。
     通过分析各种特征的取值与RNA结合偏好性之间的关系，我们发现了如下结论：①RNA对氨基酸具有强烈的结合偏好性，最受欢迎的氨基酸(Arg)比最不受欢迎的氨基酸(Cys)在出现次数上相差38倍；②亲水性氨基酸(如：Arg、Lys等)比疏水性氨基酸(如：Cys、Met等)更受欢迎，两者之间在出现次数上相差4.38倍；③从R基团极性和静电荷值看，极性带正电荷的氨基酸(例如:Arg、Lys)最受欢迎，非极性的氨基酸最不受欢迎(例如：Trp、Met、Phe等)；④从基于偶极矩与侧链体积的氨基酸分类来看，偶极矩大于3.0德拜，侧链体积大于503的氨基酸(如：Asp、Lys)比较受欢迎，而偶极矩大于3.0德拜，但方向相反的氨基酸(如：Asp、Glu)，不受欢迎。
     为了方便相关实验人员的使用，我们基于所构建的分类模型，利用MATLABBuilder JA、MySQL和JSP开发了在线预测服务器RBRPre。用户仅需提交蛋白质序列，即可通过Email获得该蛋白质序列上结合RNA的氨基酸位点。同时，我们还通过crontab定时机制与MySQL数据库，对预测任务进行排队与调度，避免了可能的高并发预测任务带来的拥堵。
     综上所述，该研究通过大规模的收集蛋白质-RNA相互作用数据，较为综合地考虑了相关特征，构建了蛋白质中结合RNA的氨基酸位点预测模型，分析并发现了各种特征的取值与RNA结合氨基酸偏好性之间的关系。经过独立测试集测试和相关实例研究表明，该预测模型具有较好的性能。基于构建的预测模型，我们还开发了在线预测服务器RBRPre。
     这一工作的开展，获得了如下结果：①使得相关研究人员能够在仅知道蛋白质序列的情况下，获得比较可靠的蛋白质与RNA相互作用的位点信息，并且具有较高的可靠性。②通过分析氨基酸各种特征对RNA结合偏好性的影响，为蛋白质-RNA相互作用机制的研究提供了有用的信息。③构建了在线预测服务器RBRPre，为研究人员获得与RNA结合的蛋白质位点提供了较好的生物信息学支持，加快了实验进程。
     该工作的特色及创新点在于：①该模型的构建，使得相关研究人员在仅知道蛋白质序列的情况下，就可以获得较为可靠的RNA结合位点，具有很强的实用性。同时，所构建的在线预测网站RBRPre，能够为相关研究人员提供方便快捷的预测服务。②数据规模较大，特征覆盖较为全面，避免了小规模数据基础上得到的可能有偏性的结论，独立测试集及实例分析表明，预测结果真实可靠。③确定了20种氨基酸结合RNA的偏好性，以及确定了氨基酸不同特征对结合RNA偏好性的影响。
Protein-RNA interaction plays an important role in many biological processes,such as RNA splicing, translation, protein synthesis and posttranscriptional regulation.Therefore, identification of RNA-binding residues in proteins provides valuableinformation for understanding the mechanisms of Protein-RNA interaction.
     The present approaches to study protein-RNA interaction can be divided intoexperimental methods and bioinformatics methods. The experimental methods, suchas x-ray crystallography or nuclear magnetic resonance, can be applied to deduce thecrystal structure of Protein-RNA complex based on which the RNA-binding residuescan be found. The advantage of experimental methods is that the result is reliable.However, the processes to obtain the crystal structure of Protein-RNA complex is atime-consuming, and sometimes, it is difficult to get the crystal structures for someProtein-RNA complexes.
     With the increasing of structure data of protein-RNA interaction, researchers havebeen trying to find RNA-binding residues through bioinformatics methods, which aremainly classified into three categories, structural domain methods, moleculardynamics simulations and machine learning methods. The core idea of the structuraldomain methods is to find RNA-binding residues by searching the position ofRNA-binding domain in protein structure databases such as SCOP. However, thismethod can only be used for those proteins that have been determined RNA-bindingdomain. In addition, the mechanism of RNA-binding domain is not very clear yet.Sometimes, residues in RNA-binding domain will interact with other regions of RNAinstead of target region, even with other proteins. Another way of findingRNA-binding residues is molecular dynamics simulations. By simulation, we canobserve the whole binding progress and determine the change of energy andconformation during the progress. The first drawback of simulation methods is that itis a long time job, and only available for small systems. The second one is that thecorrectness of simulation is affected by parameter setting. Sometimes, it is verydifficult to find out the optimized parameters. However, with the accumulation oflarge amount of structure data, it becomes possible to find RNA-binding residues bymachine learning methods, and some models have been proposed to predictRNA-binding residues recently. Though fully analyzing those models, we found thatthere existed some shortcomings of those models as follows. Firstly, the number oftraining samples is small, which may lead to a bias result. Secondly, the number offeatures is small，some works only considered several features, which may misssome important key variables. Thirdly, some models are developed using the featuresextracted from3D structure data, or complex physical chemistry features, whichcannot be applied to those protein sequences without3D structures.
     To solve these problems，we need a prediction model satisfying the following characteristics:1. The model should be developed based on a big dataset to avoidbias;2. In order to improve the prediction performance, more features should beextracted; and3. Features that are selected to develop the prediction model should bederived only from sequence information. To this end, we have developed the models.
     Firstly, we extracted532Protein-RNA complex samples from PDB databasereleased before June,2011. These complexes were derived from x-ray crystallographywith the resolution greater than3, and only contain protein and RNA sequences.After removing90samples, which have a RNA chain shorter than4nucleotides orhave mistakes in sequence data, we get a dataset contains429samples, which contain1970protein sequences and823RNA sequences. In order to reduce data redundancy,protein sequences are clustered into429groups by BLASTClust with sequenceidentity above25%. The first sequence of each group is selected as the representativeof this group. After that, we get429non-redundant protein sequences, which contain90735amino acid residues.
     The binding sites are defined by distance between atoms: if one of the atoms ofan amino acid residue falls within a cut off distance of3.5from any atoms of RNAmolecule in the complex, the residue is designated as a binding site. In the datasetconsisting of90735amino acid residues, we find10525binding residues and80210non-binding residues.
     After defined the binding sites, each amino acids residue is characterized by nineclasses of features:①the number of atoms;②the number of electrostatic charge；③the number of potential hydrogen bond；④side chain pKa value；⑤hydrophobicindex；⑥relative accessible surface area；⑦secondary structure；⑧smoothed PSSM；⑨classification of amino acids based on dipole moment and side chain volume.
     Finally, we applied TClass program to select features and construct predictionmodel by combining Na ve Bayes classification methods and forward featureselection strategy. Furthermore, attribute bagging method is used to improve classifierperformance. Test on independent dataset shows that the classifier achieves83.86%overall accuracy with83.32%sensitivity and80.55%specificity. A case study ofXlrbpa protein shows that, there is a good overlap between the positions predicted byour model and those determined by RNA-binding domain.
     By analyzing the relationship between propensities of amino acid usage and thefeatures, we get the following results:①RNA shows a strong bias on amino acidselection, the occurrence number of most popular amino acid is38times than themost unpopular amino acid.②Hydrophilic amino acid is more popular thanhydrophobic amino acid. The occurrence number of hydrophilic amino acid is4.38times higher than hydrophobic amino acid.③Positively-charged polar amino acid ismore popular than non-polar amino acid.④The amino acid residue, whose dipolemoment is bigger than3.0debay and side chain volume is bigger than503, is morepopular with nucleotides. The amino acid whose dipole moment is bigger than3.0debay but has opposite orientation is unpopular with nucleotides.
     Based on the prediction model we developed, we build an online predictionserver called RBRPre that powered by MATLAB Builder JA, MySQL and JSP. Usercan visit the website and input a protein sequence. Then, the prediction result will be sent to user via Email. In order to avoid crash caused by high-concurrence visit, werealized a queue scheduling algorithm by MySQL and crontab.
     In summary, based on a big dataset of Protein-RNA complex and lots of features,we developed a RNA-binding residue prediction model and analyzed the relationshipbetween propensities of amino acid usage and the features. Test result on independentdataset and the case study of Xlrbpa protein show that the prediction model achievesgood performance.
     Based on our work, we can get these results:①This work makes it possible toget the RNA-binding residues only by sequence information.②This work providesvaluable information for understanding the mechanism of Protein-RNA interactionthrough the analysis of relationship between propensities of amino acid usage and thefeatures.③By construction the online prediction server RBRPre, this work providesa better bioinformatics support for searching RNA-binding sites in protein，and speedup the progress of related experiments.
     The innovation points in this paper lie in:①With this model, researchers can getRNA-binding residues in proteins based only on sequence information. The onlineprediction tool, RBRPre, provides an easy-to-use service for relevant researchers.②Based on a big dataset and lots of features, we can get a reliable result with out bias.③The bias of amino acid selection on RNA-binding sites is analyzed in this paper,The relationship between amino acid features and RNA-binding bias is also analyzed.

引文

[1] Kendrew JC, Bodo G, Dintzis HM, et al. A three-dimensional model of the myoglobinmolecule obtained by x-ray analysis. Nature,1958,181(4610):662-666.
    [2] MF Perutz, Rossmann NG, Ann F, et al. Structure of Hemoglobin. Nature,1960,185:416-422.
    [3]胡蕴菲，金长文.蛋白质溶液结构及动力学的核磁共振研究.波谱学杂志，2009,26(2):151-172.
    [4]施蕴渝，吴季辉.核磁共振波谱应用于结构生物学的研究进展.生物物理学报，2007,23(4):240-245.
    [5] Nagai K, Oubridge C, Jessen TH, et al. Structure of the RNA-binding domain of the U1smallnuclear ribonucleoprotein A. Nature,1990,348:515-520.
    [6] Ramakrishnan V, White SW. The structure of ribosomal protein S5reveals sites ofinteraction with16S rRNA. Nature,1992,358:768-771.
    [7] Golden BL, Hoffman DW, Ramakrishnan V, et al. Ribosomal protein S17: characterizationof the three-dimensional structure by1H and15N NMR. Biochemistry1993,32:12812–12820.
    [8] Varani G, Nagai K. RNA recognition by RNP proteins during RNA processing andmaturation. Ann Rev Biophys Biomol Struct,1998,27:407-445.
    [9] Fierro MI, Mathwes MB. Proteins binding to duplexed RNA: one motif, multiple functions.Trends Biochem Sci,2000,25:241–246.
    [10] Lewis HA, Musunuru K, Jensen KB, et al. Sequence-specific RNA binding by a nova KHdomain: implications for paraneoplastic disease and the fragile X syndrome. Cell,2000(100):323–332.
    [11] Guzman RN, Wu ZR, Stalling CC, et al. Structure of the HIV-1nucleocapsid protein boundto the SL3Y-RNA recognition element. Science,1998,279:384–388.
    [12] Lu D, Searles MA, Klug A. Crystal structure of a zinc-finger-RNA complex reveals twomodes of molecular recognition. Nature,2003,426:96–100.
    [13] Oubridge C, Ito N, Evans PR, et al. Crystal structure at1.92resolution of the RNA-bindingdomain of the U1A spliceosomal protein complexed with an RNA hairpin. Nature,1994,372:432–438.
    [14] Allain HT, Gubser CC, Howe PWA, et al. Specificity of ribonucleoprotein interactiondetermined by RNA folding during complex formation. Nature,1996,380:646–650.
    [15] Allain FH-T, Bouvet P, Dieckmann T, et al. Molecular basis of sequence-specific recognitionof pre-ribosomal RNA by nucleolin. EMBO J,2000,19:6870–6881.
    [16] Kielkop CL, Lucke S, Green MR. U2AF homology motifs: protein recognition in the RRMworld. Genes Dev,2004,18:1513–1526.
    [17] Selenko P, Gregorovic G, Sprangeers R, et al. Structural basis for the molecular recognitionbetween human splicing factors U2AF65and SF1mBBP. Mol Cell,2003,11:965–976.
    [18] Johnston D, Brown NH, Gall JG. A conserved double-stranded RNA binding domain. ProcNatl Acad Sci USA,1992,89:10979–10983.
    [19] Green SR, Matthews MB. Two RNA-binding motifs in the double-stranded RNA-activatedprotein kinase DAI. Genes Dev,1992,6:2478–2490.
    [20] Ferrandon D, Elphick L, Nusslein-Volhard C, et al. Staufen protein associates with the3’UTR of bicoid mRNA to form particles that move in a microtubule-dependent manner. Cell,1994,79:1221–1232.
    [21] Schuldt AJ, Adams JHJ, Davidson CM, et al. Miranda mediates asymmetric protein andRNA localization in the developing nervous system. Genes Dev,1998,12:1847–1857.
    [22] Lewis HA, Musunuru K, Jensen KB, et al. Sequence-specific RNA binding by a nova KHdomain: implications for paraneoplastic disease and the fragile X syndrome. Cell,2000,100:323–332.
    [23] Siomi H, Matunis MJ, Michael WM, et al. The pre-mRNA binding K protein contains anovel evolutionarily conserved motif. Nucleic Acids Res.1993,21:1193–1198.
    [24] Burd CG, Dreyfuss G. Conserved structures and diver-sity of functions of RNA-bindingproteins. Science,1994,265:615–621.
    [25] Lewis HA, Musunuru K, Jensen KB, et al. Sequence-specific RNA binding by a nova KHdomain: implications for paraneoplastic disease and the fragile X syndrome. Cell,2000,100:323–332.
    [26] Pieler T, Theunissen O. TFIIIA: nine fingers–three hands? Trends Biochem Sci,1993,18:226–230.
    [27] Lu D, Searles MA, Klug A. Crystal structure of a zinc-finger-RNA complex reveals twomodes of molecular recognition. Nature.2003,426(6962):96-100.
    [28] Guzman RN, Wu ZR, Stalling CC, et al. Structure of the HIV-1nucleocapsid protein boundto the SL3Y-RNA recognition element. Science1998,279:384–388.
    [29] Yan KS, Yan S, Farooq A, et al. Structure and conserved RNA binding of the PAZ domain.Nature,2003,426:468–474.
    [30] Lingel A, Simon B, Izaurralde E, et al. The Argonaute PAZ domain adopts a novel nucleicacid binding fold. Nature,2003,426:465–469.
    [31] Song JJ, Smith SK, Hannon GJ. Crystal structure of Argonaute and its implications for RISCslicer activity. Science,2004,305:1434–1437.
    [32] Jones S. Protein-RNA interactions: a structural analysis. Nucleic Acids Res,2001,29:943–954.
    [33] Allers J, Shamoo Y. Structure-based analysis of protein-RNA interactions using the programENTANGLE. J Mol Biol,2001,311,75–86.
    [34] Wang L, Brown SJ. BindN: a web-based tool for efficient prediction of DNA and RNAbinding sites in amino acid sequences. Nucleic Acids Res,2006,34:W243–W248.
    [35] Cheng CW. Predicting RNA-binding sites of proteins using support vector machines andevolutionary information. BMC Bioinformatics,2008,9: S6.
    [36] Kumar,M. Prediction of RNA binding sites in a protein using SVM and PSSM profile.Proteins,2008,71:189–194.
    [37] Reyes CM, Peter AK, Structure and Thermodynamics of RNA-protein Binding: UsingMolecular Dynamics and Free EnergyAnalyses to Calculate the Free Energies of BindingandConformational Change, J Mol Biol.2000,297:1145-1158
    [38] http://www.rcsb.org/pdb/home/home.do
    [39] Kate BC, Hilal K, Khalid Z, et al. RBPDB: a database of RNA-binding specificities. NucleicAcids Res,2011,39:D301-308.
    [40] http://pfam.sanger.ac.uk/
    [41] Benjamin AL, Rasna RW, Michael T, et al. PRIDB: a protein-RNA interface database. Nucl.Acids Res.2011,39: D277-D282.
    [42] Wu T, Wang J, Liu C, et al. NPInter: the noncoding RNAs and protein relatedbiomacromolecules interaction database. Nucleic Acids Res.2006,34:D150-152.
    [43] Morris RT, Doroshenk KA, Crofts AJ. RiceRBP: a database of experimentally identifiedRNA-binding proteins in Oryza sativa L. Plant Sci.2011,180(2):204-211.
    [44] http://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html
    [45] http://xray.bmc.uu.se/usf/
    [46] http://www.bioperl.org/
    [47] Hausman, Robert EC, Geoffrey M. The cell: a molecular approach. ASM Press.2004:51.
    [48] Nelson DL, Cox MM. Lehninger Principles of Biochemistry[M]. Worth Publishers, NewYork.2000:78.
    [49] Kyte J, Doolittle R. A simple method for displaying the hydropathic character of a protein. JMol Biol,1982,157:105-132.
    [50] Shaytan AK, Shaitan KV, Khokhlov AR.Solvent accessible surface area of amino acidresidues in globular proteins: correlation of apparent transfer free energies with experimentalhydrophobicity scales.Biomacromolecules.2009,10(5):1224-37.
    [51] Ahmad S, Gromiha MM, Sarai A.RVP-net: online prediction of real valued accessiblesurface area of proteins from single sequences.Bioinformatics.2003,19(14):1849-51.
    [52] Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition ofhydrogen-bonded and geometrical features. Biopolymers,1983,22(12):2577–2637.
    [53] Rost B. Prediction in1D: secondary structure, membrane helices, and accessibility. MethodsBiochem Anal.2003,44:559–587.
    [54] Jones TD. Protein secondary structure prediction based on position specific matrices. J MolBiol.1999(292):195–202.
    [55] Altschul SF. Gapped BLAST and PSI-BLAST: a new generation of protein database searchprograms[J]. Nucleic Acids Res,1997,25(17):3389–3402.
    [56] Shen J, Zhang J, Luo X, et al. Predicting protein-protein interactions based only on sequencesinformation[J]. Proc Natl Acad Sci USA,2007,104(11):4337~4341.
    [57] http://accelrys.com/
    [58] http://www.psc.edu/general/software/packages/gaussian/
    [59] http://tripos.com/
    [60] Wu J, Liu H, Duan X, et al. Prediction of DNA-binding residues in proteins from amino acidsequences using a random forest model with a hybrid feature[J]. Bioinformatics,2009,25(1):30-35.
    [61] Bryll R, Gutierrez-Osuna R, Quek F. Attribute bagging: improving accuracy of classifierensembles by using random feature subsets. Pattern Recognition,2003,36:1291-1302.
    [62] Eckmann CR, Jantsch MF.Xlrbpa, a double-stranded RNA-binding protein associated withribosomes and heterogeneous nuclear RNPs.J Cell Biol.1997,138(2):239-53.
    [63] http://www.mathworks.cn/products/javabuilder/
    [1]Garner MM, Revzin A, A gel electrophoresis method for quantifying the binding ofproteins to specific DNA regions: application to components of the Escherichia colilactose operon regulatory system. Nucleic Acids Res.1981,9(13):3047-60.
    [2]Emery P. RNase protection assay. Methods Mol Biol.2007,362:343-8.
    [3]Fields S, Song O. A novel genetic system to detect protein-protein interactions,Nature,1989,340(6230):245-246.
    [4]Poleev A, Hartmann A, Stamm S. A trans-acting factor, isolated by the three-hybridsystem, that influences alternative splicing of the amyloid precursor protein minigene.Eur J Biochem,2000,267(13):4002-4010.
    [5]Sengupta D J, Wickens M, Fields S. Identification of RNAs that bind to a specificprotein using the yeast three-hybrid system. RNA,1999,5(4):596-60.
    [6]Licatalosi DD, Mele A, Fak JJ, et al. HITS-CLIP yields genome-wide insights intobrain alternative RNA processing. Nature,2008,456(7221):464-9.
    [7] Kendrew JC, Bodo G, Dintzis HM, et al. A three-dimensional model of themyoglobin molecule obtained by x-ray analysis. Nature,1958,181(4610):662-6.
    [8]Nagai K, Oubridge C, Jessen TH, et al. Structure of the RNA-binding domain ofthe U1small nuclear ribonucleoprotein A. Nature,1990,348:515-520.
    [9]Ramakrishnan V, White SW. The structure of ribosomal protein S5reveals sites ofinteraction with16S rRNA. Nature,1992,358:768-771.
    [10]Varani G, Nagai K. RNA recognition by RNP proteins during RNA processing andmaturation. Ann Rev Biophys Biomol Struct,1998,27:407-445.
    [11]Siomi H, Matunis MJ, Michael WM, et al. The pre-mRNA binding K proteincontains a novel evolutionarily conserved motif. Nucleic Acids Res.1993,21:1193-1198.
    [12]St Johnston D, Brown NH, Gall JG, et al. A conserved double-stranded RNAbinding domain. Proc Natl Acad Sci USA,1992,89:10979-10983.
    [13]Pieler T, Theunissen O. TFIIIA: nine fingers-three hands? Trends Biochem Sci,1993,18:226-230.
    [14]Ferrandon D, Elphick L, Nusslein-Volhard C, et al. Staufen protein associateswith the3’-UTR of bicoid mRNA to form particles that move in amicrotubule-dependent manner. Cell,1994,79:1221–1232.
    [15]Schuldt AJ, Adams JHJ, Davidson CM, et al. Miranda mediates asymmetricprotein and RNA localization in the developing nervous system. Genes Dev,1998,12:1847–1857.
    [16]Jones S. Protein-RNA interactions: a structural analysis. Nucleic Acids Res,2001,29:943–954.
    [17]Allers J, Shamoo Y. Structure-based analysis of protein-RNA interactions usingthe program ENTANGLE. J Mol Bio,2001,311:75–86.
    [18]Wang L, Brown SJ. BindN: a web-based tool for efficient prediction of DNA andRNA binding sites in amino acid sequences. Nucleic Acids Res,2006,34:W243–W248.
    [19]Cheng CW. Predicting RNA-binding sites of proteins using support vectormachines and evolutionary information. BMC Bioinformatics,2008,9(12):S6.
    [20] Reyes CM, Kollman PA. Structure and Thermodynamics of RNA-proteinBinding: Using Molecular Dynamics and Free EnergyAnalyses to Calculate the FreeEnergies of Bindingand Conformational Change, J Mol Biol,2002,297:1145-1158.
    [1] Nagai K, Oubridge C, Jessen TH, et al. Crystal structure of the RNA-binding domain of theU1small nuclear ribonucleoprotein A[J]. Nature,1990,348(6301):515–520.
    [2] Ramakrishnan V, White SW. The structure of ribosomal protein S5reveals sites ofinteraction with16S rRNA[J]. Nature,1992,358(6389):768–771.
    [3] Fierro MI, Mathwes MB. Proteins binding to duplexed RNA: one motif, multiple functions[J].Trends Biochem Sci,2000,25(5):241–246.
    [4] Lewis HA, Musunuru K, Jensen KB, et al. Sequence-specific RNA binding by a nova KHdomain: implications for paraneoplastic disease and the fragile X syndrome[J]. Cell,2000,100(3):323–332.
    [5] De Guzman RN, Wu ZR, Stalling CC, et al. Structure of the HIV-1nucleocapsid proteinbound to the SL3Y-RNA recognition element[J]. Science,1998,279(5349):384–388.
    [6] Lu D, Searles MA, Klug A. Crystal structure of a zinc-finger-RNA complex reveals twomodes of molecular recognition[J]. Nature,2003,426(6962):96–100.
    [7] Hudson BP, Martinez-Yamout MA, Dyson HJ, et al. Recognition of the mRNA AU-richelement by the zinc finger domain of TIS11d[J]. Nat Struc Mol Biol,2004,11(3):257–264.
    [8] Reyes CM, Kollman PA. Investigating the binding specificity of U1A-RNA bycomputational mutagenesis[J]. J Mol Biol,2000,295(1):1-6.
    [9] Reyes CM, Kollman PA. Molecular dynamics studies of U1A-RNA complexes[J]. RNA,1999,5(2):235-244.
    [10] Reyes CM, Kollman PA. Structure and thermodynamics of rna-protein binding: usingmolecular dynamics and free energyanalyses to calculate the free energies of bindingandconformational change [J]. J Mol Biol，2000,297(5):1145-1158.
    [11] Jones S. Protein-RNA interactions: a structural analysis[J]. Nucleic Acids Res,2001,29(4):943–954.
    [12] Allers J, Shamoo Y. Structure-based analysis of protein-RNA interactions using the programENTANGLE[J]. J Mol Biol,2001,311(1):75–86.
    [13] Ellis JJ. Protein-RNA interactions: structural analysis and functional classes[J]. Proteins,2007,66(4):903–911.
    [14] Treger M, Westhof E. Statistical analysis of atomic contacts at RNA-protein interfaces[J]. JMol Recognit,2001,14(4):199-214.
    [15] Jeong E, Chung IF, Miyano S. A neural network method for identification ofRNA-interacting residues in protein[J]. Genome Inform，2004,15(1):105-116.
    [16] Wang L, Brown SJ. BindN: a web-based tool for efficient prediction of DNA and RNAbinding sites in amino acid sequences[J]. Nucleic Acids Res,2006,34:W243–W248.
    [17] Terribilini M, Lee JH, Yan C, et al. Prediction of RNA binding sites in proteins from aminoacid sequence[J]. RNA，2006,12(8):1450–1462.
    [18] Kim OT, Yura K. Amino acid residue doublet propensity in the protein-RNA interface and itsapplication to RNA interface prediction[J]. Nucleic Acids Res,2006,34(22):6450–6460.
    [19] Kumar M. Prediction of RNA binding sites in a protein using SVM and PSSM profile[J].Proteins,2008,71(1):189–194.
    [20] Jeong E, Miyano S. A Weighted Profile Based Method for Protein-RNA Interacting ResiduePrediction[J]. Transact Comput Sys Biol,2006,3939(1):123-139.
    [21] Cheng CW. Predicting RNA-binding sites of proteins using support vector machines andevolutionary information[J]. BMC Bioinformatics,2008,12(9):S6.
    [22] http://www.ncbi.nlm.nih.gov/blast/
    [23] Allers J, Shamoo Y. Structure-based analysis of protein-RNA interactions using the programENTANGLE[J]. J Mol Biol,2001,311(1):75–86.
    [24] http://xray.bmc.uu.se/usf/
    [25] Li N. Prediction of protein-protein binding site by using core interface residue and supportvector machine[J]. BMC Bioinformatics,2008,22(9):553.
    [26] Nelson DL, Cox MM. Lehninger Principles of Biochemistry[M]. Worth Publishers, NewYork.2000:78.
    [27] Kyte J, Doolittle RF.A simple method for displaying the hydropathic character of a protein. JMol Biol,1982,157(1):105-132.
    [28] Ahmad S, Michael M, et al. RVP-net: online prediction of real valued accessible surface areaof proteins from single sequences[J]. Bioinformatics,2003,19(14):1849-1851.
    [29] Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition ofhydrogen-bonded and geometrical features. Biopolymers,1983,22(12):2577–2637.
    [30] Altschul SF. Gapped BLAST and PSI-BLAST: a new generation of protein database searchprograms[J]. Nucleic Acids Res,1997,25(17):3389–3402.
    [31] Shen J, Zhang J, Luo X, et al. Predicting protein-protein interactions based only on sequencesinformation[J]. Proc Natl Acad Sci USA,2007,104(11):4337~4341.
    [32] http://accelrys.com/
    [33] http://www.psc.edu/general/software/packages/gaussian/
    [34] http://tripos.com/
    [35] Wu J, Liu H, Duan X, et al. Prediction of DNA-binding residues in proteins from amino acidsequences using a random forest model with a hybrid feature[J]. Bioinformatics,2009,25(1):30-35.
    [36] Li WJ, Xiong MM. Tclass: Tumor Classification System Based on Gene ExpressionProfile[J]. Bioinformatics,2002,18(2):325-326.
    [37] Li WJ. How many genes are needed for early detection of breast cancer, based on geneexpression patterns in peripheral blood cells[J]. Breast Cancer Research,2005,7(5): E5.
    [38] Lejeune D, Delsaux N, Charloteaux B, et al. Protein-nucleic acid recognition: statisticalanalysis of atomic interactions and influence of DNA structure[J]. Proteins,2005,61(2):258-271.
    [39] Treget M, Westhof E. Statistical analysis of atomic contacts at RNA-protein interfaces[J]. JMol Recognit,2001,14(4):199-214.
    [40] Jeong E, Kim H, Lee SW, et al. Discovering the interaction propensities of amino acids andnucleotides from protein-RNA complexes[J]. Mol Cells,2003,16(2):161-167.
    [基金项目]国家973计划项目(2010CB912801)；国家自然科学基金资助项目(31071157)
    [作者简介]查磊，男，博士研究生，研究方向：生物信息学
    [作者单位]军事医学科学院基础医学研究所，北京100850
    [通讯作者]李伍举，

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700