用户名: 密码: 验证码:
基于布尔矩阵分解的蛋白质功能预测框架
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:The Framework of Protein Function Prediction Based on Boolean Matrix Decomposition
  • 作者:刘琳 ; 唐麟 ; 唐明靖 ; 周维
  • 英文作者:Liu Lin;Tang Lin;Tang Mingjing;Zhou Wei;School of Information, Yunnan Normal University;Key Laboratory of Educational Informatization for Nationalities (Yunnan Normal University ), Ministry of Education;President Office, Yunnan Normal University;National Pilot School of Software, Yunnan University;
  • 关键词:多标签分类 ; 蛋白质功能预测 ; 标签空间降维 ; 标签关联矩阵 ; 布尔矩阵分解
  • 英文关键词:multi-label classification;;protein function prediction;;label space dimension reduction;;label-associated matrix;;Boolean matrix decomposition
  • 中文刊名:计算机研究与发展
  • 英文刊名:Journal of Computer Research and Development
  • 机构:云南师范大学信息学院;民族教育信息化教育部重点实验室(云南师范大学);云南师范大学校长办公室;云南大学国家示范性软件学院;
  • 出版日期:2019-05-15
  • 出版单位:计算机研究与发展
  • 年:2019
  • 期:05
  • 基金:国家自然科学基金项目(61862067,61762089);; 云南师范大学博士启动项目(2016zb009);; 云南大学数据驱动的软件工程省科技创新团队项目(2017HC012)~~
  • 语种:中文;
  • 页:116-129
  • 页数:14
  • CN:11-1777/TP
  • ISSN:1000-1239
  • 分类号:TP181;Q51
摘要
蛋白质是细胞生命活动中最重要和最多样的一种大分子物质.因此,研究蛋白质功能对于破解生命密码具有重要的意义.以往的研究表明蛋白质功能预测问题本质上是一个多标签分类问题,但庞大的功能标签数量使得各种多标签分类器在蛋白质功能预测中的应用面临巨大挑战.针对蛋白质功能标签数量庞大且标签关联性较高的特点,提出了一种基于布尔矩阵分解的蛋白质功能预测框架(protein function prediction based on Boolean matrix decomposition, PFP-BMD).同时,针对目前布尔矩阵分解算法中精确分解和列利用条件难以同时满足的问题,提出一种基于标签簇的精确布尔矩阵分解算法,使其通过标签关联矩阵实现标签的层次扩展聚簇,并通过相关推论证明了该算法可实现最优的精确布尔矩阵分解.实验结果表明:提出的布尔矩阵分解算法在计算复杂度上具有较大优势,且应用了该算法的蛋白质功能预测框架可有效提升蛋白质功能预测的准确率,为各种多标签分类器在蛋白质功能预测中的高效应用奠定了基础.
        Protein is the most essential and versatile macromolecule of living cells, and thus the research on protein functions is of great significance in decoding the secret of life. Previous researches have suggested that prediction of protein function is essentially a multi-label classification problem. Nonetheless, the large number of protein functional annotation labels brings the huge challenge to various kinds of multi-label classifiers applied to protein function prediction. To achieve more accuracy prediction of protein function by multi-label classifiers, we consider the characteristics of high correlation between protein functional labels, and propose a framework of protein function prediction based on Boolean matrix decomposition(PFP-BMD). Meanwhile, considering the problem of hardly satisfying exact decomposition and column in condition simultaneously of current Boolean matrix decomposition algorithms, an exact Boolean matrix decomposition algorithm based on label clusters is proposed, which realizes the hierarchical extended clustering of labels by the label-associated matrix. What's more, we prove its ability of optimal Boolean matrix decomposition based on related deductions. The experimental results show that this exact Boolean matrix decomposition algorithm possesses considerable advantage in reducing the computational complexity in comparison with existing algorithms. In addition, the application of the proposed algorithm in PFP-BMD can effectively improve the accuracy of protein function prediction, and more importantly, reducing and restoring dimensions in the functional label space of proteins using this algorithm lays the foundation of a more efficient classification of various multi-label classifiers.
引文
[1]Ruepp A,Zollner A,Maier D,et al.The FunCat,a functional annotation scheme for systematic classification of proteins from whole genomes[J].Nucleic Acids Research,2004,32(18):5539- 5545
    [2]Harris M A,Clark J,Ireland A,et al.The gene ontology (GO) database and informatics resource[J].Nucleic Acids Research,2004,32(Suppl1):258- 261
    [3]Cao Renzhi,Cheng Jianlin.Integrated protein function prediction by mining function associations,sequences,and protein-protein and gene-gene interaction networks[J].Methods,2016,93:84- 91
    [4]Saha S,Chatterjee P,Basu S,et al.Gene ontology based function prediction of human protein using protein sequence and neighborhood property of PPI network[C]//Proc of the 5th Int Conf on Frontiers in Intelligent Computing:Theory and Applications.Berlin:Springer,2017:109- 118
    [5]Fu Guangyuan,Yu Guoxian,Wang Jun,et al.Protein function prediction using positive and negative examples [J].Journal of Computer Research and Development,2016,53(8):1753- 1765 (in Chinese)(傅广垣,余国先,王峻,等 .基于正负样例的蛋白质功能预测[J].计算机研究与发展,2016,53(8):1753- 1765)
    [6]Yu Guoxian,Rangwala H,Domeniconi C,et al.Predicting protein function using multiple kernels[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2015,12(1):219- 233
    [7]Mohana P G,Chitra S.Design and development of an efficient hierarchical approach for multi-label protein function prediction[J].Biomedical Research,2017(Special Issue):S370- S379
    [8]Vens C,Struyf J,Schietgat L,et al.Decision trees for hierarchical multi-label classification[J].Machine Learning,2008,73(2):185- 214
    [9]Cerri R,Barros R C,de Carvalho,et al.A genetic algorithm for hierarchical multi-Label classification[C]//Proc of the 27th Annual ACM Symp on Applied Computing.New York:ACM,2012:250- 255
    [10]Otero F,Freitas A,Johnson C.A hierarchical multi-label classification ant colony algorithm for protein function prediction[J].Memetic Computing,2010,2(3):165- 181
    [11]Rubin T,Chambers A,Smyth P,et al.Statistical topic models for multi-label document classification[J].Machine Learning,2012,88(1):157- 208
    [12]Mostafavi S,Morris Q.Fast integration of heterogeneous data sources for predicting gene function with limited annotation[J].Bioinformatics,2010,26(14):1759- 1765
    [13]Xiong Wei,Liu Hiu,Guan Jihong,et al.Protein function prediction by collective classification with explicit and implicit edges in protein-protein interaction networks[J].BMC Bioinformatics,2013,14(Suppl 12):S4
    [14]Chua H,Sung W,Wong L.Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions[J].Bioinformatics,2008,22(3):1623- 1630
    [15]Hsu D,Kakade S,Langford J,et al.Multi-label prediction via compressed sensing[C] //Proc of Int Conf on Neural Information Processing Systems.New York:Curran Associates Inc,2009:772- 780
    [16]Tai F,Lin H.Multi-label classification with principal label space transformation[J].Neural Computation,2012,24(9):2508- 2542
    [17]Zhang Yi,Schneider J.Multi-label output codes using canonical correlation analysis[J].Journal of Machine Learning Research,2012,15(1):873- 882
    [18]Li Li,Zhang Longkai,Wang Houfeng.Muli-label text categorization with hidden components[C]//Proc of Conf on Empirical Methods in Natural Language Processing.New York:ACM,2014:1816- 1821
    [19]Balasubramanian K,Lebanon G.The landmark selection method for multiple output prediction[C] //Proc of the 29th Int Conf on Machine Learning.Madison,Wisconsin:Omnipress,2012:283- 290
    [20]Bi Wei,Kwok J T.Efficient multi-label classification with many labels[C/OL].//Proc of Int Conf on Machine Learning.2013 [2017-05-20].http://www.jmlr.org/
    [21]Miettinen P.The Boolean column and column-row matrix decompositions[J].Data Mining and Knowledge Discovery,2008,17(1):39- 56
    [22]Lubiw A.The Boolean basis problem and how to cover some polygons by rectangles[J].SIAM Journal on Discrete Mathematics,1990,3(1):98- 115
    [23]Miettinen P.Matrix decomposition methods for data mining:Computational complexity and algorithms[D].Helsinki:University of Helsinki,2009
    [24]Belohlavek R,Trnecka M.From-below approximations in Boolean matrix factorization:Geometry and new algorithm[J].Journal of Computer & System Sciences,2013,81(8):45- 52
    [25]Drineas P,Mahoney M W,Muthukrishnan S.Relative-error $CUR$ matrix decompositions[J].SIAM Journal on Matrix Analysis & Applications,2007,30(2):844- 881
    [26]Wicker J,Pfahringer B,Kramer S.Multi-label classification using Boolean matrix decomposition[C]//Proc of the 27th Annual ACM Symp on Applied Computing.New York:ACM,2012:179- 186
    [27]Sun Yuan,Ye Shiwei,Sun Yi,et al.Improved algorithms for exact and approximate Boolean matrix decomposition[C]//Proc of the 2nd IEEE Int Conf on Data Science and Advanced Analytics.Piscataway,NJ:IEEE,2015:1- 10
    [28]Gregory D A,Pullman N J.Semiring rank:Boolean rank and nonnegative rank factorizations[J].Journal of Combinatorics,Information & System Sciences,1983,8(3):223- 233
    [29]Zhang Minling,Zhou Zhihua.ML-KNN:A lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038- 2048
    [30]Zheng Wei,Wang Chaokun,Liu Zhang,et al.A multi-label classification algorithm based on random walk model[J].Chinese Journal of Computers,2010,33(8):1418- 1426 (in Chinese)(郑伟,王朝坤,刘璋,等.一种基于随机游走模型的多标签分类算法[J].计算机学报,2010,33(8):1418- 1426)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700