用户名: 密码: 验证码:
数据模型及其发展历程
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:State of the Art Data Model and Its Research Progress
  • 作者:信俊昌 ; 王国仁 ; 李国徽 ; 高云君 ; 张志强
  • 英文作者:XIN Jun-Chang;WANG Guo-Ren;LI Guo-Hui;GAO Yun-Jun;ZHANG Zhi-Qiang;School of Computer Science and Engineering, Northeastern University;School of Computer Science and Technology,Beijing Institute of Technology;School of Computer Science and Technology,Huazhong University of Science and Technology;College of Computer Science and Technology, Zhejiang University;School of Information Management and Engineering, Zhejiang University of Finance and Economics;
  • 关键词:数据模型 ; 结构化模型 ; 半结构化模型 ; OLAP分析模型 ; 大数据模型
  • 英文关键词:data model;;structured model;;semi-structured model;;OLAP analysis model;;big data model
  • 中文刊名:RJXB
  • 英文刊名:Journal of Software
  • 机构:东北大学计算机科学与工程学院;北京理工大学计算机学院;华中科技大学计算机科学与技术学院;浙江大学计算机科学与技术学院;浙江财经大学信息管理与工程学院;
  • 出版日期:2018-11-21 09:52
  • 出版单位:软件学报
  • 年:2019
  • 期:v.30
  • 基金:国家自然科学基金(61472069,61732003,U1401256,61729201,61572215,61522208,61672181,61202090,61272184);; 黑龙江省科学基金(LC2017029,F2016005);; 哈尔滨市青年科技创新人才研究专项基金(2016RAXXJ036,2015RQQXJ067)~~
  • 语种:中文;
  • 页:RJXB201901009
  • 页数:22
  • CN:01
  • ISSN:11-2560/TP
  • 分类号:145-166
摘要
数据库是数据管理的技术,是计算机学科的重要分支.经过近半个世纪的发展,数据库技术形成了坚实的理论基础、成熟的商业产品和广泛的应用领域.数据模型描述了数据库中数据的存储方式和操作方式.从数据组织形式,可以将数据模型分为结构化模型、半结构化模型、OLAP分析模型和大数据模型.20世纪60年代中后期到90年代初,结构化模型最早被提出,其主要包括层次模型、网状模型、关系模型和面向对象模型等.20世纪90年代末期,随着互联网应用和科学计算等复杂应用的快速发展,开始出现半结构化模型,包括XML模型、JSON模型和图模型等.21世纪,随着电子商务、商业智能等应用的不断发展,数据分析模型成为研究热点,主要包括关系型ROLAP和多维型MOLAP.2010年以来,随着大数据工业应用的快速发展,以NoSQL和NewSQL数据库系统为代表的大数据模型成为新的研究热点对上述数据模型进行了综述,并选取每个模型的典型数据库系统进行了性能的分析.
        Database management technology is an important branch of computer science. After the development of nearly half a century,database technology has formed a solid theoretical foundation, mature commercial products, and a wide range of applications. The data model describes the storage and operation of data in the database. According to the organizational form of data, there are four types of data models: structured models, semi-structured models, OLAP analysis models, and big data models. From the late 1960 s to the early1990 s, the structured models were first proposed, which mainly includes hierarchical model, network model, relational model, and object-oriented model. In the late 1990 s, with the rapid development of complex applications such as Internet applications and scientific computing, semi-structured models began to emerge, including XML models, JSON models, and graph models. In the new century, with the continuous development of applications such as e-commerce and business intelligence, the data analysis model has become a research hotspot, mainly including relational ROLAP and multi-dimensional MOLAP. Since 2010, with the rapid development of big data industry applications, the big data model represented by NoSQL and NewSQL database systems has become a new research hotspot. This article summarizes the above data models, and analyzes the performance of typical database system selected from each model.
引文
[1] Meng XF, Zhou LX, Wang S. State of the art and trends in database research. Ruan Jian Xue Bao/Journal of Software, 2004,15(12):1822-1836(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/20041208.htm
    [2] Barnett AJ, Lightfoot JA. A S/360 on-line production order location and reporting system using the information management system(IMS). IN:Proc. of the IFIP Congress, Vol.2. 1968. 1192-1196.
    [3] Taylor RW. Data administration and the DBTG report. In:Proc. of the 1974 ACM-SIGMOD Workshop on Data Description,Access and Control, Vol.2. Ann Arbor, 1974. 431-444.
    [4] Codd EF. A relational model of data for large shared data banks. Communications of the ACM, 1970,13(6):377-387.
    [5] Codd EF, Date CJ. Interactive support for non-programmers:The relational and network approaches. In:Proc. of the 1974ACM-SIGMOD Workshop on Data Description, Access and Control, Vol.2. 1974. 11-41.
    [6] Ronstr(o|¨)m M, Oreland J. Recovery principles in MySQL cluster 5.1. In:Proc. of the VLDB. 2005. 1108-1115.
    [7] Stonebraker M, Rowe LA. The design of postgres. In:Proc. of the SIGMOD Conf. 1986. 340-355.
    [8] Snell B, Hardman RG. DB2 performance and capacity planning case study. In:Proc. of the Int'l CMG Conf. 1986. 63-70.
    [9] Gary H. The Oracle Warehouse. In:Proc. of the VLDB. 1995. 707-709.
    [10] Chaudhuri S, Narasayya VR. An efficient cost-driven index selection tool for Microsoft SQL Server. In:Proc. of the VLDB. 1997.146-155.
    [11] Li JY. The Principle and Application System Development in Database. Beijing:China Water and Power Press, 2005. 20-22(in Chinese).
    [12] Chen PP. The entity-relationship model-toward a unified view of data. ACM Trans. on Database Systems, 1976,1(1):9-36.
    [13] Penney DJ, Stein J. Class modification in the GemStone object-oriented DBMS. In:Proc. of the OOPSL A'87. 1987. 111-117.
    [14] Soloviev V. An overview of three commercial object-oriented database management systems:ONTOS, ObjectStore, and 02.SIGMOD Record, 1992,21(1):93-104.
    [15] Lecluse C, Richard P, Velez F. 02, an object-oriented data model. In:Proc. of the DBPL'87. 1987. 257-276.
    [16] Barry DK. ITASCA distributed ODBMS. In:Proc. of the SIGMOD Conf. 1992. 70.
    [17] Wang J, Meng XF. Schema of semistructured data:A survey. Computer Science, 2001,28(2):6-10(in Chinese with English abstract).
    [18] Quin L. Extensible Markup Language(XML). World Wide Web Consortium(W3C), 2006. http://www.w3.org/XML/
    [19] W3C Consortium. XML Path Language(XPath)2.0. 2006. http://www.w3.org/TR/xpath20/
    [20] W3C Consortium. XQuery 1.0:An XML Query Language. 2006. http://www.w3.org/TR/xquery/
    [21] Le TN, Ling TW. Survey on keyword search over XML documents. ACM SIGMOD Record, 2016,45(3):17-28.
    [22] Liu Z, Chen Y. Processing keyword search on XML:A survey. World Wide Web, 2011,14(5-6):671-707.
    [23] Gou G, Chirkova R. Efficiently querying large XML data repositories:A survey. IEEE Trans. on Knowledge and Data Engineering,2007,19(10).
    [24] Hachicha M, Darmont J. A survey of XML tree patterns. IEEE Trans. on Knowledge and Data Engineering, 2013,25(1):29-46.
    [25] Carlisle D, Ion P, Miner R. Mathematical Markup Language(MathML)Version 3.0. World Wide Web Consortium(W3C). 2010.http://www.w3.org/TR/MathML/
    [26] Murray-Rust P, Rzepa H. Chemical markup language CML. 1995. http://www.xml-cml.org/
    [27] Lake R, Burggraf DS, Trninic M, Rae L. Geography Mark-Up Language:Foundation for the Geo-Web. Wiley, 2004.
    [28] BaseX. http://basex.org/
    [29] eXist. http://exist-db.org/exist/apps/homepage/index.html
    [30] MarkLogic. https://www.marklogic.com/
    [31] JSON. json.org. http://www.json.org
    [32] Bourhis P, Reutter JL, Suarez F, et al. JSON:Data model, query languages and schema specification. In:Proc. of the 36th ACM SIGMOD-SIGACT-SIGAI Symp. on Principles of Database Systems. ACM Press, 2017. 123-135.
    [33] Kumar S, Morstatter F, Liu H. Twitter Data Analytics. New York:Springer-Verlag, 2014.
    [34] Lin J, Ryaboy D. Scaling big data mining infrastructure:The Twitter experience. ACM SIGKDD Explorations Newsletter, 2013,14(2):6-19.
    [35] Nurseitov N, Paulson M, Reynolds R, et al. Comparison of JSON and XML data interchange formats:A case study. Caine, 2009,9:157-162.
    [36] Angles R, Gutierrez C. Survey of graph database models. ACM Computing Surveys(CSUR), 2008,40(1):1.
    [37] Angles R. A comparison of current graph database models. In:Proc. of the 28th IEEE Int'l Conf. on Data Engineering Workshops(ICDEW). IEEE, 2012. 171-177.
    [38] Wu H, Cheng J, Huang S, et al. Path problems in temporal graphs. Proc. of the VLDB Endowment, 2014,7(9):721-732.
    [39] Bonnici V, Giugno R. On the variable ordering in subgraph isomorphism algorithms. IEEE/ACM Trans. on Computational Biology and Bioinformatics(TCBB), 2017,14(1):193-203.
    [40] Zhang HY, Wang LW, Chen YX. Research progress of probabilistic graphical models:A survey. Ruan Jian Xue Bao/Journal of Software, 2013,24(11):2476-2497(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4486.htm[doi:10.3724/SP.J.1001.2013.04486]
    [41] Larranaga P, Moral S. Probabilistic graphical models in artificial intelligence. Applied Soft Computing, 2011,11(2):1511-1528.
    [42] Getoor L. An introduction to probabilistic graphical models for relational data. IEEE Data Engineering Bulletin, 2006,29(1):32-39.
    [43] Sucar LE. Probabilistic Graphical Models. London:Springer-Verlag, 2015. 978-981.
    [44] Lyu B, Qin L, Lin X, et al. Scalable supergraph search in large graph databases. In:Proc. of the 32nd IEEE Int'l Conf. on Data Engineering(ICDE). IEEE, 2016. 157-168.
    [45] Tong Y, Zhang X, Cao CC, et al. Efficient probabilistic supergraph search over large uncertain graphs. In:Proc. of the 23rd ACM Int'l Conf. on Information and Knowledge Management. ACM Press, 2014. 809-818.
    [46] Zhang W, Lin X, Zhang Y, et al. Efficient probabilistic supergraph search. IEEE Trans.on Knowledge and Data Engineering, 2016,28(4):965-978.
    [47] Hellmuth M, Ostermeier L, Stadler PF. A survey on hypergraph products. Mathematics in Computer Science, 2012,6(1):1-32.
    [48] Ausiello G, Laura L. Directed hypergraphs:Introduction and fundamental algorithms—A surve. Theoretical Computer Science,2017,658:293-306.
    [49] Kostakos V. Temporal graphs. Physica A:Statistical Mechanics and its Applications, 2009,388(6):1007-1023.
    [50] Michail O. An introduction to temporal graphs:An algorithmic perspective. Internet Mathematics, 2016,12(4):239-280.
    [51] Han W, Miao Y, Li K, et al. Chronos:A graph engine for temporal graph analysis. In:Proc. of the 9th European Conf. on Computer Systems. ACM Press, 2014. 1.
    [52] Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv Preprint arXiv:1801.07455, 2018.
    [53] Fernandes D, Bernardino J. Graph databases comparison:AllegroGraph, ArangoDB, InfiniteGraph, Neo4J, and OrientDB. In:Proc.of the DATA 2018. 2018. 373-380.
    [54] Martinez-Bazan N, Muntés-Mulero V, Gómez-Villamor S, Nin J, Sanchez-Martinez MA, Larriba-Pey JL. DEX:High-Performance exploration on large graphs for information retrieval. In:Proc. of the 16th Conf. on Information and Knowledge Management(CIKM). ACM Press, 2007. 573-582.
    [55] Iordanov B. HyperGraphDB:A generalized graph database. In:Proc. of the Int'l Conf. on Web-Age Information Management.Berlin, Heidelberg:Springer-Verlag, 2010. 25-36.
    [56] Webber J. A programmatic introduction to Neo4J. In:Proc. of the 3rd Annual Conf. on Systems, Programming, and Applications:Software for Humanity. ACM Press, 2012. 217-218.
    [57] Codd EF. Providing OLAP(On-line Analytical Processing)to User-Analysts:An IT Mandate. E.F. Codd and Associates, 1993.
    [58] Vieira M, Madeira H. A dependability benchmark for OLTP application environments. In:Proc. of the VLDB. 2003. 742-753.
    [59] Colliat G. OLAP, relational, and multidimensional database systems. SIGMOD Record, 1996,25(3):64-69.
    [60] Varga J, Etcheverry L, Vaisman AA, Romero O, Pedersen TB, Thomsen C. QB20LAP:Enabling OLAP on statistical linked open data. In:Proc. of the ICDE 2016. 2016. 1346-1349.
    [61] Chen Y, Rau-Chaplin A, Dehne FKHA, Eavis T, Green D, Sithirasenan E. cgmOLAP:Efficient parallel generation and querying of terabyte size ROLAP data cubes. In:Proc. of the ICDE 2006. 2006. 164.
    [62] Norman M, Wolfgang L, Shahul HP, Nitesh M, Carsten M, Sudipto C, Anil KG. SAP HANA—From relational OLAP database to big data infrastructure. In:Proc. of the EDBT 2015. 2015. 581-592.
    [63] Golfarelli M, Graziani S, Rizzi S. Shrink:An OLAP operation for balancing precision and size of pivot tables. Data&Knowledge Engineering, 2014,93:19-41.
    [64] Hasan KMA, Tsuji T, Higuchi K. An efficient implementation for MOLAP Basic data structure and its evaluation. In:Proc. of the DASFAA 2007. 2007. 288-299.
    [65] Salka C. Ending the MOLAP/ROLAP debate:Usage based aggregation and flexible HOLAP(abstract). In:Proc. of the ICDE'98.1998. 180.
    [66] Antoniu G, Bouge L, Hatcher PJ, MacBeth M, McGuigan K, Namyst R. The hyperion system:Compiling multithreaded Java bytecode for distributed execution. Parallel Computing, 2001,27(10):1279-1297.
    [67] Im DH, Cho CH, Jung IIG. Detecting a large number of objects in real-time using apache storm. In:Proc. of the ICTC 2014. 2014.836-838.
    [68] Yang MS, Ma RTB. Smooth task migration in apache storm. In:Proc. of the SIGMOD Conf. 2015. 2067-2068.
    [69] Schaefer C, Manoj PM. Enabling privacy mechanisms in apache storm. In:Proc. of the BigData Congress. 2015. 102-109.
    [70] Chen M, Mao S, Liu Y. Big data:A survey. Mobile Networks and Applications, 2014,19(2):171-209.
    [71] Shen DR, Yu G, Wang XT, Nie TZ, Kou Y. Survey on NoSQL for management of big data. Ruan Jian Xue Bao/Journal of Software,2013,24(8):1786-1803(in Chinese with English abstract), http://www.jos.org.cn/1000-9825/4416.htm[doi:10.3724/SP.J.1001.2013.04416]
    [72] Han J, Haihong E, Le G, et al. Survey on NoSQL database. In:Proc. of the 6th Int'l Conf. on Pervasive Computing and Applications(ICPCA). IEEE, 2011. 363-366.
    [73] Gessert F, Wingerath W, Friedrich S, et al. NoSQL database systems:A survey and decision guidance. Computer Science-Research and Development, 2017,32(3-4):353-365.
    [74] Memcached. https://github.com/memcached/memcached
    [75] Redis. https://redis.io/
    [76] Leve1DB. https://github.com/google/leveldb
    [77] MongDB. https://www.mongodb.com/
    [78] CouchDB. http://couchdb.apache.org/
    [79] Chang F, Dean J, Ghemawat S, et al. Bigtable:A distributed storage system for structured data. ACM Trans. on Computer Systems(TOCS), 2008,26(2):4.
    [80] HBase. https://hbase.apache.org/
    [81] Cassandra. http://cassandra.apache.org/
    [82] Gurevich Y. Comparative survey of NoSQL and NewSQL DB systems. The Open University, 2015. https://www.openu.ac.il/lists/mediaserver_documents/academic/cs/ComparativeSurvey.pdf
    [83] Grolinger K, Higashino WA, Tiwari A, et al. Data management in cloud environments:NoSQL and NewSQL data stores. Journal of Cloud Computing:Advances, Systems and Applications, 2013,2(1):22.
    [84] Kaur K, Sachdeva M. Performance evaluation of NewSQL databases. In:Proc. of the 2017 Int'l Conf. on Inventive Systems and Control(ICISC). IEEE, 2017. 1-5.
    [85] Moniruzzaman ABM. NewSQL:Towards next-generation scalable RDBMS for online transaction processing(OLTP)for big data management. arXiv Preprint arXiv:1411.7343, 2014.
    [86] Kallman R, Kimura H, Natkins J, et al. H-Store:A high-performance, distributed main memory transaction processing system. Proc.of the VLDB Endowment, 2008,1(2):1496-1499.
    [87] Cetintemel U, Du J, Kraska T, et al. S-Store:A streaming NewSQL system for big velocity applications. Proc. of the VLDB Endowment, 2014,7(13):1633-1636.
    [88] Bacon DF, Bales N, Bruno N, et al. Spanner:Becoming a SQL system. In:Proc. of the 2017 ACM Int'l Conf. on Management of Data. ACM Press, 2017. 331-343.
    [89] VoltDB. https://www.voltdb.com/
    [90] DB-Engines. https://db-engines.com/en/ranking
    [1]孟小峰,周龙骧,王珊.数据库技术发展趋势.软件学报,2004,15(12):1822-1836. http://www.jos.org.cn/1000-9825/20041208.htm
    [11]李建义.数据库原理及应用系统开发.北京:中国水利水电出版社,2005.20-22.
    [17]王静,孟小峰.半结构化数据的模式研究综述.计算机科学,2001,28(2):6-10.
    [40]张宏毅,王立威,陈瑜希.概率图模型研究进展综述.软件学报,2013,24(11):2476-2497. http://www.jos.org.cn/1000-9825/4486.htm[doi:10.3724/SP.J.1001.2013.04486]
    [71]申德荣,于戈,王习特,聂铁铮,寇月.支持大数据管理的NoSQL系统研究综述.软件学报,2013,24(8):1786-1803. http://www.jos.org.cn/1000-9825/4416.htm[doi:10.3724/SP.J. 1001.2013.04416]

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700