基于格雷码的结构化对等计算系统及其数据管理

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于格雷码的结构化对等计算系统及其数据管理

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Data Management in Structured Peer-to-Peer System Based on Gray Code
作者：周敏奇
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：对等计算 ; 分布无关 ; 密度估计 ; 偏好查找 ; 格雷码 ; 多属性范围查找 ; 索引结构
英文关键词：Peer-to-Peer ; distribution-free ; data density estimation ; preference query ; Gray Code ; multi-attribute query
学位年度：2008
导师：周傲英
学科代码：081202
学位授予单位：复旦大学
论文提交日期：2008-09-18

摘要

对等计算(Peer-to-Peer,简称P2P)是一个自组织的分布式网络系统。脱胎于文件共享,当前P2P系统的研究热点已经逐步过渡到:系统资源共享、分布式数据管理等。这类研究给现代网络应用注入了新的活力,因为P2P系统打破了现有网络结构,进而提供了一个自组织、高分散、可扩展、节点对等的分布式网络。同时,这种分布式网络结构也给网络应用带来了很多新的挑战,例如,如何提供有效的复杂数据查询、如何获取当前网络中数据的分布信息等等。
     尽管对等计算系统在文件共享、简单数据管理方面已经取得了不少研究成果,但是许多问题(如复杂数据管理等)依旧亟待解决。本文的研究重点是在现有成熟的P2P覆盖网络(Chord)基础上,对其改进、优化,并提出了一个新的P2P系统——GChord系统。结合了Chord系统的路由特性和格雷码(GrayCode)数据索引编码的优点,GChord(Gray Code based Chord System)系统由此具备了支持复杂数据管理能力。现有GChord系统支持的复杂数据管理主要包括:当前网络中数据密度分布函数的准确估计、一维(多维)精确匹配查找、一维(多维)范围查找、偏好查找、多属性查找等。以后我们将对GChord系统不断研究、充实,使之成为一个完备的P2P系统。下面给出本文的主要贡献:
     ●实现了GChord系统中节点管理和数据管理的无缝结合。GChord系统脱胎于Chord系统,但是青出于蓝而胜于蓝。GChord系统将原Chord系统中相互独立的节点管理和数据管理紧密地结合起来,进而获得了很多Chord系统不具备的特性。对于节点管理,GChord系统和Chord系统具有相同的组织、维护方式。对于数据管理,GChord系统结合了基于格雷码(任意相邻的两个格雷码相差一位)混洗编码(shuffle-based)的数据索引键值生成技术,实现了节点管理和数据管理的无缝结合。GChord系统的指表(fingertable,即路由表)不仅实现了节点路由的O(logn)复杂度,同时也实现了数据值域路由的O(logn)复杂度,其中n为系统中节点数目。突破了Chord系统只支持精确匹配查找的限制,GChord系统能够支持包括范围查找在内的许多复杂数据管理。
     ●对GChord系统中分布无关数据密度分布函数估计提供了支持,即让网络中的每个节点都能精确估计当前网络中数据的密度分布。P2P系统的许多应用均受益于密度分布估计,例如:负载平衡分析、复杂数据查询和数据挖掘等等。分布无关数据密度函数估计算法的主要特性是,不论底层数据按照何种模型分布抑或是根本不存在任何分布模型,该算法均能以近似的估计精度准确估计当前网络中的数据密度分布函数。分布无关密度估计算法首先将底层数据的任意分布转换成一中间分布——累计概率分布函数。由于累计概率分布函数的输出在[0,1]之间均匀分布,因此接着对累计概率分布函数的输出随机采样,可以准确估计当前网络中数据的密度分布。本文在GChord系统环境下,提出了三种计算累计概率分布函数的有效算法和两种对累计概率分布函数的输出随机采样的算法。累计概率分布函数计算算法让网络中每个节点只维护累计概率分布函数中相互不重叠的一个片段,而所有这些片断综合起来又构成了完整的累计概率分布函数,由此降低了各个节点的维护代价。我们从理论上证明了累计概率分布函数的计算误差和分布无关数据密度估计的估计误差,同时给出了详尽的算法。充分的实验也证明了该算法在估计数据分布方面的有效性和高效性。
     ●对GChord系统中偏好查找提供了支持。所谓偏好查找是指查询请求中包含有用户对各个属性项不同偏好程度并返回top-k条数据元组的查询。对于各个属性项具有不同偏好程度的查询请求,其查询结果可能大相径庭。现有数据索引技术均无法对偏好查找提供很好的支持。本文提出了一种在GChord系统上支持偏好查找的新颖方法。通过估计当前网络中的数据分布,并计算包含有top-k查询数据元组的区域范围,将偏好查找转换为范围查找并执行有效查找。通过对数据空间多维柱状图信息进行离散余弦变换可有效维护当前网络中数据的分布,从而实现数据密度分布的准确估计。本文也提出了对偏好矩阵(由查询请求给定)作奇异值分解,有效计算对应查找范围的算法。在实现偏好查找到范围查找的有效转换之后,通过多播区域(mutli-cast zone)的形式获取相关节点上的top-k条数据元组。本文给出了数据分布估计算法和范围计算的算法及其理论分析。详尽的实验证明了该方法在处理偏好查找方面的有效性和高效性。
     ●对GChord系统中多属性范围查找提供了支持。所谓多属性查找是指查询请求中仅包含有用户关心属性项的谓词(即查询中包含的属性项可以是数据元组属性项集合的子集)。因此,该类查询请求中包含的谓词可以针对任意数目的属性项,也可以针对任意属性项之间的组合。现有P2P系统都无法高效的处理该类查询。由于GChord系统实现了节点管理和数据管理之间的无缝结合,进而实现了相邻数据之间的快速路由,由此提供了对多属性查询的良好支持。本文提出了使用卡诺图(Karnaugh map)有效计算包含有查询结果节点的算法,并且以多播树(multi-cast tree)的方式获取这些节点上数据元组的算法及其理论分析。详尽的试验证明了该方法在处理多属性查询的有效性和高效性。
     综上所述,本文详细介绍了GChord系统,及其支持的复杂数据管理操作。GChord系统实现了节点管理和数据管理的无缝结合,并对当前网络中分布无关数据密度的精确估计、偏好查找、多属性范围查找、一维精确匹配查找(源于对Chord系统的继承)、多维精确匹配查找和一维(多维)范围查找(源于对多属性范围查找的支持)提供了支持。这些功能满足了当前P2P系统发展的需要,促进了P2P系统的发展。以后我们将不断充实和完善GChord系统,使之成为一个成熟、完备的P2P系统。
Peer-to-Peer(P2P in short) system is a self-organized distributed network system. Derived from file sharing,current research hot spots in P2P systems have been shifted to systemic resource sharing and distributed data management.Network applications are boosted immensely by such kind of research,since current network structure is broken through by P2P network systems which provided by a self-organized,highly distributed,scalable networks.Meanwhile,new challenges are brought into network applications under such kind of distributed network structures, such as providing complex query processing operators,estimating data density in the current system and so on.
     Much researches have been done and much goal has been achieved on resource sharing,simple data management in the P2P system environment,while lots of problems are still needed to be solved imperatively.The focus of this paper is to provide a new P2P system(called GChord),which is improved,optimized from one existing P2P overlay network(say Chord).GChord(Gray Code Based Chord) system provides well support to complex data management,since it combines the routing property of Chord and the excellency of data index based on shuffle-based Gray Code seamlessly.The complex data management supported by current GChord system can be listed as follows:data density estimation in the current network, one-dimensional(multi-dimensional) exact match query,one-dimensional(multidimensional) range query,preference query,multi-attribute(partial match) range query and so on.Moreover,new data management operators will be added to the new generation of GChord system,as our future research focus.The main contribution of this paper is to propose the new GChord system and its data management. The details are given as follows:
     ●GChord system provides seamless combination between node management and data management.Inspired by Chord,GChord system out performance much over Chord.Since its seamless combination of node management and data management,complex data management operators can be supported by GChord while not by Chord.For node management,GChord has the same construction and maintenance as Chord.For data management,GChord provides Gray Code(any two adjacent Gray Codes differ in one-bit) based data index strategies,which in further results in the seamless combination of node management and data management.GChord system not only provides the complexity of O(logn) for node access in the system under its finger table, but also provides the complexity of O(logn) in data access according to its domain routing under its finger tables,where n is the number of nodes in the current system.Besides exact match query which provided by Chord,GChord system gives support to plenty of complex data management types.
     ●GChord system gives support to the global data distribution estimation in the current system.It can benefit many P2P applications,such as load balancing analysis,query processing,data mining,and so on.In this paper,we propose a novel model named distribution-free data density estimation,which is based on distribution-free(i.e.,independent of data distributions) sampling on global cumulative distribution to achieve high estimation accuracy with low estimation cost regardless of distribution models of the underlying data.The idea of distribution-free estimation is deployed in GChord system by sampling a small subset of peers to estimate global data distribution over the data domain.Algorithms on computing and sampling global cumulative distribution based on which global data distribution is estimated are introduced with detailed theoretical analysis.Our extensive performance study confirms the effectiveness and efficiency of our methods under different data distributions in dynamic GChord networks.
     ●GChord system gives support to preference query evaluation which returns top-k results according to users' specific interests has been well studied in centralized databases.However,it has never been discussed in the Peer-to-Peer (P2P) environment and remains imperative to be solved.In this paper, we propose a novel method for efficient preference query processing in GChord system.By effectively calculating a corresponding range which contains top-k results based on estimated data density in the current system using discrete cosine transformed histogram information,a preference query is transformed into a range query for efficient processing.Singular value decomposition of preference matrix is deployed to simplify the progress of range computation. Algorithms on density estimation and range computation are given as well as the theoretical analysis.Our extensive performance study confirms the effectiveness and efficiency of our methods under different data distributions in GChord system.
     ●GChrod system gives support to multi-attribute query.Multi-attribute query is referred to as a query which only contains predicates on attributes that users are interest(i.e.subset of attributes contained in data tuples).That's to say,not only the number of attributes which have predicates in query could be arbitrary,but also the combination of attributes which have predicates in query could be arbitrary.That's why such kind of queries can't be supported well by existing P2P systems.However,GChord system gives well support to it,since it provides seamless combination between node management and data management,resulting in quick routing between data tuples according attribute domains.This paper proposes new algorithms on nodes computing which contains desired results based on Karnaugh map,and new algorithms on node accessing based on multi-cast tree constructed on-the-fly.Our extensive performance study confirms the effectiveness and efficiency of our methods for multi-attribute query in dynamic GChord networks.
     In a word,this paper gives a detailed discuss on GChord system and its functionality on data management,including overlay network and complex data management. GChord system achieves seamless combination between node management and data management.GChord system provides support to distribution-free data density estimation in the current network,preference query,multi-attribute(partial match) range query,one-dimensional exact match query(since the derivation from Chord system),multi-dimensional exact match and one-dimensional(multidimensional) range query(since the support for multi-attribute range query) and so on.For more applications to be supported in data management,we propose to give them with support in the next generation of GChord system.

引文

[1]FIPS 180-1.Secure Hash Standard.U.S.Department of Commerce/NIST,National Technical Information Service,Springfield,VA,1995.
    [2]K.Aberer.P-grid:A self-organizing access structure for p2p information systems.In Proceedings of the 6th International Conference on Cooperative Information Systems(CoopIS'2001),pages 179-194,September 2001.
    [3]A.Andrzejak and Z.Xu.Scalable,efficient range queries for grid information services.In Proceedings of the 2nd International Conference on Peer-to-Peer Computing(P2P'2002),pages 33-40,September 2002.
    [4]B.Arai,G.Das,D.Gunopulos,and V.Kalogeraki.Approximating aggregation queries in peer-to-peer networks.In Proceedings of the 22nd International Conference on Data Engineering(ICDE'2006),pages 42-44,April 2006.
    [5]B.Arai,S.Lin,and D.Gunopulos.Efficient Data Sampling in Heterogeneous Peer-to-Peer Networks.In Procceeding of 7th IEEE International Conference on Data Mining(ICDM'2007),pages 23-32,2007.
    [6]R.Baeza-Yates,B.Ribeiro-Neto,et al.Modern information retrieval.Addison-Wesley Harlow,England,1999.
    [7]N.Beckmann,H.-P.Kriegel,R.Schneider,and B.Seeger.The R~*-tree:An efficient and robust access method for points and rectangles.In Proceedings of the 1990 ACM SIGMOD.International Conference on Data Management (SIGMOD'1990),pages 322-331,June 1990.
    [8]J.L.Bentley.Multidimensional binary search trees used for associative searching.Communications of the ACM,18(9):509-517,September 1975.
    [9]A.Bharambe,M.Agrawal,and S.Seshan.Mercury:Supporting scalable multiattribute range queries.In Proceedings of the A CM SIGCOMM 2004 Conference on Applications,Technologies,Architectures,and Protocols for Computer Communication(SIGCOMM'2004),pages 353-366,August 2004.
    [10]BitTorrent.Bittorrent protocol 1.0.http://www.bittorrent.org//protocol.html,2002.
    [11]C.Bohm,G.Klump,and H.Kriegel.Xz-ordering:A space-filling curve for objects with spatial extension.In Advances in Spatial Databases,6th International Symposium(SSD'1999),pages 75-90,July 1999.
    [12]N.Bruno,S.Chaudhuri,and L.Gravano.STHoles:a multidimensional workload-aware histogram.In Proceedings of the 2001 ACM SIGMOD international conference on Management of data(SIGMOD'2001),pages 211-222,2001.
    [13]M.Cai,M.Frank,J.Chen,and P.Szekely.Maan:A multi-attribute address-able network for grid information services.In Proceedings of 4th International Workshop on Grid Computing(GRID'2003),pages 184-191,November 2003.
    [14]F.Chang,J.Dean,S.Ghemawat,W.C.Hsieh,D.A.Wallach,M.Burrows,T.Chandra,A.Fikes,and R.E.Gruber.Bigtable:a distributed storage system for structured data.In Proceedings of the 7th symposium on Operating systems design and implementation(OSDI'2006),pages 205-218,2006.
    [15]Y.C.Chang,L.Bergman,V.Castelli,C.S.Li,M.L.Lo,and J.R.Smith.The onion technique:indexing for linear optimization queries.In Proceedings of the 2000 ACM SIGMOD international conference on Management of Data (SIGMOD'2000),pages 391-402,June 2000.
    [16]S.Chaudhuri,G.Das,and U.Srivastava.Effective use of block-level sampling in statistics estimation.In Proceedings of the 2004 ACM SIGMOD international conference on Management of data(SIGMOD'2004),pages 287-298,June 2004.
    [17]C.M.Chen and N.Roussopoulos.Adaptive selectivity estimation using query feedback.In Proceedings of the 1994 ACM SIGMOD international conference on Management of data(SIGMOD'1994),pages 161-172,1994.
    [18]I.Clarke,O.Sandberg,B.Wiley,and T.Hong.Freenet:A distributed anonymous information storage and retrieval system.In Proceedings of ICSI Work-shop on Design Issues in Anonymity and Unobservability(ICSI'2001),pages 46-66,July 2001.
    [19]D.Comer.The ubiquitous b-tree.ACM Computing Surveys,11(2):121-137,1979.
    [20]A.Crainiceanu,P.Linga,J.Gehrke,and J.Shanmugasundaram.P-tree:a p2p index for resource discovery applications.In Proceedings of the 13th international World Wide Web conference on alternate track papers(?) posters (WWW'2004),pages 390-391,2004.
    [21]A.Crainiceanu,P.Linga,A.Machanavajjhala,J.Gehrke,and J.Shanmugasundaram.P-ring:an efficient and robust P2P range index structure.In Proceedings of the 2007 A CM SIGMOD international conference on Management of data(SIGMOD'2007),pages 223-234,2007.
    [22]Souptik Datta,Kanishka Bhaduri,Chris Giannella,Ran Wolff,and Hillol Kargupta.Distributed data mining in peer-to-peer networks.IEEE Internet Computing,10:18-26,2006.
    [23]J.Dean and S.Ghemawat.MapReduce:simplified data processing on large clusters.In Proceedings of the 6th conference on Symposium on Opearting Systems Design(?) Implementation-Volume 6 table of contents(OSDI'2004),pages 10-10,2004.
    [24]S.Deering,Dl Estrin,D.Farinacci,V.Jacobson,C.G.Liu,and L.Wei.The PIM architecture for wide-area multicast routing.IEEE/ACM transactions on networking,4(2):153-162,1996.
    [25]R.Fagin.Fuzzy queries in multimedia database systems.In Proceedings of the 17th A CM SIGA CT-SIGMOD-SIGART symposium on Principles of database systems(PODS'1998),pages 1-10,1998.
    [26]R.Fagin.Combining fuzzy information:an overview.SIGMOD Record,31(2):109-118,2002.
    [27]R.Fagin,A.Lotem,and M.Naor.Optimal aggregation algorithms for middleware.Journal of Computer and System Sciences,66(4):614-656,2003.
    [28]F.Gray.Pulse code communications.U.S.Patent 2632058,1953.
    [29]M.Fisz.Probability theory and mathematical statistics.Wiley New York,1963.
    [30]P.Ganesan,B.Yang,and H.G.Molina.One torus to rule them all:Multi-dimensional queries in p2p systems.In Proceedings of the 7th International Workshop on the Web and Databases(WebDB'2004),pages 19-24,June 2004.
    [31]H.Garcia-Molina,J.D.Ullman,and J.Widom.Database System:The Complete Book.Prentice Hall,2001.
    [32]Genome@home.Genome@home Homepage.http://www.stanford.edu/group/pandegroup/genome,2003.
    [33]S.Ghemawat,H.Gobioff,and S.T.Leung.The Google file system.In Proceedings of the 19th Symposium on Operating Systems Principles(OSDI'2003),pages 29-43,2003.
    [34]C.Gkantsidis,M.Mihail,and A.Saberi.Random walks in peer-to-peer networks.In Proceedings of the 23rd Annual Joint Conference of the IEEE Computer and Communications Societies(INFOCOM'2004),pages 1-14,2004.
    [35]Gnutella.Gnutella Homepage.http://www.Gnutella.com.
    [36]A.Gupta,D.Agrawal,and A.El Abbadi.Approximate range selection queries in peer-to-peer systems.In Proceedings of 1st Biennial Conference on Innovative Data Systems Research(CIDR'2003),pages 141-151,January 2003.
    [37]A.Guttman.R-trees:a dynamic index structure for spatial searching.In Proceedings of the 1984 ACM SIGMOD international conference on Management of Data(SIGMOD'1984),pages 47-57,June 1984.
    [38]G.Wahba.A polynomial algorithm for density estimation.The annals of mathematical statistics,6:1870-1886,1971.
    [39]P.J.Haas and C.K(o|¨)nig.A bi-level Bernoulli scheme for database sampling.In Proceedings of the 2004 ACM SIGMOD international conference on Management of data(SIGMOD'2004),pages 275-286,2004.
    [40]P.J.Haas,J.F.Nanghton,S.Seshadri,and L.Stokes.Sampling-Based Estimation of the Number of Distinct Values of an Attribute.In Proceedings of the 21th International Conference on Very Large Data Bases(VLDB'1995),pages 311-322,1995.
    [41]A.Y.Halevy,Z.G.Ives,D.Suciu,and I.Tatarinov.Schema Mediation in Peer Data Management Systems.In Proceedings of the 19nd International Conference on Data Engineering(ICDE'2003),pages 505-516,2003.
    [42]M.Harren,J.M.Hellerstein,R.Huebsch,B.T.Loo,S.Shenker,and I.Stoica.Complex queries in dht-based peer-to-peer networks.In Proceedings of the 1st International Workshop on Peer-to-Peer Systems(IPTPS'2002),pages 242-259,March 2002.
    [43]N.Harvey,M.B.Jones,S.Saroiu,M.Theimer,and A.Wolman.Skipnet:A scalable overlay network with practical locality properties.In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS'2003),pages 113-126,March 2003.
    [44]O.Heckmann and A.Bock.The eDonkey 2000 Protocol.Technical report,Darmstadt University of Technology,2002.
    [45]K.Hinrichs and J.Nievergelt.The grid file:A data structure designed to support proximity queries on spatial objects.In Proceedings of the International Workshop on Graphtheoretic Concepts in Computer Science(WG'1983),pages 100-113,1983.
    [46]V.Hristidis,N.Koudas,and Y.Papakonstantinou.PREFER:a system for the efficient execution of multi-parametric ranked queries.In Proceedings of the 2001 A CM SIGMOD international conference on Management of Data (SIGMOD'2001),pages 259-270,2001.
    [47]Y.Hu,H.Chen,J.Lou,and J.Li.Distributed Density Estimation Using Nonparametric Statistics.In Proceedings of the 27th International Conference on Distributed Computing Systems(ICDCS'2007),2007.
    [48]R.Huebsch,J.M.Hellerstein,N.Lanham,B.T.Loo,S.Shenker,and I.Stoica.Querying the internet with PIER.In Proceedings of the 29th International Conference on Very Large Data Bases(VLDB'2003),pages 321-332,2003.
    [49]iClick,iClick Homepage.http://www.iclick.cn,2008.
    [50]Icq.ICQ Homepage.http://www.icq.com,2003.
    [51]Y.E.Ioannidis.Universality of Serial Histograms.In Proceedings of the 19th International Conference on Very Large Data Bases(VLDB'1993),pages 256-267,1993.
    [52]Y.E.Ioannidis and V.Poosala.Balancing histogram optimality and practicality for query result size estimation.In Proceedings of the 1995 ACM SIGMOD international conference on management of data(SIGMOD'1995),pages 233-244,1995.
    [53]H.V.Jagadish,B.C.Ooi,K.L.Tan,Q.H.Vu,and R.Zhang.Speeding up search in peer-to-peer networks with a multi-way tree structure.In Proceedings of the A CM SIGMOD International Conference on Management of Data (SIGMOD'2006),pages 1-12,June 2006.
    [54]H.V.Jagadish,B.C.Ooi,and Q.H.Vu.Baton:a balanced tree structure for peer-to-peer networks.In Proceedings of the 31st International Conference on Very Large Data Bases(VLDB'2005),pages 661-672,August 2005.
    [55]H.V.Jagadish,B.C.Ooi,Q.H.Vu,R.Zhang,and A.Y.Zhou.Vbitree:A peer-to-peer framework for supporting multi-dimensional indexing schemes.In Proceedings of the 22nd International Conference on Data Engineering(ICDE'2006),pages 34-43,April 2006.
    [56]HV Jagadish,N.Koudas,S.Muthukrishnan,V.Poosala,K.C.Sevcik,and T.Suel.Optimal Histograms with Quality Guarantees.In Proceedings of the 24rd International Conference on Very Large Data Bases(VLDB'1998),pages 275-286,1998.
    [57]HV Jagadish,B.C.Ooi,and Q.H.Vu.BATON:a balanced tree structure for peer-to-peer networks.In Proceedings of the 31st international conference on Very Large Data Bases(VLDB'2005),pages 661-672,2005.
    [58]H.V.Jagadish,Beng Chin Ooi,Heng Tao Shen,and Klan-Lee Tan.Towards efficient multi-feature query processing.IEEE Transactions on Knowledge and Data Engineering,18(3):350-362,2006.
    [59]J.Miller.Jabber:Conversational Technologies.In Peer-to-Peer:Harnessing the Power of Disruptive Technologies.Oram A Ed.O'Reilly,2002.
    [60]Joost.Joost Homepage.http://www.joost.com,2008.
    [61]D.Karger,E.Lehman,T.Leighton,R.Ranigrahy,M.Levine,and D Lewin.Consistent hashing and random trees:algorithms for caching in distributed networks.In Proceedings of the 29th Annual ACM Symposium on Theory of Computing(STOC'1997),pages 654-663,1997.
    [62]A.Kementsietsidis,M.Arenas,and R.J.Miller.Mapping data in peer-to-peer systems:semantics and algorithmic issues.Proceedings of the 2003 ACM SIGMOD international conference on Management of data(SIGMOD'2003),pages 325-336,2003.
    [63]V.King and J.Saia.Choosing a random peer.In Proceedings of the 23rd Annual ACM Symposium on Principles of Distributed Computing(PODC'2004),pages 125-130,July 2004.
    [64]J.Kleinberg.The Small-World Phenomenon:An Algorithmic Perspective.In Proceedings of the 32nd Annual Acre Symposium on Theory of Computing (STOC'2000),pages 21-23.
    [65]W.Kowalczyk and N.Vlassis.Newscast em.NIPS,2:713-720,2005.
    [66]Y.Kulbak and D.Bickson.The eMule Protocol Specification.http://sourceforge.net,2005.
    [67]J.Lee,H.Lee,S.Kang,S.M.Kim,and J.Song.Ciss:An efficient object clustering framework for dht-based peer-to-peer applications.In Pro- ceedings of Databases,Information Systems,and Peer-to-Peer Computing (DBISP2P'2004),pages 215-229,August 2004.
    [68]M.P.Lukas.Distributed Control Systems:Their Evaluation and Design.Van Nostrand Reinhold,1986.
    [69]D.Malkhi,M.Naor,and D.Ratajczak.Viceroy:A scalable and dynamic emulation of the butterfly.In Proceedings of the 21st Annual A CM Symposium on Principles of Distributed Cornputing(PODC'2002),pages 183-192,July 2002.
    [70]Y.Matins,J.S.Vitter,and M.Wang.Wavelet-based histograms for selectivity estimation.In Proceedings of the 1998 A CM SIGMOD international conference on Management of data(SIGMOD'1998),pages 448-459,1998.
    [71]Y.Matins,J.S.Vitter,and M.Wang.Wavelet-based histograms for selectivity estimation.Proceedings of the 1998 ACM SIGMOD international conference on Management of data(SIGMOD'1998),pages 448-459,1998.
    [72]P.Maymounkov and D.Mazieres.Kademlia:A peer-to-peer information system based on the XOR metric.In Proceedings of the 1st International Workshop on Peer-to-Peer Systems(IPTPS' 02),pages 258-263,2002.
    [73]MSN Messenger.MSN Messenger homepage,http://messenger.msn.com,2003.
    [74]M.Evans,N.Hastings,and B.Peacock.Statistical Distributions.Wiley,New York,2000.
    [75]M.Karnaugh.The map method for synthesis of combinational logic circuits.Journal of Symbolic Logic,72:593-599,1953.
    [76]M.Matsumoto and T.Nishimura.Mersenne twister:A 623-dimensionally equidistributed uniform pseudorandom number generator.ACM Transactions on Modeling and Computer Simulation,8(1):3-30,1998.
    [77]R.Motwani and P.Raghavan.Randomized algorithms.Cambridge University Press,1995.
    [78]P.Valduriez M.T.Ozsu.Principles of Distributed Database Systems.Second Edition Prentice-Hall,1999.
    [79]Napster.Napster Homepage.http://www.napster.com.
    [80]OceanStor.Oceanstore homepage,http://oceanstore.cs.berkeley.edu/,2003.
    [81]A.Oram.Peer-to-Peer:Harnessing the Power of Disruptive Technologies.SIGMOD Record,32(2):57,2003.
    [82]J.Orenstein and T.Merrett.A class of data structures for associative searching.In Proceedings of the Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems(PODS'1984),pages 181-190,April 1984.
    [83]M.T.Ozsu and P.Valduriez.Principles of distributed database systems.Prentice-Hall,Inc.Upper Saddle River,NJ,USA,1991.
    [84]T.Pitoura and P.Triantafillou.Load Distribution Fairness in P2P Data Management Systems.In Proceedings of 23rd International Conference on Data Engineering(ICDE'2007),pages 396-405,2007.
    [85]C.G.Plaxton,R.Rajaraman,and A.W.Richa.Accessing nearby copies of replicated objects in a distributed environment.In Proceedings of the gth Annual ACM Symposium on Parallel Algorithms and Architectures(SPAA'1997),pages 311-320,June 1997.
    [86]PPLive.PPLive Homepage.http://www.pplive.com,2008.
    [87]PPStream.PPStream Homepage.http://www.pps.tv,2008.
    [88]W.Pugh.A skip list cookbook.Technical Report CS-IR-2286.1,University of Maryland,Colledge Park,1989.
    [89]QQ.QQ Homepage.http://www.qq.com,2008.
    [90]QQ.QQ Homepage.http://www.qq.com,2008.
    [91]QQLive.QQLive Homepage.http://tv.qq.com,2008.
    [92]R.Ramakrishnan.Database Management Systems.WCB/McGraw-Hill,1998.
    [93]R.Ramakrishnan and J.Gehrke.Database Management Systems.McGraw-Hill Science/Engineering/Math,2003.
    [94]KR Rao and P.Yip.Discrete cosine transform:algorithms,advantages,applications.Academic Press Professional,Inc.San Diego,CA,USA,1990.
    [95]S.Ratnasamy,P.Francis,M.Handley,R.Karp,and S.Shenker.A scalable content-addressable network.In Proceedings of the ACM SIGCOMM 2001Conference on Applications,Technologies,Architectures,and Protocols for Computer Communication(SIGCOMM'2001),pages 161-172,August 2001.
    [96]W.D.Ray and R.M.Driver.Further decomposition of the Karhunen-Loeve series representation of a stationary random process.IEEE Transactions on Information Theory,16(6):663-668,1970.
    [97]A.Rowstron and P.Druschel.Pastry:Scalable,distributed object location and routing for large-scale peer-to-peer systems.In Proceedings of the 18th IFIP/A CM International Conference on Distributed Systems Platforms(Middleware'2001),pages 329-350,Novemeber 2001.
    [98]M.Ruhl.Efficient algorithms for new computational models.Doctoral Dissertation,Massachusetts Institute of Technology,2003.
    [99]S.Christodoulakis.Estimating record selectivities.Information Systems Journal,8:105-115,1983.
    [100]T.Seidl and H.P.Kriegel.Optimal multi-step k-nearest neighbor search.In Proceedings of the 1998 ACM SIGMOD international conference on Management of data(SIGMOD'1998),pages 154-165,1998.
    [101]SETI@home.SETI@home Homepage.http://www.setiathome.ssl.berkeley.edu,2003.
    [102]S.Shenker.The Data-centric Revolution in Networking.In Proceedings of the 29th International Conference on Very Large Data Bases(VLDB'2003),pages 15-26,2003.
    [103]C.Shirky.Listening to Napster.Peer-to-Peer:Harnessing the Benefits of a Disruptive Technology,pages 19-28,2001.
    [104]Y.Shu,B.C.Ooi,K.L.Tan,and A.Zhou.Supporting Multi-Dimensional Range Queries in Peer-to-Peer Systems.In Proceedings of the 5th IEEE International Conference on Peer-to-Peer Computing(P2P2005),pages 173-180,2005.
    [105]Y.Shu,K-L.Tan,and Aoying Zhou.Adapting the content native space for load balanced indexing.In Proceedings of Databases,Information Systems,and Peer-to-Peer Computing(DBISP2P'2004),pages 122-135,August 2004.
    [106]Skype.Skype Homepage.http://www.skype.com,2008.
    [107]S.N.Bernstein.Theory of Probability.Moscow,Russian,1927.
    [108]S.Seshadri.Probabilistic Methods in Query Processing.PhD thesis,University of Wisconsin,1992.
    [109]S.Steinmetz and K.Wehrle.Peer-to-Peer Networking & Computing.Informatik spektrum,27(1):51-54,2004.
    [110]I.Stoica,R.Morris,D.Karger,M.F.Kaashoek,and H.Balakrishnan.Chord:a scalable peer-to-peer lookup service for internet applications.In Proceedings of the ACM SIGCOMM 2001 Conference on Applications,Technologies,Architectures,and Protocols for Computer Communication(SIGCOMM'2001),pages 149-160,August 2001.
    [111]Y.Tao,V.Hristidis,D.Papadias,and Y.Papakonstantinou.Branch-and-bound processing of ranked queries.Information Systems,32(3):424-445,2007.
    [112]L.N.Trefethen and D.Bau.Numerical Linear Algebra.Society for Industrial Mathematics,1997.
    [113]TuDou.TuDou Homepage.http://www.tudou.com,2008.
    [114]UUSee.UUSee Homepage.http://www.uusee.com,2008.
    [115]A.Vlachou,C.Doulkeridis,K.Norvag,and M.Vazirgiannis.Skyline-based Peer-to-Peer Top-k Query Processing.In Proceedings of the 24th International Conference on Data Engineering(ICDE'2008),pages 1421-1423,2008.
    [116]S.Wang,B.C.Ooi,A.K.H.Tung,and L.Xu.Efficient skyline query processing on peer-to-peer networks.In Proceedings of 23rd International Conference on Data Engineering(ICDE'2007),pages 1126-1135,2007.
    [117]S.Wu,J.Li,B.C.Ooi,and K.L.Tan.Just-in-time query retrieval over partially indexed data on structured P2P overlays.In Proceedings of the 2008 ACM SIGMOD international conference on Management of data(SIGMOD'2008),pages 279-290,2008.
    [118]Xenoservers.Xenoserverse Homepage.http://www.cl.cam.ac.uk/research/srg/netos/xeno,2003.
    [119]D.Xin,C.Chen,and J.Han.Towards robust indexing for ranked queries.In Proceedings of the 32nd international conference on Very Large Data Bases (VLDB'2006),pages 235-246,2006.
    [120]D.Xin,J.Han,and K.C.Chang.Progressive and selective merge:computing top-k with ad-hoc ranking functions.In Proceedings of the 2007 ACM SIGMOD international conference on Management of data(SIGMOD'2007),pages 103-114,2007.
    [121]D.Xin,J.Han,H.Cheng,and X.Li.Answering top-k queries with multi-dimensional selections:the ranking cube approach.In Proceedings of the 32nd international conference on Very Large Data Bases(VLDB'2006),pages 463-474,2006.
    [122]B.Y.Zhao,J.Kubiatowicz,and A.D.Joseph.Tapestry:An Infrastructure for Fault-tolerant Wide-area Location and Routing.IEEE Journal on Selected Areas in Communications,22(1):41-53,2001.
    [123]A.Y.Zhou,L.H.Xu,and W.N.Qian.Adaptive.probabilistic search over unstructured peer-to-peer computing systems.World Wide Web archive,9(4):537-556,2006.
    [124]Zhou SG Ling B Xu LH Wee Siong Ng Beng Chin Ooi Klan-Lee Tan Zhou AY,Qian WN.Data Management in Peer-to-Peer Environment:A Perspective of BestPeer.Journal of Computer Science and Technoligy,18(4):452-461.
    [125]Y.Zhu and Y.Hu.Towards efficient load balancing in structured P2P systems.In Proceedings of the 18th International on Parallel and Distributed Processing Symposium(IPTPS'2004),2004.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700