用户名: 密码: 验证码:
多Agent系统中合作与协调机制的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
普适化、网络化、智能化、代理化、人性化是自动化计算发展的总体趋势,多Agent计算正是在此历史进程中继分布式计算、P2P计算出现的一种新的先进计算模式。其对问题求解过程类似于人类思维的方式,不同于传统的算法设计,不需要对问题有全面的分析,而只需指定Agent的目标,它们能通过彼此交互自动地逐步实现用户的目标。对大型分布式问题建立多Agent系统使计算机系统能更智能化,进一步代替更多人的工作;面向Agent的软件工程使程序设计更为人性化,软件设计过程更符合人的思考习惯;基于Agent的社会仿真是计算机科学与社会学的结合,使计算机技术在人文领域发挥其积极作用。多Agent计算有利于促进计算机技术的进一步繁荣。
     多Agent计算要真正达到其概念提出所具有的优秀特性,还需要大量的科研努力。就基于Agent的系统而言,Agent的构造、通信语言的设计、合作与协调是多Agent计算最直接面临的、亟待解决的关键问题。而以合作与协调为目的的Agent交互能力是多Agent计算区别于其他计算模式的关键所在。正如人类社会一样,合作与协调是解决大型复杂问题的重要途径。本文正是对多Agent系统的合作与协调问题进行了积极的探索,在部分子方向上取得了一定的成果。
     组织建立、联盟形成、任务分配是多Agent合作研究的主要方向。组织和联盟是多Agent合作的基础,而任务分配实现合作关系的实例化。本文针对多Agent系统的任务分配问题,考虑多Agent的网络拓扑和能力水平存在差异的特点后,在以往并行计算任务调度的基础上,提出了两个适应网络拓扑的合作异构Agent间任务分配算法。一个是考虑这两个特性后通过穷举搜索得到最优Agent分配组合,一个是利用启发式搜索降低算法时间复杂度得到任务次优的Agent组合。对于大规模的多Agent系统、任务动态到达的情形下,以上算法无用武之地。因此,继续探讨了多任务流的动态分配问题,提出了基于Q学习的分布式自适应分配算法。该算法不仅能适应自身任务流的到达过程,还充分兼顾其他任务流到达过程及分配的影响。分布式特性使得算法适用于开放的、局部可见的多Agent系统,而强化学习的采用使得任务分配决策能适应系统的任务负载和分布。该算法表现出较高的任务吞吐量,较低的平均任务执行时间。
     对于多Agent系统中协调问题,主要的研究工作可以划分为三块:建立群体思维状态模型、多Agent规划、Agent社会规范。这三块对Agent之间的协调都有各自的优势和效果。本文对这一问题的工作是多Agent规划的延续。本文提出的两个模型所得到的规划不再是传统意义上一系列行为的排列组合,而是Agent在实现目标过程中行为的选择策略。这使得规划具有更大灵活性。多Agent学习是制定行为策略中研究较多且很具前景的方法。本文针对冲突博弈这一常见的Agent竞争关系进行了分析,基于矩阵博弈的Nash均衡概念定义了Agent的最佳响应策略,然后利用模型无关的强化学习方法找到该策略。该模型得到的策略很大程度上降低了冲突发生的次数,增强了Agent行为的协调性,而且从长期效用看,策略具有一定的公平性,有利于系统的稳定。对于一般和博弈的协调,目前提出的许多算法都较容易被利用而降低了自身的利益,本文在分析了Agent行为策略的时变性和适应性两个重要属性后,认为具有这两个属性的动态策略有利于Agent做出更为理性的决策,在混合多Agent环境下有利于避免被利用的风险,针对不同类型Agent做出最大化自身利益的响应。
     Agent大规模应用后,Agent社会将成为一个特殊的多Agent系统。这时Agent的社会属性将变得越来越重要。除了信念、意图、愿望等心智属性外,个性也将在Agent的行为选择中具有重要影响,依据个性对其他Agent建模有利于制定更为协调的行为策略。本文将个性加入到Agent的行为选择过程中,利用定性决策理论,建立了一个个性化的行为选择模型。不同的定性决策原则对应了不同的Agent性格特征,依据这些决策原则选择的行为造成了Agent行为的多样化。进一步,由于个性存在复杂和描述困难的特点,而人工神经网络具有刻画人类难以理解函数的优势,因此基于神经网络提出了一个新的个性化行为选择模型。相比于前者,该模型具有更强的个性表征能力,能刻画出更为细腻的个性类型。此外,基于复杂适应系统仿真工具包Swarm搭建了多Agent系统的仿真平台,并透过实例研究了个性在实践中的应用,更明确了个性研究的重要性和现实价值。以上这些工作尽管原理较为简单,但却是在传统符号逻辑基础上研究Agent心智状态之外的一个新的尝试和初步的探索,为多角度反映社会混沌复杂特征提供了可能。
     综上所述,本文以多Agent系统中的合作与协调机制为研究课题,通过广泛调研和深入探索,在任务分配、基于学习的行为协调、个性化行为选择三个问题上提出了如下若干有益的模型和算法:
     适应网络拓扑的合作异构Agent静态任务分配算法;
     基于Q学习的多任务流动态分配算法;
     基于后悔值的多Agent冲突博弈强化学习机制;
     混合多Agent环境下一般和博弈动态策略强化学习机制;
     基于定性决策理论的Agent个性化行为选择模型;
     基于人工神经网络的Agent个性化行为选择模型。
Automatic computing is striding forward to pervasive,network-oriented, intelligent,agent-based,and humanized computation.Multi-agent computing is an advanced computing mode emerging right after distributed computing and peer-to-peer computing.Its problem solving process is very close to the way of thinking human being do.Unlike traditional algorithm designing which has to analyze the problem comprehensively,multi-agent computing only needs to assign agents their targets and then keeps free while these agents will automatically achieve client's targets by their active interaction.Building multi-agent systems for large and distributed problems makes computer system more intelligent and further liberates people from their work.Agent-oriented software engineering makes programming more humanized with software designing complying with how people think.Agent based society simulation combines computer science and sociology, which makes computer technology penetrate into humanity science.It is convinced that Multi-agent computing can prosper computer technology.
     However,lots of effort should be made before multi-agent computing can really have a variety of outstanding properties as its concept says.As far as agent-based systems are concerned,agent construction,communication language designing, mechanisms of cooperation and coordination are three key problems to be solved urgently.Therein,ability to interact aiming at cooperation and coordination is the very point that distinguishes multi-agent computing from other computing modes. The same as human society,cooperation and coordination is an important method to solve large and complicated problems.This dissertation has actively investigated this issue,and made some achievements at some sub-directions.
     Research on cooperation of multiple agents concentrates on organization building,alliance forming,and task allocation.Organization and alliance are the infrastructure of multi-agent cooperation,while task allocation instantiates cooperation relationship among agents.Here,as for task allocation in multi-agent systems,considering agent topology and capability of different levels,on the basis of past task allocation algorithm in parallel computing we come up with two task allocation algorithms adaptable to agent topology among cooperative and heterogeneous agents.One tries to get optimal agent combination by brute-force searching on account of the two parameters topology and heterogeneity.The other gets suboptimal allocation scheme but with lower time complexity.In large scale multi-agent systems and when tasks arrive dynamically,above-mentioned algorithms seem incompetent.Hence,we go on with research on allocation of multiple task flows,and propose a Q-learning based distributed and self-adaptable algorithm.This algorithm can not only adapt to task arrival process on itself,but also fully consider the influence from task flows on other agents.Besides,its distributed property guaranteed that it can be applied to open multi-agent systems with local view.Reinforcement learning makes allocation adapt to system load and node distribution.It is verified that this algorithm improves task throughput,and decreases average execution time per task.
     As for coordination in multi-agent systems,related work can be divided into three parts,which are colony mental state models,multi-agent planning,and social laws. Each of them has their own advantage and effectiveness.Our work on this problem extends research on multi-agent planning.However,plans by our two models means action selection policy for achieving a certain target,rather than a series of actions in the traditional manner.Stochastic policy makes plans more flexible. Multi-agent learning is a promising method in obtaining action policy.In this dissertation we analyze conflict game which implies a competing relationship between agents arising frequently in multi-agent domains,define agent's optimal responding policy based on Nash equilibrium of matrix games,and then find such policy using reinforcement learning a model-free method.Policy by this model dramatically brings down the frequency of conflicts,enhancing coordination of agents' behaviors.Furthermore,in the view of long-term utility,policy is fair to some extent,in favor of system stability.In general-sum games,many algorithms are likely to be exploited and consequently acquire less utility.After examining time-related policy and adaptability,we believe that dynamic policy with the two important attributes helps agents make more rational decisions and responses maximizing their payoff,avoiding risk of being exploited in mixed multi-agent environment.
     Once agents are deployed and applied in large scale,agent society will become a special multi-agent system.Social attributes of agents become more and more important.Apart from mental states such as belief,desire,intention,personality will also play an important role in agent action selection.Modeling other agents on account of their personality benefits making more harmonious policy.Under this background,we put personality into action selection of agents,and based on qualitative decision theory build an individualized action selection model.Different qualitative decision making principles correspond to different personality.Selection according to these decision making principles leads to diversity of agents' actions. Furthermore,considering complexity and hardship on description of personality,and advantage that artificial neural network is capable of depicting functions difficult to understand,therefore a new individualized action selection model is proposed based on neural network.Compared with the one based on qualitative decision making theory,this model has stronger ability to describe personality,from extreme to subtle types.Besides,a simulation platform for multi-agent system is developed using tool kit SWARM which aims to model complex adaptive systems.And application of personality is investigated by a practical instance,making significance and realistic value of personality more explicit.Although the principle behind these models is simple,it is a new attempt and elementary exploration in another way to research on mental states of agents except for traditional symbol logic,making it possible to reflect chaos and complexity of agent society from multiple aspects.
     In sum,with cooperation and coordination in multi-agent systems as research subject,through broad investigation and deep exploration,this dissertation proposed several beneficial models and algorithms on task allocation,learning based behavior coordination,and individualized action selection as follows.
     Algorithm on task allocation adaptable to network topology among cooperative heterogeneous agents
     Algorithm on dynamic task allocation of multiple task flows based on Q-learning
     Mechanism on reinforcement learning for multi-agent conflict game based on regret value
     Mechanism on reinforcement learning of dynamic policy for general-sum game under mixed multi-agent environment
     Models on agent individualized action selection based on qualitative decision theory
     Models on agent individualized action selection based on artificial neural network
引文
[1]Gerhard Weiss.Multiagent systems:a modern approach to distributed artificial intelligence. Cambridge,Mass.:MIT Press,1999.
    [2]Hewit C.Viewing control structures as patterns of passing messages.Artificial Intelligence,1977,8(3):323-364.
    [3]Jiming Liu.Autonomous Agents and Multi-Agent Systems:Explorations in Learning,Self-Organization and Adaptive Computation.World Scientific Publishing,2001.
    [4]Jiming Liu,Xiaolong Jin,and Kwok Ching Tsui.Autonomy Oriented Computing:From Problem Solving to Complex Systems Modeling,Berlin:Springer-Verlag Heidelberg,2004.
    [5]Fan Yushun,Cao Junwei.Multi-Agent Systems:Theory,Method and Applications.Berlin:Springer-Verlag Heidelberg,2002.
    [6][英]Michael Wooldridge著.石纯一、张伟、徐晋晖等译.多Agent系统引论.北京:电子工业出版社,2003.
    [7]史忠植.智能主体及其应用.北京:科学出版社,2000.
    [8]何炎详,陈莘萌.Agent和多Agent系统的设计与应用.武汉:武汉大学出版社,2001.
    [9]张云勇.移动Agent及其应用.北京:清华大学出版社,2002.
    [10]Cyprian Foinjong Ngolah.A tutorial on agent communication and knowledge sharing.2002.http://www.enel.ucalgary.ca/People/far/Lectures/SENG60922/PDF/tutorials/2002/Agent_Commun ication and Knowledge_Sharing.pdf.
    [11]Horling,Bryan;and Lesser,Victor.A survey of multi-agent organizational paradigms.UMass Computer Science Technical Report 04-45,No.04-45,University of Massachusetts.May 2004.
    [12]R.G.Smith and R.Davis.Frameworks for cooperation in distributed problem solving.IEEE Transactions on Systems,Man,and Cybernetics,1981,11(1):61-70.
    [13]S.Kraus.Negotiation and cooperation in multi-agent environment.Artificial Intelligence.1997,pp.79-97.
    [14]Robert Neches,Richard Fikes,Tim Finin,et.al.Enabling technology for knowledge sharing.AI Magazine,1991,12(3):36-56.
    [15]Gray R S.Agent Tcl:A Flexible and Secure Mobile Agent System.In:Proceedings of the 4th Annual Tcl/Tk Workshop.1996,pp.9-23.
    [16]Lange D B,Oshima M.Programming and Deploying Mobile Agents with Aglets.Reading,MA:Addison-Wesley Publishing Company,1998.
    [17]Markus Straber,Joachin Baumann,Fritz Hohl.Mole:a Java-based Mobile Agent System.In:Proc of Special Issues in Object Oriented Programming.Linz.Austria:Dpunkt Verlag,1997,pp.301-308.
    [18]Wong D,Paciorek N,Walsh T et al.Concordia:An Infrastructure for Collaboration Mobile Agents In:Rothermel Kcd.In:Proceeding of the 1st International Workshop on Mobile Agents 97.Berlin:Springer Verlag,1997.
    [19]张冠群,陶先平,李先等.Mogent系统迁移机制的设计和实现.计算机研究与发展,2001,38(9):1035-1041.
    [20]Fritz Hohl.An approach to solve the problem of malicious hosts in mobile agent systems.Technical Report 1997/03,Fakultat fur Informatik,Universitat Stuttgart,March 1997.
    [21]Fritz Hohl.Time limited blackbox security:Protecting mobile agents from malicious hosts.Mobile Agents and Security,Giovanni Vigna(ed.),Springer-Varlag,1998,pp.92-113.
    [22]Fred B.Schneider.Towards fault-tolerant and secure agentry.Invited paper,In:Proceedings of 11th International Workshop on Distributed Algorithms,Saarbucken,Germany,Sept 1997,pp.119-129.
    [23]Miquel Montaner,Beatriz Lopez,et.al.Developing trust in recommender agents.In:Proceedings of Autonomous Agents and Multi-Agent Systems(AAMAS'02),Bologna,Italy,2002,pp.304-305.
    [1]Carl Hewitt,Viewing Control Structures as Patterns of Passing Messages,Artificial Intelligence,1977,vol.8(3):323-364.
    [2]A conversation with Marvin Minsky about agents,CACM,1994.
    [3]Y.Shoham,Time for action:on the relation between time,knowledge and action,In:Proceedings of the 11~(th) International Joint Conference on Artificial Intelligence,1989,pp:954-959.
    [4]K.Sycra,Multiagent systems,AI Magazine,1998,19(2):79-92.
    [5]N.R.Jennings,K.Sycrra and M.Wooldridge,A roadmap of agent research and development,Automous Agents and Multi-Agent System,1998,1(1):7-38.
    [6]M.Wooldridge,An Introduction to MultiAgent System.John Wiley & Sons,2002.
    [7]Miller M S,Drexler K E.The ecology of computation[M].Edited by B.A.Hubermann,North-Holland,Amsterdam,1988.
    [8]R.Davis,R.G.Smith,Negotiation as a metaphor for distributed problem solving,Artificial Intelligence,1983,vol.20,pp.63-100.
    [9]M.Wooldridge and N.R.Jennings,Intelligent agents:theory and practice,The Knowledge Engineering Review,1995,10(2):115-152.
    [10]R.A.Brooks,A robust layered control system for a mobile robot,IEEE Journal of Robotics and Automation,1986,2(1):14-23.
    [11]R.G.Smith and R.Davis,Frameworks for cooperation in distributed problem solving,IEEE Transactions on Systems,Man and Cybernetics,1980,vol.11(1).
    [12]M.Georgeff,Communication and interaction in multi-agent planning,National Conference Artificial Intelligence,1983,pp.125-129.
    [13]M.Georgeff,A theory of action for multi-agent planning,National Conference Artificial Intelligence,1984,pp.121-125.
    [14]D.D.Corkill,Hierarchical planning in a distributed environment,National Conference Artificial Intelligence,1979,pp.168-179.
    [15]J.S.Rosenschein and M.R.Genesereth,Communication and cooperation among logic-based agents,Proceedings of Computer Communication,1987,pp.594-600.
    [16]V.R.Lesser,A Retrospective View of FA/C Distributed Problem Solving,IEEE Transaction on Systems,Man and Cybernetics,1991,21(6):1347-1362.
    [17]V.R.Lesser and D.D.Corkill,Functionally accurate,cooperative distributed systems,IEEE Transaction on Systems,Man and Cybernetics,1981,vol.SMC- 11,pp.81-96.
    [18]E.H.Durfee,Planning in distributed artificial intelligence,In Foundations of Distributed Artificial Intelligence,1996,pp.231-245.
    [19]S.Parsons,C.A.Sierra,and N.R.Jennings,Agents that reason and negotiate by arguing,Journal of Logic and Computation,1998,8(3):261-292.
    [20]J.Fox,P.Krause and S.Ambler,Arguments,contradictions and practical reasoning,In Proceedings of the 10th European Conf.on Artificial Intelligence,1992,pp.623-627.
    [21]J.S.Rosenschein,Rational Interaction:Cooperation among Intelligent Agents,PhD thesis,Computer Science Department,Stanford University,CA 4305,1985.
    [22]P.J.Gmytrasiewicz and E.H.Durfee,Rational Coordination in Multi-agent Environments,Autonomous Agents and Multi-agent Systems,2000,vol.3:319-350.
    [23]Y.Shoham and M.Tennenholtz,Emergent conventions in multi-agent systems,In Proceedings of Knowledge Representation and Reasoning,1992,pp.225-231.
    [24]Y.Shoham and M.Tennenholtz,On social laws for artificial agent societies:off-line design,In Proceedings of Knowledge Representation and Reasoning,1992,pp.225-231.
    [25]F Zambonelle,N.R Jennings,M.Wooldridge.Organizational abstraction for the analysis and design of multi-agent systems.In:Proc.of 1~(st) Intl.Workshop on AOSE,2000,pp.127-141.
    [26]J.Ferber,O Gutnetcht.A meta-model for the analysis and design of organization in multi-agent systems.In:Proc.3~(rd) Intl.Conf.on MAS.1998,pp.258-266.
    [27]H.V.D.Parunak,J.J.Odell,Representing social structures in UML[C],In:Proc of the 2st Intl Workshop on AOSE,2002,pp.1216.
    [28]V.Dignum,H Weigand,L.Xu,Agent societies:towards frameworks-based design[C],In:Proc of the 2st Intl Workshop on AOSE,2002,pp.33-49.
    [29]G.Cabri,Role-based infrastructures for agents[C],In:Proc of the 8st IEEE Workshop on FTDCS, 2001,pp.210-214.
    [30]T.Sandholm,V.Lesser.Coalitions among Bounded Rational Agens.IJCAI,1995.
    [31]G.Zlokin,J.Rosenschein.Mechanisms for Automated Negotiation in State Oriednted Domains.Journal of Artificial Intelligence Research,1996.
    [32]Shehory Onn,Kraus Sarit.A kernel-oriented model for coalition formation in general environment:Implementation and results.AAAI-96 1996,pp.134-140.
    [33]Sandholm Tuomas,Larson Kate,Andersson Martin,Shehory Onn,Tohme Fernando.Anytime coalition structure generation with worst case guarantees.AAAI-98,1998,pp.46-53.
    [34]王玥,陈世福.基于多Agent的Teamwork研究综述.计算机科学,2002,第29卷,第10期,38-42.
    [35]李静,陈兆乾,陈世福,徐殿祥.多Agent Teamwork研究综述.计算机研究与发展,2003,第40卷,第3期,422-429.
    [36]李海刚,吴启迪.多Agent系统研究综述.同济大学学报,2003,第31卷,第6期,728-732.
    [37]李庆华,张红君.开放Agent社会的框架模型研究综述.计算机科学,2005,第32卷,第7期,137-141
    [38]Y.Shoham,Agent-Oriented Programming:An overview and summary of recent research,In:Proc of Artificial Intelligence,1992.
    [39]Y.Shoham,Moshe Tennenholtz.On social laws for artificial agent societies:off-line design.Artificial Intelligence,1995,vol.73:231-252.
    [40]Fitoussi D,Tennenholtz M.Choosing social law for multia gent systems:minimality and simplicity[J].Artificial Intelligence,2000,119(1):61 - 101.
    [1]F.Buccaaafurri,D.Rosaci,G.M.L.Sarne,L.Palopoli.Modeling cooperation in multi-agent communities.Cognitive Systems Research,2004,vol.5:171-190.
    [2]Sarit Kraus,Tatjana Plotkin.Algorithms of distributed task allocation for cooperative agents.Theoretical Computer Science,2000,vol.242:1-27.
    [3]C.B.Excelente-Toledo,N.R.Jennings.The dynamic selection of coordination mechanisms.Autonmous Agents and multi-Agent Systems,2004,vol.9:55-85.
    [4]Yi-Chuan Jiang,J.C.Jiang.A multi-agent coordination model for the variation of underlying network topology.Expert Systems with Applications,2005,vol.29:372-382.
    [5]J.B.Sinclair.Efficient computation of optimal assignments for distributed tasks.Parallel Distributed Computer,1987,vol.4:342-362.
    [6]A.Billionnet,M.C.Costa,A.Sutter.An efficient algorithm for a task allocation problem.Journal of ACM,1992,vol.39:502-518.
    [7]P.R.Ma,E.Lee,M.Tsuchiya.A task allocation model for distributed computing systems,IEEE Transactions.Computers,1982,C-31 1:41-59.
    [8]A.H.Bond,L.Gasser.An analysis of problems and research in DAI.Readings in Distributed Artificial intelligence,1988,pp.3-35.
    [9]M.Georgeff.Communication and interaction in multi-agent planning.Proceedings of national conference on artificial intelligence,1983,pp.125-129.
    [10]K.H.Chang,S.Phiphobmongkol,W.B.Day.An agent-oriented multi-agent planning system.Proceedings of the 1993 ACM conference on computer science,1993,pp.107-114.
    [11]K.H.Chang,W.B.Day.Adaptive multi-agent planning in a distributed environment.Proceedings of the 3rd international conference on industrial and engineering applications of artificial intelligence and expert systems,1990,vol.2:828-837.
    [12]R.G.Smith.The contract net protocol:high-level communication and control in a distributed problem solver.IEEE Transactions on Computers,1980,C-29:1104-1113.
    [13]T.Sandholm.An implementation of the contract net protocol based on marginal cost calculations.Proceedings of the 12th international workshop on distributed artificial intelligence,1993,pp.295-308.
    [14]T.W.Malone,R.E.Fikes,M.T.Grant.Enterprise:a market-like task schedule for distributed computing environments.The Ecology of Computation,1988,pp.177-205.
    [15]D.Palmer,M.Kirschenbaum,J.Murtom,K.Zajac.Decentralized cooperative auction for multiple agent task allocation using synchronized random number generators.International conference on intelligent robots and systems,2003.
    [16]H.EI-Rewini,T.G.Lewis.Scheduling parallel program tasks onto arbitrary target machines.Journal of Parallel and Distributed Computing,1990,vol.9,pp.138-153.
    [17]Hesham Ali,Hesham EI-Rewini.Task Allocation in Distributed Systems.Journal of Combinatorial Mathematics and Combinatorial Computing,1993,vol.14,pp.15-32.
    [18]Yoav Shoham,Rob Powers,Troud Grenager.Multi-Agent Reinforcement Learning:A Critical Survey.Technical report,Stanford University,2003.
    [19]仲宇,顾国昌,张汝波.多智能体系统中的颁布式强化学习研究现状.控制理论与应用,2003,20(3):317-322.
    [20]Hosam Hanna,Abdel-lllah Mouaddib.Task Selection Problem under Uncertainty as Decision-Making.In:Proc.of International Conference on Autonomous Agent and Multi-Agent System(AAMAS),2002,pp.1303-1308.
    [21]Abdallah Sherief,Lesser Victor.Modeling Task Allocation Using a Decision Theoretic Model.In:Proc.of Fourth International Joint Conference on Autonomous Agents and Multi-agent Systems,ACM Press,2005,pp.719-726.
    [22]Abdallah Sherief,Lesser Victor.Learning Task Allocation via Multi-Level Policy Gradient Algorithm with Dynamic Learning Rate.In:Proceedings of Workshop on Planning and Learning in A Priori Unknown or Dynamic Domains,the International Joint Conference on Artificial Intelligence,IJCAI,2005,pp.76-82.
    [23]Abdallah Sherief,Lesser Victor.Learning the Task Allocation Game.In:Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems,ACM Press,2006,pp.850-857.
    [24]Mailler Roger,Lesser Victor.A Cooperative Mediation-Based Protocol for Dynamic,Distributed Resource Allocation.IEEE Transaction on Systems,Man,and Cybernetics,Part C,Special Issue on Game-theoretic Analysis and Stochastic Simulation of Negotiation Agents,IEEE Press,2006,Volume 36,Number 1,pp.80-91.
    [25]Krainin Michael,An Bo,Lesser Victor.An Application of Automated Negotiation to Distributed Task Allocation.IEEE/WIC/ACM International Conference on Intelligent Agent Technology(IAT 2007),IEEE Computer Society Press,2007,pp.138-145.
    [26]马巧云,洪流,陈学广.多Agent系统中任务分配问题的分析与建模.华中科技大学学报,2007,35(1):54-57.
    [27]J.Schneider,W.K.Wong,A.Moore,M.Riedmiller.Distributed Value Functions.In:Proceedings of the Sixteenth International Conference on Machine Learning,1999,pp.371-378.
    [1]M.Tan.Multi-agent Reinforcement Learning:Independent vs.Cooperative Agents.In:Proc of the 10th International Conference on Machine Learning,Amberst,MA,1993,pp.330-337.
    [2]S.Sen,M.Sekaran,J.Hale.Learning to coordinate without sharing information,In:Proceedings of the National Conference on Artificial Intelligence,Seattle,Washington,United States,1994,pp.426-431.
    [3]Pradeep M Pappachan.An MDP-based Policy for Stochastic Multi-agent Domains.In:Proc of the International Conference on Systems,Man,and Cybernetics(SMC'99),Tokyo,Japan,1999,pp.464-468.
    [4]Littman M L.Markov games as a framework for multi-agent reinforcement learning.In:Proc of 11th International Conference on Machine Learning,San Francisco,CA:Morgan Kaufmann,1994,pp.157-163.
    [5]Littman M L.Friend-or-foe Q-learning in general-sum games.In:Proc of the Eighteenth International Conference on Machine Learning,Williams College,MA,2001,pp.322-328.
    [6]Caroline Claus and Craig Boutilier,The Dynamics of Reinforcement Learning in Cooperative Multi-agent Systems,In:Proc of the Fifteenth National Conference on Artificial Intelligence,Cambridge,MA,1998,pp.746-752.
    [7]J.Hu and P.Wellman,Multi-agent reinforcement learning:Theoretical framework and an algorithm,In:Proc of the Fifteenth International Conference on Machine Learning,Madision Wisconsin,1998,pp.242-250.
    [8]Junling Hu and Michael P.Wellman,Nash-Q Learning for General-sum Stochastic Games,Journal of Machine Learning Research,2003,4(6):1039-1069.
    [9]Bowling M,Veloso M.Rational and Convergent Learning in Stochastic Games.In:Proc of the 17th International Joint Conference on Artificial Intelligence,Seattle,2001,pp.1021-1026.
    [10]Greenwald A,Keith Hall.Correlated Q-Learning.In:Proc of the 20th Int Conf on Machine Learning,Washington,1998,pp.242-250.
    [11]Bowling M,Veloso M.Multi-agent Learning using a Variable Learning Rate.Artificial Intelligence,2002,136(2):215-250.
    [12]Banerjee B,Jing Peng.Adaptive Policy gradient in Multi-agent Learning.In:Proc of Autonomous Agent and Multi-Agent System(AAMAS),Melbourne,2003,pp.686-692.
    [13]Daniel M R,Michael P W.Computing Best Response Strategies in Infinite Games of Incomplete Information.In:Proc of the 20th Conference on Uncertainty in Artificial Intelligence,Banff,2004,pp.470-478.
    [14]G.Brown.Iterative solution of games by fictitious play.Activity Analysis of Production and Allocation.New York,John Wiley and Sons,1951,pp.374-376.
    [15]Y.Shoham,R.Powers,and T.Grenager.Multi-agent reinforcement learning:a critical survey.Technical report,Stanford University,Stanford,CA,2003.
    [16]Rasmusen E.Games and Information:An Introduction to Game Theory.2nd ed.,Beijing:Beijing University Press,2003.
    [17]高阳,周志华,何加洲,陈世福,基于Markov对策的多Agent强化学习模型及算法研究,计算机研究与发展,2000,37(3):257-263.
    [18]张双民,石纯一.一种基于特征向量提取的FMDP模型求解方法,软件学报,2005,16(5):733-743.
    [19]Singn S,Kearns M,Mansour Y.Nash Convergence of Gradient Dynamics in General-sum Games[C].Proc.of the 16th Conference on Uncertainty in Artificial intelligence,Stanford,2000,pp.541-548.
    [20]O.Gies,B.Chaib-draa,Apprentissage de la coordination multiagent:une methode basee sur le Q-learningpar jeu adaptatif[J].Revue d'Intelligence Artificielle,2006,20(2-3):385-412.
    [21]Gerald Tesauro.Extending Q-learning to general adaptive multi-agent systems[J].Advances in Neural Information Processing System,2004,vol.16.
    [22]Andriy Burkov,Brahim-draa.Multi-agent Learning in Adaptive Dynamic Systems[C].Autonomous Agent and Multi- agent System(AAMAS),Hawai'I,2007.
    [1]Cyril Brom and Joanna Bryson,Action selection for Intelligent Systems,white paper for The European Network for the Advancement of Artificial Cognitive Systems,2006.
    [2]Dolgov D A,Durfee E H.Resource Allocation and Policy Formulation for Multiple Resource-Limited Agents under Uncertainty.In:Proceedings of the International Conference on Automated Planning and Scheduling,2004.
    [3]Dolgov D A,Durfee E H.Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards,Costs,and Discount Factors.In:Proceedings of the ninth International Joint Conference on Artificial Intelligence,2005.
    [4]Atkinson K.,Bench-Capon T.,and McBurney P.,Agent Decision Making Using Argumentation About Actions,Technical Report ULCS-05- 006.Computer Science Department:University of Liverpool,2005.
    [5]Anne M.P.Canuto,Andre M.C.Campos,Joao Carlo Alchiere,etc.A personality-based model of agents for representing individuals in working organizations.International Conference on Intelligent Agent Technology,2005.
    [6]Andre M.C.Campos,Emanuel B.Santos,Anne M.P.Canuto,etc.A flexible framework for representing personality in agents.International Conference on Autonomous Agent and Multi-Agent System,2006.
    [7]Parsons S.,Wooldridge M.,Game Theory and Decision Theory in Multi-Agent Systems,Autonomous Agents and Multi-Agent Systems,2002,vol.5.
    [8]Gmytrasiewicz P J.,Durfee E H.,Rational Coordination in Multi-Agent Environments,Autonomous Agents and Multi-Agent Systems,2000,vol.3:319-350.
    [9]Brafman R I.,Tennenholtz M.,Modeling Agents as Qualitative Decision Makers,Artificial Intelligence,1997,vol.94,n.1-2,pp.217-268.
    [10]Lute R D.,Raiffa H.,Games and Decisions,John Wiley & Sons,New York,1957.
    [11]Lees Michael,A History of The Tileworld Agent Testbed,Technical Report NOTTCS-WP-2002-1,University of Nottingham,2002.
    [12]Camurri A,Coglio A,An architecture for emotional agents,IEEE Multimedia,1998,5(4):24-33.
    [13]Bozinovski S,Bozinovska L,Self-learning agents:a connectionist theory of emotion based on crossbar value judgement,Cybernetics and Systems,2001,32:637-669.
    [14]Kubota N,Kojima F,Fukuda T,Self-Consciousness and Emotion for A Pet Robot with Structured Intelligence,In:Proceedings of 9~(th) Joint Conf.of IFSA World Congress and 20~(th) NAFIPS Intl.Conf.,2001,5:2786-2791.
    [15]Kitamura T,An architecture of emotion-based behavior selection for mobile robots.Cybernetics and Systems,2001,32:671-690.
    [16]Gadanho S C,Hallam J,Emotion-triggered learning in autonomous robot control,Cybernetics and Systems,2001,32:531-559.
    [17]徐晋晖,张伟,路海明,石纯一.一种具有个性的Agent实现机制.计算机研究与发展,2001,第38卷,第6期.
    [18]G.Cybenko,Continuous valued neural networks with two hidden layers are sufficient.Technical Report.Department of Computer Science,Tufts University,MA.1988.
    [19]Jack Breese,Gene Ball.Modeling Emotinal State and Personality for Conersational Agents.Technical Report,Advanced Technology Division,Mircosoft Corporation,1998.
    [20]社会经济系统仿真平台首页 http://www.swarm.com.cn.
    [21]Swarm 官方网站http://www.swarm.org.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700