INDIGO Home University of Illinois at Urbana-Champaign logo uic building uic pavilion uic student center

Multiagent Stochastic Planning with Bayesian Policy Recognition

Show full item record

Bookmark or cite this item: http://hdl.handle.net/10027/20865

Files in this item

File Description Format
PDF Panella_Alessandro.pdf (921KB) (no description provided) PDF
Title: Multiagent Stochastic Planning with Bayesian Policy Recognition
Author(s): Panella, Alessandro
Advisor(s): Gmytrasiewicz, Piotr
Contributor(s): Berger-Wolf, Tanya; Di Eugenio, Barbara; Martin, Ryan; Ziebart, Brian
Department / Program: Computer Science
Graduate Major: Computer Science
Degree Granting Institution: University of Illinois at Chicago
Degree: PhD, Doctor of Philosophy
Genre: Doctoral
Subject(s): Artificial Intelligence Multiagent Systems Opponent Modeling Stochastic Planning Bayesian Inference Bayesian Nonparametrics
Abstract: We consider an autonomous agent facing a stochastic, partially observable, multiagent environment. In order to compute an optimal plan, the agent must accurately predict the actions of the other agents, since they influence the state of the environment and ultimately the agent's utility. To do so, we propose a special case of interactive partially observable Markov decision process (I-POMDP), in which the agent does not explicitly model the other agents' beliefs and intentions, and instead models the other agents as stochastic processes implemented by probabilistic deterministic finite state controllers (PDFCs). The agent maintains a probability distribution over the PDFC models of the other agents, and updates this distribution using Bayesian inference. Since the number of nodes of these PDFCs is unknown and unbounded, the agent places a Bayesian nonparametric prior distribution over the infinitely dimensional set of PDFCs. This allows the size of the learned models to adapt to the complexity of the observed behavior. Deriving the posterior distribution is in this case too complex to be amenable to analytical computation; therefore, we provide a Markov chain Monte Carlo (MCMC) algorithm that approximates the posterior beliefs over the other agents PDFCs, given a sequence of (possibly imperfect) observations about their behavior. Experimental results show how the learned models converge behaviorally to the true ones. Moreover, we describe how the learned PDFCs can be embedded in the learning agent's own decision making process. We consider two settings, one in which the agent first learns, then interacts with other agents, and one in which learning and planning are interleaved. We show how the agent's performance increases as a result of learning in both situations. Moreover, we analyze the dynamics that ensue when two agents are simultaneously learning about each other while interacting, showing in an example environment that coordination emerges naturally from our approach. Moreover, we demonstrate how an agent can exploit the learned models to complement its possibly noisy observations about the environment.
Issue Date: 2016-07-01
Genre: thesis
URI: http://hdl.handle.net/10027/20865
Rights Information: Copyright 2016 Alessandro Panella
Date Available in INDIGO: 2016-07-01
Date Deposited: 2016-05
 

This item appears in the following Collection(s)

Show full item record

Statistics

Country Code Views
United States of America 105
China 54
Russian Federation 34
Germany 6
Switzerland 2

Browse

My Account

Information

Access Key