Abstract

One design strategy for developing intelligent agents is to create N distinct behaviors. At each time step during task execution, the agent, or bandit, chooses which of the N behaviors to use. Traditional bandit algorithms often (1) assume the environment is stationary, (2) focus on asymptotic performance, and (3) do not incorporate external information available to them. Each of these simplifications limits these algorithms such that they often cannot be used successfully in practice. In this paper, we propose a new algorithm, called AlegAATr, as a step toward overcoming these deficiencies. AlegAATr utilizes a technique called Assumption-Alignment Tracking (AAT), proposed previously in the robotics literature, to predict the performance of each behavior. We demonstrate the effectiveness of AlegAATr in selecting behaviors in three domains: repeated games, ad hoc teamwork, and a robot pick-n-place task.

Degree

College and Department

Physical and Mathematical Sciences

Rights

https://lib.byu.edu/about/copyright/