Abstract
One design strategy for developing intelligent agents is to create N distinct behaviors. At each time step during task execution, the agent, or bandit, chooses which of the N behaviors to use. Traditional bandit algorithms often (1) assume the environment is stationary, (2) focus on asymptotic performance, and (3) do not incorporate external information available to them. Each of these simplifications limits these algorithms such that they often cannot be used successfully in practice. In this paper, we propose a new algorithm, called AlegAATr, as a step toward overcoming these deficiencies. AlegAATr utilizes a technique called Assumption-Alignment Tracking (AAT), proposed previously in the robotics literature, to predict the performance of each behavior. We demonstrate the effectiveness of AlegAATr in selecting behaviors in three domains: repeated games, ad hoc teamwork, and a robot pick-n-place task.
Degree
MS
College and Department
Physical and Mathematical Sciences
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Pedersen, Ethan, "AlegAATr the Bandit" (2023). Theses and Dissertations. 9829.
https://scholarsarchive.byu.edu/etd/9829
Date Submitted
2023-03-01
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd12667
Keywords
Bandit algorithms, proficiency self-assessment, repeated games, ad hoc teams, human-robot interaction
Language
english