Abstract
A common challenge in AI is that of an agent possessing N distinct behaviors, each of which works effectively in certain tasks and circumstances, to choose from at any particular time. A popular approach to learn optimal selections in such contexts is found in bandit algorithms. However, traditional bandits often (1) assume the environment is stationary, (2) focus on asymptotic performance, and (3) do not incorporate available external information about the environment. While contextual bandits that do consider external information are designed to combat one of these challenges, the others remain problematic in realistic domains. This is especially true of non-stationary domains, such as multi-agent environments (where agents adapt their behavior to each other), as most bandits assume that regret minimization, a reasonable goal in stationary environments, is always valid. Furthermore, existing contextual bandits are not without their own difficulties: they typically either assume one bandit per context (which is not feasible in domains with large state spaces) or employ expert advice without consideration of how the current state might impact expert performance. To help combat these issues, we explored the use of Assumption-Alignment Tracking (AAT) to design contextual bandit algorithms that are successful in complex and varied multi-agent domains. Specifically, this dissertation is composed of four studies. The first three study the ability of an AAT-based bandit to achieve considerable performance in small, multi-agent systems; a complex, real-world, zero-sum domain; and a complex, general-sum, collective-action game. The third study also explores the impact of a bandit's learning framework. The fourth and final study explores improvements to the AAT design process, which, while successful, can be time intensive and demanding. Specifically, the fourth study explores how to automate the design process to boost simplicity and ease of implementation.
Degree
PhD
College and Department
Computer Science; Computational, Mathematical, and Physical Sciences
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Pedersen, Ethan, "Contextual Bandits for Non-Stationary Multi-Agent Environments Using Assumptions-Alignment Tracking" (2025). Theses and Dissertations. 11054.
https://scholarsarchive.byu.edu/etd/11054
Date Submitted
2025-11-13
Document Type
Dissertation
Keywords
bandit algorithms, contextual bandits, bandits with expert advice, proficiency self-assessment, multi-agent domains
Language
english