Policy Hill Climbing (PHC) is a reinforcement learning algorithm that extends Q-learning to learn probabilistic policies for multi-agent games. WoLF-PHC extends PHC with the "win or learn fast" principle. A proof that PHC will diverge in self-play when playing Shapley's game is given, and WoLF-PHC is shown empirically to diverge as well. Various WoLF-PHC based modifications were created, evaluated, and compared in an attempt to obtain convergence to the single shot Nash equilibrium when playing Shapley's game in self-play without using more information than WoLF-PHC uses. Partial Commitment WoLF-PHC (PCWoLF-PHC), which performs best on Shapley's game, is tested on other matrix games and shown to produce satisfactory results.
College and Department
Physical and Mathematical Sciences; Computer Science
BYU ScholarsArchive Citation
Cook, Philip R., "Limitations and Extensions of the WoLF-PHC Algorithm" (2007). Theses and Dissertations. 1222.
Matrix Games, Multi-agent Systems, Nash equilibrium, convergence, reinforcement learning, machine learning, algorithm