Abstract
Policy Hill Climbing (PHC) is a reinforcement learning algorithm that extends Q-learning to learn probabilistic policies for multi-agent games. WoLF-PHC extends PHC with the "win or learn fast" principle. A proof that PHC will diverge in self-play when playing Shapley's game is given, and WoLF-PHC is shown empirically to diverge as well. Various WoLF-PHC based modifications were created, evaluated, and compared in an attempt to obtain convergence to the single shot Nash equilibrium when playing Shapley's game in self-play without using more information than WoLF-PHC uses. Partial Commitment WoLF-PHC (PCWoLF-PHC), which performs best on Shapley's game, is tested on other matrix games and shown to produce satisfactory results.
Degree
MS
College and Department
Physical and Mathematical Sciences; Computer Science
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Cook, Philip R., "Limitations and Extensions of the WoLF-PHC Algorithm" (2007). Theses and Dissertations. 1222.
https://scholarsarchive.byu.edu/etd/1222
Date Submitted
2007-09-27
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd2109
Keywords
Matrix Games, Multi-agent Systems, Nash equilibrium, convergence, reinforcement learning, machine learning, algorithm
Language
English