Abstract

Policy Hill Climbing (PHC) is a reinforcement learning algorithm that extends Q-learning to learn probabilistic policies for multi-agent games. WoLF-PHC extends PHC with the "win or learn fast" principle. A proof that PHC will diverge in self-play when playing Shapley's game is given, and WoLF-PHC is shown empirically to diverge as well. Various WoLF-PHC based modifications were created, evaluated, and compared in an attempt to obtain convergence to the single shot Nash equilibrium when playing Shapley's game in self-play without using more information than WoLF-PHC uses. Partial Commitment WoLF-PHC (PCWoLF-PHC), which performs best on Shapley's game, is tested on other matrix games and shown to produce satisfactory results.

Degree

MS

College and Department

Physical and Mathematical Sciences; Computer Science

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2007-09-27

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd2109

Keywords

Matrix Games, Multi-agent Systems, Nash equilibrium, convergence, reinforcement learning, machine learning, algorithm

Share

COinS