Abstract

Value iteration is not typically considered a viable algorithm for solving large-scale MDPs because it converges too slowly. However, its performance can be dramatically improved by eliminating redundant or useless backups, and by backing up states in the right order. We present several methods designed to help structure value dependency, and present a systematic study of companion prioritization techniques which focus computation in useful regions of the state space. In order to scale to solve ever larger problems, we evaluate all enhancements and methods in the context of parallelizability. Using the enhancements, we discover that in many instances the limiting factor of the algorithms is no longer time, but space. We thus evaluate all metrics and decisions with respect to cache performance. We generate a family of algorithms by combining several of the methods discussed, and present empirical evidence demonstrating that performance can improve by several orders of magnitude for real-world problems, while preserving accuracy and convergence guarantees.

Degree

MS

College and Department

Physical and Mathematical Sciences; Computer Science

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2004-06-14

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd437

Keywords

Machine learning, reinforcement learning, value iteration, Markov Decision Processes

Share

COinS