Policy Invariance under Reward Transformations for Multi-Objective Reinforcement Learning

Mannion, Patrick; Devlin, Sam; Mason, Karl; Duggan, Jim

View/Open

Article (525.6Kb)

Date

2017

Author

Mannion, Patrick

Devlin, Sam

Mason, Karl

Duggan, Jim

Metadata

Show full item record

Abstract

Reinforcement Learning (RL) is a powerful and well-studied Machine Learning paradigm, where an agent learns to improve its performance in an environment by maximising a reward signal. In multi-objective Reinforcement Learning (MORL) the reward signal is a vector, where each component represents the performance on a di erent objective. Reward shaping is a wellestablished family of techniques that have been successfully used to improve the performance and learning speed of RL agents in single-objective problems. The basic premise of reward shaping is to add an additional shaping reward to the reward naturally received from the environment, to incorporate domain knowledge and guide an agent's exploration. Potential-Based Reward Shaping (PBRS) is a speci c form of reward shaping that o ers additional guarantees. In this paper, we extend the theoretical guarantees of PBRS to MORL problems. Speci cally, we provide theoretical proof that PBRS does not alter the true Pareto front in both single- and multi-agent MORL. We also contribute the rst published empirical studies of the e ect of PBRS in single- and multi-agent MORL problems.

URI

https://research.thea.ie/handle/20.500.12065/2390

Collections

Other - School of Science, ATU Galway City [8]

The following license files are associated with this item:

Creative Commons

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland