Policy Invariance under Reward Transformations for Multi-Objective Reinforcement Learning
View/ Open
Date
2017Author
Mannion, Patrick
Devlin, Sam
Mason, Karl
Duggan, Jim
Metadata
Show full item recordAbstract
Reinforcement Learning (RL) is a powerful and well-studied Machine Learning
paradigm, where an agent learns to improve its performance in an environment
by maximising a reward signal. In multi-objective Reinforcement
Learning (MORL) the reward signal is a vector, where each component represents
the performance on a di erent objective. Reward shaping is a wellestablished
family of techniques that have been successfully used to improve
the performance and learning speed of RL agents in single-objective problems.
The basic premise of reward shaping is to add an additional shaping
reward to the reward naturally received from the environment, to incorporate
domain knowledge and guide an agent's exploration. Potential-Based
Reward Shaping (PBRS) is a speci c form of reward shaping that o ers additional
guarantees. In this paper, we extend the theoretical guarantees of
PBRS to MORL problems. Speci cally, we provide theoretical proof that
PBRS does not alter the true Pareto front in both single- and multi-agent
MORL. We also contribute the rst published empirical studies of the e ect
of PBRS in single- and multi-agent MORL problems.
Collections
The following license files are associated with this item: