dc.contributor.author | Mannion, Patrick | |
dc.contributor.author | Devlin, Sam | |
dc.contributor.author | Mason, Karl | |
dc.contributor.author | Duggan, Jim | |
dc.date.accessioned | 2018-12-19T16:14:25Z | |
dc.date.available | 2018-12-19T16:14:25Z | |
dc.date.copyright | 2017 | |
dc.date.issued | 2017 | |
dc.identifier.uri | https://research.thea.ie/handle/20.500.12065/2390 | |
dc.description.abstract | Reinforcement Learning (RL) is a powerful and well-studied Machine Learning
paradigm, where an agent learns to improve its performance in an environment
by maximising a reward signal. In multi-objective Reinforcement
Learning (MORL) the reward signal is a vector, where each component represents
the performance on a di erent objective. Reward shaping is a wellestablished
family of techniques that have been successfully used to improve
the performance and learning speed of RL agents in single-objective problems.
The basic premise of reward shaping is to add an additional shaping
reward to the reward naturally received from the environment, to incorporate
domain knowledge and guide an agent's exploration. Potential-Based
Reward Shaping (PBRS) is a speci c form of reward shaping that o ers additional
guarantees. In this paper, we extend the theoretical guarantees of
PBRS to MORL problems. Speci cally, we provide theoretical proof that
PBRS does not alter the true Pareto front in both single- and multi-agent
MORL. We also contribute the rst published empirical studies of the e ect
of PBRS in single- and multi-agent MORL problems. | en_US |
dc.format | Pdf | en_US |
dc.language.iso | en | en_US |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 Ireland | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/ie/ | * |
dc.subject | Reinforcement Learning | en_US |
dc.subject | Multi-Objective | en_US |
dc.subject | Potential-Based | en_US |
dc.subject | Reward Shaping | en_US |
dc.subject | Multi-Agent Systems | en_US |
dc.title | Policy Invariance under Reward Transformations for Multi-Objective Reinforcement Learning | en_US |
dc.type | Article | en_US |
dc.description.peerreview | yes | en_US |
dc.rights.access | Copyright | en_US |
dc.subject.department | Department of Computer Science & Applied Physics | en_US |