Analysing the Effects of Reward Shaping in Multi-Objective Stochastic Games
View/ Open
Date
2017-05Author
Mannion, Patrick
Duggan, Jim
Howley, Enda
Metadata
Show full item recordAbstract
The majority ofMulti-Agent Reinforcement Learning (MARL)
implementations aim to optimise systems with respect to a
single objective, despite the fact that many real world problems
are inherently multi-objective in nature. Research into
multi-objective MARL is still in its infancy, and few studies
to date have dealt with the issue of credit assignment. Reward
shaping has been proposed as a means to address the
credit assignment problem in single-objective MARL, however
it has been shown to alter the intended goals of the
domain if misused, leading to unintended behaviour. Two
popular shaping methods are Potential-Based Reward Shaping
and di erence rewards, and both have been repeatedly
shown to improve learning speed and the quality of joint
policies learned by agents in single-objective problems. In
this work we discuss the theoretical implications of applying
these approaches to multi-objective problems, and evaluate
their e cacy using a new multi-objective benchmark domain
where the true Pareto optimal system utilities are known.
Our work provides the rst empirical evidence that agents
using these shaping methodologies can sample true Pareto
optimal solutions in multi-objective Stochastic Games.
Collections
The following license files are associated with this item: