Abstract
Reward shaping has been proposed as a means to address the
credit assignment problem in Multi-Agent Systems (MAS).
Two popular shaping methods are Potential-Based Reward
Shaping and di erence rewards, and both have been shown
to improve learning speed and the quality of joint policies
learned by agents in single-objective MAS. In this work we
discuss the theoretical implications of applying these approaches
to multi-objective MAS, and evaluate their e -
cacy using a new multi-objective benchmark domain where
the true set of Pareto optimal system utilities is known.