An ambitious goal for machine learning is to create agents that behave
ethically: The capacity to abide by human moral norms would greatly expand the
context in which autonomous agents could be practically and safely deployed,
e.g. fully autonomous vehicles will encounter charged moral decisions that
complicate their deployment. While ethical agents could be trained by rewarding
correct behavior under a specific moral theory (e.g. utilitarianism), there
remains widespread disagreement about the nature of morality. Acknowledging
such disagreement, recent work in moral philosophy proposes that ethical
behavior requires acting under moral uncertainty, i.e. to take into account
when acting that one's credence is split across several plausible ethical
theories. This paper translates such insights to the field of reinforcement
learning, proposes two training methods that realize different points among
competing desiderata, and trains agents in simple environments to act under
moral uncertainty. The results illustrate (1) how such uncertainty can help
curb extreme behavior from commitment to single theories and (2) several
technical complications arising from attempting to ground moral philosophy in
RL (e.g. how can a principled trade-off between two competing but incomparable
reward functions be reached). The aim is to catalyze progress towards
morally-competent agents and highlight the potential of RL to contribute
towards the computational grounding of moral philosophy.
37
0
Reinforcement Learning Under Moral Uncertainty
attributed to: Adrien Ecoffet, Joel Lehman
An ambitious goal for machine learning is to create agents that behave
ethically: The capacity to abide by human moral norms would greatly expand the
context in which autonomous agents could be practically and safely deployed,
e.g. fully autonomous vehicles will encounter charged moral decisions that
complicate their deployment. While ethical agents could be trained by rewarding
correct behavior under a specific moral theory (e.g. utilitarianism), there
remains widespread disagreement about the nature of morality. Acknowledging
such disagreement, recent work in moral philosophy proposes that ethical
behavior requires acting under moral uncertainty, i.e. to take into account
when acting that one's credence is split across several plausible ethical
theories. This paper translates such insights to the field of reinforcement
learning, proposes two training methods that realize different points among
competing desiderata, and trains agents in simple environments to act under
moral uncertainty.
0
Vulnerabilities & Strengths