Reactions such as gestures, facial expressions, and vocalizations are an
abundant, naturally occurring channel of information that humans provide during
interactions. A robot or other agent could leverage an understanding of such
implicit human feedback to improve its task performance at no cost to the
human. This approach contrasts with common agent teaching methods based on
demonstrations, critiques, or other guidance that need to be attentively and
intentionally provided. In this paper, we first define the general problem of
learning from implicit human feedback and then propose to address this problem
through a novel data-driven framework, EMPATHIC. This two-stage method consists
of (1) mapping implicit human feedback to relevant task statistics such as
reward, optimality, and advantage; and (2) using such a mapping to learn a
task. We instantiate the first stage and three second-stage evaluations of the
learned mapping. To do so, we collect a dataset of human facial reactions while
participants observe an agent execute a sub-optimal policy for a prescribed
training task. We train a deep neural network on this data and demonstrate its
ability to (1) infer relative reward ranking of events in the training task
from prerecorded human facial reactions; (2) improve the policy of an agent in
the training task using live human facial reactions; and (3) transfer to a
novel domain in which it evaluates robot manipulation trajectories.
32
0
The EMPATHIC Framework for Task Learning from Implicit Human Feedback
attributed to: Yuchen Cui, Qiping Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox
Reactions such as gestures, facial expressions, and vocalizations are an
abundant, naturally occurring channel of information that humans provide during
interactions. A robot or other agent could leverage an understanding of such
implicit human feedback to improve its task performance at no cost to the
human. This approach contrasts with common agent teaching methods based on
demonstrations, critiques, or other guidance that need to be attentively and
intentionally provided. In this paper, we first define the general problem of
learning from implicit human feedback and then propose to address this problem
through a novel data-driven framework, EMPATHIC. This two-stage method consists
of (1) mapping implicit human feedback to relevant task statistics such as
reward, optimality, and advantage; and (2) using such a mapping to learn a
task. We instantiate the first stage and three second-stage evaluations of the
learned mapping. To do so, we collect a dataset of human facial reactions while
participants observe an agent execute a sub-optimal policy for a prescribed
training task...
0
Vulnerabilities & Strengths