Full Abstract:
Abstract
We survey eight research areas organized around one question: As learning
systems become increasingly intelligent and autonomous, what design principles
can best ensure that their behavior is aligned with the interests of the operators?
We focus on two major technical obstacles to AI alignment: the challenge of
specifying the right kind of objective functions, and the challenge of designing
AI systems that avoid unintended consequences and undesirable behavior
even in cases where the objective function does not line up perfectly with the
intentions of the designers.
Open problems surveyed in this research proposal include: How can we train
reinforcement learners to take actions that are more amenable to meaningful
assessment by intelligent overseers? What kinds of objective functions incen-
tivize a system to “not have an overly large impact” or “not have many side
effects”? We discuss these questions, related work, and potential directions
for future research, with the goal of highlighting relevant research topics in
machine learning that appear tractable today.
103
0
Alignment for Advanced Machine Learning Systems
attributed to: Jessica Taylor and Eliezer Yudkowsky and Patrick LaVictoire and Andrew Critch Machine Intelligence Research Institute
We survey eight research areas organized around one question: As learning
systems become increasingly intelligent and autonomous, what design principles
can best ensure that their behavior is aligned with the interests of the operators?
We focus on two major technical obstacles to AI alignment: the challenge of
specifying the right kind of objective functions, and the challenge of designing
AI systems that avoid unintended consequences and undesirable behavior
even in cases where the objective function does not line up perfectly with the
intentions of the designers...
0
Vulnerabilities & Strengths