Safety is an essential component for deploying reinforcement learning (RL)
algorithms in real-world scenarios, and is critical during the learning process
itself. A natural first approach toward safe RL is to manually specify
constraints on the policy's behavior. However, just as learning has enabled
progress in large-scale development of AI systems, learning safety
specifications may also be necessary to ensure safety in messy open-world
environments where manual safety specifications cannot scale. Akin to how
humans learn incrementally starting in child-safe environments, we propose to
learn how to be safe in one set of tasks and environments, and then use that
learned intuition to constrain future behaviors when learning new, modified
tasks. We empirically study this form of safety-constrained transfer learning
in three challenging domains: simulated navigation, quadruped locomotion, and
dexterous in-hand manipulation. In comparison to standard deep RL techniques
and prior approaches to safe RL, we find that our method enables the learning
of new tasks and in new environments with both substantially fewer safety
incidents, such as falling or dropping an object, and faster, more stable
learning. This suggests a path forward not only for safer RL systems, but also
for more effective RL systems.
41
0
Learning to be Safe: Deep RL with a Safety Critic
attributed to: Krishnan Srinivasan, Benjamin Eysenbach, Sehoon Ha, Jie Tan, Chelsea Finn
Safety is an essential component for deploying reinforcement learning (RL)
algorithms in real-world scenarios, and is critical during the learning process
itself. A natural first approach toward safe RL is to manually specify
constraints on the policy's behavior. However, just as learning has enabled
progress in large-scale development of AI systems, learning safety
specifications may also be necessary to ensure safety in messy open-world
environments where manual safety specifications cannot scale. Akin to how
humans learn incrementally starting in child-safe environments, we propose to
learn how to be safe in one set of tasks and environments, and then use that
learned intuition to constrain future behaviors when learning new, modified
tasks. We empirically study this form of safety-constrained transfer learning
in three challenging domains: simulated navigation, quadruped locomotion, and
dexterous in-hand manipulation.
0
Vulnerabilities & Strengths