QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
CoRLJun 27, 2018Best Systems Paper
In this paper, we study the problem of learning vision-based dynamic
manipulation skills using a scalable reinforcement learning approach. We study
this problem in the context of grasping, a longstanding challenge in robotic
manipulation. In contrast to static learning behaviors that choose a grasp
point and then execute the desired grasp, our method enables closed-loop
vision-based control, whereby the robot continuously updates its grasp strategy
based on the most recent observations to optimize long-horizon grasp success.
To that end, we introduce QT-Opt, a scalable self-supervised vision-based
reinforcement learning framework that can leverage over 580k real-world grasp
attempts to train a deep neural network Q-function with over 1.2M parameters to
perform closed-loop, real-world grasping that generalizes to 96% grasp success
on unseen objects. Aside from attaining a very high success rate, our method
exhibits behaviors that are quite distinct from more standard grasping systems:
using only RGB vision-based perception from an over-the-shoulder camera, our
method automatically learns regrasping strategies, probes objects to find the
most effective grasps, learns to reposition objects and perform other
non-prehensile pre-grasp manipulations, and responds dynamically to
disturbances and perturbations.