Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation
CoRLJun 22, 2018Best Paper
What is the right object representation for manipulation? We would like
robots to visually perceive scenes and learn an understanding of the objects in
them that (i) is task-agnostic and can be used as a building block for a
variety of manipulation tasks, (ii) is generally applicable to both rigid and
non-rigid objects, (iii) takes advantage of the strong priors provided by 3D
vision, and (iv) is entirely learned from self-supervision. This is hard to
achieve with previous methods: much recent work in grasping does not extend to
grasping specific objects or other tasks, whereas task-specific learning may
require many trials to generalize well across object configurations or other
tasks. In this paper we present Dense Object Nets, which build on recent
developments in self-supervised dense descriptor learning, as a consistent
object representation for visual understanding and manipulation. We demonstrate
they can be trained quickly (approximately 20 minutes) for a wide variety of
previously unseen and potentially non-rigid objects. We additionally present
novel contributions to enable multi-object descriptor learning, and show that
by modifying our training procedure, we can either acquire descriptors which
generalize across classes of objects, or descriptors that are distinct for each
object instance. Finally, we demonstrate the novel application of learned dense
descriptors to robotic manipulation. We demonstrate grasping of specific points
on an object across potentially deformed object configurations, and demonstrate
using class general descriptors to transfer specific grasps across objects in a
class.