Learning from Physical Human Feedback: An Object-Centric One-Shot Adaptation Method
ICRAMar 9, 2022Outstanding Physical Human-Robot Interaction Paper
For robots to be effectively deployed in novel environments and tasks, they
must be able to understand the feedback expressed by humans during
intervention. This can either correct undesirable behavior or indicate
additional preferences. Existing methods either require repeated episodes of
interactions or assume prior known reward features, which is data-inefficient
and can hardly transfer to new tasks. We relax these assumptions by describing
human tasks in terms of object-centric sub-tasks and interpreting physical
interventions in relation to specific objects. Our method, Object Preference
Adaptation (OPA), is composed of two key stages: 1) pre-training a base policy
to produce a wide variety of behaviors, and 2) online-updating according to
human feedback. The key to our fast, yet simple adaptation is that general
interaction dynamics between agents and objects are fixed, and only
object-specific preferences are updated. Our adaptation occurs online, requires
only one human intervention (one-shot), and produces new behaviors never seen
during training. Trained on cheap synthetic data instead of expensive human
demonstrations, our policy correctly adapts to human perturbations on realistic
tasks on a physical 7DOF robot. Videos, code, and supplementary material are
provided.