Interactive Robotic Grasping with Attribute-Guided Disambiguation
ICRAMar 15, 2022Outstanding Student Paper
Interactive robotic grasping using natural language is one of the most
fundamental tasks in human-robot interaction. However, language can be a source
of ambiguity, particularly when there are ambiguous visual or linguistic
contents. This paper investigates the use of object attributes in
disambiguation and develops an interactive grasping system capable of
effectively resolving ambiguities via dialogues. Our approach first predicts
target scores and attribute scores through vision-and-language grounding. To
handle ambiguous objects and commands, we propose an attribute-guided
formulation of the partially observable Markov decision process (Attr-POMDP)
for disambiguation. The Attr-POMDP utilizes target and attribute scores as the
observation model to calculate the expected return of an attribute-based (e.g.,
"what is the color of the target, red or green?") or a pointing-based (e.g.,
"do you mean this one?") question. Our disambiguation module runs in real time
on a real robot, and the interactive grasping system achieves a 91.43\%
selection accuracy in the real-robot experiments, outperforming several
baselines by large margins.