Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery
ICRANov 3, 2020Best Medical Robotics Paper
Automatic surgical gesture recognition is fundamentally important to enable
intelligent cognitive assistance in robotic surgery. With recent advancement in
robot-assisted minimally invasive surgery, rich information including surgical
videos and robotic kinematics can be recorded, which provide complementary
knowledge for understanding surgical gestures. However, existing methods either
solely adopt uni-modal data or directly concatenate multi-modal
representations, which can not sufficiently exploit the informative
correlations inherent in visual and kinematics data to boost gesture
recognition accuracies. In this regard, we propose a novel online approach of
multi-modal relational graph network (i.e., MRG-Net) to dynamically integrate
visual and kinematics information through interactive message propagation in
the latent feature space. In specific, we first extract embeddings from video
and kinematics sequences with temporal convolutional networks and LSTM units.
Next, we identify multi-relations in these multi-modal embeddings and leverage
them through a hierarchical relational graph learning module. The effectiveness
of our method is demonstrated with state-of-the-art results on the public
JIGSAWS dataset, outperforming current uni-modal and multi-modal methods on
both suturing and knot typing tasks. Furthermore, we validated our method on
in-house visual-kinematics datasets collected with da Vinci Research Kit (dVRK)
platforms in two centers, with consistent promising performance achieved.