SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention
ICRA• 2024
Abstract
We present Self-Adaptive Robust Attention for Robotics Transformers
(SARA-RT): a new paradigm for addressing the emerging challenge of scaling up
Robotics Transformers (RT) for on-robot deployment. SARA-RT relies on the new
method of fine-tuning proposed by us, called up-training. It converts
pre-trained or already fine-tuned Transformer-based robotic policies of
quadratic time complexity (including massive billion-parameter
vision-language-action models or VLAs), into their efficient linear-attention
counterparts maintaining high quality. We demonstrate the effectiveness of
SARA-RT by speeding up: (a) the class of recently introduced RT-2 models, the
first VLA robotic policies pre-trained on internet-scale data, as well as (b)
Point Cloud Transformer (PCT) robotic policies operating on large point clouds.
We complement our results with the rigorous mathematical analysis providing
deeper insight into the phenomenon of SARA.