Demonstrating Agile Flight from Pixels without State Estimation
RSS• 2024
Abstract
Quadrotors are among the most agile flying robots. Despite recent advances in
learning-based control and computer vision, autonomous drones still rely on
explicit state estimation. On the other hand, human pilots only rely on a
first-person-view video stream from the drone onboard camera to push the
platform to its limits and fly robustly in unseen environments. To the best of
our knowledge, we present the first vision-based quadrotor system that
autonomously navigates through a sequence of gates at high speeds while
directly mapping pixels to control commands. Like professional drone-racing
pilots, our system does not use explicit state estimation and leverages the
same control commands humans use (collective thrust and body rates). We
demonstrate agile flight at speeds up to 40km/h with accelerations up to 2g.
This is achieved by training vision-based policies with reinforcement learning
(RL). The training is facilitated using an asymmetric actor-critic with access
to privileged information. To overcome the computational complexity during
image-based RL training, we use the inner edges of the gates as a sensor
abstraction. This simple yet robust, task-relevant representation can be
simulated during training without rendering images. During deployment, a
Swin-transformer-based gate detector is used. Our approach enables autonomous
agile flight with standard, off-the-shelf hardware. Although our demonstration
focuses on drone racing, we believe that our method has an impact beyond drone
racing and can serve as a foundation for future research into real-world
applications in structured environments.