Explicitly Incorporating Spatial Information to Recurrent Networks for Agriculture
IROSJun 27, 2022Best Agri-Robotics Paper
In agriculture, the majority of vision systems perform still image
classification. Yet, recent work has highlighted the potential of spatial and
temporal cues as a rich source of information to improve the classification
performance. In this paper, we propose novel approaches to explicitly capture
both spatial and temporal information to improve the classification of deep
convolutional neural networks. We leverage available RGB-D images and robot
odometry to perform inter-frame feature map spatial registration. This
information is then fused within recurrent deep learnt models, to improve their
accuracy and robustness. We demonstrate that this can considerably improve the
classification performance with our best performing spatial-temporal model
(ST-Atte) achieving absolute performance improvements for
intersection-over-union (IoU[%]) of 4.7 for crop-weed segmentation and 2.6 for
fruit (sweet pepper) segmentation. Furthermore, we show that these approaches
are robust to variable framerates and odometry errors, which are frequently
observed in real-world applications.