BioCLIP: A Vision Foundation Model for the Tree of Life
CVPR• 2024
Abstract
Images of the natural world, collected by a variety of cameras, from drones
to individual phones, are increasingly abundant sources of biological
information. There is an explosion of computational methods and tools,
particularly computer vision, for extracting biologically relevant information
from images for science and conservation. Yet most of these are bespoke
approaches designed for a specific task and are not easily adaptable or
extendable to new questions, contexts, and datasets. A vision model for general
organismal biology questions on images is of timely need. To approach this, we
curate and release TreeOfLife-10M, the largest and most diverse ML-ready
dataset of biology images. We then develop BioCLIP, a foundation model for the
tree of life, leveraging the unique properties of biology captured by
TreeOfLife-10M, namely the abundance and variety of images of plants, animals,
and fungi, together with the availability of rich structured biological
knowledge. We rigorously benchmark our approach on diverse fine-grained biology
classification tasks and find that BioCLIP consistently and substantially
outperforms existing baselines (by 16% to 17% absolute). Intrinsic evaluation
reveals that BioCLIP has learned a hierarchical representation conforming to
the tree of life, shedding light on its strong generalizability.
https://imageomics.github.io/bioclip has models, data and code.