On-Demand Sampling: Learning Optimally from Multiple Distributions
NeurIPSOct 22, 2022Outstanding Paper
Social and real-world considerations such as robustness, fairness, social
welfare and multi-agent tradeoffs have given rise to multi-distribution
learning paradigms, such as collaborative learning, group distributionally
robust optimization, and fair federated learning. In each of these settings, a
learner seeks to uniformly minimize its expected loss over n predefined data
distributions, while using as few samples as possible. In this paper, we
establish the optimal sample complexity of these learning paradigms and give
algorithms that meet this sample complexity. Importantly, our sample complexity
bounds for multi-distribution learning exceed that of learning a single
distribution by only an additive factor of nlog(n)/ϵ2. This
improves upon the best known sample complexity bounds for fair federated
learning by Mohri et al. and collaborative learning by Nguyen and Zakynthinou
by multiplicative factors of n and log(n)/ϵ3, respectively. We
also provide the first sample complexity bounds for the group DRO objective of
Sagawa et al. To guarantee these optimal sample complexity bounds, our
algorithms learn to sample from data distributions on demand. Our algorithm
design and analysis are enabled by our extensions of online learning techniques
for solving stochastic zero-sum games. In particular, we contribute stochastic
variants of no-regret dynamics that can trade off between players' differing
sampling costs.