Privacy for Free: How does Dataset Condensation Help Privacy?
ICMLJun 1, 2022Outstanding Paper
To prevent unintentional data leakage, research community has resorted to
data generators that can produce differentially private data for model
training. However, for the sake of the data privacy, existing solutions suffer
from either expensive training cost or poor generalization performance.
Therefore, we raise the question whether training efficiency and privacy can be
achieved simultaneously. In this work, we for the first time identify that
dataset condensation (DC) which is originally designed for improving training
efficiency is also a better solution to replace the traditional data generators
for private data generation, thus providing privacy for free. To demonstrate
the privacy benefit of DC, we build a connection between DC and differential
privacy, and theoretically prove on linear feature extractors (and then
extended to non-linear feature extractors) that the existence of one sample has
limited impact (O(m/n)) on the parameter distribution of networks trained on
m samples synthesized from n(n≫m) raw samples by DC. We also
empirically validate the visual privacy and membership privacy of
DC-synthesized data by launching both the loss-based and the state-of-the-art
likelihood-based membership inference attacks. We envision this work as a
milestone for data-efficient and privacy-preserving machine learning.