Selective Pre-training for Private Fine-tuning

Da Yu; Sivakanth Gopi; Janardhan (Jana) Kulkarni; Zinan Lin; Saurabh Naik; Tomasz Lukasz Religa; Jian Yin; Huishuai Zhang

Selective Pre-training for Private Fine-tuning

Da Yu ,
Sivakanth Gopi ,
Janardhan (Jana) Kulkarni ,
Zinan Lin ,
Saurabh Naik ,
Tomasz Lukasz Religa ,
Jian Yin ,
Huishuai Zhang

TMLR | May 2024

Download BibTex

Suppose we want to train text prediction models in email clients or word processors. The models must preserve the privacy of user data and adhere to a specific fixed size to meet memory and inference time requirements. We introduce a generic framework to solve this problem. Specifically, we are given a public dataset \({D_}_{\text{pub}}\) and a private dataset \({D_}_{\text{priv}}\) corresponding to a downstream task \(T\). How should we pre-train a fixed-size model \(M\) on \({D_}_{\text{pub}}\) and fine-tune it on \({D_}_{\text{priv}}\) such that performance of \(M\) with respect to \(T\) is maximized and \(M\) satisfies differential privacy with respect to \({D_}_{\text{priv}}\)? We show that pre-training on a subset of dataset \({D_}_{\text{pub}}\) that brings the public distribution closer to the private distribution is a crucial ingredient to maximize the transfer learning abilities of \(M\) after pre-training, especially in the regimes where model sizes are relatively small. Besides performance improvements, our framework also shows that with careful pre-training and private fine-tuning, smaller models can match the performance of much larger models, highlighting the promise of differentially private training as a tool for model compression and efficiency.