Selective Pre-training for Private Fine-tuning
- Da Yu ,
- Sivakanth Gopi ,
- Janardhan (Jana) Kulkarni ,
- Zinan Lin ,
- Saurabh Naik ,
- Tomasz Lukasz Religa ,
- Jian Yin ,
- Huishuai Zhang
TMLR |
Suppose we want to train text prediction models in email clients or word processors. The models must preserve the privacy of user data and adhere to a specific fixed size to meet memory and inference time requirements. We introduce a generic framework to solve this problem. Specifically, we are given a public dataset \({D_}_{\text{pub}}\) and a private dataset \({D_}_{\text{priv}}\) corresponding to a downstream task \(T\). How should we pre-train a fixed-size model \(M\) on \({D_}_{\text{pub}}\) and fine-tune it on \({D_}_{\text{priv}}\) such that performance of \(M\) with respect to \(T\) is maximized and \(M\) satisfies differential privacy with respect to \({D_}_{\text{priv}}\)? We show that pre-training on a subset of dataset \({D_}_{\text{pub}}\) that brings the public distribution closer to the private distribution is a crucial ingredient to maximize the transfer learning abilities of \(M\) after pre-training, especially in the regimes where model sizes are relatively small. Besides performance improvements, our framework also shows that with careful pre-training and private fine-tuning, smaller models can match the performance of much larger models, highlighting the promise of differentially private training as a tool for model compression and efficiency.