ProtoBandit: Efficient Prototype Selection via Multi-Armed Bandits
In this work, we propose a multi-armed bandit-based framework for identifying a compact set of informative data instances (i.e., the prototypes) from a source dataset \(S\) that best represents a given target set \(T\). Prototypical examples of a given dataset offer interpretable insights into the underlying data distribution and assist in example-based reasoning, thereby influencing every sphere of human decision-making. Current state-of-the-art prototype selection approaches require \(O(|S||T|)\) similarity comparisons between source and target data points, which becomes prohibitively expensive for large-scale settings. We propose to mitigate this limitation by employing stochastic greedy search in the space of prototypical examples and multi-armed bandits for reducing the number of similarity comparisons. Our randomized algorithm, ProtoBandit, identifies a set of \(k\) prototypes incurring \(O(k|S|)\) similarity comparisons, which is independent of the size of the target set. An interesting outcome of our analysis is for the \(k\)-medoids clustering problem (\(T=S\) setting) in which we show that our algorithm ProtoBandit approximates the BUILD step solution of the partitioning around medoids (PAM) method in \(O(k|S|)\) complexity. Empirically, we observe that ProtoBandit reduces the number of similarity computation calls by several orders of magnitudes (\(100-1000\) times) while obtaining solutions similar in quality to those from state-of-the-art approaches.