Gradient Coresets for Federated Learning

Abstract

Federated Learning (FL) serves as a pivotal technique for training machine learning models on decentralized data across multiple clients, encompassing edge devices with limited resources. The challenge lies in developing solutions that optimize computational, communication, and energy efficiency while upholding privacy constraints within the FL framework. Traditional approaches address these challenges through the selection of a weighted subset known as a coreset from the training dataset, proven to be resilient to data noise. However, these methods hinge on the aggregate statistics of the training data and lack seamless adaptability to the FL context. This paper introduces an innovative algorithm, Gradient-based Coreset for Robust and Efficient Federated Learning (GCFL), which strategically selects a coreset at each client only every K communication rounds, drawing updates exclusively from it and assuming the availability of a small validation dataset at the server. The proposed coreset selection technique proves highly adept at mitigating noise in clients' data, as evidenced by experiments on four real-world datasets. The results showcase that GCFL is (1) more computationally and energy efficient than FL, (2) resilient to various forms of noise in both feature space and labels, (3) ensures privacy of the validation dataset, and (4) introduces minimal communication overhead while delivering substantial performance gains, particularly in scenarios where clients' data exhibits noise.

Publication
IEEE/CVF Winter Conference on Applications of Computer Vision

Related