Author(s)

Kalathiya Parth, Shah Shrey, Shah Shivam, Budheliya Hetvi, Devang Bhatt, Prof. Dhaval Chandarana

  • Manuscript ID: 140065
  • Volume: 2
  • Issue: 2
  • Pages: 1–30

Subject Area: Computer Science

Abstract

The rapid advancement of Large Language Models (LLMs) has revolutionized natural language processing, yet their development requires massive centralized datasets, raising critical privacy concerns. Federated Learning (FL) offers a promising paradigm for training LLMs while preserving data privacy by enabling collaborative learning without direct data sharing. This comprehensive survey examines the integration of federated learning with large language models, covering pre-training, fine-tuning methodologies, deployment strategies, communication optimization, and security mechanisms. We systematically review state-of-the-art approaches including federated full-parameter tuning, parameter-efficient fine-tuning (PEFT), prompt learning, and homomorphic encryption-based methods. Key challenges such as communication costs, data heterogeneity, system scalability, and privacy preservation are analyzed alongside emerging solutions. We present detailed analyses of frameworks like Federated Scope-LLM and techniques achieving communication costs under 18 KB for billion-parameter models. This survey synthesizes current progress, identifies critical research gaps, and outlines future directions for privacy-preserving LLM development across healthcare, finance, recommendation systems, and edge computing applications.

Keywords
Federated LearningLarge Language ModelsPrivacy-Preserving AIParameter-Efficient Fine-TuningCommunication EfficiencyHomomorphic EncryptionEdge LearningPrompt Tuning