Real-time and offline large language models on edge devices: a systematic review

Dinçer, Erçin; KİLİMCİ, ZEYNEP

doi:10.7717/peerj-cs.3769

Real-time and offline large language models on edge devices: a systematic review

Dinçer E., KİLİMCİ Z. H.

PeerJ Computer Science, cilt.12, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Derleme
Cilt numarası: 12
Basım Tarihi: 2026
Doi Numarası: 10.7717/peerj-cs.3769
Dergi Adı: PeerJ Computer Science
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, Directory of Open Access Journals
Anahtar Kelimeler: Edge computing, Large language models, Offline deployment, Real-time deployment, Systematic review
Kocaeli Üniversitesi Adresli: Evet

Özet

Large Language Models have recently gained prominence for deployment on edge devices, owing to their potential to support privacy-preserving, low-latency, and offline inference. Nevertheless, their considerable computational and memory requirements present fundamental challenges in both real-time and offline scenarios. This systematic review synthesizes evidence from 49 studies identified through a structured search and screening process, of which 34 were included in the qualitative synthesis. Among these, a subset of studies providing sufficient methodological detail and full-text access was analyzed in depth to investigate techniques, challenges, and applications of Large Language Models (LLM) deployment on edge devices. The studies were identified through a structured search and screening process, and data were extracted regarding model types, hardware platforms, optimization strategies, and performance outcomes. Findings indicate that hardware acceleration, model compression, and hybrid edge–cloud strategies can yield latency reductions of up to 972×, memory savings of up to 130×, and energy efficiency improvements exceeding 1,600×, within the evaluated experimental settings, while largely preserving accuracy. Real-time deployments are predominantly applied in robotics, healthcare monitoring, and autonomous driving, whereas offline deployments are tailored to privacy-sensitive or batch-oriented contexts. The review also identifies persistent research gaps, including the absence of standardized benchmarks and the limited generalizability of results to real-world environments. It concludes by outlining future research directions, with particular emphasis on hardware–software co-design, federated learning, and secure task offloading.