Large Language Models for Machine Learning Design Assistance: Prompt-Driven Algorithm Selection and Optimization in Diverse Supervised Learning Tasks


KAYA GÜLAĞIZ F.

Applied Sciences (Switzerland), cilt.15, sa.20, 2025 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 15 Sayı: 20
  • Basım Tarihi: 2025
  • Doi Numarası: 10.3390/app152010968
  • Dergi Adı: Applied Sciences (Switzerland)
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Agricultural & Environmental Science Database, Applied Science & Technology Source, Communication Abstracts, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
  • Anahtar Kelimeler: code generation, large language models, prompt engineering
  • Kocaeli Üniversitesi Adresli: Evet

Özet

Large language models (LLMs) are playing an increasingly important role in data science applications. In this study, the performance of LLMs in generating code and designing solutions for data science tasks is systematically evaluated based on different real-world tasks from the Kaggle platform. Models from different LLM families were tested under both default settings and configurations with hyperparameter tuning (HPT) applied. In addition, the effects of few-shot prompting (FSP) and Tree of Thought (ToT) strategies on code generation were compared. Alongside technical metrics such as accuracy, F1 score, Root Mean Squared Error (RMSE), execution time, and peak memory consumption, LLM outputs were also evaluated against Kaggle user-submitted solutions, leaderboard scores, and two established AutoML frameworks (auto-sklearn and AutoGluon). The findings suggest that, with effective prompting strategies and HPT, models can deliver competitive results on certain tasks. The ability of some LLMS to suggest appropriate algorithms reveals that LLMs can be seen not only as code generators, but also as systems capable of designing machine learning (ML) solutions. This study presents a comprehensive analysis of how strategic decisions such as prompting methods, tuning approaches, and algorithm selection, affect the design of LLM-based data science systems, offering insights for future hybrid human–LLM systems.