Prediction of Software Security Vulnerabilities from Source Code Using Machine Learning Methods


Mandal D., KÖSESOY İ.

2023 Innovations in Intelligent Systems and Applications Conference, ASYU 2023, Sivas, Türkiye, 11 - 13 Ekim 2023 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/asyu58738.2023.10296747
  • Basıldığı Şehir: Sivas
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: AST Tree, Doc2Vec, Feature Extraction, Machine Learning Algorithms, Software Vulnerability, TF-IDF
  • Kocaeli Üniversitesi Adresli: Evet

Özet

One of the most significant problems in software engineering is the presence of security vulnerabilities in software. Attackers can exploit these vulnerabilities to gain unauthorized access to systems, leak information, corrupt data, and cause service interruptions. Therefore, in addition to developing secure software, the detection of existing security vulnerabilities in software is also considered as an important research topic. In this study, security vulnerabilities in the source code of software were predicted using machine learning methods. The OWASP Benchmark Test pocket was used as the dataset. This dataset consisted of Java codes and was utilized for training machine learning models Logistic Regression, Decision Tree, Support Vector Machines, K-Nearest Neighbors, and Random Forest. TF-IDF and Doc2Vec methods were employed to extract feature vectors from the source code. In the conducted experimental study, the highest prediction accuracy (0.97) was achieved using the TF-IDF feature extraction method and the Decision Tree, SVM and Logistic Regression algorithms.