A Mixed-Integer linear programming based training and feature selection method for artificial neural networks using piece-wise linear approximations

Sildir, Hasan; Aydin, Erdal

doi:10.1016/j.ces.2021.117273

A Mixed-Integer linear programming based training and feature selection method for artificial neural networks using piece-wise linear approximations

Sildir H., Aydin E.

Chemical Engineering Science, cilt.249, 2022 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 249
Basım Tarihi: 2022
Doi Numarası: 10.1016/j.ces.2021.117273
Dergi Adı: Chemical Engineering Science
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Aqualine, Biotechnology Research Abstracts, Chemical Abstracts Core, Communication Abstracts, INSPEC, Metadex, Pollution Abstracts, zbMATH, DIALNET, Civil Engineering Abstracts
Anahtar Kelimeler: Machine learning, Artificial neural networks, Piece-wise linear artificial neural networks, Feature selection, Mixed-integer programming, LOCAL MINIMA
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Kocaeli Üniversitesi Adresli: Hayır

Özet

© 2021 Elsevier LtdArtificial Neural Networks (ANNs) may suffer from suboptimal training and test performance related issues not only because of the presence of high number of features with low statistical contributions but also due to their non-convex nature. This study develops piecewise-linear formulations for the efficient approximation of the non-convex activation and objective functions in artificial neural networks for optimal, global and simultaneous training and feature selection in regression problems. Such formulations include binary variables to account for the existence of the features and piecewise-linear approximations, which in turn, after one exact linearization step, calls for solving a mixed-integer linear programming problem with a global optimum guarantee because of convexity. Suggested formulation is implemented on two industrial case studies. Results show that efficient approximations are obtained through the usage of the method with only a few number of breakpoints. Significant feature space reduction is observed bringing about notable improvement in test accuracy.