Application of machine learning algorithms and feature selection methods for better prediction of sludge production in a real advanced biological wastewater treatment plant


Ekinci E., ÖZBAY B., İLHAN OMURCA S., SAYIN F. E., ÖZBAY İ.

JOURNAL OF ENVIRONMENTAL MANAGEMENT, cilt.348, 2023 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 348
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1016/j.jenvman.2023.119448
  • Dergi Adı: JOURNAL OF ENVIRONMENTAL MANAGEMENT
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, International Bibliography of Social Sciences, PASCAL, Aerospace Database, Agricultural & Environmental Science Database, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), BIOSIS, CAB Abstracts, Communication Abstracts, Environment Index, Geobase, Greenfile, Index Islamicus, Metadex, Pollution Abstracts, Public Affairs Index, Veterinary Science Database, Civil Engineering Abstracts
  • Anahtar Kelimeler: Feature selection, Machine learning models, Municipal wastewater, Prediction, Sludge production
  • Kocaeli Üniversitesi Adresli: Evet

Özet

Although the management of sewage sludge is an important and challenging task of wastewater treatment, there is a scarcity of studies on the prediction of waste sludge. To overcome this deficiency, the present work aims to develop an appropriate model providing accurate and fast prediction of sewage sludge. With this aim, different machine learning (ML) algorithms were tested by data obtained from a real advanced biological wastewater treatment plant located in Kocaeli, Turkey. In modelling studies, a data set from January 2022 to December 2022 composed of 208 daily measurements was considered. The flow rate of the plant (Q), polyelectrolyte dosage (PD) and removed amounts of total suspended solids (TSS), chemical oxygen demand (COD), biological oxygen demand (BOD), total phosphorous (TP), total nitrogen (TN) were assigned as input parameters to predict sludge production (SP). The precision of the models was evaluated in terms of Mean Square Error (MSE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and correlation coefficient (R2). Among the various tested models Kernel Ridge Regression provided the best accuracy with R2 value of 0.94 and MAE value of 3.25. Mutual information-based feature selection (MIFS) and correlation-based feature selection (CFS) algorithms were also used in the study in order to enhance the model performance. Thus, higher prediction accuracies were achieved using the selected subset of features. Furthermore, importance contribution of features were calculated and visualized by SHapley Additive exPlanations (SHAP) technique. The overall results of the work indicate the feasibility of ML models for describing the dynamic and complex nature of SP. The process operators may benefit from this modelling approach since it enables accurate and fast estimation of sewage sludge by using fewer and easily measurable parameters.