Advanced hyperparameter optimization for improved spatial prediction of shallow landslides using extreme gradient boosting (XGBoost)

Kavzoglu T., Teke A.

Bulletin of Engineering Geology and the Environment, vol.81, no.5, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 81 Issue: 5
  • Publication Date: 2022
  • Doi Number: 10.1007/s10064-022-02708-w
  • Journal Name: Bulletin of Engineering Geology and the Environment
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, IBZ Online, Aerospace Database, Aquatic Science & Fisheries Abstracts (ASFA), CAB Abstracts, Communication Abstracts, Compendex, Environment Index, Geobase, INSPEC, Metadex, Pollution Abstracts, Civil Engineering Abstracts
  • Keywords: Hyperparameter optimization, XGBoost, Bayesian optimization, Genetic algorithm, Hyperband, Landslide susceptibility, HYPER-PARAMETER OPTIMIZATION, LOGISTIC-REGRESSION, DECISION TREE, BAYESIAN OPTIMIZATION, FREQUENCY RATIO, MODELS, GIS, AREA, MACHINE, SEARCH
  • Kocaeli University Affiliated: No


© 2022, Springer-Verlag GmbH Germany, part of Springer Nature.Machine learning algorithms have progressively become a part of landslide susceptibility mapping practices owing to their robustness in dealing with complicated and non-linear mechanisms of landslides. However, the internal structures of such algorithms contain a set of hyperparameter configurations whose correct setting is crucial to get the highest achievable performance. This current study investigates the effectiveness and robustness of advanced optimization algorithms, including random search (RS), Bayesian optimization with Gaussian Process (BO-GP), Bayesian optimization with Tree-structured Parzen Estimator (BO-TPE), genetic algorithm (GA), and Hyperband method, for optimizing the hyperparameters of the eXtreme Gradient Boosting (XGBoost) algorithm in the spatial prediction of landslides. 12 causative factors were considered to produce landslide susceptibility maps (LSMs) for the Trabzon province of Turkey, where translational shallow landslides are ubiquitous. Five accuracy metrics, including overall accuracy (OA), precision, recall, F1-score, area under the receiver operating characteristic curve (AUC), and a statistical significance test were employed to measure the effectiveness of the optimization strategies on XGBoost algorithm. Compared to the XGBoost model with default setting, the optimized models provided a significant improvement of up to 13% in terms of overall accuracy, which was also ascertained by McNemar’s test. AUC analysis revealed that having statistically similar performances, GA (0.942) and Hyperband (0.922) methods had the highest predictive abilities, followed by BO-GP (0.920), BO-TPE (0.899), and RS (0.894). Analysis of computational cost efficiency showed that the Hyperband approach (40.3 s) was much faster (about 13 times) than the GA in hyperparameter tuning, and thus appeared to be the best optimization algorithm for the problem under consideration.