Development of an Air Quality Classification System Using SMOTE-Based Random Forest and XAI Analysis

Authors

  • Arip Kristiyanto Universitas Pamulang
  • Hirawati Lubis Universitas Pamulang

Keywords:

air quality, Random Forest, SMOTE, SHAP, classification

Abstract

South Tangerang City is a critical environmental issue that requires an accurate and transparent classification system. This study aims to develop an air quality classification model using a machine learning algorithm integrated with data balancing techniques and model interpretation methods. The methodology used includes pre-processing of Air Pollutant Standard Index (ISPU) data for the 2020–2022 period into three categories: Good, Moderate, and Unhealthy. The dataset used is 1096, Synthetic Minority Over-sampling Technique (SMOTE) is applied to handle class imbalance, and hyperparameter optimization is performed using GridSearchCV. The experimental results show that the Random Forest algorithm outperforms the baseline SVM and KNN models, achieving an accuracy of 0.81 and an F1-Score of 0.75 after SMOTE and tuning. Explainable AI (XAI) analysis using SHAP reveals that sulfur dioxide (SO₂) is the most dominant feature influencing model decisions, and it is spatially correlated with industrial activities and heavy transportation in the South Tangerang area. The final model was then deployed to the Hugging Face Spaces cloud platform via the Gradio interface to provide publicly accessible classification services. This study demonstrates that integrating Random Forests and SHAP produces a classification system that is not only highly performant but also scientifically transparent, supporting air pollution mitigation.

Downloads

Published

2026-03-31

How to Cite

Arip Kristiyanto, & Hirawati Lubis. (2026). Development of an Air Quality Classification System Using SMOTE-Based Random Forest and XAI Analysis. JOURNAL ZETROEM, 8(1), 26–36. Retrieved from https://ejournal.unibabwi.ac.id/index.php/Zetroem/article/view/7586

Issue

Section

Article