Development of an Air Quality Classification System Using SMOTE-Based Random Forest and XAI Analysis
Keywords:
air quality, Random Forest, SMOTE, SHAP, classificationAbstract
South Tangerang City is a critical environmental issue that requires an accurate and transparent classification system. This study aims to develop an air quality classification model using a machine learning algorithm integrated with data balancing techniques and model interpretation methods. The methodology used includes pre-processing of Air Pollutant Standard Index (ISPU) data for the 2020–2022 period into three categories: Good, Moderate, and Unhealthy. The dataset used is 1096, Synthetic Minority Over-sampling Technique (SMOTE) is applied to handle class imbalance, and hyperparameter optimization is performed using GridSearchCV. The experimental results show that the Random Forest algorithm outperforms the baseline SVM and KNN models, achieving an accuracy of 0.81 and an F1-Score of 0.75 after SMOTE and tuning. Explainable AI (XAI) analysis using SHAP reveals that sulfur dioxide (SO₂) is the most dominant feature influencing model decisions, and it is spatially correlated with industrial activities and heavy transportation in the South Tangerang area. The final model was then deployed to the Hugging Face Spaces cloud platform via the Gradio interface to provide publicly accessible classification services. This study demonstrates that integrating Random Forests and SHAP produces a classification system that is not only highly performant but also scientifically transparent, supporting air pollution mitigation.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Arip Kristiyanto, Hirawati Lubis

This work is licensed under a Creative Commons Attribution 4.0 International License.











