Statistical and machine learning approaches for estimating pollution of fine particulate matter (PM2.5) in Vietnam
Abstract
This study aims to predict fine particulate matter (PM2.5) pollution in Ho Chi Minh City, Vietnam, using autoregressive integrated moving average (ARIMA), linear regression (LR), random forest (RF), long short-term memory (LSTM), bidirectional LSTM (Bi-LSTM), and convolutional neural network (CNN) combining Bi-LSTM (CNN+Bi-LSTM). Two experiments were set up: the first one used data from 2018–2020 and 2021 as training and test data, respectively. Data from 2018–2021 and 2022 were used as training and test data for the second experiment, respectively. Consequently, ARIMA showed the worst performance, while CNN+Bi-LSTM achieved the best accuracy, with an R² of 0.70 and MAE, MSE, RMSE, and MAPE of 5.37, 65.4, 8.08 µg/m³, and 29%, respectively. Additionally, predicted air quality indexes (AQIs) of PM2.5 were matched the observed ones up to 96%, reflecting the application of predicted concentrations for AQI computation. Our study highlights the effectiveness of machine learning model in monitoring of air pollution.
Keyword : PM2.5, machine learning, ARIMA, univariate time series, Ho Chi Minh City
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Barthwal, A., & Goel, A. K. (2024). Advancing air quality prediction models in urban India: A deep learning approach integrating DCNN and LSTM architectures for AQI time-series classification. Modeling Earth Systems and Environment, 10, 2935–2955. https://doi.org/10.1007/s40808-023-01934-9
Bhatti, U. A., Yan, Y., Zhou, M., Ali, S., Hussain, A., Qingsong, H., Yu, Z., & Yuan, L. (2021). Time series analysis and forecasting of air pollution particulate matter PM2.5: An SARIMA and factor analysis approach. IEEE Access, 9, 41019–41031. https://doi.org/10.1109/ACCESS.2021.3060744
Bontempi, G., Ben Taieb, S., & Le Borgne, Y.-A. (2013). Machine learning strategies for time series forecasting. In M.-A. Aufaure & E. Zimányi (Eds.), Lecture notes in business information processing: Vol. 138. Business intelligence: Second European Summer School, eBISS 2012 (pp. 62–77). Springer. https://doi.org/10.1007/978-3-642-36318-4_3
Cai, P., Zhang, C., & Chai, J. (2023). Forecasting hourly PM2.5 concentrations based on decomposition-ensemble-reconstruction framework incorporating deep learning algorithms. Data Science and Management, 6(1), 46–54. https://doi.org/10.1016/j.dsm.2023.02.002
Chlebnikovas, A., Paliulis, D., Bradulienė, J., & Januševičius, T. (2023). Short-term field research on air pollution within the boundaries of the large city in the Baltic region. Environmental Science and Pollution Research, 30(34), 81950–81965. https://doi.org/10.1007/s11356-022-23798-9
Clark, S. N., Kulka, R., Buteau, S., Lavigne, E., Zhang, J. J. Y., Riel-Roberge, C., Smargiassi, A., Weichenthal, S., & van Ryswyk, K. (2024). High-resolution spatial and spatiotemporal modelling of air pollution using fixed site and mobile monitoring in a Canadian city. Environmental Pollution, 356, Article 124353. https://doi.org/10.1016/j.envpol.2024.124353
Department of Natural Resources and Environment. (2021). Report of the environmental status of Ho Chi Minh city. Ho Chi Minh City.
Ejohwomu, O. A., Shamsideen Oshodi, O., Oladokun, M., Bukoye, O. T., Emekwuru, N., Sotunbo, A., & Adenuga, O. (2022). Modelling and forecasting temporal PM2.5 concentration using ensemble machine learning methods. Buildings, 12(1), Article 46. https://doi.org/10.3390/buildings12010046
Feng, L., Li, Y., Wang, Y., & Du, Q. (2020). Estimating hourly and continuous ground-level PM2.5 concentrations using an ensemble learning algorithm: The ST-stacking model. Atmospheric Environment, 223, Article 117242. https://doi.org/10.1016/j.atmosenv.2019.117242
Filonchyk, M., Yan, H., & Hurynovich, V. (2017). Temporal-spatial variations of air pollutants in Lanzhou, Gansu Province, China, during the spring–summer periods, 2014–2016. Environmental Quality Management, 26(4), 65–74. https://doi.org/10.1002/tqem.21502
Filonchyk, M., Yan, H., Yang, S., & Lu, X. (2018). Detection of aerosol pollution sources during sandstorms in Northwestern China using remote sensed and model simulated data. Advances in Space Research, 61(4), 1035–1046. https://doi.org/10.1016/j.asr.2017.11.037
H. C. M. C. P. s. Committee. (2022). Climate and weather of Ho Chi Minh City. https://hochiminhcity.gov.vn/-/khi-hau-thoi-tiet?redirect=%2Fdieu-kien-tu-nhien
Hamami, F., & Dahlan, I. A. (2020, October 20–21). Univariate time series data forecasting of air pollution using LSTM neural network. In 2020 International Conference on Advancement in Data Science, E-learning and Information Systems (ICADEIS) (pp. 1–5), Lombok, Indonesia. https://doi.org/10.1109/ICADEIS49811.2020.9277393
Harishkumar, K., Yogesh, K., & Gad, I. (2020). Forecasting air pollution particulate matter (PM2.5) using machine learning regression models. Procedia Computer Science, 171, 2057–2066. https://doi.org/10.1016/j.procs.2020.04.221
Hien, T. T., Chi, N. D. T., Nguyen, N. T., Vinh, L. X., Takenaka, N., & Huy, D. H. (2019). Current status of fine particulate matter (PM2.5) in Vietnam’s most populous city, Ho Chi Minh City. Aerosol Air Quality Research, 19(10), 2239–2251. https://doi.org/10.4209/aaqr.2018.12.0471
Ho, B. Q. (2017). Modeling PM10 in Ho Chi Minh City, Vietnam and evaluation of its impact on human health. Sustainable Environment Research, 27(2), 95–102. https://doi.org/10.1016/j.serj.2017.01.001
Ho, B. Q., Vu, H. N. K., Nguyen, T. T. T., Nguyen, T. T., Nguyen, T. T. H., Khoa, N. T. D., & Phu, V. L. (2021). Photochemical modeling of PM2.5 and design measures for PM2.5 reduction: A case of Ho Chi Minh City, Vietnam. IOP Conference Series: Earth Environmental Science, 652(1), Article 012025. https://doi.org/10.1088/1755-1315/652/1/012025
Ho, Q. B., Vu, H. N. K., Nguyen, T. T., Nguyen, T. T. H., & Nguyen, T. T. T. (2019). A combination of bottom-up and top-down approaches for calculating of air emission for developing countries: A case of Ho Chi Minh City, Vietnam. Air Quality, Atmosphere & Health, 12(9), 1059–1072. https://doi.org/10.1007/s11869-019-00722-8
Hoa, N. T. (2023). Evaluation of fine particulate matter (PM2.5) concentrations in Ho Chi Minh City in 2021 (in Vietnamese). Tạp chí khí tượng thủy văn, 2023(751), 68–77.
Kumari, S., & Singh, S. K. (2023). Machine learning-based time series models for effective CO2 emission prediction in India. Environmental Science and Pollution Research, 30, 116601–116616. https://doi.org/10.1007/s11356-022-21723-8
Le, C. D., Pham, H. V., Pham, D. A., Le, A. D., & Vo, H. B. (2022, December 20–22). A PM2.5 concentration prediction framework with vehicle tracking system: From cause to effect. In 2022 RIVF International Conference on Computing and Communication Technologies (pp. 714–719), Ho Chi Minh City, Vietnam. https://doi.org/10.1109/RIVF55975.2022.10013864
Ma, J., Yu, Z., Qu, Y., Xu, J., & Cao, Y. (2020). Application of the XGBoost machine learning method in PM2.5 prediction: A case study of Shanghai. Aerosol and Air Quality Research, 20(1), 128–138. https://doi.org/10.4209/aaqr.2019.08.0408
Minh, V. T. T., Tin, T. T., & Hien, T. T. (2021). PM2.5 forecast system by using machine learning and WRF model, a case study: Ho Chi Minh City, Vietnam. Aerosol and Air Quality Research, 21(12), Article 210108. https://doi.org/10.4209/aaqr.210108
Ministry of Natural Resources and Environment. (2013). National technical regulation on ambient air quality (QCVN 05:2013/BTNMT). Ha Noi, Vietnam.
Nath, P., Saha, P., Middya, A. I., & Roy, S. (2021). Long-term time series pollution forecast using statistical and deep learning methods. Neural Computing and Applications, 33(19), 12551–12570. https://doi.org/10.1007/s00521-021-05901-2
Nguyen, T. N. T., Du, N. X., & Hoa, N. T. (2023a). Emission source areas of fine particulate matter (PM2.5) in Ho Chi Minh City, Vietnam. Atmosphere, 14(3), Article 579. https://doi.org/10.3390/atmos14030579
Nguyen, T. N. T., Nguyen, N. T., Nguyen, M. T. T., & Bao, P. T. (2023b). Characteristics and effect of the temperature inversion on concentrations of fine particulate matter (PM2.5) in Ho Chi Minh city. Vietnam Journal of Hydro-Meteorology, 746, 87–95.
Phung, N. K., Long, N. Q., Tin, N. V., & Le, D. T. T. (2020). Development of a PM2.5 forecasting system integrating low-cost sensors for Ho Chi Minh City, Vietnam. Aerosol and Air Quality Research, 20(6), 1454–1468. https://doi.org/10.4209/aaqr.2019.10.0490
Rabie, R., Asghari, M., Nosrati, H., Niri, M. E., & Karimi, S. (2024). Spatially resolved air quality index prediction in megacities with a CNN-Bi-LSTM hybrid framework. Sustainable Cities and Society, 109, Article 105537. https://doi.org/10.1016/j.scs.2024.105537
Rakholia, R., Le, Q., Vu, K., Ho, B. Q., & Carbajo, R. S. (2022). AI-based air quality PM2.5 forecasting models for developing countries: A case study of Ho Chi Minh City, Vietnam. Urban Climate, 46, Article 101315. https://doi.org/10.1016/j.uclim.2022.101315
Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, Article 132306. https://doi.org/10.1016/j.physd.2019.132306
Siami-Namini, S., Tavakoli, N., & Namin, A. S. (2019). The performance of LSTM and BiLSTM in forecasting time series. In 2019 IEEE International Conference on Big Data (Big Data) (pp. 3285–3292). IEEE. https://doi.org/10.1109/BigData47090.2019.9005997
Tong, W., Li, L., Zhou, X., Hamilton, A., & Zhang, K. (2019). Deep learning PM2.5 concentrations with bidirectional LSTM RNN. Air Quality, Atmosphere & Health, 12, 411–423. https://doi.org/10.1007/s11869-018-0647-4
Upadhya, A. R., Kushwaha, M., Agrawal, P., Gingrich, J. D., Asundi, J., Sreekanth, V., Marshall, J. D., & Apte, J. S. (2024). Multi-season mobile monitoring campaign of on-road air pollution in Bengaluru, India: High-resolution mapping and estimation of quasi-emission factors. Science of the Total Environment, 914, Article 169987. https://doi.org/10.1016/j.scitotenv.2024.169987
Vietnam Environment Administration. (2019). Technical guidance for calculation and publication of Vietnamese air quality index (VN_AQI).
Wang, P., Zhang, H., Qin, Z., & Zhang, G. (2017). A novel hybrid-Garch model based on ARIMA and SVM for PM2.5 concentrations forecasting. Atmospheric Pollution Research, 8(5), 850–860. https://doi.org/10.1016/j.apr.2017.01.003
Wang, Z., Zhou, Y., Zhao, R., Wang, N., Biswas, A., & Shi, Z. (2021). High-resolution prediction of the spatial distribution of PM2.5 concentrations in China using a long short-term memory model. Journal of Cleaner Production, 297, Article 126493. https://doi.org/10.1016/j.jclepro.2021.126493
World Health Organization. (2021). WHO global air quality guidelines: particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide. Geneva.
Wu, C., Li, B., & Xiong, N. (2021). An effective machine learning scheme to analyze and predict the concentration of persistent pollutants in the Great Lakes. IEEE Access, 9, 52252–52265. https://doi.org/10.1109/ACCESS.2021.3069990
Xu, C., Xu, D., Liu, Z., Li, Y., Li, N., Chartier, R., Chang, J., Wang, Q., Wu, Y., & Li, N. (2020). Estimating hourly average indoor PM2.5 using the random forest approach in two megacities, China. Building and Environment, 180, Article 107025. https://doi.org/10.1016/j.buildenv.2020.107025
Zamani Joharestani, M., Cao, C., Ni, X., Bashir, B., & Talebiesfandarani, S. (2019). PM2.5 prediction based on Random Forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere, 10(7), Article 373. https://doi.org/10.3390/atmos10070373
Zhao, R., Gu, X., Xue, B., Zhang, J., & Ren, W. (2018). Short period PM2.5 prediction based on multivariate linear regression model. PLoS ONE, 13(7), Article e0201011. https://doi.org/10.1371/journal.pone.0201011