Automatic monitoring of treated water released from wastewater treatment plants using model-based clustering with density estimation

Sheik Faritha Begum; K Lokeshwaran

doi:10.3846/jeelm.2025.22953

DOI: https://doi.org/10.3846/jeelm.2025.22953

Abstract

One of the most promising efforts to fight against the water scarcity threat is to reuse the treated water released from WasteWater Treatment Plants (WWTP). The objective of this paper is to propose an integrated approach for continuously evaluating the performance of wastewater treatment plants (WWTPs), with a focus on treated wastewater quality assessment and reuse of treated water for beneficial purposes like irrigation, aquarium, groundwater recharge, and in river water discharge based on pollution level in treated water. This paper implemented a model-based clustering with density estimation to generate the non-overlapped clusters to categorize the clusters. Cluster analysis using the Euclidean distance resulted in three clusters labeled under a specified category of water polluted: non-polluted, lightly polluted, highly polluted or slightly polluted. Unlike standard clustering algorithms like K-means, hierarchical that produce optimized clusters in statistical terms that deviate from naturally categorized clusters, model-based clustering with density estimation operates on the assumption that each data object originates from the mixture of underlying probability distributions. Water quality parameters like suspended solids (SS) have been considered for the analysis. Our experimental results conclusively show the polluted levels of wastewater from WWTP using a model-based clustering approach. The Dataset used in this work has been derived from the wastewater treatment plant located in Manresa, a town of 100,000 inhabitants near Barcelona (Catalonia). The plant treats a flow of 35,000 m³/day, mainly domestic wastewater, although wastewater from industries located inside the town is received in the plant too. In this research, the plant’s behavior over 527 days are under consideration. Model-based density clustering algorithm discovers 3 clusters, with half lying in size range of 14–89 and a maximum size of 352. With the help of natural clusters generated, our results show that out of 445 days, in 352 days, the treated water is almost non-polluted. By this, we can assess the performance of the wastewater treatment plant.

Keyword : clustering, density estimation, pollution, water quality

How to Cite

Begum, S. F., & Lokeshwaran, K. (2025). Automatic monitoring of treated water released from wastewater treatment plants using model-based clustering with density estimation. Journal of Environmental Engineering and Landscape Management, 33(1), 110–117. https://doi.org/10.3846/jeelm.2025.22953

Published in Issue

Feb 7, 2025

Abstract Views

255

PDF Downloads

117

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

Anaokar, G. S., Khambete, A. K., & Christian, R. A. (2018). Evaluation of a performance index for municipal wastewater treatment plants using MCDM – TOPSIS. International Journal of Technology, 9(4), 715–726. https://doi.org/10.14716/ijtech.v9i4.102

Ashfaq, A., Saadia, A., & Sharma, S. (2010). Performance evaluation of a common effluent treatment plant in Delhi, India. Journal of Industrial Pollution Control, 26(2), 157–160.

Baki, O. T., Aras, E., Akdemir, U. O., & Yilmaz, B. (2019). Biochemical oxygen demand prediction in wastewater treatment plants using different regression analysis models. Desalination and Water Treatment, 157, 79–89. https://doi.org/10.5004/dwt.2019.24158

Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821. https://doi.org/10.2307/2532201

Baridam, B. B. (2012). More work on K-means clustering algorithm: The dimensionality problem. International Journal of Computer Applications, 44(2), 23–30. https://doi.org/10.5120/6236-8332

Batagelj, V. (1981). Note on ultrametric hierarchical clustering algorithms. Psychometrika, 46(3), 351–352. https://doi.org/10.1007/BF02293743

Begum, S. F., Rajesh, A., & Kaliyamurthie, K. P. (2016). Multi-objective clustering and optimization. International Journal of Control Theory and Applications, 9(28), 217–223. https://doi.org/10.17485/ijst/2016/v9i12/89282

Burkardt, J. (2009). K-means clustering. Virginia Tech, Advanced Research Computing, Interdisciplinary Center for Applied Mathematics.

Buvana, M., & Suganthi, M. (2015). An efficient cluster based service discovery model for mobile ad hoc network. KSII Transactions on Internet and Information Systems, 9(2), 680–699. https://doi.org/10.3837/tiis.2015.02.011

Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 224–227. https://doi.org/10.1109/TPAMI.1979.4766909

Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. A. M. T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182–197. https://doi.org/10.1109/4235.996017

Diller, P. (2013). Environmental protection administration, executive yuan guidelines concerning the establishment and oversight of non-profit corporations dedicated to environmental protection.

Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32–57. https://doi.org/10.1080/01969727308546046

Fraley, C., & Raftery, A. E. (2007). Model-based methods of classification: Using the mclust software in chemometrics. Journal of Statistical Software, 18(6), 1–13. https://doi.org/10.18637/jss.v018.i06

Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys (CSUR), 31(3), 264–323. https://doi.org/10.1145/331499.331504

Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: An introduction to cluster analysis (Vol. 344). John Wiley & Sons.

Lichman, M. (2013). UCI machine learning repository. The University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml

Mazhar, S., Ditta, A., Bulgariu, L., Ahmad, I., Ahmed, M., & Nadiri, A. A. (2019). Sequential treatment of paper and pulp industrial wastewater: Prediction of water quality parameters by Mamdani Fuzzy Logic model and phytotoxicity assessment. Chemosphere, 227, 256–268. https://doi.org/10.1016/j.chemosphere.2019.04.022

Mukhopadhyay, A., Maulik, U., & Bandyopadhyay, S. (2012). An interactive approach to multiobjective clustering of gene expression patterns. IEEE Transactions on Biomedical Engineering, 60(1), 35–41. https://doi.org/10.1109/TBME.2012.2220765

Murtagh, F. (1983). A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4), 354–359. https://doi.org/10.1093/comjnl/26.4.354

Nadiri, A. A., Shokri, S., Tsai, F. T. C., & Asghari Moghaddam, A. (2018). Prediction of effluent quality parameters of a wastewater treatment plant using a supervised committee fuzzy logic model. Journal of Cleaner Production, 180, 539–549. https://doi.org/10.1016/j.jclepro.2018.01.139

Nourani, V., Elkiran, G., & Abba, S. I. (2018). Wastewater treatment plant performance analysis using artificial intelligence–an ensemble approach. Water Science and Technology, 78(10), 2064–2076. https://doi.org/10.2166/wst.2018.477

Padalkar, A. V., & Kumar, R. (2018). Common effluent treatment plant (CETP): Reliability analysis and performance evaluation. Water Science and Engineering, 11(3), 205–213. https://doi.org/10.1016/j.wse.2018.10.002

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136

Sharghi, E., Nourani, V., Aliashrafi, A., & Gökçekuş, H. (2019). Monitoring effluent quality of wastewater treatment plant by clustering-based artificial neural network method. Desalination and Water Treatment, 164, 86–97. https://doi.org/10.5004/dwt.2019.24385

Silverman, B. W. (1986). Density estimation for statistics and data analysis (Vol. 26). CRC Press.

Xie, X. L., & Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8), 841–847. https://doi.org/10.1109/34.85677