Automatic monitoring of treated water released from wastewater treatment plants using model-based clustering with density estimation
Abstract
One of the most promising efforts to fight against the water scarcity threat is to reuse the treated water released from WasteWater Treatment Plants (WWTP). The objective of this paper is to propose an integrated approach for continuously evaluating the performance of wastewater treatment plants (WWTPs), with a focus on treated wastewater quality assessment and reuse of treated water for beneficial purposes like irrigation, aquarium, groundwater recharge, and in river water discharge based on pollution level in treated water. This paper implemented a model-based clustering with density estimation to generate the non-overlapped clusters to categorize the clusters. Cluster analysis using the Euclidean distance resulted in three clusters labeled under a specified category of water polluted: non-polluted, lightly polluted, highly polluted or slightly polluted. Unlike standard clustering algorithms like K-means, hierarchical that produce optimized clusters in statistical terms that deviate from naturally categorized clusters, model-based clustering with density estimation operates on the assumption that each data object originates from the mixture of underlying probability distributions. Water quality parameters like suspended solids (SS) have been considered for the analysis. Our experimental results conclusively show the polluted levels of wastewater from WWTP using a model-based clustering approach. The Dataset used in this work has been derived from the wastewater treatment plant located in Manresa, a town of 100,000 inhabitants near Barcelona (Catalonia). The plant treats a flow of 35,000 m3/day, mainly domestic wastewater, although wastewater from industries located inside the town is received in the plant too. In this research, the plant’s behavior over 527 days are under consideration. Model-based density clustering algorithm discovers 3 clusters, with half lying in size range of 14–89 and a maximum size of 352. With the help of natural clusters generated, our results show that out of 445 days, in 352 days, the treated water is almost non-polluted. By this, we can assess the performance of the wastewater treatment plant.
Keyword : clustering, density estimation, pollution, water quality
![Creative Commons License](http://i.creativecommons.org/l/by/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Ashfaq, A., Saadia, A., & Sharma, S. (2010). Performance evaluation of a common effluent treatment plant in Delhi, India. Journal of Industrial Pollution Control, 26(2), 157–160.
Baki, O. T., Aras, E., Akdemir, U. O., & Yilmaz, B. (2019). Biochemical oxygen demand prediction in wastewater treatment plants using different regression analysis models. Desalination and Water Treatment, 157, 79–89. https://doi.org/10.5004/dwt.2019.24158
Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821. https://doi.org/10.2307/2532201
Baridam, B. B. (2012). More work on K-means clustering algorithm: The dimensionality problem. International Journal of Computer Applications, 44(2), 23–30. https://doi.org/10.5120/6236-8332
Batagelj, V. (1981). Note on ultrametric hierarchical clustering algorithms. Psychometrika, 46(3), 351–352. https://doi.org/10.1007/BF02293743
Begum, S. F., Rajesh, A., & Kaliyamurthie, K. P. (2016). Multi-objective clustering and optimization. International Journal of Control Theory and Applications, 9(28), 217–223. https://doi.org/10.17485/ijst/2016/v9i12/89282
Burkardt, J. (2009). K-means clustering. Virginia Tech, Advanced Research Computing, Interdisciplinary Center for Applied Mathematics.
Buvana, M., & Suganthi, M. (2015). An efficient cluster based service discovery model for mobile ad hoc network. KSII Transactions on Internet and Information Systems, 9(2), 680–699. https://doi.org/10.3837/tiis.2015.02.011
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 224–227. https://doi.org/10.1109/TPAMI.1979.4766909
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. A. M. T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182–197. https://doi.org/10.1109/4235.996017
Diller, P. (2013). Environmental protection administration, executive yuan guidelines concerning the establishment and oversight of non-profit corporations dedicated to environmental protection.
Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32–57. https://doi.org/10.1080/01969727308546046
Fraley, C., & Raftery, A. E. (2007). Model-based methods of classification: Using the mclust software in chemometrics. Journal of Statistical Software, 18(6), 1–13. https://doi.org/10.18637/jss.v018.i06
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys (CSUR), 31(3), 264–323. https://doi.org/10.1145/331499.331504
Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: An introduction to cluster analysis (Vol. 344). John Wiley & Sons.
Lichman, M. (2013). UCI machine learning repository. The University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml
Mazhar, S., Ditta, A., Bulgariu, L., Ahmad, I., Ahmed, M., & Nadiri, A. A. (2019). Sequential treatment of paper and pulp industrial wastewater: Prediction of water quality parameters by Mamdani Fuzzy Logic model and phytotoxicity assessment. Chemosphere, 227, 256–268. https://doi.org/10.1016/j.chemosphere.2019.04.022
Mukhopadhyay, A., Maulik, U., & Bandyopadhyay, S. (2012). An interactive approach to multiobjective clustering of gene expression patterns. IEEE Transactions on Biomedical Engineering, 60(1), 35–41. https://doi.org/10.1109/TBME.2012.2220765
Murtagh, F. (1983). A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4), 354–359. https://doi.org/10.1093/comjnl/26.4.354
Nadiri, A. A., Shokri, S., Tsai, F. T. C., & Asghari Moghaddam, A. (2018). Prediction of effluent quality parameters of a wastewater treatment plant using a supervised committee fuzzy logic model. Journal of Cleaner Production, 180, 539–549. https://doi.org/10.1016/j.jclepro.2018.01.139
Nourani, V., Elkiran, G., & Abba, S. I. (2018). Wastewater treatment plant performance analysis using artificial intelligence–an ensemble approach. Water Science and Technology, 78(10), 2064–2076. https://doi.org/10.2166/wst.2018.477
Padalkar, A. V., & Kumar, R. (2018). Common effluent treatment plant (CETP): Reliability analysis and performance evaluation. Water Science and Engineering, 11(3), 205–213. https://doi.org/10.1016/j.wse.2018.10.002
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136
Sharghi, E., Nourani, V., Aliashrafi, A., & Gökçekuş, H. (2019). Monitoring effluent quality of wastewater treatment plant by clustering-based artificial neural network method. Desalination and Water Treatment, 164, 86–97. https://doi.org/10.5004/dwt.2019.24385
Silverman, B. W. (1986). Density estimation for statistics and data analysis (Vol. 26). CRC Press.
Xie, X. L., & Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8), 841–847. https://doi.org/10.1109/34.85677