We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Comparative Analysis of Supervised Classification Algorithms for Residential Water End Uses.
- Authors
Heydari, Zahra; Stillwell, Ashlynn S.
- Abstract
Water sustainability in the built environment requires an accurate estimation of residential water end uses (e.g., showers, toilets, faucets, etc.). In this study, we evaluate the performance of four models (Random Forest, RF; Support Vector Machines, SVM; Logistic Regression, Log‐reg; and Neural Networks, NN) for residential water end‐use classification using actual (measured) and synthetic labeled data sets. We generated synthetic labeled data using Conditional Tabular Generative Adversarial Networks. We then utilized grid search to train each model on their respective optimized hyperparameters. The RF model exhibited the best model performance overall, while the Log‐reg model had the shortest execution times under different balanced and imbalanced (based on number of events per class) synthetic data scenarios, demonstrating a computationally efficient alternative for RF for specific end uses. The NN model exhibited high performance with the tradeoff of longer execution times compared to the other classification models. In the balanced data set scenario, all models achieved closely aligned F1‐scores, ranging from 0.83 to 0.90. However, when faced with imbalanced data reflective of actual conditions, both the SVM and Log‐reg models showed inferior performance compared to the RF and NN models. Overall, we concluded that decision tree‐based models emerge as the optimal choice for classification tasks in the context of water end‐use data. Our study advances residential smart water metering systems through creating synthetic labeled end‐use data and providing insight into the strengths and weaknesses of various supervised machine learning classifiers for end‐use identification. Plain Language Summary: We looked at how well different computer models can tell apart types of water use in homes, like identifying when someone is taking a shower versus flushing a toilet. We used real water meter data and also created fake, but realistic, water use data to test these models. Among the models we tested, the Random Forest model (a method that uses a collection of decisions to make predictions) was the most accurate. However, the Logistic Regression model, another type of model we tested, was faster in analyzing the data, making it a good option for quickly identifying specific water uses without needing as much computer power. We also found that all the models we tested were close in detecting what type of water use event had occurred when the data were evenly distributed across different types of water use. But, when the data were uneven—more like real‐life situations—the Random Forest and Neural Network models were better than the others. This research helps improve systems that monitor how humans use water in homes, making it easier to identify where water is used and how we can save more of it, contributing to more sustainable living environments. Key Points: We generated synthetic labeled data representing residential water end uses with a Conditional Tabular Generative Adversarial NetworkIn the context of water end‐use data, decision tree‐based models emerge as the optimal choice for classification tasksLogistic function‐based models, such as logistic regression, are computationally efficient alternatives for classifying specific end uses
- Subjects
CLASSIFICATION algorithms; WATER use; SUPERVISED learning; RESIDENTIAL water consumption; DECISION trees; GENERATIVE adversarial networks; ARTIFICIAL neural networks; SUPPORT vector machines
- Publication
Water Resources Research, 2024, Vol 60, Issue 6, p1
- ISSN
0043-1397
- Publication type
Article
- DOI
10.1029/2023WR036690