Water & Wastewater Treatment Magazine
Issue link: https://fhpublishing.uberflip.com/i/826791
18 | XXXXX 20XX | WWT | www.wwtonline.co.uk The Works: Data ● Predictive model helps monitor service reservoir bacteria risk The man behind the Bacterial Compliance Model is Kevin Parry, Principal Statistician at Welsh Water, who began it in 2015 as part of a Masters Degree he was completing in operational research and applied statistics. The first step in the project was a literature review and research exercise, to establish a list of all possible contributory factors to a bacterial compliance failure in service reservoirs (SRVs). The laboratory testing results themselves provided the most obvious data source, giving information on water temperature, bacterial colony counts and residual chlorine levels. However, other asset data - such as the source of the water used and the distance between the treatment works and the service reservoir – were also used, along with rainfall data. The further water travels between treatment works and the SRV, the more likely it is that chlorine efficacy diminishes; rainfall is relevant because bacteria may enter the water through structural issues, such as cracks in the roof of the SRV (ingress). However, one area in which data was insufficient for inclusion was asset condition: while Welsh Water employees were carrying out regular condition checks every three years, the documents that were being filled out were not in a form that could be easily harvested for data analysis in the model. In response to this finding, the company has started using a new e-form with simple drop-down menus for such visits, the information from which is automatically fed into a database. Before building the model, Parry used Kevin Parry with the bacterial compliance predictive tool two stages of statistical analysis. First, univariate testing, looking for direct relationships between individual factors and failures; and second, multivariate testing, where multiple factors are examined to see how they relate to each other. The findings from these processes were displayed visually and sense checked by operational colleagues before being taken forward for use in the predictive model. "This was very much a collaborative effort where we worked closely with our operational teams - they had plenty of buy-in and were very interested in the results as they were coming out," says Parry. "They were in a position to say 'that makes sense' or 'that's a bit surprising' or whatever it may be. They were very involved in that process and I used data visualisation techniques to make the more complex results as understandable as possible for them which benefited both parties." The strongest factors that emerged as predictors of failure were the residual amount of free chlorine in the water, and the total colony count of bacteria present. On the other hand, rainfall was found to be less significant. For building the predictive element of the model, Parry had the choice of using any of 12 different machine learning and sampling techniques. The chosen method was eventually dictated by the imbalance of the dataset: Parry was able to draw on six years of data in total, running from 2009 to 2015, but the number of failures in this period was still relatively small. To remedy this imbalance, a method was used which created additional synthetic observations to add to the dataset, before a more standard classifier based on regression analysis was applied. The model produced gives a single metric for the risk of failure at each SRV, and was found to be able to predict failure with an accuracy of 70%. The next stage was to present this in an interactive tool which would be a spur to action for operational staff. This took the form of a 'dashboard' with the level of risk at each service reservoir represented as red, yellow or green; operational colleagues can see instantly where the points of greatest risk are by glancing at the colours on a map, and can also pull up a list of the five SRVs that present the highest risk in any geographical area. "The advantage of using a single score is that our operational colleagues don't have to look through lots of different reports, graphs and tables to make the decision about where the highest risk is," concludes Parry. "It means that they can spend less time looking at data analysis and more time carrying out the remedial works that might be required." • THE APPLICATION THE USER "The bacti-predictor model has allowed my colleagues and I in North East Wales to take a strategic and scientific approach to managing service reservoir actions and to track performance across distribution areas. The tool itself is easy to use and gives us a great overview of data which otherwise would be spread over several different locations. I can compare sites and review several years' worth of data." Rosie Winter, Water Quality Scientist, Dwr Cymru Welsh Water 18 | jUnE 2017 | WWT | www.wwtonline.co.uk