Utility Week - authoritative, impartial and essential reading for senior people within utilities, regulators and government
Issue link: https://fhpublishing.uberflip.com/i/1138546
UTILITY WEEK | 5TH - 11TH JULY 2019 | 23 Operations & Assets Data lakes versus data warehouses The rst and crucial foundational step towards the creation of a data-driven utility is to develop a plan for collecting, managing and exploiting existing data within the enterprise into what is o en called a "data lake." The graphic below identi es one of the critical problems with the classical data warehouse approaches, especially as it relates to the taking full advantage of data-driven AI and machine learn- ing opportunities. Typically they require that data be organised, normalised (o en resulting in some kinds of data loss) and put into prede- ned schemes before being stored in the data warehouse. This is an e cient and reasonable approach, however, only in a world where (a) the kinds and characteristics of the data coming in are well known, and (b) all of the potential uses for which the data is to be used is known and understood beforehand. However, when data-driven AI or machine learning appli- cations become critical drivers of value, it is of paramount importance to have highly e cient, flexible ways of storing large amounts of multi-modal data in the original unprocessed format. With this kind of architecture, maximum flexibility is maintained for de ning new types of data selection and pre-processing that will be necessary as new uses and applications for the data are discovered and developed. For a utility, the data lake should begin by ingesting various kinds of existing historical information and prioritising those collections of data that are likely, when analysed with machine learning algorithms, to make predictions tied to achieving spe- ci c and agreed upon business objectives. Initially, these kinds of data can be existing data, to develop applications like the predictive analytics for maintenance of assets that we describe rst below. But ultimately many di… er- ent multi-modal types of data could go into the data lake, from customer-centric sentiment data like Twitter data, to high- frequency time series data like voltage levels from transmission lines, to high-density image data from satellite images, drones, and Lidar. This is an extract from Cambridge Technology's white paper AI and the Data-Drive Utility. The paper includes further thoughts on AI and machine learning use cases for utilities and approaches to organising data to exploit these opportunities fully. Read the full report here: http://bit.ly/AIandDDUtility within the organisation and relevant exter- nal sources. (See right for more). With such structures in place, a key next- step challenge for optimising AI outputs was identi• ed in • nding ways to "publish" useful algorithms back to the business in a way that is trusted and facilitates uptake in business as usual. Don't underestimate AI training needs The need to train AI is o en hugely underes- timated. To create a reliable and useful "digi- tal person" – as one participant described AI programs – a large quantity of high quality data and signi• cant human "teaching" time is required to establish smooth-running "neural networks". Gathering and prepar- ing suitable data for running training pro- grammes can be challenging. For some AI use cases, suitable data sets may not yet even exist or be accessible to utilities. In the future, advanced machine learning capability will be able to expedite this task but, for now, such AI maturity remains the domain of technology blue chips. Align with strategic objectives AI value will always remain limited unless it can be linked directly to an organisational vision and strategy, attendees agreed. This remains a key challenge for most utilities where even the most advanced could be pre- dominantly de• ned as having an "initiative- based" approach to using the technology. A key challenge, therefore, for almost all utilities is to establish an enterprise data model that is speci• cally designed to sup- port key strategic objectives. For some water companies at our roundtable, the approach of the next asset management plan period, AMP7, has been leveraged as a driver for this next stage of data maturity, and collabora- tion is under way with enterprise architec- ture teams to ensure data lake information is able to "talk" to core business systems. ture teams to ensure data lake information is able to "talk" to core business systems. Brought to you in association with Copyright © 2019 Cambridge Technology. All Rights Reserved. 2 1. Organizing and Exploiting Existing Data Sources The Data Lake as a Key Piece to Enable the Data-driven Utility The first and crucial foundational step towards the creation of a data-driven utility is to develop a plan for collecting, managing and exploiting existing data within the enterprise into what is often called a "data lake." The figure below identifies one of the critical problems with the classical data warehouse (DWH) approaches especially as i t relates to the taking full advantage of data-driven AI / ML opportunities. Typically they require that data be organized, normalized (often resulting in some kinds of data loss) and put into pre-defined schemas before being stored in the DWH. This is an efficient and reasonable approach, however, only in a world where (a) the kinds and characteristics of the data coming in are well known, and (b) all of the potential uses for which the data is to be used is known and understood beforehand. However, when data- driven AI / ML applications become critical drivers of value, it is of paramount importance to have highly efficient flexible ways of storing large amounts of multi-modal data in the original unprocessed format. With this kind of architecture, maximum flexibility is maintained for defining new types of data selection and preprocessing that will be necessary as new uses and applications for the data are discovered and developed in the future. Figure 1: A Data Lake future proofs data for AI / ML applications versus a Data Warehouse. A data lake future-proofs data for AI/machine learning appli- cations versus a data warehouse "It is of paramount importance to have highly effi cient, fl exible ways of storing large amounts of multi-modal data in the original unprocessed format."