Why—and how—utilities are turning to multicloud architectures
Quick summary: Two key areas where utilities are shifting to multicloud configurations and some of the approaches that are helping them make the leap.
As utilities adapt to a rapidly changing environment, accelerating the path to digital transformation has become a top priority. Shifting customer expectations, new competitive pressures, evolving demand patterns, and new sources of data are creating an urgent need for effective technology solutions. While the digital demands stack up, so does the need for computing and data storage resources that can also help safeguard the reliability of vital utility services. So it should come as no surprise that many utilities are investing heavily in the cloud, particularly in multicloud architectures. Two areas that are clearly benefiting from multicloud architectures are emergency measures related to public safety power shutoffs (PSPS) and the emerging “smart grid.”
High availability for emergency-shutoff situations
When disasters such as wildfires, tornadoes, hurricanes, or earthquakes threaten a region, utilities are faced with life-and-death decisions that impact millions of people. For the applications supporting these decisions and the actions that follow, failure is simply not an option. Multicloud architectures offer the high availability needed to keep these resources available 24 hours a day, seven days a week—whenever a crisis arises. In 2021, over 8,800 wildfires raged through more than 2.5 million acres of California terrain, damaging thousands of structures and impacting millions of people. When wildfire conditions threaten electricity infrastructure, utilities may be forced to order a PSPS to ensure the safety of their communities. Shutdown decisions and the communications surrounding them rely on an array of applications. Entrusting these apps to a single cloud provider (or to on-premise resources) leaves the utility vulnerable to potential failures at a moment of crisis, and the results could be disastrous.
In 2021, over 8,800 wildfires raged through more than 2.5 million acres of California terrain
Supporting the smart grid
The Department of Energy defines the smart grid as “a new kind of electric grid, one that is built from the bottom up to handle the groundswell of digital and computerized equipment and technology dependent on it—and one that can automate and manage the increasing complexity and needs of electricity in the 21st Century.” A new element of urgency was added to this development when the recently passed Infrastructure Investment and Jobs Act called for $65 billion to upgrade the nation’s electrical transmission infrastructure, including the implementation of smart technologies. The smart grid represents a huge leap forward in a variety of ways, including
•Two-way flows of electricity and data between utility companies and distributed energy resources, which has created a new class of “prosumers,” consumers who are also producers due to the decentralization of energy generation (microgrids)
•Grid self-healing, which leverages data for fault identification and enables predictive maintenance based on sensor data
•Smart metering leveraging sensors, internet-of-things (IoT) architectures, and edge computing—and the new cybersecurity measures needed to protect them
Multicloud architectures offer the flexible, scalable, and reliable data storage and computing power utilities need to handle these and many other smart-grid capabilities.
Building blocks for making the leap
When utilities decide to implement a multicloud architecture, the “how” can be as critical as the “why” in delivering the desired results. When we work with utility clients on multicloud implementation, we focus on proven best practices for getting the greatest benefit from their investment, including the following.
DevOps and container-ready applications
By developing a DevOps pipeline and refactoring applications to be container-ready, utilities can help ensure seamless releases across cloud environments. Teams must also be trained on why and how these changes are being made so that they can support them over the long term. When we work with clients, we provide runbooks and architecture artifacts to help with support—particularly with respect to Day 2 operations, in which reliability is the key strategic driver.
GitOps is an operational framework that applies DevOps best practices to infrastructure automation. GitOps practices include
•Infrastructure as code (IaC), which helps developers reduce or eliminate manual steps, version-control changes to infrastructure, and create automated presses for code reviews, testing, and deployment
•Merge requests, the change mechanism for infrastructure updates that enables teams to collaborate and formal approvals to be given
•Continuous integration/continuous delivery (CI/CD) pipelines enable automated infrastructure updates by enacting the change into the environment when new code is merged
Reducing single points of failure
One of the key advantages of a multicloud architecture is the ability to develop robust failover strategies, so that if a failure occurs anywhere in the infrastructure, a backup resource is available. When utilities configure a multicloud architecture, two considerations should come into play:
•Provider diversity: When content delivery network Cloudflare experienced an outage in June 2022, the websites of Fitbit, Discord, Peloton, and other high-profile brands were disrupted—most likely due to simple human error. Multicloud enables utilities to heighten reliability and security by avoiding over-reliance on individual vendors and enabling automated failovers if an outage does occur. Another important issue to consider is shared core infrastructure between CSPs: if you opt for two clouds that share the same backbone, you gain limited benefits from a reliability standpoint.
•Regional diversity: Natural disasters such as wildfires, hurricanes, and tornadoes can impact extensive geographical areas. By ensuring that their cloud architecture spans multiple regions, utilities can help ensure that regional disruptions won’t impact the continuity of their services. When we work with utilities on designing multicloud configurations for reliability, one thing we determine early in the process is whether to adopt an “active-active” or "active-passive approach.
•An active-active configuration makes all resources available for a majority of the time.
•In an active-passive configuration, a select group of primary resources is available for a majority of the time, while secondary resources are placed on standby in case one or more primary resources become unavailable.
The active-passive approach can save on costs, but it can also leave gaps in reliability if secondary resources are not constantly being tested and maintained. When we implement active-passive configurations for clients, we ensure that with every release or other regular activity, we flip active resources to passive and vice versa to help ensure that all resources are ready for action.
Another strategy we employ in locating and addressing single points of failure is chaos engineering, which involves going through the system and simulating catastrophic events to assess the impact on reliability. If one of these simulations would cause an outage or other disruption, we know we’ve found a problem that needs to be addressed.
Cloud-agnostic app development
In a true multicloud environment, every app must be capable of running on every cloud platform, which requires a cloud-agnostic approach to app development. Cloud agnosticism emphasizes cross-cloud portability and functionality, with the goal of applications that can run seamlessly in any cloud environment, using a mix of vendor tools and open-source solutions. While cloud-agnostic development requires more work to build and integrate than cloud-native approaches, it enables utilities to more fully realize the advantages that multicloud offers, including
•Flexibility: Development teams are not wholly reliant on vendors to update their proprietary tools; they have the option of leveraging open-source tools, which tend to evolve more consistently.
•Scalability: As cloud-agnostic apps and services move across cloud platforms, they can quickly scale up to accommodate changes in demand.
•Reliability: In the event of a failure (or other reason for a platform going offline, such as routine maintenance), cloud-agnostic apps can quickly and seamlessly switch to another platform, enabling high-speed redundancy.
How multicloud is powering the future
As utilities evolve to meet the demands of an ever-shifting environment, the need for advanced digital solutions will only increase. By implementing multicloud architectures, built on proven best practices, utility companies can build a foundation for the solutions that will let them deal with emergencies reliably, meet the demands of the emerging smart grid, and face the future with confidence.
Lionel Bodin is a Senior Director in the Logic20/20 Digital Transformation Practice. He manages highly complex, multi-faceted digital programs related to CRM systems, cloud and on-premise implementations, big data, and more.
Mike Ashby is a Principle Architect in the Logic20/20 Digital Transformation Practice. He's experienced in cross-region, multi-team, real-time streaming, and cloud platform development and providing technical leadership across cloud, microservices, web, security, performance and DevOps.