Crowdstrike, Azure outages cause widespread chaos in the digital supply chain.

Today was a hectic day for the digital supply chain, with two unrelated events impacting thousands of users around the world.

Over the last 12 hours, the digital supply chain has been hit with two major outages causing widespread chaos:

     1. Microsoft Azure Central US Regional Cloud Outage

     2. CrowdStrike Flagship Falcon Product Outage

Despite what reports may say, the two incidents are not connected.

Parametrix has developed a proprietary global monitoring system that tracks the performance and availability of all SaaS, PaaS, and IaaS services. Here’s a breakdown of our monitoring and analysis:

Crowdstrike Outage (Global)

CrowdStrike, a cybersecurity company providing endpoint protection, experienced a bug in a recent update to their flagship Falcon product. This update affected computers or machines running Windows, whether on-premises or in the cloud, causing them to enter a state known as BSOD (Blue Screen of Death). CrowdStrike has released a manual fix for this issue and is in the process of rolling back the software update.

The downstream effects of this outage have been felt by companies and organizations globally across various sectors. Reports of downtime have come from the aviation and travel industry, impacting United Airlines and Delta Airlines; the healthcare sector, affecting Epic Systems, Harris Health System, and England’s National Health Service; and the financial services industry, disrupting operations at Charles Schwab and Bradesco in Brazil. Government and public services were also affected, including various US states' 911 services and the Georgia Department of Driver Services. In retail and pharmaceuticals, companies like CVS Health and Zara experienced interruptions, while in technology and internet services, Visa, Amazon, ADT Security, and Microsoft 365 faced significant disruptions.

Microsoft Azure Central US (Regional)

Parametrix's Cloud Monitoring System (PCMS) detected an outage in Azure’s Central US (Iowa) cloud region lasting 4 hours and 16 minutes. The event began on July 18 at 21:56 UTC and was resolved at 2:12 UTC. The root cause was a configuration chain issue that blocked backend access between a subset of Azure Storage clusters and Compute resources in the region. This resulted in Compute resources automatically restarting when connectivity was lost to virtual disks hosted on the impacted storage resources. This outage affected major US airline Frontier, preventing passengers from checking in, booking flights, or accessing flight information.


These incidents highlight the interconnectivity of today's digital world, where a single “bug” can cascade through various services, leading to significant downtime. Moreover, this outage was caused by a cybersecurity tool that is meant to protect you, demonstrating that downtime can originate from anywhere and have major implications. Such disruptions not only impact individual companies but can also have broader economic consequences. When critical services fail, the ripple effect can disrupt supply chains, financial transactions, and communication networks, potentially resulting in substantial economic losses and decreased productivity across multiple sectors.

--

Parametrix monitors third-party IT services across the globe and collects granular data on service interruptions. We will continue to keep our customers and partners updated. Feel free to reach out if you have any questions: info@parametrixinsurance.com

The Parametrix Team
View Profile
Published
July 19, 2024
Category