In the 78 minutes that took down millions of Windows machines

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on telegram
Share on email
Share on reddit
Share on whatsapp
Share on telegram


On Friday morning, shortly after midnight in New York, the disaster began to spread across the world. In Australia, shoppers were greeted with Blue Screen of Death (BSOD) messages in self-checkout lanes. In the United Kingdom, Sky News had to suspend its broadcast after servers and PCs began to crash. In Hong Kong and India, airport check-in counters began to fail. When morning arrived in New York, millions of Windows computers had crashed and a global technological disaster was underway.

In the early hours of the outage, there was confusion about what was happening. How did so many Windows machines suddenly display a blue crash screen? “Something very strange is happening right now,” wrote Australian cybersecurity expert Troy Hunt in a post to X. On Reddit, IT administrators raised the alarm in a topic titled “BSOD Error in the Latest CrowdStrike Update,” which has since racked up over 20,000 responses.

The problems have led to major US airlines grounding their fleets and workers in Europe at banks, hospitals and other major institutions, unable to log into their systems. And it quickly became clear that it was all due to one small file.

At 12:09 a.m. ET on July 19, cybersecurity company CrowdStrike released a faulty update for the Falcon security software it sells to help companies prevent malware, ransomware, and any other cyberthreats from taking down their machines. It is widely used by companies for important Windows systems, which is why the impact of the bad update was so immediate and felt so widely.

The CrowdStrike update should be like any other silent update, automatically providing the latest protections to your customers in a tiny file (just 40 KB) which is distributed over the web. CrowdStrike issues them regularly without incident and they are quite common for security software. But this one was different. It exposed a massive flaw in the company’s cybersecurity product, a catastrophe that was just one bad update away — and one that could have easily been avoided.

How did this happen?

CrowdStrike’s Falcon protection software operates on Windows at the kernel level, the central part of an operating system that has unrestricted access to system memory and hardware. Most other applications run at the user mode level and do not need or get special access to the kernel. CrowdStrike’s Falcon software uses a special driver that allows it to run at a lower level than most applications so it can detect threats on a Windows system.

Running in the kernel makes CrowdStrike software much more capable as a line of defense – but also much more capable of causing problems. “This can be very problematic, because when an update comes in that isn’t formatted the right way or contains some malformations, the driver can ingest that and blindly trust that data,” Patrick Wardle, CEO of DoubleYou and founder of the Objective-See Foundation , account On the edge.

Kernel access allows the driver to create a memory corruption issue, which is what happened on Friday morning. “The crash occurred in an instruction that tried to access some memory that was not valid,” says Wardle. “If you are running the kernel and try to access invalid memory, it will crash and cause the system to crash.”

CrowdStrike detected the problems quickly, but the damage was already done. The company issued a fix 78 minutes after the original update was released. IT administrators tried restarting the machines repeatedly and were able to get some of them back online if the network got the update before the CrowdStrike driver killed the server or PC, but for many support workers, the fix involved manually visiting the affected machines and delete the defective content from the CrowdStrike update.

While investigations into the CrowdStrike incident continue, the leading theory is that there was likely a bug in the driver that had been dormant for some time. It may not have correctly validated the data it was reading from the content update files, but that was never an issue until Friday’s problematic content update.

“The driver should probably be updated to do additional error checking, to ensure that even if a problematic setting were eliminated in the future, the driver would have defenses to check and detect it…rather than acting blindly and crashing,” says Wardle. . “I would be surprised if we didn’t see a new version of the driver that has additional integrity and error checks.”

CrowdStrike should have caught this problem sooner. It’s a fairly standard practice to release updates gradually, allowing developers to test any major issues before an update reaches their entire user base. If CrowdStrike had properly tested its content updates with a small group of users, Friday would have been a wake-up call to fix an underlying driver issue rather than a globe-spanning technological disaster.

Microsoft didn’t cause Friday’s disaster, but the way Windows operates allowed the entire operating system to crash. The widespread Blue Screen of Death messages are so synonymous with Windows errors from the 1990s onwards that many headlines initially said “Microsoft outage” before it became clear that CrowdStrike was to blame. Now, there are inevitable questions about how to avoid another CrowdStrike situation in the future – and that answer can only come from Microsoft.

What can be done to avoid this?

Despite not being directly involved, Microsoft still controls the Windows experience and there is a lot of room for improvement in how Windows handles issues like this.

In the simplest form, Windows can disable buggy drivers. If Windows determines that a driver is crashing the system at startup and forcing it into recovery mode, Microsoft can create smarter logic that allows a system to boot without the faulty driver after multiple boot failures.

But the biggest change would be blocking access to the Windows kernel to prevent third-party drivers from crashing an entire PC. Ironically, Microsoft tried to do just that with Windows Vista, but was met with resistance from cybersecurity vendors and EU regulators.

Microsoft attempted to implement a feature known at the time as PatchGuard in Windows Vista in 2006, restricting third-party access to the kernel. McAfee and Symantec, the two big antivirus companies at the time, opposed Microsoft’s changes, and Symantec even complained to the European Commission. Microsoft eventually recoiledallowing security vendors to access the kernel once again for security monitoring purposes.

Apple eventually took the same step, locking down its macOS operating system in 2020 so that developers could no longer gain access to the kernel. “It was definitely the right decision for Apple to discontinue third-party kernel extensions,” says Wardle. “But the path to actually achieving this has been fraught with problems.” Apple had some kernel bugs where security tools running in user mode could still trigger an accident (kernel panic), and Wardle says Apple “has also introduced some privilege execution vulnerabilities and there are still some other bugs that could allow security tools on the Mac to be unloaded by malware.”

Regulatory pressures may still be preventing Microsoft from acting in this case. Wall Street Journal reported over the weekend that “a Microsoft spokesperson said it cannot legally block its operating system in the same way that Apple does because of an understanding it reached with the European Commission following a complaint.” O Daily paraphrases the anonymous spokesperson, and also mentions a 2009 agreement to provide security vendors with the same level of access to Windows as Microsoft.

Microsoft has achieved a interoperability agreement with the European Commission in 2009, which was a “public undertaking” to allow developers to access technical documentation for building applications on Windows. The agreement was reached as part of a deal that included implementing a browser choice screen in Windows and offering special versions of Windows without Internet Explorer included in the operating system.

The deal to force Microsoft to offer browser options ended five years later in 2014, and Microsoft also stopped producing its special versions of Windows for Europe. Microsoft now includes its Edge browser in Windows 11, without challenge from European regulators.

It’s unclear how long this interoperability agreement has been in place, but the European Commission doesn’t appear to believe it’s stopping Microsoft from reviewing Windows security. “Microsoft is free to decide on its business model and to adapt its security infrastructure to respond to threats, as long as this is done in accordance with EU competition law,” said the Microsoft spokeswoman. European Commission, Lea Zuber, in a statement to the On the edge. “Microsoft never raised any security concerns with the Commission, neither before the recent incident nor since.”

The Windows Lockdown Reaction

Microsoft could try to follow the same path as Apple, but resistance from security vendors like CrowdStrike will be strong. Unlike Apple, Microsoft also competes with CrowdStrike and other security vendors that have made protecting Windows a business. Microsoft has its own Endpoint Defender paid service, which offers similar protections to Windows machines.

CrowdStrike CEO George Kurtz is also regularly critical of Microsoft and its security record and is proud to winning customers of Microsoft’s own security software. Microsoft has had a number of security issues in recent years, so it’s easy and effective for competitors to use them to sell alternatives.

Every time Microsoft tries to block Windows in the name of security, it also faces backlash. A special mode in Windows 10 that limited machines to Windows Store apps to avoid malware was confusing and unpopular. Microsoft also left millions of PCs behind with the release of Windows 11 and its hardware requirements designed to improve the security of Windows PCs.

Cloudflare CEO Matthew Prince is already warning about the effects of Microsoft’s additional blocking of Windows, framed so that Microsoft favors its own security products if such a scenario occurs. All this resistance means that Microsoft has a complicated path to follow here if it wants to avoid Windows being at the center of a CrowdStrike-like incident again.

Microsoft is stuck in the middle, with pressure from both sides. But at a time when Microsoft is revamping security, there should be room for security vendors and Microsoft to agree on a better system that will once again avoid a world of blue screen outages.



Source link

Support fearless, independent journalism

We are not owned by a billionaire or shareholders – our readers support us. Donate any amount over $2. BNC Global Media Group is a global news organization that delivers fearless investigative journalism to discerning readers like you! Help us to continue publishing daily.

Support us just once

We accept support of any size, at any time – you name it for $2 or more.

Related

More

1 2 3 9,595

Don't Miss