February 26, 2018
The Purple Screen of Death (PSOD) is a fatal crash of VMware ESX/ESXi hosts which kills all active Virtual Machines. With highly virtualized data centers, microservices and technologies like Docker, a single PSOD can terminate tens, hundreds or even thousands of underlying services. Some PSODs can be hardware-related, but most of the time they are due to a combination of problematic drivers, BIOS or software bugs (assert errors). Taking measures to proactively detect (anticipate) and resolve these issues can do a great deal in preventing PSOD outages.
A Purple Screen of Death (PSOD) is a diagnostic screen with white type on a purple background. The term Purple Screen of Death is a play on the Blue Screen of Death, the informal name given by users to the Windows general protection fault error. Typically, the PSOD details the memory state at the time of the crash and includes other information such as the ESXi version and build, the exception type, register dump, what was running on each CPU at the time of the crash, backtrace, server uptime, error messages and core dump information.
Being proactive like this will greatly help you avoid future critical PSOD-related service outages. Runecast Analyzer was designed to minimize and even completely eliminate PSOD crashes of ESXi hosts. Many of the root causes behind PSODs are not easy to detect manually because it is typically a combination of several factors, not just a software problem. Automating this process is the most viable way to ensure your datacentres are as reliable as possible.
Below is an example of Ruencast Analyzer detecting a PSOD problem.