DHU Health Care
DHU Health Care works with the UK’s National Health Service (NHS) to provide a range of services, including phone service 111 and out-of-hours and integrated urgent care across the East Midlands and Milton Keynes. For nearly 30 years DHU Health Care has grown and innovated how it delivers high quality treatment to its patients over an expanding geographical area.
The organization promotes innovation and its work with the NHS to design policy has led it to be at the forefront of integrated urgent care –with the patient at the center, whether via face to face consultations in a caring and safe environment or first contact over the telephone. DHU services over 12,000 patients annually, covering nearly 800,000 patients in total, operating 24/7 to deliver assessment, interventions, and other patient-centric outcomes.
For this case study, we spoke with Nathanael Pearson, Head of IT & Telecoms for DHU Health Care, and Jonathan Pratt, Cloud Services Engineer with Excell Group.
DHU was running 14 ESXi hosts and 2 Windows vCenters (6.5) with vSphere 6.5, vSAN, and vRO. As the VMware stack was relatively new, it was not yet causing much pain in terms of complexity, but future planning and knowing where problems were likely to occur next was clearly going to become a problem without a more proactive approach. Additionally, they had no visibility of compliance and no simple place to find recommendations for such.
In planning for version upgrades, they realized that they would need help to ensure a proactive approach and be able to focus on what really matters; rather than saying merely “we think we should do X,” they wanted better visibility with high-risk priorities already highlighted.
Being a public healthcare provider, subject to numerous ISO standards and lots of regulation, security and compliance were also of major concern. They did not have a way of demonstrating security compliance against the VMware estate. And regulatory bodies were always looking for evidence of compliance, which was difficult to provide.
The challenges of lacking overall visibility included the situation that sometimes BIOS, driver, and firmware compatibility required not keeping something up to date. Vendor messaging for compatibility was sometimes “Do not upgrade to the latest version, as it is known to cause issues” – without the integrated vendor being aware of it. For the team to know about this in every case was an ongoing issue. They would have to check each physically, based on Knowledge Base (KB) articles.
Additionally, the team had experienced issues with vCenter a couple of times where it would suddenly lose its inventory. “We would reboot the server and everything would disappear,” stated Mr. Pearson. “It was a corruption in the database and there were only two ways to fix it: rebuild the server or log a ticket with VMware.”
“Without the visibility we needed, addressing issues only after there were problems and then trying to reactively juggle the firefighting priorities was a problem for us,” said Mr. Pratt.
“We didn’t really know that we had such a problem of addressing issues only reactively until we saw the contrast,” said Mr. Pearson. “Our trial period of running Runecast Analyzer revealed this.”
As Mr. Pearson and Mr. Pratt explained, it took them around 10 minutes to deploy the Runecast Analyzer OVA and get the first scan of their environment, then just a bit of tweaking after that.
According to Mr. Pearson, “We quickly realized that Runecast Analyzer was actually helpful in what it does – rather than providing only a detailed list of problems, which we had expected, it provided helpful remediation steps for each issue.” They finally had the transparency needed to determine what to work on proactively.
When asked how they made the decision to go with Runecast Analyzer, Mr. Pearson replied, “We are not aware of any other solution that integrates to such a level into the VMware suite, or one that can even do what Runecast does. Nothing else we saw covered the security compliance bits. It gives both the PCI definition and the technical definition, with info broken down into practical bite-sized bits for working on.”
Runecast Analyzer identified for the DHU Health Care team critical issues that could result in a purple screen of death (PSOD), with reasons that included driver version mismatch, an incompatible Intel processor, and background unmap going on during host reboot. Most of the discovered issues related to being able to upgrade vSphere 6.5, but some were related to their telephone service provider. It found three or four network driver issues and BIOS/firmware issues that could also cause a PSOD.
- About 10 minutes to deploy & get a first scan
- Revealed pre-Runecast approach as reactive at best
- Critical issues now visible, able to be worked on proactively
- Know exactly which issues to prioritize and work on first
- Stability of mission-critical urgent-healthcare systems
- Easy to justify the investment
- Can now ensure stable services to patients
"Runecast is proactive enough that it gave us time to ‘play’ with stuff, so we didn’t need hard deadlines to fix things,” said Mr. Pratt. “It took away a lot of the pressure from firefighting mode. It shows you how you should have done things in the first place, so it’s also training people and giving them the confidence to do something they may not have known how to do before. With Runecast, we can now look at one screen and have the visibility and transparency to know what we are working on and why.”
Mr. Pearson added, “Runecast does the job of more than one person for the cost of less than one person, and helps us to have transparency of critical issues to ensure the stability of our mission-critical urgent-healthcare systems.”
Runecast now does the job for the team of scanning KBs and other information sources, which was near humanly impossible, and allows them to invest more time proactively looking at VMware and discussing what they want to do next. The analyzer’s upgrade simulation feature brings value on a recurring basis, each time they are planning for upgrades.
“An obvious benefit of using Runecast is gaining visibility. And it’s not a person saying ‘this is critical to fix’ but actual data showing this,” said Mr. Pearson. “Looking ahead, we can see the bigger picture.”
Mr. Pratt added, “We have the ability to stand at 10,000 feet and look down. Before, our view of things was flat in front like a motorway. Now we can see everything around and focus on what needs to be a priority and what doesn’t – and not be stuck in a rut.”
When asked what advice they would give to peers about running Runecast Analyzer in their own environments, Mr. Pratt answered, “Break things down, don’t look at everything as one big list. Focus on the criticals, pick one host, put it into maintenance mode and apply the fixes. Bring the host back up, check the results, if the environment is stable, and then monitor for any other issues. If it looks good, start working on the rest of that site. Break down tasks to minimize risk.”
Mr. Pearson added, “It might be that you’ve got 8 problems across 28 hosts, so go fix those first to make the biggest impact. You need to translate the report according to the infrastructure first. Some issues are duplicated when VM issues show up on a host, so you may see a big reduction when you go into host maintenance mode. Figure out what you want to do, apply it to one host first. You can then quickly see 90 issues drop to 30.”
- Immeasurable costs of downtime to mission-critical urgent healthcare systems
- Associated costs of reactive troubleshooting time, in terms of man hours
- Potential incalculable cost to reputational damage in case of service interruption