Proactive vSphere analysis based on VMware KBs
This article's purpose is to shed more light on the benefits of using the VMware KBs (Knowledge Base articles) in a proactive way - aimed at improving system resilience and performance. This can be achieved by minimizing the likelihood of service-impacting incidents or events from occurring and/or to mitigate the impact if they do occur.
What are the VMware KBs?
- The VMware Knowledge Base ( http://kb.vmware.com ), is a collection of about 30.000 articles (as of April 2016) covering all VMware's products portfolio.
- The articles are written by VMware staff and there are 9 categories. An article can be tagged with one or multiple categories:
- Alerts - Highly critical KBs, where a severe problem is described (example: KB1034262 )
- Best Practices - Covering configuration settings and/or steps to achieve best performance/availability/security for specific use cases ( example: KB2109712 )
- How to - Step by step guides on how to enable particular features or perform specific operations (example: KB2092403 )
- Informational - Facts about products or features: configuration maximums, interoperability, integration with 3rd party products, etc... (example: KB2111492 )
- Patch - Details about patch releases (example: KB2114884 )
- Security - Mostly security advisory articles (example: KB2121689 )
- Staff Picks - These can be picks from any of the other categories, picked based on the value the article provides (example KB1003734 )
- Troubleshooting - Describing issues along with their symptoms, causes and solutions/workarounds. (example: KB2109922 )
- Video - Containing video guides for configuration or troubleshooting steps. (example: KB2006980 )
- Language: most of the articles are written in English, but there are also many important KBs available also in (Chinese, Japanese, German, French, Portuguese and Spanish).
Why to use the VMware KBs for proactive analysis?
Because we estimate that around 90% of issues happening on VMware vSphere are already known issues and documented in a VMware knowledge base article. This is based on our vast experience working with many large enterprise vSphere environments. All the technical staff of Runecast is VCAP (VMware Advanced Professional) certified and we have also a VCDX (VMware Certified Design Expert) among us. Before Runecast, we were part of the VMware Center of Excellence within IBM GTS. We have seen so many incidents manifesting themselves over and over in different locations, most of them relating to a known issue (described in a KB). The common conclusion for those support teams dealing with the issues was: "if we only knew it before..." . We thought there must be a better way to do this, so we've quit our dream jobs and founded Runecast.
How to do it best?
This is a tremendous effort, it is not impossible but it is a huge challenge:
- Index the whole existing VMware Knowledge Base, regularly checking for new KBs and also keeping track of the KBs which are being updated or removed.
- Parse the articles and extract the meaning, in machine usable language, key/value pairs of configuration settings.
- Develop scripts to compare the values/logic from the KBs with the actual data on the environment (hosts, vCenter, VMs, datastores, etc...)
This is exactly what we did at Runecast: we developed a product which does all of the above ... automatically ... You can deploy it (OVA virtual appliance), connect it to the vCenter and click "Analyze Now". It will perform tens of thousands of configuration checks in a couple of minutes and will report if any VMware KB can plague your environment.
It is a great effort, but thanks to the algorithms we are using for parsing and classifying the KBs, we are able to cover the vSphere suite on the VMware Knowledge Base.
Analysis report - screenshot from the tool:
See how many KBs are applicable in your environment, download the trial version and give it a try !