How to Troubleshoot VMware NSX


VMware announced 4,500 NSX users in Q2 2018, which is not a small number, but neither is it a huge number considering VMware's plans to expand NSX product to hundreds of thousands of deployments. It also means that NSX is not yet a commodity and, as such, supporting and troubleshooting it presents a challenge for admins who decided (or were told) to adopt VMware NSX. Things are looking up as VMware CEO recently announced new version of NSX-V for mainstream (medium and small sized companies) so we can expect faster adoption of NSX in the upcoming months.

As with every new product in your environment, it is just a matter of time until you run into issues and are forced to troubleshoot. In this post you will find a simplified guide for the fundamental troubleshooting of NSX for vSphere.

 

NSX issue analyzer

NSX host deployment troubleshooting tips

First step begins with NSX-V installation and deployment. When you run into problems during host deployment it is worth going through the following checks:

1. Start with checking DNS, host, and firewall settings (NSX to vCenter to ESXi). It is known to cause problems during host setup (applicable to many issues related to NSX manager to the vCenter server). Use the NSX Manager web interface to see the settings.

NSX network settings

2. Problems during host preparation? Check that all hosts are listed properly under “Host preparation” tab. To do this, open vSphere web client of the NSX-V connected vCenter. In the picture below we are using Flash version of the UI.

NSX host praparation

3. Check vSphere ESX Agent Manager for errors. Open vCenter home > Administration > vCenter Server Extensions > vSphere ESX Agent Manager.

4. The control plane between hypervisors and the controllers is down. To see why, check the NSX Manager System Events

5. If more than one ESXi host is affected, check the status of message bus service on NSX Manager. RabbitMQ is stopped, restart it.

NSX status

6. Checking Communication Channel Health. From vSphere Web Client, you can check the status of communication between various components:

- Between NSX Manager and the firewall agent;
- Between NSX Manager and the control plane agent;
- Between Control plane agent and controllers.

In vSphere Web Client, navigate to Networking & Security > Installation > Host Preparation. Select a cluster or expand a cluster and select a host. Click Actions ( ) then Communication Channel Health. 

NSX Channel health

 

NSX manager troubleshooting tips

1. Verify that the NSX Manager and lookup service appliances are in time sync. Use the same NTP server configurations on both NSX Manager and the lookup service.

NSX NTP

2. Verify that the time on the ESXi hosts is in sync with NSX Manager. This can be checked in "Hosts and Clusters" -> ESXi host -> time configuration.

NSX host NTP

3. Account does not have a role on NSX Manager (Home > Networking & Security > NSX Managers > {IP of NSX Manager} > Manage > Users).

NSX accounts

4. Verify which port group and uplink NIC is used by the NSX Manager using the “esxtop” command on the ESXi host. By pressing "n" key you will get view where to find what Port-ID is using your NSX.

ESXItiop

5. Check network connectivity between NSX and vCenter: Log in to the NSX Manager CLI console. To verify connectivity, view the ARP and routing tables. (show arp, show ip route).

NSX routing troubleshooting tips

1. The vSphere Web Client UI provides two major sections relevant to NSX routing. These include the L2 and control-plane infrastructure dependencies and the routing subsystem configuration. Check if clusters and hosts are in a healthy state.

2. Check if VXLAN components are correctly configured. The MTU setting needs to be higher than 1600, VMKs nicks are not from IP address range: 169.254.x.x – failed to get them from DHCP.

NSX dhcp fail

3. Check that the team policy is consistent across all clusters. The amount of VTEPs is equal to the number of dvUplinks.

4. Run: esxcli network vswitch dvs vmware vxlan network list –vds-name=Compute_VDS and check if control planes are enabled. Control the valid IP address is listed under Controller and Connection is UP and the port count >=1

These fundamental checks above are just an entry point to the more advanced NSX components troubleshooting which typically requires network skills in routing and switching area and, most likely, network packet analysis.

If you want to speed up troubleshooting you can try Runecast Analyzer that can identify problems and tell you what exactly to fix. Runecast Analyzer can help with:

1. Runecast Analyzer detects known NSX issues on VMware NSX-V versions 6.2 - 6.4.x
2. Perform automated scan and evaluation of the NSX-V Best Practices violations.
3. Automatic NSX-V VMware Security Hardening profile analysis and reporting
4. NSX-V DISA-STIG profile analysis and reporting
5. Automatic discovery of NSX Managers linked to VMware vCenters
6. vSphere web console plugin - NSX update with a new issue summary widget.
 

For more information on all the specific menus and settings, watch the VMware NSX troubleshooting webinar.

 

 

Download NSX issue analyzer

 

Michal Hrncirk | Head of Product Management