Showing posts with label portgroup. Show all posts
Showing posts with label portgroup. Show all posts

Sunday, August 17, 2014

Handling Outage with Double Mix - vCenter Server 5.5 and Webclient

Hi,

Hope you have read my earlier post on recovering vCenter Server from the unexpected Outage in the Datacenter and how to revive vCenter Server and gain access to the inventory properly.

Now lets go little deeper in to another situation and with vSphere 5.5 now a days a normal vSphere environment looks something like this.

vCenter Server 5.5 with latest updates installed on a Virtual Machine with Latest Hardware version 10

Inventory Server on a separate  virtual machine with Hardware Version 10

Running Nexus 1000v VDS / VMware VDS for all nature of Traffic including Management, VM, vMotion, iSCSI, Control, Packet and Management (for Nexus 1000v), NFS, FT etc. etc.

ESXi host Management Network is on VDS as well so keep this in mind.

Now lets assume an outage occurred and you were able to connect to the ESXi host and also you were able to trace down the vCenter Server. Oh man, that was pain. I know that feeling, and hence I posted earlier about the available ways on just how to recover vCenter Server from an outage. So lets continue.

Now if you verify the VM traffic is going through VDS and the vCenter Server VM has one of the dvPortGroups selected. As you restored a standard vSwitch on ESXi host though DCUI and now you have connectivity to the ESXi host using vSphere client.

vCenter Service is not starting due to the reason that the Inventory Service is running on some other virtual machine. Luckily you found that VM and registered the VM on the same host as the vCenter Server.

But still both VMs - vCenter Server and Inventory Server cant connect to each other as both using dvPortGroup of VDS.

 Now you have the vSwitch on the ESXi host which you can use temporarily but how to change the settings of the vNIC on the VM.

This is where the trouble starts

You can't edit the Virtual Machine settings as you are connected through vSphere client to the ESXi host

In order to change the vNIC Settings you need to connect using WebClient for Hardware Version 10 Virtual machine but as the vCenter Service is not starting as it depends on the Inventory Service which is running on another Virtual machine so you are now stuck in a loop. Its a chain reaction within which you cant make vCenter Server and Inventory Server talk to each other.

Now by design you can't edit the HW 10 virtual machine settings if not using Web Client.

So here is the trick on how to fix this.

First of all shut down both vCenter Server and Inventory Server virtual machine and remove them from Inventory. Right click the virtual machine and select "Remove from Inventory" option.

Lets assume you have a standard portgroup created on the Standard vSwitch called "Test".

Now you need to open SSH access to the ESXi host and connect through putty or you can use DCUI too.

Go to the virtual machine directory

cd /vmfs/volumes/datastore-name/vcenter-vm/

Now you can use the "vi" command to modify the .vmx file of the vCenter Server virtual machine and also the Inventory Server virtual machine

Go to the line where you see the hardware version - go in to edit mode by pressing a in vi.

Hardware = 10

Change the value from 10 to 8 - save the file :wq! and you will be back at the root prompt of ESXi host.

Now go back to vSphere client and browse the respective datastore one at a time and register both vCenter Server virtual machine and also the Inventory Server virtual machine in to the Inventory.

Now you can Edit the Settings using vSphere Client and select the standard portgroup "Test" for both VMs.

Power on both VMs and verify that the vCenter Service is running.

Once verified you can open WebClient and connect to vCenter Server or you can use the vSphere client as well to connect to the vCenter Server. Make sure you have all the hosts and cluster settings as they were before the outage.

Now you should be back in Business with minimal down time.

Hope you will find this information useful when hit by the limitation of Hardware Version 10. Hopefully this should be addressed in future products where you have the ability to change the settings of the virtual machine without any specific requirement. With ESXi 5.5 U2 you can now edit the setting of the VM with Hardware Version 10 using vSphere client. So update your ESXi host in order to get this benefit and also there are other issues resolved too with this release.

Please share and care !!

Enjoy !!



Saturday, August 10, 2013

Simulating Management Network Redundancy

Hello Readers,

First of all let me thank you, for your time, you will be investing in reading this post. And as this is a long post, I appreciate your patience as the information documented here is very important and helps you Design/Configure/Troubleshoot/Maintain your VMware environment accordingly.


So many times during discussion I came across a situation where power outage happened in the environment and everything went down completely or partially. Now this really depends on the configuration and design you have opted on the Network side of things when you lost the connectivity to your Management Network (either to Service Console - ESX 3.x, ESX 4.x or to vmkernel post ESXi 5.x).

As this is a long post I appreciate your patience as the information documented here is very important and helps you design your environment accordingly.

The purpose of this article is to simulate a condition with which you can test the redundancy of your service console/vmkernel used for carrying the Management Network.

Assumptions made before we proceed:

1) Here I am making an assumption that the configuration on the ESX/ESXi host with atleast two uplinks for redundancy either in NIC Teaming or only have one uplink or more than one uplinks on one vSwitch/VDS or having a second vSwitch/dvportgroup on VDS configured wither either 1 or 2 uplinks.

2) Both NICs are either connected to one physical switch or if connected to two separate physical switches then both uplink switches are connected at L2 using any workable technologies such as vPC, MLAG, SMTL etc. etc.

3) vSwitch port group or VDS dvportgroup are configured with correct Load Balancing policies which corresponds to the configuration on the uplink physical switches

4) Environment is not running any Production workload and it could be Dev/Test/Lab so that no impact on the running Production VMs.

5) The default gateway is reachable from ESX/ESXi

6) If more than one isolation address is required on the Cluster/Host then necessary configuration is done on ESX/ESXi

7) Physical connectivity is configured properly for each ESX/ESXi host irrespective of hardware and other components necessary.

8) ESX/ESXi host is configured to use any of the available tools such as SSH (putty)/DCUI/ILO/DRAC/RSA/KVM etc.

9) All the network cables which are used for providing an uplink/s to ESX/ESXi host are identified properly and marked/documented properly on both ESX/ESXi and physical Switch/es Side.


10) Familiarity with vSphere Networking and use of the necessary commands where not listed.

11) Make changes at the vSwitch level only when you a separate vSwitch configured for Management Network with a different uplink or if using VDS you should have a different dvportgroup configured for Management Network.


1st Method


You can go into VSS port group settings and go to NIC teaming Tab where you can see the two uplinks in Active Active, check mark the box Override failover order, select one of the NICs from the Tea, select the arrow Move Down and click it twice so the NIC will be moved to Unused state from Active state

At this time the vSwitch/VDS will still shows both NICs as Active/Active state. Once you move the NIC into unused state you will see the Management Network will have only one uplink and you will received the warning under the Summary Tab "Host currently has no management network redundancy".

Now HA may also gives you an error but related information is not under the scope of this article and user should be aware about the impact and able to recover if needed.

Note: If the warning continues to appear, disable and re-enable VMware High Availability/FDM (depending on the version of ESX/ESXi in the Cluster.

To set das.ignoreRedundantNetWarning to true (for ESX 3.x and 4.x):

   1. From the VMware Infrastructure Client, right-click on the cluster and click Edit Settings.
   2. Select vSphere HA and click Advanced Options.
   3. In the Options column, enter das.ignoreRedundantNetWarning
   4. In the Value column, type true.

      Note: Steps 3 and 4 create a new option.
   5. Click OK.
   6. Right-click the host and click Reconfigure for vSphere HA. This reconfigures HA.

To set das.ignoreRedundantNetWarning to true in the vSphere 5.1 Web Client:

   1. From the vSphere Web Client, right click on the cluster.
   2. Click on the Manage tab for the cluster, then under Settings click vSphere HA.
   3. Click on the Edit button in the top right corner.
   4. Expand the Advanced Options section, and click Add.
   5. In the Options column, type das.ignoreRedundantNetWarning.
   6. In the Value column, type true.
   7. Click OK.
   8. Right-click the host and click Reconfigure for vSphere HA.

2nd Method

If you do not have access through vSphere client or Web Client then you can set the NIC to unused state using command line through ssh session to the ESX/ESXi hos. Now this command will move the NIC into unsed state for the whole vSwitch or VDS and will NOT make change at the particular port group (VSS) or dvport group (VDS) level.

If using vSwitch (VSS)

#esxcfg-vswitch -U vmnicX vSwitchX

(Note: Replace X with the actual NIC number and vSwitch Number)

For ESXi 5.x

Remove network card (known as vmnics) to or from a Standard vSwitch using these commands:

#esxcli network vswitch standard uplink remove --uplink-name=vmnic --vswitch-name=vSwitch 

Change the link status of one of the uplinks vmnic with one of these commands:

#esxcli network nic down -n vmnicX

If using VDS

To get the dvport ID for the dvportgroup

#esxcfg-vswitch -l | more

To unlink the uplink from VDS

#esxcfg-vswitch -Q vmnicX -V dvPort_ID_of_vmnic dvSwitch

3rd Method

For ESX/ESXi 3.x/4.x, ESXi 5.0 you can change the VLAN property on the VSS/VDS which carries the Management Network though SSH/DCUI

#esxcfg-vswitch -v -p “Service Console” vSwitch0

(Note: for ESXi 5.1 the rollback feature won't allow you to make such change due to which the ESXi host may get disconnected from vCenter server but you can still try the command as we have one uplink still available)

4th Method

You can change the load balancing policy using the command line as well for the vSwitch where the Management Network is configured using Service Console or vmkernel and we are using only one vSwitch with one uplink and have a 2nd vSwitch configured separately with 2nd Management Network (Service Console/VMkernel) with one uplink.

Here you have to use a different teaming policy than the physical switch which results in to loss of Network connectivity on the uplink/s used for Management Network VSS/dvportgroup.

Run the command on ESX/ESXi 3.x/4.x:

#vim-cmd /hostsvc/net/vswitch_setpolicy --nicteaming-policy='loadbalance_ip' vSwitch0

To change the load balancing policy on an ESXi 5.x host, run this command:

#esxcli network vswitch standard policy failover set -l iphash -v vSwitch0

* To change the load balancing policy to a route based on the originating virtual port ID, run this command:

#vim-cmd /hostsvc/net/vswitch_setpolicy --nicteaming-policy='loadbalance_srcid' vSwitch0

* To change the load balancing policy to a route based on the MAC hash, run this command:

#vim-cmd /hostsvc/net/vswitch_setpolicy --nicteaming-policy='loadbalance_srcmac' vSwitch0

Or

#Set load balancing policy. The options are:

    *      Port ID = loadbalance_srcid
    *      IP Hash = loadbalance_ip
    *      MAC = loadbalance_srcmac
    *      Failover Only = failover_explicit

Here is an example command:

#vimsh -n -e "hostsvc/net/vSwitch_setpolicy --nicteaming-policy loadbalance_iphash vSwitch#"

Refresh network settings with the command:
#vimsh -n -e "/internalsvc/refresh_network"

Restart the management service with the command:

#service mgmt-vmware restart

On ESXi 5.x,

* To change the load balancing policy to a route based on the originating virtual port ID, run this command:

#esxcli network vswitch standard policy failover set -l portid -v vSwitch0

* To change the load balancing policy to a route based on the MAC hash, run this command:

#esxcli network vswitch standard policy failover set -l mac -v vSwitchX

* To change the load balancing policy to a route based on the IP hash, run this command:

#esxcli network vswitch standard policy failover set -l iphash -v vSwitchX

Restart the management service with the command:

#/etc/init.d/hostd restart

5th Method

Using Cisco CDP protocol find out the uplink switch port (assuming the uplink switch/es are Cisco branded), access the uplink switch using SSH and go to the port configuration for one of the NIC which is used as an uplink for Management Network, and just shut the port down administratively.

You can click on the small Blue icon right beside the vmnic to find out the CDP information as shown in the attachment.





You will see red X against the NIC which is disconnected in vSphere client which means the link is down.

Using the command "esxcfg-nics -l" will show you the Link is down for that particular NIC as well.

6th Method

By changing the MTU for the portgorup/dvportgorup used for Service Console/vmkernel Management Network

For ESX/ESXi 3.x/4.x if the MTU value selected is Default of 1500 then use the command and use the commands if the configuration is vice versa.

Find out the vmkernel or Service Console first using the command/s.

#esxcfg-vswif -l

#esxcfg-vmknic -l

Find out the port group/dvportgroup name from the out put of the above command/s.

# esxcfg-vmknic -m 9000 portgroup_name

Run this command for ESXi 5.x:

# esxcli network ip interface set -m 9000 -i vmk_interface

7th Method (Need more testing)

Once you connect to the ESX/ESXi host using SSH (putty) or any other tools and use the following command which will blink the NIC for 5-6 seconds.

#ethtool -p  vmnicX

8th Method

You can actually pull the cable from the back of the physical ESX/ESXi host which is carrying the Management Traffic so you need to have the information about the port mapping and related cable used for port/s used for Management Network on VSS/VDS.

9th Method

Warning: Using vsish interface which is currently NOT SUPPORTED by VMware so be careful when using it.


To block a particular link for a certain NIC of the vmkernel, run this command:

vsish -e set /net/pNics/vmnicX/block 1

All the above methods basically isolating the uplink connectivity for one NIC which in turn should break the redundancy status for the Management Network and the host should stay connected inside Virtual Center Server/ vCenter Server.

Enjoy the information and test out your Management Network properly and DO NOT use Production Host/s to test out the above methods.

Share and Care !!

Cheers !!