Showing posts with label HA. Show all posts
Showing posts with label HA. Show all posts

Saturday, August 10, 2013

Simulating Management Network Redundancy

Hello Readers,

First of all let me thank you, for your time, you will be investing in reading this post. And as this is a long post, I appreciate your patience as the information documented here is very important and helps you Design/Configure/Troubleshoot/Maintain your VMware environment accordingly.


So many times during discussion I came across a situation where power outage happened in the environment and everything went down completely or partially. Now this really depends on the configuration and design you have opted on the Network side of things when you lost the connectivity to your Management Network (either to Service Console - ESX 3.x, ESX 4.x or to vmkernel post ESXi 5.x).

As this is a long post I appreciate your patience as the information documented here is very important and helps you design your environment accordingly.

The purpose of this article is to simulate a condition with which you can test the redundancy of your service console/vmkernel used for carrying the Management Network.

Assumptions made before we proceed:

1) Here I am making an assumption that the configuration on the ESX/ESXi host with atleast two uplinks for redundancy either in NIC Teaming or only have one uplink or more than one uplinks on one vSwitch/VDS or having a second vSwitch/dvportgroup on VDS configured wither either 1 or 2 uplinks.

2) Both NICs are either connected to one physical switch or if connected to two separate physical switches then both uplink switches are connected at L2 using any workable technologies such as vPC, MLAG, SMTL etc. etc.

3) vSwitch port group or VDS dvportgroup are configured with correct Load Balancing policies which corresponds to the configuration on the uplink physical switches

4) Environment is not running any Production workload and it could be Dev/Test/Lab so that no impact on the running Production VMs.

5) The default gateway is reachable from ESX/ESXi

6) If more than one isolation address is required on the Cluster/Host then necessary configuration is done on ESX/ESXi

7) Physical connectivity is configured properly for each ESX/ESXi host irrespective of hardware and other components necessary.

8) ESX/ESXi host is configured to use any of the available tools such as SSH (putty)/DCUI/ILO/DRAC/RSA/KVM etc.

9) All the network cables which are used for providing an uplink/s to ESX/ESXi host are identified properly and marked/documented properly on both ESX/ESXi and physical Switch/es Side.


10) Familiarity with vSphere Networking and use of the necessary commands where not listed.

11) Make changes at the vSwitch level only when you a separate vSwitch configured for Management Network with a different uplink or if using VDS you should have a different dvportgroup configured for Management Network.


1st Method


You can go into VSS port group settings and go to NIC teaming Tab where you can see the two uplinks in Active Active, check mark the box Override failover order, select one of the NICs from the Tea, select the arrow Move Down and click it twice so the NIC will be moved to Unused state from Active state

At this time the vSwitch/VDS will still shows both NICs as Active/Active state. Once you move the NIC into unused state you will see the Management Network will have only one uplink and you will received the warning under the Summary Tab "Host currently has no management network redundancy".

Now HA may also gives you an error but related information is not under the scope of this article and user should be aware about the impact and able to recover if needed.

Note: If the warning continues to appear, disable and re-enable VMware High Availability/FDM (depending on the version of ESX/ESXi in the Cluster.

To set das.ignoreRedundantNetWarning to true (for ESX 3.x and 4.x):

   1. From the VMware Infrastructure Client, right-click on the cluster and click Edit Settings.
   2. Select vSphere HA and click Advanced Options.
   3. In the Options column, enter das.ignoreRedundantNetWarning
   4. In the Value column, type true.

      Note: Steps 3 and 4 create a new option.
   5. Click OK.
   6. Right-click the host and click Reconfigure for vSphere HA. This reconfigures HA.

To set das.ignoreRedundantNetWarning to true in the vSphere 5.1 Web Client:

   1. From the vSphere Web Client, right click on the cluster.
   2. Click on the Manage tab for the cluster, then under Settings click vSphere HA.
   3. Click on the Edit button in the top right corner.
   4. Expand the Advanced Options section, and click Add.
   5. In the Options column, type das.ignoreRedundantNetWarning.
   6. In the Value column, type true.
   7. Click OK.
   8. Right-click the host and click Reconfigure for vSphere HA.

2nd Method

If you do not have access through vSphere client or Web Client then you can set the NIC to unused state using command line through ssh session to the ESX/ESXi hos. Now this command will move the NIC into unsed state for the whole vSwitch or VDS and will NOT make change at the particular port group (VSS) or dvport group (VDS) level.

If using vSwitch (VSS)

#esxcfg-vswitch -U vmnicX vSwitchX

(Note: Replace X with the actual NIC number and vSwitch Number)

For ESXi 5.x

Remove network card (known as vmnics) to or from a Standard vSwitch using these commands:

#esxcli network vswitch standard uplink remove --uplink-name=vmnic --vswitch-name=vSwitch 

Change the link status of one of the uplinks vmnic with one of these commands:

#esxcli network nic down -n vmnicX

If using VDS

To get the dvport ID for the dvportgroup

#esxcfg-vswitch -l | more

To unlink the uplink from VDS

#esxcfg-vswitch -Q vmnicX -V dvPort_ID_of_vmnic dvSwitch

3rd Method

For ESX/ESXi 3.x/4.x, ESXi 5.0 you can change the VLAN property on the VSS/VDS which carries the Management Network though SSH/DCUI

#esxcfg-vswitch -v -p “Service Console” vSwitch0

(Note: for ESXi 5.1 the rollback feature won't allow you to make such change due to which the ESXi host may get disconnected from vCenter server but you can still try the command as we have one uplink still available)

4th Method

You can change the load balancing policy using the command line as well for the vSwitch where the Management Network is configured using Service Console or vmkernel and we are using only one vSwitch with one uplink and have a 2nd vSwitch configured separately with 2nd Management Network (Service Console/VMkernel) with one uplink.

Here you have to use a different teaming policy than the physical switch which results in to loss of Network connectivity on the uplink/s used for Management Network VSS/dvportgroup.

Run the command on ESX/ESXi 3.x/4.x:

#vim-cmd /hostsvc/net/vswitch_setpolicy --nicteaming-policy='loadbalance_ip' vSwitch0

To change the load balancing policy on an ESXi 5.x host, run this command:

#esxcli network vswitch standard policy failover set -l iphash -v vSwitch0

* To change the load balancing policy to a route based on the originating virtual port ID, run this command:

#vim-cmd /hostsvc/net/vswitch_setpolicy --nicteaming-policy='loadbalance_srcid' vSwitch0

* To change the load balancing policy to a route based on the MAC hash, run this command:

#vim-cmd /hostsvc/net/vswitch_setpolicy --nicteaming-policy='loadbalance_srcmac' vSwitch0

Or

#Set load balancing policy. The options are:

    *      Port ID = loadbalance_srcid
    *      IP Hash = loadbalance_ip
    *      MAC = loadbalance_srcmac
    *      Failover Only = failover_explicit

Here is an example command:

#vimsh -n -e "hostsvc/net/vSwitch_setpolicy --nicteaming-policy loadbalance_iphash vSwitch#"

Refresh network settings with the command:
#vimsh -n -e "/internalsvc/refresh_network"

Restart the management service with the command:

#service mgmt-vmware restart

On ESXi 5.x,

* To change the load balancing policy to a route based on the originating virtual port ID, run this command:

#esxcli network vswitch standard policy failover set -l portid -v vSwitch0

* To change the load balancing policy to a route based on the MAC hash, run this command:

#esxcli network vswitch standard policy failover set -l mac -v vSwitchX

* To change the load balancing policy to a route based on the IP hash, run this command:

#esxcli network vswitch standard policy failover set -l iphash -v vSwitchX

Restart the management service with the command:

#/etc/init.d/hostd restart

5th Method

Using Cisco CDP protocol find out the uplink switch port (assuming the uplink switch/es are Cisco branded), access the uplink switch using SSH and go to the port configuration for one of the NIC which is used as an uplink for Management Network, and just shut the port down administratively.

You can click on the small Blue icon right beside the vmnic to find out the CDP information as shown in the attachment.





You will see red X against the NIC which is disconnected in vSphere client which means the link is down.

Using the command "esxcfg-nics -l" will show you the Link is down for that particular NIC as well.

6th Method

By changing the MTU for the portgorup/dvportgorup used for Service Console/vmkernel Management Network

For ESX/ESXi 3.x/4.x if the MTU value selected is Default of 1500 then use the command and use the commands if the configuration is vice versa.

Find out the vmkernel or Service Console first using the command/s.

#esxcfg-vswif -l

#esxcfg-vmknic -l

Find out the port group/dvportgroup name from the out put of the above command/s.

# esxcfg-vmknic -m 9000 portgroup_name

Run this command for ESXi 5.x:

# esxcli network ip interface set -m 9000 -i vmk_interface

7th Method (Need more testing)

Once you connect to the ESX/ESXi host using SSH (putty) or any other tools and use the following command which will blink the NIC for 5-6 seconds.

#ethtool -p  vmnicX

8th Method

You can actually pull the cable from the back of the physical ESX/ESXi host which is carrying the Management Traffic so you need to have the information about the port mapping and related cable used for port/s used for Management Network on VSS/VDS.

9th Method

Warning: Using vsish interface which is currently NOT SUPPORTED by VMware so be careful when using it.


To block a particular link for a certain NIC of the vmkernel, run this command:

vsish -e set /net/pNics/vmnicX/block 1

All the above methods basically isolating the uplink connectivity for one NIC which in turn should break the redundancy status for the Management Network and the host should stay connected inside Virtual Center Server/ vCenter Server.

Enjoy the information and test out your Management Network properly and DO NOT use Production Host/s to test out the above methods.

Share and Care !!

Cheers !!

Thursday, July 25, 2013

Simulating HA/FDM Failover on vSphere

While working on cases I got this question so many times before so thought to just write few words about it.

How to simulate HA/FDM in vSphere environment and check the functionality of HA/FDM and other settings or the VMs.


Based on the version of vSphere you are running you can simulate HA/FDM in few ways.


1st Method

For  4.x environment where you are running HA based on AAM and have two redundant NICs for the Management Network vmkernel (if ESXi) /Service Console (if ESX) then you can just track down the uplink switch ports and issue the command manually on the switch to shut down the corrsponding port or just plug out the cable on the back of the physical ESX/ESXi host which will disconnect the Management Network/Service Console of the host.


2nd Method

For 5.x it uses FDM where not only Network Heartbeat used but also the Storage Heartbeat is used before the Cluster declares the Host as Isolated and work on the Isolation Response set.

In our example we have only Two ESX/ESXi hosts with two Uplinks configured and using only one Shared Datastore.


So first you need to configure at the Cluster level to set the datastore heartbeat to None. No datastore is used for Storage Heartbeat.


Now for the Management Network you can either remove the cable from the back of the ESXi host or shut down the uplink switch Port/s.

Once the Network got disconnected, the HA Cluster will wait for the default settings and then once the time out is over it will declare the host as isolated. and work on the Isolation Response configured for the VMs (Default is Leave VMs power on) and probably restart the VMs on the other surviving host. Now we are not getting in to the details about the resources available or capacity to hold the VMs.


3rd Method

To disconnect the network is an option while the host keep running so if you have the ability you can actually shut down the Host by powering off physically or by using any 3rd party KVM, DRAC, iLO, RAS alike utilities.


4th Method


Now this last method may/may not work which is involving unloading the modules of Network Driver and Storage Driver but indepth knowledge is required to find and unload/load them. Depending on the components used and the version/s of ESX/ESXi, the command/s may vary.

If you feel any information required to be added/modified then please leave the comment and I will update the post.

Thanks for your time.

Share and Care !!

Saturday, September 15, 2012

Critical Articles Repository for vSphere 5.1


Hi,

As everyone is now familiar about the release of vSphere 5.1 and majority of the community might have started installing/upgrading the environment with the new bits.

Assuming all the necessary Downloads are done and now the planning must have started for setting up the new components.

Here are some of the critical article which will make your install/upgrade process smoother. Go through them if possible to avoid any last minute surprises and stay away from any Gotchas.

Installing/Upgrading (Including SSO) KB Articles:
vCenter Single Sign On KB articles:

Troubleshooting Single Sign On (SSO) KB articles:


vSphere 5.1 Logs related KB articles:


Other vSphere 5.1 new features KB articles
:
  
Other vSphere 5.1 new Storage/Networking/Fault KB articles:

Wednesday, June 27, 2012

VMworld Session rejected, now What?

Few days back I saw my Twitter feed full of VMworld Session submissions and requesting to vote them to get accepted.

AFAIK its a 5 day event in US and 3 day event in Europe.

Each presenter will get about 20-45 minutes for their presentation and I could be wrong for this duration.

Within 5 days may be 200 or so people will be invited whose presentations got accepted and those are the lucky ones whose trip will be worth to SF. And last I heard about the submissions for this year were around 1600.

So definitely if we say 200 people should be invited then 1400 should be rejected as well. So there should not be any hate or sorrow for being rejected as any one who is going to present has other 1599 presentations to defeat (in a sense of quality and content of the presentation). Now this is not a cup of tea for everyone. I am aware about the nature of presentations got accepted at VMworld and by whom. They are top notch professionals with lots of experience with VMware and associated partners products which goes together. If you have focused only one area in your presentation then it will definitely competing with one which has more than one are covered. Very simple mechanism but very hard to incorporate.

One day I thought about how to use this uncovered knowledge within the community for a better cause to spread it out further. Means giving a second chance to a person to share his knowledge and experience through his/her presentation which could not make it to VMworld approval. And I Tweeted about it too with a Dropbox folder link.


https://www.dropbox.com/sh/1r69spjan76tdvn/QRF17nGJ2R

Now you will ask why you are not using Octopus, so my answer for now will be it still in BETA and getting qualified for every success it can achieve so to save the time I decided to go with Dropbox for now.


I did not attend a single VMworld till date (its a surprise but a Fact) and while seeing blogs and Tweets about the experience of the person who is attending, giving me a feeling that one day I will attend it.

So my suggestion to those people is to get ready and pull your sleeves for VMworld 2013 from today itself.

But first share the knowledge you have which will be obsolete by 2013. Sharing is caring.

Thanks for the support as I have received few inquiries on the same subject on how to upload the presentation and I have contacted them offline too as there is no easy way I can share the folder where people can upload anything directly to a Dropbox folder.

So I need the email of the person only to which I can send this link and they can upload to it.


https://www.dropbox.com/sh/1r69spjan76tdvn/QRF17nGJ2R


I will suggest to do two things with your presentation before you upload

1) Watermark it, if you want to do so as the person would know who is the contributor or you can leave the presentation without water marking

2) Do it in PPT or PDF so that the person can view it offline at any time by downloading it.

If you don't have an account with dropbox then use this link http://db.tt/SgOFdTa8 to create one.

Hope this will help you spread the knowledge within VMware Community.

Thanks again to make this happen and prove an example of being a part of community where knowledge does not stay within one location but travels faster across various regions

Cheers