Tuesday, August 19, 2014

Web-Scale Infrastructure by Nutanix

virtualpatel.blogspot.com: Web-Scale Infrastructure by Nutanix: Hello, For last few weeks I was reading this term a lot on social media and kept wondering what exactly Web-Scale is. Nutanix has relea...

Sunday, August 17, 2014

Handling Outage with Double Mix - vCenter Server 5.5 & Web Client

virtualpatel.blogspot.com: Handling Outage with Double Mix - vCenter Server 5...: Hi, Hope you have read my earlier post on recovering vCenter Server from the unexpected Outage in the Datacenter and how to revive vCent...

Handling Outage with Double Mix - vCenter Server 5.5 and Webclient


Hope you have read my earlier post on recovering vCenter Server from the unexpected Outage in the Datacenter and how to revive vCenter Server and gain access to the inventory properly.

Now lets go little deeper in to another situation and with vSphere 5.5 now a days a normal vSphere environment looks something like this.

vCenter Server 5.5 with latest updates installed on a Virtual Machine with Latest Hardware version 10

Inventory Server on a separate  virtual machine with Hardware Version 10

Running Nexus 1000v VDS / VMware VDS for all nature of Traffic including Management, VM, vMotion, iSCSI, Control, Packet and Management (for Nexus 1000v), NFS, FT etc. etc.

ESXi host Management Network is on VDS as well so keep this in mind.

Now lets assume an outage occurred and you were able to connect to the ESXi host and also you were able to trace down the vCenter Server. Oh man, that was pain. I know that feeling, and hence I posted earlier about the available ways on just how to recover vCenter Server from an outage. So lets continue.

Now if you verify the VM traffic is going through VDS and the vCenter Server VM has one of the dvPortGroups selected. As you restored a standard vSwitch on ESXi host though DCUI and now you have connectivity to the ESXi host using vSphere client.

vCenter Service is not starting due to the reason that the Inventory Service is running on some other virtual machine. Luckily you found that VM and registered the VM on the same host as the vCenter Server.

But still both VMs - vCenter Server and Inventory Server cant connect to each other as both using dvPortGroup of VDS.

 Now you have the vSwitch on the ESXi host which you can use temporarily but how to change the settings of the vNIC on the VM.

This is where the trouble starts

You can't edit the Virtual Machine settings as you are connected through vSphere client to the ESXi host

In order to change the vNIC Settings you need to connect using WebClient for Hardware Version 10 Virtual machine but as the vCenter Service is not starting as it depends on the Inventory Service which is running on another Virtual machine so you are now stuck in a loop. Its a chain reaction within which you cant make vCenter Server and Inventory Server talk to each other.

Now by design you can't edit the HW 10 virtual machine settings if not using Web Client.

So here is the trick on how to fix this.

First of all shut down both vCenter Server and Inventory Server virtual machine and remove them from Inventory. Right click the virtual machine and select "Remove from Inventory" option.

Lets assume you have a standard portgroup created on the Standard vSwitch called "Test".

Now you need to open SSH access to the ESXi host and connect through putty or you can use DCUI too.

Go to the virtual machine directory

cd /vmfs/volumes/datastore-name/vcenter-vm/

Now you can use the "vi" command to modify the .vmx file of the vCenter Server virtual machine and also the Inventory Server virtual machine

Go to the line where you see the hardware version - go in to edit mode by pressing a in vi.

Hardware = 10

Change the value from 10 to 8 - save the file :wq! and you will be back at the root prompt of ESXi host.

Now go back to vSphere client and browse the respective datastore one at a time and register both vCenter Server virtual machine and also the Inventory Server virtual machine in to the Inventory.

Now you can Edit the Settings using vSphere Client and select the standard portgroup "Test" for both VMs.

Power on both VMs and verify that the vCenter Service is running.

Once verified you can open WebClient and connect to vCenter Server or you can use the vSphere client as well to connect to the vCenter Server. Make sure you have all the hosts and cluster settings as they were before the outage.

Now you should be back in Business with minimal down time.

Hope you will find this information useful when hit by the limitation of Hardware Version 10. Hopefully this should be addressed in future products where you have the ability to change the settings of the virtual machine without any specific requirement. With ESXi 5.5 U2 you can now edit the setting of the VM with Hardware Version 10 using vSphere client. So update your ESXi host in order to get this benefit and also there are other issues resolved too with this release.

Please share and care !!

Enjoy !!

Sunday, August 10, 2014

Recovering #vCenter Server after the major Outage

virtualpatel.blogspot.com: Recovering vCenter Server after the major Outage: Hi, I have discussed this issue so many time with my colleagues and friends who experienced the situation where the whole Datacenter had...

Recovering vCenter Server after the major Outage


I have discussed this issue so many time with my colleagues and friends who experienced the situation where the whole Datacenter had an Outage due to X reason and then depending on the products running in the environment, it is getting difficult to restore everything back to normal.

The products which can add complexity in the restoration process can be Nexus 1000v, SRM, VMware VDS. Now let me emphasize here that there is nothing wrong with these products but to recover vCenter Server which is a Critical/Key component in the whole infrastructure.

I am going to explain few scenarios on how to recover vCenter Server and get everything up and running which may take time from minutes to few hours and sometimes a day or two (depends on the size and inventory of  the Datacenter).

1) If running VMware VDS, Nexus 1000v VDS

First of all one needs to find out where the vCenter Server was running lastly.

Now lets assume you are running fewer number of ESXi hosts in the cluster/Datacenter (lets say 1-20) then if you have the location documented if the vCenter was tied up on certain hosts only using the DRS rules then you can connect directly (assuming network connection is working or else we need to get that working first) to that particular ESXi host (using putty or DCUI and later using vSphere client) and find out if the vCenter Virtual Machine is still registered on that host or not. (If running WebClient then I will be covering that in an upcoming blog post).

vmware-cmd -l

The above command will tell us which VMs are still registered on the host and you can power on the vCenter Server through command line or from vSphere Client. Once logged in you can open the console of the vCenter and login using the local administrator account. Now here there is another assumption going here that if the vCenter Database is residing on another VM or a physical machine than there should be network connectivity available between the DB machine and vCenter Server.

Now lets take a situation where the management Network of ESXi host is configured on VDS (VMware) then if needed you need to restore the Standard Switch on the ESXi host so that atleast you can connect to the Management interface. Then you can try accessing vCenter Server from the console right on the vCenter server VM using localhost option and you can see all the inventory within the VC virtual machine. Once you power on all the ESXi hosts you should be able to see all the VMs and other inventory items.

If the VMs are showing inaccessible or orphaned then you can simply unregister the VMs and register them again. (This needs the information documented somewhere about the names of the virtual machines, network they are using / connected to, Datastores they are using or configured on).

If running Nexus 1000v then make sure both VSMs are powered on and can reach to vCenter Server and ESXi host/s. If needed you can register the VSMs on the same host as vCenter Server to make things easy. You can put them back on to the host where they belong as per the DRS rules (if any specified for Nexus
VSMs) set for them once everything is in working order.

2) Where you have large number of Hosts in the cluster (more than 20-50) so atleast you need the name of the Datastore where the vCenter Server files are residing. (Assuming no Storage vMotion occurred before the outage happened) so it will be a good practice to designate a specific datastore for vCenter Server virtual machine files so that it will be easy to just connect to that datastore using ssh from one of the ESXi host which is using as a shared datastore.

You can setup ssh to the host or use local ESXi shell option from DCUI and browse to the datastore and register the vCenter Server on that ESXi host.

vmware-cmd -s register .vmx


vim-cmd solo/reistervm /vmfs/volumes/datastore_name/VM_directory/VM_name.vmx

Once you see the vCenter entry in the inventory then just power it on. Make sure the vNIC is connected. Now if the VM was part of VDS then it must be connected to one of the dvPortGroup on the VDS which is not accessible as vCenter Service is not available yet. So you need to create a Standard vSwitch on the ESXi host where you registered the vCenter Server VM. Provide at least one uplink which can carry the same VALN traffic (if there was VLAN configured for the vCenter network) so get the connectivity. Now again you need to see if you have a spare NIC which you can use on VSS and if not (assuming all the NICs are used by VDS) then you just need to use one on VSS which was assigned to VDS before the outage.

3) Now assume you don't know the Datastore / ESXi host name where the vCenter Server was residing and running lastly and you have more than 50/100 hosts in the cluster. 

Here's comes the real part of this post so be patient and read on.

The question is how to find the vCenter Server virtual machine directory.

There are few methods you can go with

a) Run the PowerCLI script across all the ESXi hosts which definitely is a time consuming task as you need to connect to each host individually and run the command.

I can update the post here if someone comes with a PowerCLI one liner so please leave a comment.

b) If you are running SQL Database for vCenter Server database then you need to find out that VM first.
Login to the SQL VM (with local admin or using Domain Admin account (if DC/AD machine is available and accessible). Then login to the SQL Database using the Administrator Account. Run the following query against the VC Database.

First query only returns a host ID

select HOST_ID from VPX_VM where DNS_NAME like '%vcenter%'

You need to use the valued derived in the 2nd query. You just need to replace "vcenter' in the above query to the actual name of your vCenter Server VM name.

select * from VPX_ENTITY where ID='x'

The above will give you the result with the ESXi host where it was lastly registered and running.

Hopefully you can use the same query on Oracle Database as well but not sure so if someone is Oracle Expert then please leave a comment and I will modify the post with actual Query for Oracle database.

The above methods are having certain assumptions such as connectivity, login information to vCenter Server, SQL Server, ESXi host etc. etc. which are needed in the whole restore process.

4) Now the last situation where you dont know the name of the vCenter Server/ESXi Host/Datastore name then you just cross the fingers, pray to God and start digging for the VM on each and every direction possible and make a resolution first that you will DOCUMENT everything about your virtual Inventory going forward. Not joking here as seen instances like this too.

Let me know if you feel to add/update the existing information and I will be happy to do it. Just need your comment through any available medium.

Please share and care !!

Thanks for your time.

Wednesday, August 6, 2014

#Web-Scale Infrastructure by #Nutanix

virtualpatel.blogspot.com: Web-Scale Infrastructure by Nutanix: Hello, For last few weeks I was reading this term a lot on social media and kept wondering what exactly Web-Scale is. Nutanix has relea...

Web-Scale Infrastructure by Nutanix


For last few weeks I was reading this term a lot on social media and kept wondering what exactly Web-Scale is. Nutanix has released their Hyper converged solution 3 years back to meet the needs of varying industries and which can match variety of requirements too which can range from SMB to Mid Size, or from Enterprise level to Federal customers. Their NX- range of solutions are available to meet varying range of requirements.

Their main components which build up as part of Webscale can be pointed out as follows.

Hyper Convergence on x86 nodes which allows you the Freedom to scale as required. One server at a time.

Now for the organization who has budget for certain department but if its growing then to keep up with the growth, it gives the ability to add the Nutanix Node (server) one at a time. Now this is giving you freedom of not doing a large investment at the same time but you can spread it over period of time in future years to come. Accounting wise and Business Management perspective this is very crucial as you are not stuck with the Investment and ROI aspects of it. Depreciation also counts at some point but Finance Dept. will be happy as they dont have the deal with this kind of situation.

Intelligence - this is provided by their Nutanix's own kernel developed which will run on individual node known as CVM and two CVMs always in connection with each other all the time using the backend network connection. Even if one CVM goes down due to the underlying hardware failure or some other reasons, still the functionality wont be affected in any of the features and operation served by the nodes.
Everything will keep functioning as they supposed to work.

Distributed Architecture - This is another plus point where all the infrastructure related information is stored in the distributed fashion across the Nutanix nodes and such information can contain mostly all the items including data, metadata, operations occurring between virtual machines etc. etc. this information is getting stored within whole cluster so if one component fails still there wont be any information loss. By having this feature you can easily plan ahead and make the infrastructure more scalable without even worrying about the limitation. This scalability gives you the possibility to plan appropriately without wasting resources and time (eventually it involves this factor as we need time to setup/implement and to maintain it later on too) which is very crucial for any organization from small to large size.

Self-healing System : Nodes are built up in such a way that the failures of the components are not becoming responsible for system crash or operational impact of production workload. Fault Isolation and Automatic Recovery allows this facility which keeps the nodes active and keep running the overall system.

API based automation and Analytics : These two features are the pillars where automation in place and if one is using any system monitoring to achieve any data-driven efficiency and REST based programmable interface for the Datacenter management. This is a big one as going forward you can use the feature when working with various cloud based systems which can be Private Cloud, Public Cloud or Hybrid Cloud.

Multi-application support :  On Nutanix Nodes you can run various type of applications working as part of VDI, Big Data, Private Cloud, Enterprise Branch Office etc. etc.

And the following is an extract I have taken from one of the post on Nutanix

  • Web-scale system enables a non-disruptive approach to disruptive tasks, such as rolling or forklift upgrades, expandable clusters, always-on clusters, and all workflows always done online. Examples of Nutanix scalability include ability of adding and removing nodes dynamically, rolling upgrades of one node at a time ( different nodes can be at different version during the rolling upgrade), datastore availability when there is a disk failure and many other features. Nutanix Engineering is adding more features to improve resiliency and simplify workflows.

This allows any IT admin a breath of Relax and upgrades are becoming critical where there are SLAs to achieve which can range from 9999 to 99999 sometimes. Without any disruption you can upgrade or add / scale the Infrastructure which is very valuable to the any type of Industry working 24x7x365.


The above diagram is about CAP theorem (Consistency, Availability and Partition Tolerance) and you can see the clear relationship between the various components and what Nutanix is offering is pretty much aligns with each of them. The diagram again is a courtesy by Nutanix blogpost.

Now finishing here with a pic of the very nice T-shirt received from Nutanix and as its a Wednesday today so it will be a perfect day to Web-Scale it. ;-)

Go and use the Hyper-Convergence with ease using the Web-Scale Infrastructure offered by Nutanix !!

Please share and care !!

Thanks for your time.

Saturday, August 2, 2014

Some facts about vCenter Server Inventory Service

The Inventory Service uses some sort of XML database called xDB. Unfortunately, that database uses a flat file which ends in .log (which is stored on the filesystem of the VC) - during housekeeping, it is not uncommon for administrators to search for files that end in .log and to remove them.

The Inventory Service is used to search the VC inventory and it is used when checking the VC Service status. It was created to offload processing required to search inventory as the C# client hammers vpxd when searching.

By default, there are scripts that are installed on VC to backup and restore the IS database.

Under 5.0, the Inventory Service and vCenter Server both exist on the same machine. Under 5.1, vCenter Server and Inventory Service can exist on different machines.

After my digging, it appears a popular troubleshooting step is to reset the IS database. Our official documentation outlines how to do that:

http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.install.doc_50%2FGUID-EBB03FB7-F1AE-433C-A78D-A0E345EB0986.ht ml

The instructions to reset the IS database are also supplied in KB article below:

Logging in to the vCenter Server 5.0 Web Client fails with the error: unable to connect to vCenter Inventory Service http://kb.vmware.com/kb/2017750

Under 5.0, it appears you can reset the DB with no adverse affects as the information in the database will repopulate - although I'm not sure how long that takes or how that is done.

Under 5.1, it appears the IS database includes new inventory tagging information although I believe this feature is not well known or used very often so most customers probably do not have any tagging information in their IS database.

In 5.5 the behavior still needs to be verified. If anyone has seen this they can leave a comment and I will update the post with 5.5.

Thanks for reading and share please !!

Sharing is caring !