Sunday, May 25, 2014

EMC VNXe 5100 and Checkpoint Folders on VMFS volumes


Came across an interesting situation where the virtual machine stopped responding as the External Backup application tried to take the snapshot and the storage ran out of disk space

Infrastructure details

ESXi: 5.x

VM: Windows 2Kx OS

Storage: EMC VNXe 5100

4 HDDs are mounted on the VM which are coming from separate individual VMFS datastore.

All 4 VMDKs consumed the overall space of 4 TB in total.

The Datastore summary states the total size is 18.7 TB consumed out of 28.25 TB provisioned. This is the only VM using those 4 LUNs for each VMDK.

So the question is where is the additional space of 14TB lost apart from 4 TB used by the VM??

Here is the output of du -h for the VM and /vmfs/volumes

/vmfs/volumes/50cc5dd8-555143cbd-44d2-ac162d783818/SVRAP01 # du -h
332.6G  .
/vmfs/volumes/50cc5dd8-555143cbd-87d5-ac162d783818/SVRAP01 # du -h
3.7T    .

du -h
8.0K    ./.ckpt_group.vmware_21_sg_443.fs.13/lost+found
272.0K  ./.ckpt_group.vmware_21_sg_443.fs.13/.etc
3.7T    ./.ckpt_group.vmware_21_sg_443.fs.13/SVRAP01
8.0K    ./.ckpt_group.vmware_21_sg_443.fs.13/.vSphere-HA
3.7T    ./.ckpt_group.vmware_21_sg_443.fs.13
8.0K    ./.ckpt_group.vmware_21_sg_441.fs.13/lost+found
272.0K  ./.ckpt_group.vmware_21_sg_441.fs.13/.etc
3.7T    ./.ckpt_group.vmware_21_sg_441.fs.13/SVRAP01
8.0K    ./.ckpt_group.vmware_21_sg_441.fs.13/.vSphere-HA
3.7T    ./.ckpt_group.vmware_21_sg_441.fs.13
8.0K    ./.ckpt_root_rep_ckpt_51_832916_2/lost+found
272.0K  ./.ckpt_root_rep_ckpt_51_832916_2/.etc
3.7T    ./.ckpt_root_rep_ckpt_51_832916_2/SVRAP01
8.0K    ./.ckpt_root_rep_ckpt_51_832916_2/.vSphere-HA
3.7T    ./.ckpt_root_rep_ckpt_51_832916_2
8.0K    ./.ckpt_root_rep_ckpt_51_832916_1/lost+found
272.0K  ./.ckpt_root_rep_ckpt_51_832916_1/.etc
3.7T    ./.ckpt_root_rep_ckpt_51_832916_1/SVRAP01
8.0K    ./.ckpt_root_rep_ckpt_51_832916_1/.vSphere-HA
3.7T    ./.ckpt_root_rep_ckpt_51_832916_1
8.0K    ./lost+found
272.0K  ./.etc
3.7T    ./SVRAP01
8.0K    ./.vSphere-HA
18.7T   .

So as you can see the overall space used on the VM is approx 4 TB and at the datastore level is 18.7 TB.

After doing more research found that the these files are created during EMC SAN root replication to preserve a replication pair which had no prior replication relationship. So technically this includes all 4 LUNs used by the virtual machine.

Found few articles here (EMC community) and here (By Justin Paul) found the culprit as well for the Datastore space consumption and how to reclaim it.

As a work around we storage vMotioned smaller VMDK to another datastore and now the VM can be powered on and the users can work with it.

Plan recommended to contact EMC support for better guidance on how to reclaim that space properly without losing any data.

Hope this helps to find out the lost space on the VMFS datastore !

Share and care please !!

Sunday, May 11, 2014

Virtual Disk Service entered into stopped State and Sophos AV


Recently I was part of the discussion where the following infrastructure was involved.

ESXi 5.1.0

VM - Windows 2008 R2

User was  getting the error under Event Viewer inside the Guest Operating System

"Virtual disk service entered into a Stopped State"'

Now operation impact was none for the VM and multiple VMs are in the same situation.

No errors logged under the vmware.log of the virtual machine either. VMkernel log has no entries for such events so this led the focus more inside the Guest OS of the VM.

Upon investigating the issue further, found that user was running Sophos AV inside the virtual machine at the Guest Operating System Level. Unsure about the exact version of AV,  but once we stopped the service/application within the Guest Operating System then we stopped getting the error any more under the Event viewer.

Found a thread which was discussing about the version 7.6.18 (Updated 4/15/2010) which led towards the investigation at the Guest OS Level.

Hope this helps other users who are encountering the same issue.

Please share and care !!