Thursday, December 17, 2015

vCenter Server Maintenance Best Practices




vCenter Server Maintenance Best Practices

I was trying to find best practices  for vCenter Server maintenance which should include everything and not just the database. Looking through online public documentation from VMware and other sources I found the following information and compiled in this post. If you feel anything is missing or needs an update then please reach out to me using any available channel and I will modify/update the post.

While you are at this, you should look at the Database Maintenance plan (if only you are not having anything setup currently) by visiting my friend and a blogger Erik Bussink's (@ErikBussinkblog post.

Now you can change the frequency depending on the requirements you might have so its not hard coded values posted here. Organization to organization the values may change.

Virtual Center Roles (Yearly)

• VirtualCenter Administrators: super users who have all privileges on all systems
• Virtual Machine Administrators: administrators on a subset of servers; can perform all operations on their servers, including VM provisioning, resource allocation and VMotion
• Virtual Machine User: access to a subset of VMs; can use remote console, perform power operations, view performance graphs, but cannot create/delete VMs, set resources or move VMs.
• Read-Only User: can only view information on a subset of VMs
• Privilege Management
• Administrators on the Windows system running the Management Server are automatically assigned VirtualCenter Administrator privileges
• VirtualCenter Administrators can delegate privileges to other users by accessing an existing ActiveDirectory or Domain Controller

Best Practices for Templates (Quarterly)

Virtual machine templates are very powerful and versatile. The following best practices, culled from many different areas of IT infrastructure management, will enable you to derive the most value from templates and avoid starting ineffective habits.

• Install Antivirus software and keep it up to date: In today’s world of viruses that are hyper efficient at exploitation and replication, an OS installation routine has to merely initialize the network subsystem to be vulnerable to attack. By deploying virtual machines with up to date antivirus protection, this exposure is limited. Keep the antivirus software current every month by converting the templates to VMs, powering on, and updating the signature files.

• Install the latest operating system patches, and stay current with the latest releases: Operating system vulnerabilities and out of date antivirus software can increase exposure to exploitation significantly, and current antivirus software isn’t enough to keep exposure to a minimum. When updating a templates antivirus software, apply any relevant OS patches and hotfixes.

• Use the template notes field to store update records: A good habit to get into is to keep information about the maintenance of the template in the template itself, and the Notes field is a great place to keep informal update records.

• Plan for ESX Server capacity for template management: The act of converting a template to virtual machine, powering it on, accessing the network to obtain updates, shutting down, and converting back to template requires available ESX Server resources. Make sure there are ample resources for this very important activity.

• Use a quarantined network connection for updating templates: The whole point of keeping antivirus and operating systems up to date is to avoid exploitation, so leverage the ability of ESX Server to segregate different kinds of network traffic and apply updates in a quarantined network.

• Use the same datastore for storing templates and for powered on templates: During the process of converting templates to virtual machines, do not deploy the template to another datastore. It is faster and more efficient to keep the template’s files in the same place before and after the update.

• Install the VMware Tools in the template: The VMware Tools include optimized drivers for the virtualized hardware components that use fewer physical host resources. Installing the VMware Tools in the template saves time and reduces the chance that a sub optimally configured virtual machine will be deployed to your production ESX Server infrastructure.

• Use a standardized naming convention for templates: Some inventory panel views do not offer you the opportunity to sort by type, so create a standard prefix for templates to help you intuitively identify them by sorting by name. Also, be sure to include enough descriptive information in the template name to know what is contained in the template.

• Defragment the guest OS filesystem before converting to template: Most operating system installation programs create a highly fragmented filesystem even before the system begins its useful life. Defragment the OS and convert to template, and that way you won’t have to worry about it again until the system has been in production for a while.

• Remove Nonpresent Hidden Devices from Templates: This problem will likely occur only if you about certain devices, notably network devices, even after they are removed from the system. Refer to Microsoft TechNet article 269155 for removal instructions

• Use Folders to Organize and Manage Templates: Folders can be both an organizational and security container. Use them to keep templates organized and secure.

• Create Active Directory groups that map to VirtualCenter roles: Rather than assign VirtualCenter roles to individual user accounts, create dedicated Active Directory groups, and place user accounts in those groups.

• Store templates on a shared VMFS volume on the SAN (dedicated LUN) and enable access to the SAN-based template volume from all ESX servers

• SAN templates may only be provisioned to target hosts connected to SAN

• The VC Mgmt Server’s local template repository can be used to provision VMs onto ESX Servers that are not connected to the SAN

• If template deployments to a LUN fail due to SCSI reservations, increase the “Scsi.ConflictRetries” parameter to a value of “10” through the Advanced Settings menu

Roll up Jobs (Quarterly) 

1.     Ensure that the jobs listed in this table are installed:

Note: For managing an Oracle vCenter Server database, you can use Oracle SQL Developer and for managing a DB2 vCenter Server database, you can use DB2 Control Center.


Rollup Job
Corresponding File
Event Task Cleanup myDB
job_cleanup_events_DB.sql
Past Day stats rollup myDB
job_schedule1_DB.sql
Past Month stats rollup myDB
job_schedule3_DB.sql
Past Week stats rollup myDB
job_schedule2_DB.sql
Process Performance Data myDB
job_dbm_performance_data_DB.sql
Property Bulletin Daily Update myDB

Note: This job only applies to vCenter Server 5.x
job_property_bulletin_DB.sql
Topn past day myDB
job_topn_past_day_DB.sql
Topn past month myDB
job_topn_past_month_DB.sql
Topn past week myDB
job_topn_past_week_DB.sql
Topn past year myDB
job_topn_past_year_DB.sql

2.     

where DB is db2mssql, or oracle.


Note: Ensure that myDB references the vCenter Server database and not the master or some other database. If these jobs reference any other database, you must delete and recreate the jobs. 


Stored Procedures (Quarterly

Verifying the stored procedures installed in vCenter 5.5 and 6.0
To check the stored procedures installed in vCenter Server 5.5 and 6.0 using MS SQL:
1.     Navigate to vCenter DB > Programmability > Stored Procedures.
2.     Ensure that the stored procedures listed in this table are installed:
Stored Procedure
Corresponding File
calc_topn1_proc
calc_topn1_proc_DB.sql
calc_topn2_proc
calc_topn2_proc_DB.sql
calc_topn3_proc
calc_topn3_proc_DB.sql
calc_topn4_proc
calc_topn4_proc_DB.sql
cleanup_events_tasks_proc
cleanup_events_DB.sql
clear_topn1_proc
clear_topn1_proc_DB.sql
clear_topn2_proc
clear_topn2_proc_DB.sql
clear_topn3_proc
clear_topn3_proc_DB.sql
clear_topn4_proc
clear_topn4_proc_DB.sql
delete_stats_proc
delete_stats_proc_DB.sql
insert_stats_proc
insert_stats_proc_DB.sql
l_purge_stat2_proc
l_purge_stat2_proc_DB.sql
l_purge_stat3_proc
l_purge_stat3_proc_DB.sql
l_stats_rollup1_proc
l_stats_rollup1_proc_DB.sql
l_stats_rollup2_proc
l_stats_rollup2_proc_DB.sql
l_stats_rollup3_proc
l_stats_rollup3_proc_DB.sql
load_stats_proc
load_stats_proc_DB.sql
load_usage_stats_proc
load_usage_stats_proc_DB.sql
process_license_snapshot_proc
process_license_snapshot_DB.sql
process_performance_data_proc
process_performance_data_DB.sql
purge_stat2_proc
purge_stat2_proc_DB.sql
purge_stat3_proc
purge_stat3_proc_DB.sql
purge_usage_stat_proc
purge_usage_stats_proc_DB.sql
rule_topn1_proc
rule_topn1_proc_DB.sql
rule_topn2_proc
rule_topn2_proc_DB.sql
rule_topn3_proc
rule_topn3_proc_DB.sql
rule_topn4_proc
rule_topn4_proc_DB.sql
stats_rollup1_proc
stats_rollup1_proc_DB.sql
stats_rollup2_proc
stats_rollup2_proc_DB.sql
stats_rollup3_proc
stats_rollup3_proc_DB.sql
upsert_last_event_proc
upsert_last_event_proc_DB.sql

Where, DB is db2mssql, or oracle.  
If any of these jobs or stored procedures are missing, you must install them by running the corresponding .sql file on the vCenter Server database using a database management tool such as SQL Management Studio. For more information on running these .sql files, see sectionAdding the SQL Server Agent Jobs in Updating rollup jobs after the error: Performance data is currently not available for this entity (1004382)

Notes

  • The upsert_last_event_proc procedure is not required for the Oracle database.
  • If there is a custom schema, the following command also needs to be ran:

    alter schema schema_name transfer dbo.stored_procedure_name
All SQL scripts are located in the vCenter Server installation folder:
  • vCenter Server 5.1 and 5.5: C:\Program Files\VMware\Infrastructure\VirtualCenter Server\sql.  
  • vCenter Server 6.0: C:\Program Files\VMware\vCenter Server\vpxd\sql
For more information on commonly used vCenter Server installation paths, see Common vCenter Server and vSphere Client Windows paths (1028185).


Ensure that the vCenter Server database is the target before executing the SQL file.


Growth of a Database (Quarterly)

Determining what is growing in the vCenter Server database

The vCenter Server database is a complex database and there are several areas that can cause problems. Out of the many tables in vCenter Server, there are very few which accumulate data during regular operation. These tables do accumulate data during regular operation:
  • vpx_hist_stat1 to vpx_hist_stat4 in vCenter Server 4.x and vpx_hist_stat1_n to vpx_hist_stat4_n in vCenter Server 5.x – These tables store the collected performance data information.
  • vpx_sample_time1 to vpx_sample_time4 – These tables store the reference time frames for the performance data in the vpx_hist_stat tables.
  • vpx_event and vpx_event_arg – These tables store the event information from the Tasks and Events tab in vCenter Server.
  • vpx_task – stores the task information from the Tasks and Events tab in vCenter Server.
This small subset of the tables in vCenter Server account for the majority of cases that are showing substantial growth in the database. If any other table is showing growth, file a support request with VMware Technical Support and note this KB Article ID in the Problem Description. For more information, see How to Submit a Support Request.
Microsoft SQL
If you are using Microsoft SQL, there are three ways to validate where space is being consumed within a Microsoft SQL database. Select one method.
  • From the SQL Management Studio interface, navigate to the database, right-click the table, and select Properties. See the Data space in the Storage section of the screen.
  • Manually run this SQL query against the vCenter Server database:

    select object_name(id) [Table Name], 
    [Table Size] = convert (varchar, dpages * 8 / 1024) + 'MB' 
    from sysindexes where indid in (0,1) 
    order by dpages desc

    This query lists all tables in the vCenter Server database by table size in MB.
  • Manually run this SQL query for individual tables:

    exec sp_spaceused tablename;

    See the data column of the output. For example:

    Note: Querying the database one table at a time may be time consuming. To query all tables simultaneously, use this SQL Query:
EXEC sp_MSforeachtable @command1="EXEC sp_spaceused '?'"

Growth of Transaction Logs (Quarterly)

vCenter Server Transaction log growth when using Microsoft SQL

The Transaction log records all transactions that occur on the database.
Depending on the recovery model that is set on the database, you may notice growth of the transaction log. The recovery model for the database can dramatically affect database growth for any database.
There are three different recovery models for Microsoft SQL:
  • Full Recovery Model

    This model logs all transactions, which makes full failure recovery possible. It provides the greatest amount of recovery potential in case of a failure that impacts the database, but it uses the most disk space of all of the models.
  • Bulk-Logged Recovery Model 

    This model logs all transactions except for certain large scale operations such as Index creation or bulk load operations. A full backup is typically performed after a large insert of information, but this model does not consume as much disk space.
  • Simple Recovery Model 

    This model logs all transactions, but after the transaction is complete, it is deleted. It uses the least amount of disk space of all the models, but it also offers the least amount of recovery. As such, regular full backups need to be taken.
By default, Microsoft SQL uses the full recovery model for the databases. Due to the large number of transactions with the vCenter Server database, VMware uses a warning during the installer that indicates the recovery model that is set in the database. For example:

Regardless of the recovery model, VMware recommends that you take regular backups of the database and that a truncate of the transaction log is performed at the same time as the backup. This regular maintenance prevents the size of the transaction logs from posing an issue to the amount of disk space available to the system. For more information on the transaction logs and how to shrink them, see:


Reducing the size of SQL Database (Quarterly) – to be verified with DMT
Reducing the size of the vCenter Server database
To reduce the size of the vCenter Server database:

Warning: This procedure erases all historical data. If you want to retain some historical performance data instead of deleting all of it, see Purging old data from the database used by vCenter Server (1025914) or Purging old data from the database used by VirtualCenter 2.x (1000125).


Note: The below steps are not applicable to vCenter Server 5.1 and 5.5. To truncate performance data on the vCenter Server 5.1 and 5.5 database see the sections, Truncating all performance data from vCenter Server 5.1.


To reduce the data perform these steps:
1.     If your database is Microsoft SQL Server, Oracle or PostgreSQL, obtain the vCenter Server database password.
For information, see the Obtain the vCenter Server database password section in this Knowledge Base article.
2.     Stop the vCenter Server service.

o    If you installed vCenter Server on a Windows machine: 


a.     Log in as an administrator to the Windows machine on which vCenter Server is installed.
b.     Navigate to Start > Administrative Tools > Services.
c.     Right-click VMware VirtualCenter Server and select Stop
3.     Back up the vCenter Server database. 

o    For MS SQL, see your database vendor's documentation.


4.     Run the script for your database.
As vCenter Server is installed on a Windows machine:
1.     Log in as an administrator to the Windows machine on which vCenter Server is installed.
2.     Locate the vcdb.properties file and open the file by using a text editor.
o    For vCenter Server 5.1 and 5.5, the file is located in the C:\ProgramData\VMware\VMware VirtualCenter\ folder.
o    For vCenter Server 6.0 the file is located in the C:\ProgramData\VMware\vCenterServer\cfg\vmware-vpx\ folder.
3.     In the vcdb.properties file, locate the password of the vCenter Server database user and record it.
For information, see the Run the script for your database section in this KB article.
                The scripts contain three main parameters:

         For Microsoft SQL Server:


1.     Log in to the Microsoft SQL Server machine as an administrator.
2.     Download and save the 2110031_MS_SQL_task_event_task.sql script attached to this Knowledge Base article.
3.     Open the command prompt and run the script:
sqlcmd -S IP-address-or-FQDN-of-the-database-machine\instance_name -U vCenter-Server-database-user -P password-d database-name -v TaskMaxAgeInDays=task-days -v EventMaxAgeInDays=event-days -v StatMaxAgeInDays=stat-days -i download-path\2110031_MS_SQL_task_event_stat.sql
  • TaskMaxAgeInDays
All tasks older than TaskMaxAgeInDays day are deleted.
  • EventMaxAgeInDays
All events older than EventMaxAgeInDays day are deleted.
  • StatMaxAgeInDays
All statistics older than StatMaxAgeInDays day are deleted.
The possible values for all of the parameters are:
-1
Skips the respective historical data deletion. For example, TaskMaxAgeInDays = -1, means that no task records will be deleted.
0
Deletes all historical data for the respective component. For example, TaskMaxAgeInDays = 0, deletes all task records.
1 and more
Deletes data older than the number you enter, in days. For example, TaskMaxAgeInDays = 10, leaves the task records gathered within the last 10 days and deletes all of the records gathered before that.

Rebuilding indexes (Quarterly)

To rebuild the vCenter Server database indexes:
1.     Download and extract the .sql files from the 2009918_rebuild.zip file attached to this article. 

Note: For a vCenter Server 5.1 and 5.5 databases, download and extract the .sql files from the 2009918_rebuild_51.zip file attached to this article.

2.     Connect to the vCenter Server database, for example using Management Studio for SQL Server or SQL*Plus for Oracle.
3.     Execute the .sql file to create the REBUILD_INDEX stored procedure:

o    Oracle: rebuild_indexes_oracle.sql or rebuild_indexes_oracle_51.sql
o    SQL Server: rebuild_indexes_sql.sql or rebuild_indexes_sql_51.sql


4.     Execute the stored procedure for either Oracle or SQL Server that was created in the previous step:

execute REBUILD_INDEX


Backup of vCenter SSL Certificates (Yearly)
· Windows 2003: %ALLUSERSPROFILE%\Application Data\VMware\VMware VirtualCenter
· Windows Vista and 2008 Server: %ALLUSERSPROFILE%\VMWare\VMware VirtualCenter


Windows OS Patches (Monthly)
Windows Critical and Security Patches are installed after taking the snapshot of the Virtual machine.
Test all the functionalities of vCenter Server machine and see if any other impact occurred due to the recent patches. If yes, then roll back the patches installed recently and or uninstall them from Add Remove Programs.
Once the verification is finished, and all the application/s runs without an issue then remove the Snapshot from the vCenter Server.
Change Request is required to put the patches and having maintenance on the vCenter Server.

Upgrade to the new version release with same build or patch/es (3-6 months)


Every six months check the VMware’s web site for any new updates available for vCenter Server and download them and have a maintenance window with a Change request to put those patch/es or update/s.

Major version upgrade is not discussed here as it requires the whole environment upgrade.
As usual take the snapshot of the virtual machine and also make sure that a full backup of vCenter is taken along with necessary database/s before applying the patch/update.
One can verify on the download site for any critical issue which got resolved by VMware from security stand point or other critical areas which are flagged as Bug with the product, then, it’s very important to apply such patch/es or update/s with an Emergency Change Request to avoid any impact on the existing environment e.g. SSL issue or some other security flaw/s.
Keep checking http://kb.vmware.com/kb/ and http://blogs.vmware.com/ site frequently and look for recently modified/created articles for vCenter Server which will have similar information about any known issue/s which got taken care with a single patch/update or multiple patches/updates.

vCenter Server Service Restart (as needed)
Sometimes due to the nature of the issues, you may just restart the vCenter Server Service rather than restarting the whole vCenter Server. In such cases, please make sure that you verify all the other dependent application/s, service/s and other components which are heavily rely and integrated to the vCenter Server service and to avoid any disruption to the existing production workload in Private Cloud, you need to inform the necessary stake holders about the possible impact on the functionalities and availability of services offered by whole vRealize suite and vCenter Server combination. This should be done only with Change Request during after hours.
For more information refer http://kb.vmware.com/kb/1003895

Resource Availability on ESXi  (upon scheduling maintenance)



Make sure we have enough resources on ESXi where the vCenter Server will be powering on and running (in case disaster occurred and the vCenter Server was shut down).
Verify if any resource pools or memory/cpu reservation configured at the cluster level, then make sure such resources are available to the vCenter Server virtual machine.

vCenter Server and vRealize Automation Center portal (as and when required)
During the maintenance of vCenter Server the console to the virtual machines are not available and also the Access to the vRA portal (if you are using vRA) is not available to deploy any of the virtual machine


Changing IP address of Vcenter Server virtual machine(as and when required)

Whenever you need ot change the IP address of the vCenter Server please proceed very carefully as so many other components are involved and dependent on vCenter Server e.g. Plugins, vRA, vRO etc.


First of all, create backups of the vCenter Server VM and underlying SQL database.


·         Set DRS to manual mode to avoid anything moving around (optional as it depends on the configuration in place at Cluster level).
·         Identified the ESXi host running the vCenter VM and connected directly to the host with the vSphere Client.
·         Close any sessions you have open to the vCenter Server (Web Client, vSphere Client, etc.)
·         Open a console window to the vCenter Server by way of the ESXi host.
·         Stop all VMware services.
·         Changed the IPv4 address and IPv4 gateway.
·         Ping the new Default Gateway for Test and also try pinging other ESXi hosts in the same cluster and other clusters, do some more ping test with other virtual machine in same Datacenter and other Datacenter as well and to the Internet for making sure end to end connectivity.
·         Restarted the vCenter Server.
·         Put DRS back to fully automated (optional based on your setup)


Changing FQDN of vCenter Server virtual machine(as and when required)

The same recommendation as of changing the IP address will apply for FQDN of the vCenter Server virtual machine. Also make sure the Active Directory is reflecting the new name and all the DNS records get updated upon the change. Check with the Server Team if needs any help on updating the DNS records.

Plugins and other Extensions of vCenter Server (as and when required)



Check and verify all the plugin operations e.g. Storage plugin or Update Manager or External Backup solution plugin or #vRealize Orchestrator as and when vCenter Server got patched/updated/upgraded from one version to another. Major version always requires to update the plugins and then test the functionality.

Backup of vCenter server (monthly)

Make sure the Backup is done and includes a full back up of the vCenter Virtual machine each month. Verify with Backup Team about the same.

1)      In case of troubleshooting / maintenance only you can try restarting vCenter Service first and see if the issue gets resolved or not and if not then you can shut down the service and restart the virtual machine

For more information please refer

2)      Make sure you have enough resources on the ESXi host in case of Disaster occurred and the vCenter server was shut down
3)      For Patching you need to have a maintenance window scheduled with a proper Change Request request and then patch the VM with specific OS related critical and important patches

4)      For Database maintenance refer the following KB and make sure you have the Full Backup done before you proceed
5)      For purging old Data from the vCenter Database please refer to http://kb.vmware.com/kb/1025914
6)      During the maintenance of vCenter Server the console to the virtual machines are not available and also the Access to the vRA portal (if using to deploy VMs) is not available to deploy any of the virtual machine
7)      Whenever you need to change the IP address of the vCenter Server please proceed very carefully as so many other components are involved and dependent on vCenter Server e.g. Plugins, vRA, vRO etc.

First of all, create backups of the vCenter Server VM and underlying SQL database.
1.    Set DRS to manual mode to avoid anything moving around (optional as it depends on the configuration in place at Cluster level).
2.    Identified the ESXi host running the vCenter VM and connected directly to the host with the vSphere Client.
3.    Close any sessions you have open to the vCenter Server (Web Client, vSphere Client, RDP etc.)
4.    Open a console window to the vCenter Server by way of the ESXi host.
5.    Stop all VMware services.
6.    Changed the IPv4 address and IPv4 gateway.
7.    Ping the new Default Gateway for Test and also try pinging other ESXi hosts in the same cluster and other clusters, do some more ping test with other virtual machine in same Datacenter and other Datacenter as well and to the Internet for making sure end to end connectivity.
8.    Restart the vCenter Server.
9.    Put DRS back to fully automated (optional based on your setup)
8)      The same recommendation as per Item #7 goes for changing the FQDN of the vCenter Server virtual machine. Also make sure the Active Directory is reflecting the new name and all the DNS records get updated upon the change.
9)      Check all the Templates’ registration and do a test deployment of a virtual machine from the template/s to see if any error occurs
10)   Check and verify all the plugin operations e.g. Storage plugin or Update Manager or External Backup solution plugin or #vRealize Orchestrator
11)    Make sure the Backup is done and includes a full back up of the vCenter Virtual machine each month. Verify with Backup Team about the same.
Hope you will find the above useful in your environment and as vCenter Server is a crucial core components, necessary steps needs to be taken to make sure it runs smoothly without any issues.

I have not included anything about VCSA but you can use the same information for Windows based vCenter and just replace the services portion accordingly and add ssh to the connection list.

If you have any Feedback then do let me know please. 

Please share and care ! 

Enjoy !!


No comments:

Post a Comment