November 07, 2012

VMware vSphere data backup and recovery

vStorage APIs

Perhaps the biggest benefit to backups and storage in vSphere is the new vStorage APIs that VMware developed. These APIs allow third-party applications to directly interface with the VMkernel without the need for scripts or agents. The vStorage APIs existed in VI3, but were referred to as the VMware Consolidated Backup (VCB) Backup Framework. However, unlike VMware Consolidated Backup, they are not a separate standalone application and built directly into ESX(i), and require no additional software installation. While the VCB Backup Framework still exists in vSphere and can also be used by backup applications, the vStorage APIs are the successor to VCB and will eventually completely replace it. The vStorage APIs are broken into four groups that have different types of functionality as listed below:
  • vStorage APIs for array integration: Currently being developed with specific third-party storage vendors (i.e., EMC Corp., Hewlett-Packard [HP] Co. and NetApp), these APIs will allow vendors to leverage their storage array-based capabilities directly from vSphere. This includes things such as array-based snapshots, hardware offloaded storage device locking, integration between VMware and array-level thin provisioning, storage provisioning, replication, and more. This will enable vSphere to act more efficiently for some storage-related operations by allowing the storage array to perform certain operations.
  • vStorage APIs for multi-pathing: These APIs enable third-party storage vendors to leverage array multipathing functionality through plug-ins that they can develop. These plug-ins allow for more intelligent storage multipathing to achieve better storage I/O throughput and storage path failover for a specific vendor storage array.
  • vStorage APIs for Site Recovery Manager: These APIs are part of VMware's Site Recovery Manager (SRM) and are used to integrate SRM with array-based replication for block and network-attached storage (NAS) models. This allows SRM to seamlessly handle both virtual machine, host failover and storage replication failover, and also enables SRM to control the underlying array-based replication that it relies on.
  • vStorage APIs for Data Protection: These APIs are the ones that are very important to third-party backup and replication vendors as they enable better and more seamless integration to virtual machines disks. While designed to be the successor to the VCB, they include the functionality that was available in VCB and also added new functionality such as Changed Block Tracking (CBT) and the ability to directly interact with the contents of Virtual Disks via the VDDK.
The vStorage APIs are not really a single API and the term is basically just a name for a collection of interfaces that can be utilized by third-party applications to interact with storage devices in vSphere. These interfaces consist of various SDKs that exist in vSphere and also their Virtual Disk Development Kit (VDDK). The VDDK is a combination API and SDK that enables vendors to develop applications that create and access virtual disk storage in. The VDDK is used in conjunction with other vStorage APIs to offer a complete integrated solution for management of storage in vSphere. For example, while VM snapshots can be managed using the SDK functionality, other operations like mounting virtual disks are handled through the VDDK.

Changed Block Tracking is supported on any storage device and datastore in vSphere except for physical mode Raw Device Mappings, this includes iSCSI, VMFS, NFS and local disks. It also works with both thin and thick disk types. CBT is a new feature to vSphere, so it does require that the virtual machine hardware be version 7, which is the default in vSphere. By default, CBT is disabled as there is a very slight performance penalty that occurs when using it. It can be enabled on select VMs by adding parameters (ctkEnabled=true and scsi#:#.ctkEnabled=true) to the configuration file of the virtual machine, backups applications can also enable it using the SDKs. Once enabled, a VM must go through something called a stun/unstun cycle for it to take effect; this cycle happens during certain VM operations including power on/off, suspend/resume and create/delete snapshot. During this cycle, a VM's disk is re-opened, which allows a change tracking filter to be inserted into the storage stack for that VM.

The Changed Block Tracking feature stores information about changed blocks in a special "-ctk.vmdk" file that is created in each VM's home directory. This file is fixed length and does not grow and the size will vary based on the size of a virtual disk (approximately .5 MB per 10 GB of virtual disk size). Inside this file the state of each block is stored for tracking purposes using sequence numbers that can tell applications if a block has changed or not. One of these files will exist for each virtual disk that CBT is enabled on.

The vStorage APIs for Data Protection and the CBT feature make backups quicker and easier in vSphere and are a big improvement over VCB. VMware has provided third-party vendors with a much improved backup interface in vSphere, now it's up to them to adapt their products to take advantage of them.

Thin provisioning and backups

Thin provisioned disks are virtual disks that start small and grow as data is written to them. Unlike thick disks where are all space is allocated at the time of disk creation, when a thin disk is created its initial size is 1 MB, (or up to 8 MB depending on the default block size) and it then grows up to the maximum size that was defined when it was created as data is written to it by the guest OS. The benefit of thin provisioned disks is that they allow for the over-allocation of storage on a VMFS volume to make use of the often wasted unused space inside of a VM's disk. Thin provisioned disks are not new to vSphere and also existed in VI3, however, there were numerous changes to make them more usable in vSphere.

Why are thin disks important to backups? Many backup applications for virtualization do not operate inside the guest operating system and operate outside of it at the virtualization layer. Instead of backing up individual files inside the guest OS, they back up the single large virtual disk files (vmdk) that contain the encapsulated VM. Because of this, backup applications must search for empty disk blocks contained inside the virtual disk file so they do not back them up. This process of identifying empty blocks takes additional time and resources to complete. With thick disks all space is allocated at once, so a 40 GB virtual disk will actually take up 40 GB of disk space on a datastore regardless of how much space is used by the guest OS running on it. So, if only 10 GB of disk space is in actual use by the guest OS you will want to avoid backing up the extra 30 GB of empty space inside the virtual disk file.

Thin disks only take up as much space on a datastore as what is actually used by the guest OS, so if only 10 GB of a 40 GB virtual disk is in use the virtual disk file will only be 10 GB in size. Because of this, backup applications no longer have to worry about searching for those empty disk blocks because there are none in a thin disk. Not having to do this results in faster and more efficient backups which is just one of the advantages of using thin disks.

Hot-add of virtual disks

The hot-add of virtual disks feature allows a virtual machine to mount the disk of another virtual machine while it is running so it can be backed up. This is similar to what was first introduced in VCB where a virtual disk can be mounted by another server to be backed up. The hot-add feature in vSphere allows one virtual machine running a backup application to mount the disk of another so it can read the data from it and write it to destination media. Doing this removes the backup traffic from the network as the VM running the backup application uses the VDDK to access the disk and all I/O requests to it are sent directly down the VMkernel I/O path.

The hot-add feature works by taking a snapshot of the virtual disk that deflects writes to a separate delta file. Once this is complete, the now read-only disk can be mounted by another VM so the data can be copied from it. Hot-add takes advantage of the SCSI specification that allows for SCSI devices to be added/removed from a server without powering it down. It works with disks on any type of storage supported by vSphere as long as the VM running the backup application is on a host that can access the storage of the target VM (i.e., shared storage). However, it does not work with VMs that have IDE virtual disks that are now supported in vSphere.

Several data backup applications have already taken advantage of the hot-add feature including VMware Data Recovery and Veeam Backup and Replication. The use of the hot-add feature is not available in all editions of vSphere and requires the more costly Advanced, Enterprise and Enterprise Plus editions.

iSCSI improvements

VMware made significant improvements to the iSCSI storage protocol in vSphere that resulted in increased performance and greater efficiency of virtual machines on iSCSI datastores. This is also beneficial to backup applications as the increased efficiencies with the iSCSI protocol are a direct benefit to heavy disk I/O operations that occur during virtual machine backups. The improvements to iSCSI in vSphere included the following:

In vSphere, VMware made significant updates in iSCSI for both software and hardware initiators. The software initiator that is built into ESX was completely rewritten, tuned and optimized for virtualization I/O. The result of these efforts includes a marked improvement to performance as well as greater CPU efficiency which resulted in a significant CPU usage reduction when using software initiators.
Support for Jumbo Frames was introduced in VI 3.5, but was not officially supported for use with storage protocols. With vSphere, VMware officially supports the use of Jumbo Frames with the iSCSI and NFS storage protocols. In addition, they now support 10 Gb Ethernet with iSCSI that results in much greater I/O throughput.
Easier provisioning of iSCSI storage due to the iSCSI stack no longer requiring a Service Console connection to communicate with an iSCSI target. Configuration steps for iSCSI have been made easier and global configuration settings will now propagate down to all targets. Additionally bi-directional CHAP authentication is now supported for increased security.

These improvements make the use of iSCSI a more attractive choice over the more expensive Fibre Channel storage area network (SAN) for either virtual machine datastores or backup targets.

VMware Data Recovery

VMware introduced VMware Data Recovery (VDR) in vSphere that is a disk-to-disk backup application developed by VMware to provide basic backup capabilities natively in vSphere. VMware Data Recovery provides an alternative method for backing up virtual machines instead of the traditional OS agent methods that are used in physical environments. While not as feature rich as some of the other third-party backup applications, it does provide some advanced features such as inline data deduplication and compression, and a centralized management console that is integrated into the vSphere Client. In addition, VDR takes full advantage of the new features in vSphere such as Changed Block Tracking and hot-add of disk to ensure more efficient and faster backups. VDR is available as part of the Essentials Plus, Advanced, Enterprise and Enterprise Plus editions, or can be purchased a la carte with the Standard edition.

All the new and improved features that I covered make upgrading to vSphere very compelling as backup and recovery is much improved. The vStorage APIs provide much better integration for third-party backup applications and enable vendors to develop more efficient products to safeguard virtual machine data. If you have been putting off upgrading to vSphere, these new data backup-related features, along with the many other great and improved features in other areas, may persuade you to upgrade.

Changed Block Tracking

The vStorage APIs for Data Protection are most beneficial to backup and replication applications and vendors seem to be most excited about the new Changed Block Tracking feature that is included in it. This feature allows third-party applications to query the VMkernel to find out which disk blocks have changed in a virtual machines disk file since the last backup operation. Without this feature, applications would have to figure this out on their own which can be time-consuming. Now with CBT they can instantly find this out so they know exactly which disk blocks need to be backed up. This enables much faster incremental backups and also allows for near continuous data protection (CDP) when replicating virtual disk files to other locations. In addition, point-in-time restore operations are much quicker as CBT can tell exactly which disk blocks need to be restored to the virtual machine.

No comments:

Post a Comment