What Is Live Migration?
Live Migration is the equivalent of vMotion. The purpose of this feature is to move virtual machines from one location to another without any downtime. Well, that’s the perception of Live Migration and vMotion. As anyone who as ever used these features in a lab will know, there is actually some downtime when vMotion or Live Migration are used. A better definition would be: Live Migration (or vMotion) allows you to move virtual machines without losing service availability. That’s a very subtle difference in definitions, which we will explain later on in this article.
The purpose of Live Migration is flexibility. Virtual machines are abstracted from the hardware on which they run. This flexibility allows us to match our virtual machines to our resources and to replace hardware more easily. It makes IT and the business more agile and response – all without impacting on the operations of the business.
Back to Basics
Often there is confusion between Live Migration and high availability (HA). This is due to the fact that Live Migration (and vMotion) historically required a host cluster with shared storage. But things have changed, and it’s important to understand the differences between Live Migration and HA.
Live Migration is a proactive operation. Maybe an administrator wants to power down a host and is draining it of virtual machines. The process moves the virtual machines, over a designated Live Migration network, with no drop in services availability. Maybe System Center wants to load balance virtual machines (VMM Dynamic Optimization). Live Migration is a planned and preventative action – virtual machines move with no downtime to service availability.
High availability, on the other hand, is reactive and unplanned. HA is the function of failover clustering in the Windows Server world. Hosts are clustered and virtual machines are marked as being highly available. Those virtual machines are stored on some shared storage, such as a SAN, a shared Storage Pool, or a common SMB 3.0 share. If a host fails, all of the virtual machines there were running on it stop. The other hosts in the cluster detect the failure via failed heartbeats. The remaining hosts failover the virtual machines that were on the now dead host. Those failed over virtual machines automatically power up. You’ll note that there is downtime.
Read those two paragraphs again. There was no mention of failover clustering when Live Migration was discussed as a planned operation. Windows Server 2012 Hyper-V Live Migration does not require failover clustering: You can do Live Migration without the presence of a cluster. However, HA is the reason that failover clustering exists.
Promises of Live Migration
There are two very important promises made by Microsoft when it comes to Live Migration:
- The virtual machine will remain running no matter what happens. Hyper-V Live Migration does not burn bridges. The source copy of a virtual machine and its files remain where they are until a move is completed and verified. If something goes wrong during the move, the virtual machine will remain running in the source location. Those who stress-tested Live Migration in the beta of Windows Server 2012 witnessed how this worked. It is reassuring to know that you can move mission critical workloads without risk to service uptime.
- No new features will prevent Live Migration. Microsoft understands the importance of flexibility. All new features will be designed and implemented to allow Live Migration. Examples of features that have caused movement restrictions on other platforms are Single Root IO Virtualization (SR-IOV) and virtual fiber channel. There are no such restrictions with Hyper-V – you can quite happily move Hyper-V virtual machines with every feature enabled.
Live Migration Changes in Windows Server 2012 Hyper-V
Windows Server 2012 features a number of major changes to Live Migration, some of which shook up the virtualization industry when they were first announced.
- Performance enhancements: Some changes were made to the memory synchronization algorithm to reduce page copies from the source host to the destination host.
- Simultaneous Live Migration: You can perform multiple simultaneous Live Migrations across a network between two hosts, with no arbitrary limits.
- Live Migration Queuing: A clustered host can queue up lots of Live Migrations so that virtual machines can take it in turn to move.
- Storage Live Migration: We can move the files (all or some) of a virtual machine without affecting the availability of services provided by that virtual machine.
- SMB 3.0 and Live Migration: The new Windows Server shared folder storage system is supported as shared storage for Live Migration with or without a Hyper-V cluster.
- Shared Nothing Live Migration: We can move virtual machines between two non-clustered hosts, between a non-clustered host and a clustered host, and between two clustered hosts.
Performance Enhancements
Let’s discuss how Live Migration worked in Windows Server 2008 R2 Hyper-V before we look at how the algorithm was tuned. Say a virtual machine, VM01, is running on HostA. We decide we want to move the virtual machine to HostB via Live Migration. The process will work as follows:
- Hyper-V will create a copy of VM01’s specification and configure dependencies on HostB.
- The memory of VM01 is divided up into a bitmap that tracks changes to the pages. Each page was copied from the first to the last from HostA to HostB. Each page was marked as clean after it was copied.
- The virtual machine is running so memory is changing. Each changed page is marked as dirty in the bitmap. Live Migration will copy the dirty pages again, marking them clean after the copy. The virtual machine is still running, so some of the pages will change again and be marke as dirty. The dirty copy process will repeat until (a) it has been done 10 times or (b) there is almost nothing left to copy.
- What remains of the VM01 that has not been copied to HostB is referred to as the state. At this point VM01 is paused on HostA.
- The state is copied from HostA to HostB, thus completing the virtual machine copy.
- VM01 is resumed on HostB.
- If VM01 runs successfully on HostB then all trace of it is removed from HostA.
This process moves the memory and processor of the virtual machine from HostA to HostB, both in the same host cluster. The files of the virtual machine are on some shared storage (a SAN in Windows Server 2008 R2) that is used by the cluster.
It is between the pause in step 4 and the resume in step 6 that the virtual machine is actually offline. This is where a ping test drops a packet. Ping is a tool based on the ICMP diagnostic protocol. Ping is designed to find latency. That’s exactly what happens when that ping fails to respond during Live Migration or vMotion. The virtual machine is briefly unavailable. Most applications are based on more tolerant protocols which will allow servers several seconds to respond. Both vMotion and Live Migration take advantage of that during the switch over of the virtual machine from the source to the destination host. That means your end users can be reading the email, using the CRM client, or connected to a Citrix XenApp server, and they might not notice anything other than a slight dip in performance for a second or two. That’s a very small price for a business-friendly feature like Live Migration or vMotion.
Aside from the cluster requirement, the other big change in this process in Windows Server 2012 is that the first memory copy from HostA to HostB has been tuned to reflect memory activity. The initial page copy is prioritized, with least used memory being copied first, and the most recently used memory being copied last. This should lead to fewer copy iterations and faster Live Migration of individual virtual machines.
Simultaneous Live Migration
In Windows Server 2008 R2, we could only perform one simultaneous Live Migration between any two hosts within a cluster. With host capacities growing (up to 4 TB RAM and 1,024 VMs on a host) we need to be able to move virtual machines more quickly. Imagine how long it would take to drain a host with 256 GB RAM over a 1 GbE link! Hosts of this capacity (or greater) should use 10 GbE networking for the Live Migration network. Windows Server 2008 R2 couldn’t make full use of this bandwidth – but Windows Server 2012 can. Combined with simultaneous Live Migration, Hyper-V can move lots of virtual machines very quickly, taking advantage of 10 Gbps, 40 Gbps, or even 56 Gbps networking! This makes large data center operations happen very quickly.
The default number of simultaneous Live Migrations is two, as you can see in the below screenshot. You can tune the host based on its capabilities. Running too many Live Migrations at once is expensive; not only does it consume the bandwidth of the Live Migration network (which might be converged with other networks) but it also consumes resources on the source and destination hosts. Don’t worry – Hyper-V will protect you from yourself. Hyper-V will only perform the number of concurrent Live Migrations that it can successfully do.
A common question is this: My source host is configured to allow 20 concurrent Live Migrations and my destination host will allow five. How many Live Migrations will be done? The answer is simple: Hyper-V will respect every host’s maximum, so only five Live Migrations will happen at once between these two hosts.
You might also notice in the above screenshot that Storage (Live) Migration also has a concurrency limit, which defaults to two.
Live Migration Queuing
Imagine you have a cluster with two nodes, HostA and HostB. Both nodes are configured to allow ten simultaneous Live Migrations. HostA is running 100 virtual machines and you want to place this host in maintenance mode. Failover Cluster manager will orchestrate the Live Migration of the virtual machines. All virtual machines will queue up, and up to ten (depending on host resources) will live migrate at the same time. As virtual machines leave HostA, other virtual machines will start to live migrate, and eventually all of the virtual machines will be running on HostB.
Storage Live Migration
A much sought-after feature for Hyper-V was the ability to relocate the files of a virtual machine without affecting service uptime. This is what Storage Live Migration gives us. The tricky bit is moving the active virtual hard disks because they are being updates. Here is how Microsoft made the process work:
- The running virtual machine is using its virtual hard disk which is stored on the source device.
- An administrator decides to move the virtual machine’s files and Hyper-V starts to copy the virtual hard disk to the destination device.
- The IO for the virtual hard disk continues as normal but now it is mirrored to the copy that is being built up in the destination device.
- Live Migration has a promise to live up to; the new virtual hard disk is verified as successfully copied
- Finally the files of the virtual machine can be removed from the source device
Storage Live Migration can move all of the files of a virtual machine as follows:
- From on folder to another on the same volume
- To another drive
- From one storage device to another, such as from a local drive to an SMB 3.0 share
- You can move files from one server to another
When using Storage Live Migration, you can choose to:
- Move all files into a single folder for the virtual machine
- Choose to only move some files
- Scatter the various files of a virtual machine to different specified locations
SMB 3.0 and Live Migration
Windows Server 2012 introduces SMB 3.0 – an economic, continuously available, and scalable storage strategy that is supported by Windows Server 2012 Hyper-V. Live Migration supports storing virtual machines on SMB 3.0 shared storage. This means that a virtual machine can be running on HostA and be quickly moved to run on HostB, without moving the files of the virtual machine. Scenarios include a failover cluster of hosts using a common SMB 3.0 share and a collection of non-clustered hosts that have access to a common SMB 3.0 share.
Shared-Nothing Live Migration
Thanks to Shared-Nothing Live Migration we can move virtual machines between any two Windows Server 2012 Hyper-V hosts that do not have any shared storage. This means we can move virtual machines:
- Move the virtual machine that is stored on the local drive of a non-clustered host to another non-clustered host, and store the files on the destination host’s storage.
- From a non-clustered host to a clustered host, with the files placed on the cluster’s shared storage. Then we can make the virtual machine highlight available to add it to the cluster.
- Remove the highly available attribute of a virtual machine and move it from a clustered host to a non-clustered host.
- Remove the highly available attribute of a virtual machine and move it from a host in a source cluster to a host in a destination cluster, where the virtual machine will be made highly available again.
In other words, it doesn’t matter what kind of Windows Server 2012 or Hyper-V Server 2012 host you have. You can move that virtual machine.
Hyper-V Is Breaking Down Barriers to IT Agility
You can easily move virtual machines from one host to another in Windows Server 2012 Hyper-V. The requirements for this are as follows.
- You are running Windows Server 2012 Hyper-V on the source and destination hosts.
- The source and destination hosts have the same processor family (all Intel or all AMD).
- If there are mixed generations of processor then you might have to enable processor compatibility mode in the settings of the virtual machine. It is a good idea to always buy a new processor when acquiring new hosts – this gives you a better chance at buying compatible host processors in 12-18 months’ time.
- The hosts must be in the same domain.
- You have at last 1 GbE of connectivity between the hosts. This bandwidth can be a share of a greater link that is guaranteed by the features of converged networking (such as QoS). Ideally, you will size the Live Migration network according to the amount of RAM in the hosts and the time it takes to drain the host for maintenance.
With all of those basic requirements configured, the only barrier remaining to Live Migration is the network that the virtual machine is running one. For example, if the virtual machine is running on 192.168.1.0/24 then you don’t want to live migrate it to a host that is connected to 10.0.1.0/24. You could do that, but the virtual machine would be unavailable to clients… unless the destination host was configured to use Hyper-V Network Virtualization, but that’s a whole other article!
No comments:
Post a Comment