January 15, 2013

Shared Storage Considerations for Hyper-V

One of the biggest considerations that must be taken into account when deploying a Hyper-V server is that of storage. Not only must your server have adequate storage capacity, but the storage subsystem needs to be able to deliver sufficient I/O to meet the virtual machine's demands. Furthermore, storage should offer some sort of redundancy so as to avoid becoming a single point of failure.

As you can imagine, planning for Hyper-V storage is not a task to be taken lightly. Fortunately, there are a lot of options. The trick is to assess the available options and pick a solution that meets your performance, fault tolerance, and budgetary requirements.

My goal in this article is not to discuss every possible storage option for Hyper-V, but rather to give you some insight as to what works and what doesn't work based on my own experiences in the field.

The Most Important Fault Tolerant Consideration

Even though server virtualization is widely regarded as a revolutionary technology, there are a few negative aspects to using it. Perhaps one of the biggest pitfalls to server virtualization is that a single host server runs multiple virtual machines. If the host server were to fail then all of the virtual machines residing on that host also drop offline, resulting in a major outage.

That being the case, I always recommend to my clients that they implement a failover cluster as a way of preventing a host server from becoming a single point of failure. The problem is that failover clustering for Hyper-V requires the use of shared storage, which can be expensive. When Microsoft eventually releases Windows Server 8 and Hyper-V 3.0, the shared storage requirement will go away. For right now though, Hyper-V clusters are often beyond the budget of smaller organizations. Such organizations typically end up using direct attached storage.

Direct Attached Storage

Even though local, direct attached storage does not offer the same degree of redundancy as a full blown clustering solution, it is still possible to build in at least some redundancy by making use of an internal storage array. A storage array won't protect you against a server level failure, but it will protect against a disk failure (if implemented correctly).

Before I talk about your options for RAID storage, I want to talk about a situation that I ran into a few weeks ago. I have a friend who owns a small business and runs three production servers. The servers had been in place for a long time and the hardware was starting to age. My friend didn't really have the money to replace all of the server hardware and asked me if virtualizing the servers might be a good alternative.

After considering my friend's very tight budget and his business needs we decided to purchase a high end PC rather than a true server. The computer had six SATA ports so we planned to use one port for a boot drive, one port for a DVD burner (which was a business requirement), and the remaining four ports for a RAID 5 array.

Even though RAID 5 has fallen out of fashion over the last few years, it made sense in this case because combining the four disks into a RAID 5 array would deliver higher performance (from an I/O prospective) than a mirror set would. Although RAID 5 doesn't perform as well as RAID 0, the built in parity more than makes up for any loss in performance or capacity.

When all of the parts arrived, I set up the new computer in the way that we had planned. However, even though we had built the array from SATA 3 disks which were rated at 6 gigabits per second, the array was painfully slow. In fact, copying files to the array yielded a sustained transfer rate of only about 1 MB per second. Furthermore, the array would almost always fail to copy large files.

I have built similar arrays on comparable hardware in lab environments before, so I knew that the array should perform much better than it was. My initial assumption was that the problem was driver related, but a check of all of the system's drivers revealed that everything was up to date.

The next thing that I decided to do was to update the computer's firmware. Over the years I have had a few bad experiences with firmware updates, so there are a couple of things that I always do prior to updating the firmware. First, I plug the computer into a UPS in case there is a power failure during the update. I have actually had the electricity go out during a firmware update and it ruined the system board.

The other thing that I do is document all of the BIOS settings. While documenting the BIOS settings I noticed that the computer's BIOS had identified all of the hard drives as IDE rather than ACPI. While this could certainly account for the performance problems, the system would not let me set the drives to ACPI.

After a lot of trial and error I discovered that SATA ports one through four could operate in either IDE or ACPI mode, but ports 5 and six could only operate in IDE mode. To fix the problem I moved the boot drive from port 1 to port 5 and moved the DVD burner from port 2 to port 6. I then set ports one through four to use ACPI mode and attached the drives for the storage array.

Before I could use the server, I had to reconfigure the BIOS to boot from the drive on Port 5. I also had to use the Windows Disk Management Console to completely rebuild the RAID array. Once I did that the disk array began delivering the expected level of performance.

The reason why I chose to tell this story is because anyone who decides to store virtual machines on an internal RAID array could potentially run into similar problems. Since I have already worked through the troubleshooting process, I wanted to pass along my solution in the hopes that I could help someone.

RAID Selection

If you end up setting up Hyper-V to use a local RAID array then you will have to decide what type of RAID array you want to use. Your options vary depending on the number of disks that you have to work with. Here are a few thoughts on some common RAID levels:

RAID Level

Description

Comments

0

Striping

RAID 0 delivers high performance, but does not provide any fault tolerance

1

Mirroring

Disk mirroring is great for redundancy, but RAID 1's performance is almost always inadequate for hosting virtual machines

5

Striping with parity

RAID 5 delivers the performance of a stripe set (although not as good as RAID 0) and the array can continue functioning even if one disk fails.

6

Striping with double parity

RAID 6 has a higher degree of overhead than RAID 5 but the array can survive a double disk failure.

10

Mirrored Stripe Set

RAID 10 (or RAID 1+0 as it is sometimes called) offers the performance of a stripe set, with full mirroring. RAID 10 typically delivers the best bang for the buck, but it takes a lot of hard disks to build an adequately performing RAID 10 array.



Using Direct Attached Storage is acceptable for implementing basic server virtualization in a small organization, but is inadequate for use in medium and large sized organizations.

The reason why this is the case has to do with the very nature of server virtualization. Server virtualization uses a single physical server to host multiple virtualized workloads. The problem with doing so is that the cost of failure goes way up. In a physical data center for example, a server failure might be a big inconvenience, but it is rarely catastrophic. Server failures in a virtual data center are another story. If a host server fails then every virtual server residing on that host will also fail. When you consider that a single host might contain dozens of virtual machines you can begin to understand why it is so critically important to protect virtualization hosts.

So what does all this have to do with storage? Well, the only way to protect against a server level failure is to build a failover cluster. If a host within a failover cluster drops off-line then the virtual machines themselves are simply moved to another host that is still functioning. Of course virtual machine migrations can also occur even without a server failure. Often times for example, a virtual machine may be moved to another host in an effort to balance the host workload or in preparation for taking the host off-line for maintenance.

In the Windows Server 2008 and 2008 R2 versions of Hyper-V, the only way to provide virtual machine failover and live migration capabilities is to implement shared storage. Shared storage consists of a storage device that is treated as a local storage resource by all of the nodes in a failover cluster.

Unfortunately shared storage can be expensive to implement. In fact, the cost is one of the major barriers to entry for smaller organizations. Thankfully, Windows Server 2012 will do away with the shared storage requirements for Hyper-V (although shared storage will still be supported).

In the case of Windows Server 2008 and 2008 R2, building a failover cluster for Hyper-V means storing virtual machines on a cluster shared volume. As previously mentioned, the cluster shared volume is networked storage that is accessible to each node in the cluster. The reason why cluster shared volumes tend to be expensive to implement is because the storage must be seen as a local to each cluster node. This rules out connecting cluster nodes to file server storage (although doing so will be supported in Windows Server 2012). For the time being, your only options for implementing shared storage are to use either iSCSI or Fibre Channel.

As is the case for Direct Attached Storage, connectivity is far from being the only consideration that should be taken into account with regard to the storage unit. Other important considerations are the number of IOPS that the storage unit is capable of delivering, resilience to failure, and the bandwidth available for storage connectivity.

When it comes to storage bandwidth, higher bandwidth is obviously better. However, it is important to keep in mind that raw throughput is not always an accurate reflection of storage bandwidth. For example, iSCSI can be utilized over a ten gigabit Ethernet connection. Likewise, there is a flavor of Fibre Channel called Fibre Channel Over Ethernet that can also be used over ten gigabit Ethernet. If one were to only look at raw throughput then it would be easy to assume that Fibre Channel Over Ethernet and iSCSI could both outperform Fibre Channel because Fibre Channel communications are currently limited to 8 gigabits per second. However, Fibre Channel is actually the faster medium in spite of the fact that it has a lower raw throughput. The reason for this is that Fibre Channel Over Ethernet and iSCSI both require storage transmissions to be encapsulated into Ethernet packets. There is quite a bit of overhead associated with the encapsulation process and that overhead causes iSCSI and Fibre Channel Over Ethernet to be slower than Fibre Channel. Network cards with TCP/IP offloading capabilities can help to bridge the gap between the various technologies, but Fibre Channel still comes out ahead.

As previously mentioned, storage bandwidth is not the only consideration that must be taken into account with regard to building a cluster shared volume. IOPS and resiliency to failure are also major concerns. The RAID level used by the storage array directly impacts both of these factors. As a general rule RAID 10 (also called RAID 0+1) is the preferred RAID level because it delivers the highest IOPS while also protecting against hard drive failure.

Implementing a Cluster Shared Volume

The process for creating a cluster shared volume differs depending on what type of storage medium you are using and on whether you are using Windows Server 2008 or Windows Server 2008 R2. As a general rule however, you must begin the process by installing Windows onto each cluster node and then using the Server Manager to deploy the Failover Clustering Service. It is important that each cluster node be configured in an identical manner aside from its computer name and IP addresses.

Once Windows has been installed then the next step in the process is to use an initiator to establish connectivity to the shared storage. Each cluster node must use the same drive letter for the shared storage.

At this point you would open the Failover Cluster Manager and create the cluster. The cluster creation process is beyond the scope of this article since my primary focus is on storage.

Once the cluster has been created, you can select the cluster name within the Failover Cluster Manager and then click on the Enable Cluster Shared Volumes link. When you do, a new container named Cluster Shared Volumes will be created within the console tree. Now you must tell Windows to treat your shared storage as a cluster shared volume. To do so, simply select the Cluster Shared Volume container and then click on the Add Storage link found in the Actions pane. Windows will now ask you which disk you want to use as a cluster shared volume. Make your selection and click OK. The disk that you have selected now appears as a cluster shared volume.

Configuring Virtual Machines to Use the Cluster Shared Volume

After the cluster shared volume is in place the next step is to configure your virtual machines to use it. The first step in doing so is to install Hyper-V onto each cluster node. Once Hyper-V is up and running then you can begin creating virtual machines. As you create the virtual machines you must tell Hyper-V to store the virtual machines and their associated virtual hard disk files on the cluster shared volume. Remember, each cluster node should use the same drive letter for the cluster shared volume.

Believe it or not, merely storing the virtual machine files and the virtual hard disks on the cluster shared volume will not make the virtual machines fault tolerant. To achieve fault tolerance you must shut down the virtual machines (or place them in a saved state) and then take some steps to make the Failover Clustering Service aware of your virtual machines. To do so, open the Failover Cluster Manager and then select the Services and Applications container in the console tree. Next, click the Configure a Service or Application link found in the Actions pane. This will cause Windows to launch the High Availability Wizard.

The wizard's initial screen asks you which service or application you want to configure for high availability. Choose the Virtual Machines option and then click Next. On the following screen select the check boxes that correspond to the virtual machines that you want to add to the failover cluster and click OK. Now just click Next and Finish. When you are done the virtual machines should be listed in the Failover Cluster Manager.

No comments:

Post a Comment