March 31, 2013

QoS in Windows Server 2012

As organizations begin to depend more heavily on cloud services, network bandwidth management becomes even more critical. Thankfully, bandwidth can be managed through a Windows component known as Quality of Service (QoS). In this article, I will explain what QoS is, how it works, and what you need to know about using QoS in Windows Server 2012.
An Introduction to QoS

Although the main focus of this article series will be on using QoS with Windows Server 2012, there are two important things that you need to know right off the bat. First of all, QoS is not new to Windows Server 2012. Microsoft first introduced QoS over a decade ago when it debuted in Windows 2000. Of course Windows support for QoS has been modernized in Windows 2012.

The other thing that I want to clear up right away is the notion that QoS is a Microsoft technology. Even though QoS is built into the Windows operating system (and has been for quite some time) it is an industry standard rather than a Microsoft technology. Microsoft has a long history of including industry standards in Windows. For example IPv4 and IPv6 are also industry standard networking protocols that are included in the Windows operating system.

So with that out of the way, I want to go ahead and talk about what QoS is. In order to really understand what QoS is and why it is important, you have to consider the nature of networks in general. Without a mechanism such as QoS in place, most networks use what is known as best effort delivery. In other words, when a computer sends a packet to another computer, the two machines and the networking hardware between them will make an earnest attempt to deliver the packet. Even so, delivery is not guaranteed. Even if the packet does make it to its destination, there are no guarantees as to how quickly it will get there.

Often times the delivery speed is based on the network speed. For example, if a packet is being sent between two PCs that reside on the same gigabit network segment then the packet will most likely be delivered very quickly. However, this is anything but a guarantee. If anything, the network speed (1 gigabit in this example) can be thought of as an unbreakable speed limit rather than a guarantee of fast delivery. Of course having a fast network certainly increases the chances that a packet will be delivered quickly, but there are no guarantees. An application that consumes a lot of bandwidth has the potential to degrade performance for every other device on the network.

This is where QoS comes into play. QoS is essentially a set of standards that are based on the concept of bandwidth reservation. What this all boils down to is that network administrators are able to reserve network bandwidth for mission critical applications so that those applications can send and receive network packets in a reasonable amount of time.

It is important to understand that although QoS is implemented through the Windows operating system, the operating system is not the only component that is involved in the bandwidth reservation process. In order for QoS to function properly then each network device that is involved in communications between two hosts (including the hosts themselves) must be QoS aware. This can include network adapters, switches, routers, and other networking hardware such as bridges and gateways. If the traffic passes through a device that is not QoS aware, then the traffic is dealt with on a first come, first serve basis just like any other type of traffic would be.

Obviously not every type of networking supports QoS, but Ethernet and Wireless Ethernet do offer QoS support (although not every Ethernet device is QoS aware). One of the best networking types for use with QoS is Asynchronous Transfer Mode (ATM). The reason why ATM works so well with QoS is because it offers connection oriented connectivity. When QoS is used, ATM can enforce the bandwidth requirements at the hardware level.

Before move on, I want to clear up what might seem like a contradiction. When I talked about Ethernet, I said that Ethernet supports QoS, but that the underlying hardware must be QoS aware. Even so, Ethernet does not enforce QoS at the hardware level the way that ATM does. So what gives?

The reason why Ethernet does not enforce QoS at the hardware level is because Ethernet is a very old networking technology that has been retrofit many times over the last couple of decades. The concept of bandwidth reservation did not exist when Ethernet was created, and bandwidth reservation at the hardware level just does not work with the existing Ethernet standard. That being the case, QoS is implemented at a higher level in the OSI model. The hardware does not perform true bandwidth reservation, but rather emulates bandwidth reservation through traffic prioritization based on the instructions provided by QoS.

Additional Considerations

Although I have given you an overview of what is required for implementing QoS, there are a few other considerations that should be taken into account. For starters, Windows Server 2012 does not impose any bandwidth requirements that would keep you from using QoS in certain situations. Even so, Microsoft states that QoS works best on 1 gigabit and 10 gigabit network adapters.

Presumably the main reason behind Microsoft's statement is that adapters that operate at speeds below a gigabit simply do not provide enough bandwidth to make bandwidth reservation worthwhile.

I might be reading too much into Microsoft's recommendation, but there is something that I just can't help but notice. Microsoft said that QoS works best on 1 gigabit or 10 gigabit adapters – not connections. Although this might at first seem trivial, I think that Microsoft's wording is deliberate.

One of the new features in Windows Server 2012 is NIC teaming. NIC teaming will allow multiple network adapters to work together as one in order to provide higher overall throughput and resilience against NIC failure. I have not seen any official word as to whether or not NIC teaming will work with QoS, but I would be very surprised if Microsoft did not allow the two features to be used together.

One last thing that I want to quickly mention about QoS is that it is designed for traffic management on physical networks. As such, Microsoft recommends that you avoid using QoS from within a virtual server. However, QoS can be used on a physical server that is acting as a virtualization host.

I explained that Quality of Service (QoS) is a networking standard, and that Microsoft has offered QoS support within the Windows operating system since Windows 2000. That being the case, it is easy to dismiss Windows Server 2012's support for QoS as being nothing more than a legacy feature that is still being supported. However, QoS has evolved to meet today's bandwidth reservation related needs.

Legacy Bandwidth Management

In order to truly appreciate how QoS has been improved in Windows Server 2012, you have to understand some of the QoS limitations in previous versions of the Windows Server operating system. In the case of Windows Server 2008 R2, QoS could only be used to enforce maximum bandwidth consumption. This type of bandwidth management is also sometimes referred to as rate limiting.

With careful planning it was often possible to achieve effective bandwidth management even in Windows Server 2008 R2. However, in the case of Hyper-V it was impossible to achieve granular bandwidth management for an individual virtual machine.

Granular Bandwidth Management

The reason why granular bandwidth management is so important within a virtual datacenter is because virtual machines produce at least four different types of traffic. Limiting bandwidth consumption for all four types of network traffic in a consistent way can sometimes be counterproductive.

To show you what I mean, here are the four main types of network traffic that can be produced by virtual machines in a Hyper-V environment:
  • Normal network traffic – This is network traffic that flows between the virtual machine and other servers or workstations on the network. These machines can be both physical and virtual.
  • Storage traffic – This is the traffic that is generated when virtual hard disk files reside on networked storage rather than directly on the host server that is running the virtual machine.
  • Live migration traffic – This is the traffic that is created by the live migration process. It typically involves storage traffic and traffic between two host servers.
  • Cluster traffic – There are several different forms of cluster traffic. Cluster traffic can be the traffic between a cluster node and a cluster shared volume (which is very similar to storage       traffic). It can also be inter-node communications such as heart beat traffic.

The point is that network traffic within a virtual datacenter can be quite diverse. Because of this, the type of bandwidth management provided by QoS in Windows Server 2008 R2 simply does not lend itself well to virtual datacenters.

There are two reasons why the concept of bandwidth rate limiting doesn't work so well for virtual machines. For one thing, limiting a virtual machine to using a certain amount of bandwidth might lead to unnecessary performance problems. Suppose for instance that a host server had a 10 gigabit connection and you limited a particular virtual machine to consuming 1 gigabyte of bandwidth. By doing so, you could prevent the virtual machine from robbing bandwidth from other virtual machines, but you also prevent the virtual machine from using surplus bandwidth. Imagine for instance that at a given point in time there were seven gigabits of available bandwidth, but the virtual machine was only able to use one gigabit even though it could benefit from additional bandwidth and the additional bandwidth could be provided at that moment without taking anything away from other virtual machines.

Of course the opposite is also true. Without proper planning, limiting bandwidth can lead to bandwidth deprivation for specific virtual machines. Suppose for example that a host server is running twelve virtual machines and that those virtual machines all share a single, ten gigabit network adapter. Now let's suppose that you were to configure each virtual machine so that it can never consume more than 1 gigabit of network bandwidth.

Given the fact that the host server is running twelve virtual machines, the server's bandwidth has actually been over committed at that point. During a period of high demand, each virtual machine will try to use up to 1 gigabit of network bandwidth. Because the physical hardware cannot provide a full twelve gigabits of bandwidth, some of the virtual machines could end up suffering from poor performance because they are unable to get the bandwidth that they need.

QoS in Windows Server 2012

As I previously explained, the Windows Server 2008 R2 implementation of QoS isn't exactly a bandwidth reservation system (even though QoS is technically a bandwidth reservation protocol). Instead, it can be thought of more as a bandwidth throttling solution. In other words, Windows Server 2008 R2's QoS implementation allows an administrator to dictate the maximum amount of bandwidth that a virtual machine can consume. This is similar to the technology that Internet Service Providers (ISPs) use to offer various rate plans. For example, my own ISP offers a 7 megabit, ten megabit, and a fifteen megabit package. The more you pay, the faster the Internet connection that you get.

Even though the concept of bandwidth throttling still exists in Windows Server 2012, Microsoft is also introducing a concept known as minimum bandwidth. Minimum bandwidth is a bandwidth reservation technology that makes is possible to make sure that various types of network traffic always receive the bandwidth that they need. This is really what QoS was designed for in the first place.

Obviously the biggest benefit to using this approach is that the concept of minimum bandwidth makes it possible to reserve bandwidth in a way that ensures that each virtual machine receives enough bandwidth to do its job. However, that is not the only benefit.

A second benefit is that Windows Server 2012 will make it possible to differentiate between the various types of network traffic that are produced by virtual machines. For example, an administrator could theoretically reserve more bandwidth for storage traffic than for regular virtual machine traffic.

Arguably the greatest benefit however, is that minimum bandwidth reservations are different from bandwidth caps. Although it is still possible (and sometimes necessary) to set bandwidth caps, minimum bandwidth settings do not cap bandwidth consumption.

Let's assume for example that you wanted to reserve 30% of your network bandwidth for virtual machine traffic, and the remaining 70% of bandwidth for things like live migration and storage traffic. If you don't have any live migrations happening at the moment then you might not need any bandwidth for live migrations at all. It would be silly to lock up that bandwidth to prevent it from being used for other types of network traffic.

In this type of situation, the virtual machine traffic receives the 30% of the network bandwidth that has been reserved for it. If the virtual machine traffic could benefit from additional bandwidth at the moment and bandwidth is not presently being consumed by the other services that hold a reservation then that bandwidth is made available to virtual machine traffic until it is needed by one of the other traffic types in order to fulfill the minimum bandwidth reservation. Of course I am only using virtual machine traffic as an example. The concept applies to any type of traffic.

Policy Based QoS

In Windows Server 2012, QoS is implemented through the use of group policy settings. Microsoft refers to this as Policy Based QoS. You can access the QoS portion of the Group Policy Editor by navigating through the Group Policy Editor's console tree to Computer Configuration | Windows Settings | Policy Based QoS.

It is worth noting that a QoS policy can be created at either the computer level or at the user level (or both). It is generally preferred to implement QoS policies at the computer level.
Creating a QoS Policy

You can create a new QoS policy by right clicking on the Policy Based QoS container and selecting the Create New Policy command from the shortcut menu. When you do, Windows will launch the Policy Based QoS Wizard.

The next option gives you the opportunity to specify a DSCP value. DSCP is an acronym standing for Differentiated Services Code Point. In spite of its rather cryptic sounding name the DSCP value's job is actually quite simple. The value that is assigned here designates the policy's traffic priority.

You might have noticed in the that the DSCP field has a default value of zero. The DSCP can be set to a value ranging from zero to 63. The higher the value, the higher the traffic priority. Therefore, a default QoS policy has the lowest possible priority.

When you assign a DSCP value to a QoS policy, you are essentially creating a queue for outbound network traffic. By default the traffic passing through the queue is not throttled. QoS only limits the traffic when bandwidth contention becomes an issue. In those types of situations lower priority queues yield to higher priority queues.

In some situations it is possible that a high priority queue could choke out a lower priority queue if a large amount of traffic passes through the higher priority queue. Doing so implements a bandwidth cap that prevents the queue from consuming an excessive amount of bandwidth. The throttle rate can be specified in terms of either kilobits per second or megabits per second.

When you click Next, you are given the opportunity to specify the traffic stream to which the QoS policy should apply. Rather than requiring you to identify traffic by TCP/IP port numbers, the QoS policy is designed to be bound to specific applications.

If you want to bind the QoS policy to a specific application then all you have to do is specify the name of the application's executable. In some cases however, there might be multiple applications on a system that use duplicate executable file names even though the applications themselves are different. In those types of situations you can specify the path to the application. If a path is required then you should use environment variables (such as %ProgramFiles%) whenever possible.

Your other option for binding the new QoS policy to a traffic stream is to use the policy to regulate HTTP traffic. In doing so, you must specify a specific URL or domain. That way the QoS policy will only regulate traffic for that specific site or Web application rather than applying to all HTTP traffic. Of course if your goal is to put a bandwidth cap on Web browsing then you always have the option of binding the policy to Internet Explorer.

Often times QoS policies need to have a granular scope. Imagine for example that your goal was to regulate the traffic produced by applications on one specific server. The problem with doing so is that QoS policies are really nothing more than group policy settings. Therefore, if you were to simply create a QoS policy within the Default Domain Policy then that policy would apply to all of the computers (or users) in the domain.

You could resolve this problem by segmenting your Active Directory into a series of organizational units, but that gets complicated and can be difficult to manage.

Specifying a source address effectively limits the policy so that it only applies to the specified source computer. As an alternative you can specify an IP address range so that the policy will apply to a specific subnet rather than to an individual computer.

The destination option allows you to apply the policy only to traffic that is destined for a specific computer or a specific subnet. If you need even tighter granular control then you have the option of specifying both a source and a destination address so that only the traffic flowing between designated hosts is regulated.

Gives you tighter control over the types of traffic that are regulated. Here you can specify TCP and UDP port numbers for both the source and destination. You can use this as an alternative to specifying an application to which the policy should be bound.

No comments:

Post a Comment