Traditional Network Protection
When we decide that a virtual machine must have fault tolerant networking, we normally choose to deploy a network configuration as depicted below. Two or more NICs are connected to different top-of-rack (TOR) access switches. Those NICs are teamed to give load balancing and failover (LBFO).
A virtual switch is connected to that NIC team and then virtual machines connect to that virtual switch. The end result is that virtual machines have a fault tolerant network connection, where the virtual machines stay online if a single TOR switch or a single NIC fails.
Although this design has been around for years and is considered good practice, times are changing. There are some data centers where this design might not be considered suitable, or it might not offer the right kind of fault tolerance.
The following two examples are scenarios where this design wouldn't be suitable:
What Are Protected Networks?
Protected networks is a feature that is supported by Failover Clustering when you deploy Windows Server 2012 R2 Hyper-V clusters. A cluster will monitor the network connection of virtual machines. If a virtual machine loses network connectivity, then the virtual machine will be live migrated with no perceivable downtime to another host that has an identical, but working network connection.
Each host in the Hyper-V cluster has a single NIC that is dedicated to virtual machine networking. Note that there is no NIC team. A virtual switch is created in each host and connected to that dedicated NIC. A virtual machine is created in Host A and connected to the virtual switch. Failover Clustering monitors the connection status of the physical NICs. If the physical NIC in Host A fails, then the virtual machine will be live migrated to Host B.
Note: A failover cluster did not occur because the clustering NetFT driver was still able to send receive heartbeats across the other network(s) on the host.
Failover Clustering will not react immediately to a network outage. Each virtual machine has a cluster resource to report failures to that virtual machine. Each virtual machine will individually report a failure to Failover Clustering.
The status report might be executed immediately or it might take up to 60 seconds. As a result, a brief network disconnection might occur after the network failure before the virtual machine is live migrated to another host. The fact that each virtual machine has an individual report means that you won''t see all virtual machines initiate a live migration at the same time, assuming that your faulty host is configured to accommodate that load.
For most of us, this would be intolerable so we'll continue to invest in dual TOR switches and NIC teams. But if you have a truly massive installation or if the cost of insurance is higher than the cost of brief outages, then protected networks might seem better than NIC teaming.
Controlling Protected Networks
By default, protected networks is turned on. Yes, this feature protects you even if you have a traditional NIC team connection for your virtual switches. This is protection for the above fault domain scenario. You can view this setting by opening the properties of a virtual machine and expanding the Advanced Features of the desired virtual NIC. You can disable or re-enable this setting for different virtual NICs. For example, you might tolerate outages on one virtual NIC but want an automated response for another virtual NIC.
Protected networks gives you additional protection for your highly available virtual machines' network connectivity. But if you are working in one of those data centers that considers NIC teaming to be too expensive, then you can use protected networks to move virtual machines quickly to another host in the cluster soon after the network outage.
When we decide that a virtual machine must have fault tolerant networking, we normally choose to deploy a network configuration as depicted below. Two or more NICs are connected to different top-of-rack (TOR) access switches. Those NICs are teamed to give load balancing and failover (LBFO).
A virtual switch is connected to that NIC team and then virtual machines connect to that virtual switch. The end result is that virtual machines have a fault tolerant network connection, where the virtual machines stay online if a single TOR switch or a single NIC fails.
Although this design has been around for years and is considered good practice, times are changing. There are some data centers where this design might not be considered suitable, or it might not offer the right kind of fault tolerance.
The following two examples are scenarios where this design wouldn't be suitable:
- Fault domain: In this scenario, a data center is concerned that all of the TOR switching in a rack (usually two independent switches) might fail. They need a solution to get virtual machines out of that rack as quickly as possible to another rack with working network connectivity.
- Reduced Costs: A data center decides that the cost of supporting double the number of NICs and switch ports is not sustainable. Instead, they consider a rare brief outage to virtual machines to be more tolerable than the guaranteed CAPEX (purchase cost) and OPEX (operational cost) of additional hardware and management.
What Are Protected Networks?
Protected networks is a feature that is supported by Failover Clustering when you deploy Windows Server 2012 R2 Hyper-V clusters. A cluster will monitor the network connection of virtual machines. If a virtual machine loses network connectivity, then the virtual machine will be live migrated with no perceivable downtime to another host that has an identical, but working network connection.
Each host in the Hyper-V cluster has a single NIC that is dedicated to virtual machine networking. Note that there is no NIC team. A virtual switch is created in each host and connected to that dedicated NIC. A virtual machine is created in Host A and connected to the virtual switch. Failover Clustering monitors the connection status of the physical NICs. If the physical NIC in Host A fails, then the virtual machine will be live migrated to Host B.
Note: A failover cluster did not occur because the clustering NetFT driver was still able to send receive heartbeats across the other network(s) on the host.
Failover Clustering will not react immediately to a network outage. Each virtual machine has a cluster resource to report failures to that virtual machine. Each virtual machine will individually report a failure to Failover Clustering.
The status report might be executed immediately or it might take up to 60 seconds. As a result, a brief network disconnection might occur after the network failure before the virtual machine is live migrated to another host. The fact that each virtual machine has an individual report means that you won''t see all virtual machines initiate a live migration at the same time, assuming that your faulty host is configured to accommodate that load.
For most of us, this would be intolerable so we'll continue to invest in dual TOR switches and NIC teams. But if you have a truly massive installation or if the cost of insurance is higher than the cost of brief outages, then protected networks might seem better than NIC teaming.
Controlling Protected Networks
By default, protected networks is turned on. Yes, this feature protects you even if you have a traditional NIC team connection for your virtual switches. This is protection for the above fault domain scenario. You can view this setting by opening the properties of a virtual machine and expanding the Advanced Features of the desired virtual NIC. You can disable or re-enable this setting for different virtual NICs. For example, you might tolerate outages on one virtual NIC but want an automated response for another virtual NIC.
Protected networks gives you additional protection for your highly available virtual machines' network connectivity. But if you are working in one of those data centers that considers NIC teaming to be too expensive, then you can use protected networks to move virtual machines quickly to another host in the cluster soon after the network outage.
No comments:
Post a Comment