PCI compliance and being able to secure Virtual Machines connected to the same vSwitch and Port group are becoming critical requirements in most large enterprise virtualisation deployments as they drive adoption further and consider Business Critical Applications. This is important because the network traffic would otherwise not be visible back to the traditional network security appliances and security administrators. With a virtualized environment more dynamic and automated enforcement of security policies are needed, as an individual virtual machine can run on any host at any time. The problems with the traditional security methods can be solved in a couple of different ways, such as using a Cisco Nexus 1000V and port policies and ACL’s (access control lists), private VLAN’s or with VMware vShield. Here I will discuss design considerations and limitations that will be important in enterprise deployments of vShield App.
There are considerations that impact both the management and managed infrastructure. The goal is to ensure the architecture is deployed in a secure and scalable manner. This will allow traditional security bottlenecks to be avoided, while allowing consistent security policy to be applied in a more automated manner to allow further business critical applications to be virtualized with confidence.
Below I go over a bit of the history behind vShield before we dive into the design and deployment of the most recent versions. This article will not go into deployment of vShield Zones 1.0 or use of vShield on vSphere 4.0. Instead we will focus on deployment of vShield App from v4.1 Update 1 and above. It is important to note that vShield only provides security protection for virtual machines, not host management interfaces, vMotion, network storage, or VMware Fault Tolerance network communications.
The original version of vShield, vShield Zones was made available as part of the Enterprise licenses in vSphere 4.0. vShield Zones offered simple firewall and flow monitoring for virtual machines. The architecture had to be deployed in such a way that the virtual machines connected to an internal only vSwitch and the vShield Zones agent listened to all traffic on this internal only vSwitch (using promiscuous mode) and only passed allowed traffic to or from the physical network. This original architecture, as displayed in the diagram below was quite limited and had a high impact on performance and scalability. Fortunately as vShield was developed this architecture has greatly improved.
Multiple vShield Zones Agents could be deployed on a host to protect multiple port groups which each contained related virtual machines. vShield Zones originally only protected virtual machines from traffic originating from the physical network or from a different vSwitch (which wold have to first pass out to the physical network). It did not protect virtual machines from traffic within a vShield Zone, i.e. connected to the same internal only vSwitch Port Group. This may sound similar to what vShield Edge does today, by protecting the perimeter of a zone. The original version of vShield Zones could also be hard to implement on Standard vSwitches, as multiple vSwitches were needed to protect multiple zones, each requiring multiple uplinks for redundancy and needing to be configured identically on every host in the cluster. Virtual Distributed Switches made the process somewhat more palatable.
Due to the way vShield Zones was deployed and it’s architecture it was not recommended to protect the vShield Manager or vCenter virtual machines using vShield Zones. I will discuss solutions to this problem later in this article. Management of vShield Zones was via either a direct https web interface to the vShield Manager appliance, or via a vCenter plug-in. vShield Zones 1.0 U1 is compatible with vSphere 4.0 and above (including vSphere 5.0).
With vSphere 4.1 vShield underwent an architectural evolution the made it a lot more scalable, easier to implement and greatly improved performance. Additional automation was included in the way that the vShield Manager deployed and managed the vShield Firewall Appliances. With the introduction of vSphere 4.1 vShield was also split into a suite of security products, vShield Zones, still available with the vSphere Enterprise license, was restricted to firewall only, whereas vShield App was introduced with Firewall and improved Flow Monitoring. vShield Edge and vShield Endpoint were added to the suite (not covered in this article).
vShield Zones and App 4.1 no longer required virtual machines connect to an internal only port group and then pass the traffic through the vShield agent to the outside network. Instead vShield 4.1 leveraged a vSphere Kernel Module, and only the Firewall appliance was deployed to an internal vSwitch, along with a vmkernel port to inspect and allow / deny traffic flows. This took much of the processing overheads off the vShield Firewall Appliance itself and put it into the vSphere Kernel, which was more able to handle high throughput. This also meant that only a single vShield Firewall Appliance virtual machine was required per host to provide protection of all port groups. This was also much more secure as it didn’t rely any longer on promiscuous mode being enabled on the vSwitches. The diagram below depicts the logical architecture for vShield Zones and App since 4.1, which is still current as of vShield 5.0.
vShield App is broken down into three major components:
- vShield Manager, which manages the rule base, role based access controls, stores flow data etc.
- vShield Firewall Service VM, which is deployed per host and inspects traffic flows and allows/denies traffic based on the rule base
- vSphere Kernel Modules (vShield 4.1 and above)
Even though vShield 4.1 is a big improvement over vShield Zones 1.0 there are a few design considerations with regard to it’s deployment to ensure it will work correctly.
- There must be a 1:1 mapping of vShield Manager to a vCenter. A single vShield Manager can’t be used to protect multiple vCenters even in linked mode. If you have more than one vCenter you need to have a vShield Manager for each vCenter that will have protected VM’s.
- The vShield Manager must be managed by the vCenter where it is providing the protection. This means it can’t be in an independent management cluster that is managed by a different vCenter. This also means that each vCenter that will host resources that need protection must itself have a management host or management cluster to run vShield Manager. This is due to the way it deploys vShield App Firewall Service Appliance virtual machines using linked-clones.
- It is not supported to run vShield Zones or vShield App on the management cluster in which vShield Manager and vCenter virtual machines reside. This is due to the circular dependency that is created that could prevent vCenter and vShield Manager traffic from reaching the hosts and vShield Firewall Service Appliance virtual machines.
- Even though vShield Manger has a single vCPU you can’t use VMware Fault Tolerence to protect it when implementing vShield App, this is because it uses linked clones and snapshots as part of the deployment process for the vShield Firewall Service Appliance virtual machines. This limitation doesn’t apply to vShield Edge (which is best practice as per the VMware vCloud Architecture Toolkit).
- It is not supported to rename the vShield App Firewall Service Appliance virtual machines. vShield Manager uses the default virtual machine name in order to find and manage the deployment and update of the vShield App Firewall Service Appliances.
- If deploying the vShield App Firewall Service Appliance virtual machines on shared storage DRS Host Affinity Rules (Must Run on Host) need to be defined and applied to enforce the appliances run on the host they are provisioned on, and are not migrated off when the host is put into maintenance mode.
- If the vShield App Firewall Service Appliance virtual machine goes offline for any reason no traffic will pass to any of the protected virtual machines. Availability of the vShield App Firewall Service Appliance VM is critical to the service availability of the virtual machines.
- All hosts within a cluster need to run vShield App if protection of virtual machines is to be achieved as vShield Manager will unprotect a virtual machine if it is migrated to a host that does not have vShield App deployed.
The diagram below displays the logical deployment of vShield App 4.1 in an enterprise environment.
In the above logical architecture diagram you can see that the management cluster contains the vShield Manager and in this case also the vCenter virtual machine that is responsible for managing the virtual datacentre. The App cluster in this diagram would contain all non-management virtual machines that need protecting by vShield App. When vCenter is deployed inside the same virtual datacentre that it is managing inside a management cluster it is a good idea to pin it to a particular host using a DRS affinity rule (should run on host), so it can be easily found if necessary for troubleshooting purposes. By using the should run on host or preferred host rule the vCenter can still be relocated automatically by DRS in the case of host maintenance. There are a few important drawbacks with the above architecture:
- The Management Cluster cannot be protected itself due to the circular dependency that this would create and the difficulties in troubleshooting.
- In a small deployment it may cause inefficient use of resources as the management host or cluster may not be highly utilized.
- Does not allow for a completely independent management infrastructure that is separate from the infrastructure being managed.
In spite of these drawbacks the design is very scalable and allows the critical application virtual machines to be protected by vShield App. This will mean consistent firewall policies can be applied to logical resource groupings, such as a datacentre, cluster, resource pool, or user defined security group (with vNIC membership) and all Virtual Machines provisioned within those resource groupings automatically inherit the defined security policy. The architecture allows for a large number of application clusters to be protected by the one very small management cluster. Due to the way that vShield App 4.1 uses a vSphere Kernel Module all traffic going between hosts (inter-host) and virtual machines, and between virtual machines even on the same port group (intra-host) is inspected and monitored.
Inspection and monitoring of inter-vm traffic on the same port group and host, and inter-host traffic is very important when customers are trying to meet PCI compliance. All VM traffic flows can be easily monitored, rules defined and enforced regardless of virtual machine location, and if necessary important allow/deny rules can be set to log to the standard logging facilities. vShield App supports logging to a centralized Syslog infrastructure and this should be set up during deployment. vShield Manager itself stores system events and audit records of changes made to the rule base and this can be inspected easily. vShield App v4.1 U1 is compatible with vSphere 4.0 to vSphere 4.1 U1. It is not compatible with vSphere 5.0.
On 01 September 2011 VMware released the latest version of vShield. vShield 5.0, which is part of the Cloud Infrastructure Management suite of products. vShield 5.0 is compatible with all versions of vSphere (4.x – 5.0). Many of the drawbacks with the vShield 4.1 architecture have been addressed as part of this product release. Some of the major improvements are:
- It is now possible to have a completely separate and independent management cluster.
- Although there is still a 1:1 mapping between a vShield Manager and a vCenter it is possible for the management cluster to host multiple vShield Managers for their associated vCenter Servers.
- The same management cluster can also host multiple vCenters for a very large scale deployment involving multiple resource clusters and resource vCenters.
- DRS Affinity Rules are no longer required if deploying vShield App Firewall Service virtual machines on shared storage as vShield Manager will ensure they run on the correct host and power them off automatically when a host enters maintenance mode.
- It is possible to design an architecture that allows for the management clusters to be protected without introducing circular dependencies. This is described below and is very useful for large enterprise deployments and especially where business critical applications will be protected.
- An alarm will be triggered if a virtual machine is migrated using vMotion to a host that does not have vShield App configured and running. The alarm can be configured to trigger an alert or take action of some sort as you define.
With the above improvements in vShield App 5.0 it is ready for the mainstream. However please bare in mind that some of the considerations for vShield 4.1 still remain with vShield App 5.0:
- It is not supported to rename the vShield App Firewall Service virtual machines.
- If vShield App is offline no traffic will flow to or from the protected virtual machines.
- vShield Manager and vCenter Server should not be deployed into the same clusters that they themselves are managing. See below for solution to this problem.
As I mentioned above it is now possible to create an architecture design without the circular dependencies and still allow for the protection of the management cluster and resource clusters. The diagram below describes how this is achieved.
As you can see from the above diagram vCenter C and D, and vShield Manager C and D manage and protect Resource Cluster 1 and 2 respectively. vCenter A, and vShield Manager A protect Management Cluster 2, while vCenter B and vShield Manager B protects Management Cluster 2. This architecture design is suitable for large scale deployments and between well connected datacentres. If the datacenters where this architecture are being deployed span long distances then in addition to the above components vCenter Heartbeat should be considered for additional availability in case of a WAN isolation event between sites. The Management Clusters in the above design would also host any other management components that are necessary to run the infrastructure, to ensure the most efficient use of resources. It would be recommended that the hosts are signed just sufficient to run all of the required management functions and allow for host failure. Normal VMware best practices apply to the configuration of all the clusters. It is recommended VMware HA and VMware DRS are enabled on all the clusters, and for the management clusters that the vCenter Servers and their dependencies are set to high priority start order.
Important things to note about the above design:
- Ensure you configure Allow Rules from VC to Hosts on the cluster being managed by it.
- Ensure you configure Allow Rules from VSM to ESX hosts of the cluster being managed by it.
These new changes have really opened up new possibilities of what vShield App can be used to protect and the scale of the environments that it can be deployed into and protect:
- vShield App to protect VMware View or other VDI type implementations, to ensure only allowed traffic flows between VDI desktops and to/from the correct systems.
- Integrating protection of virtual machines with SRM, so VM’s are protected the same way when they fail over to the recovery site as the are at the protected site.
- Large business critical applications.
VMware has really done a good job with the enhancements to vShield App 5.0 and I would encourage everyone to seriously and carefully consider it for their environments. This information has been based on recent design engagements where I’ve been working with large enterprise clients that are adopting vShield App for some of their security requirements. As with any solution being deployed in an enterprise a thorough planning, design, testing and implementation process should be followed to ensure that the solution achieves your objectives. I hope this article has been helpful and wish you the best of luck with your vShield designs and deployments.
For additional information please review the vShield Product Documentation, vShield Design Guide, and the vShield App Design Guide.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com. By Michael Webster +. Copyright © 2012 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.
Great article, thanks very much!
[…] Creating a small management cluster of hosts to host the vShield Manager, vCenter and vCenter DB and other supporting infrastructure. This is important to allow the managed infrastructure to be independent of the infrastructure that is being managed. If vShield Manager is deployed within the infrastructure it is protecting you will suffer circular dependencies that will prevent the solution, including vCenter from functioning, as all the traffic will be blocked during the implementation process. In larger environments this can be solved by using the design I described in vShield App Design for the Enterprise. […]
This article is a must read for anyone using vShield firewalls. Thanks.
In order to satisfy the seperate management cluster requirement, couldn't you simply set up DRS host groups within a cluster – one for vShield hosts and one for non-vShield hosts? Then you can set affinitiy rules so that the mgmt VMs MUST run on the non-vShield hosts and the protected VMs MUST run on the vShield hosts. We want to implement a limited vShield implementation and would prefer not to break our cluster into two just for a few mgmt VMs. It seems to me that this would work especially since HA respects the 'Must' rules.
Granted if you're trying to protect many VMs this may not be the way to go, but we're looking at a few for now (rules at the port group level) and maybe 25 max in the future. This is out of nearly 500 running in our cluster now. I'd rather the rest of the VMs have access to all resources while limiting only the mgmt and protected VMs.
You could try that. But it wouldn't be a supported configuration, so it would be at your risk. Although the DRS Rules would keep the Mgmt VM's on the designated hosts and the protected VM's on the protected hosts, what about all the other VM's in the environment? Every time a new VM is provisioned or maintenance needs to be done potentially you would have to change all the DRS Rules. So from a pure management overhead it might not be a good idea. Also within vShield Manager the VM's wouldn't show up as protected as not every host in the cluster has vShield App installed. So at best it's a short term solution until there is a management cluster. The management cluster only has to be big enough to run the management workloads though, so doesn't have to be too big. Breaking out a couple of hosts from a cluster to run management workloads shouldn't be that onerous. I recently did an implementation at a customer that only had 3 hosts and a physical vCenter Server. As part of the project we virtualized the vCenter server to act as the management server to host the vCenter, vCenter DB and vShield Manager. This worked great for them.
Could you do a blog on how to configure vshield edge portion only? Not very many options but everyone seems to be missing steps in their documentation.
Thank you much!
Looking at the logical diagram, it looks all traffic is being inspected by the kernel which should yield high performance and traffic throughput, but if the vshield APP FW VM is in user space, that would negate the performance benefits as being inspected by the kernel. In your diagram, shouldn't there be a link shown from the FW VM back to the distributed virtual switch if all traffic is passing through it.
Can you clarify this?
Hi Jay, The traffic passes through the kernel to the vShield App Firewall VM on an isolated standard vSwitch via a special vmkernel port. So there is no ling back to the vDS. The performance will be hardware dependent but I've seen over 6Gb/s throughput possible on vShield 4.1 and it's probably improved a lot on the latest versions.