Many of you may recall the article I wrote titled The Case for Larger Than 2TB Virtual Disks and The Gotcha with VMFS. In that article I put forward the pros and cons for larger than 2TB virtual disks, some solutions suitable for large storage VM’s in vSphere, and an issue caused by VMFS Heap Size. The VMFS Heap Size issue itself was somewhat addressed with updates, which I wrote about in Latest ESXi 5.0 Patch Improves VMFS Heap Size Limits. When architecting a large amount of storage for a Virtual Machine there are a lot of things to consider and it’s not all about capacity. I gave a considerable overview of storage sizing for business critical applications in my article Storage Sizing Considerations when Virtualizing Business Critical Applications. While the content and considerations in these articles are still quite valid in a lot of respects many things changed a couple of weeks ago.
At VMworld US 2013 in San Francisco VMware announced the launch of vSphere 5.5. There were many significant changes and enhancements announced in the release and some of the more significant ones I mentioned briefly in my article VMworld USA 2013 By The Numbers. This article will focus squarely on the new 62TB VMDK and VMFS enhancements that I’m calling Jumbo VMDK. These changes address the gotcha’s with VMFS and the size limitations that I have highlighted previously. I predict this will be one of the very welcome enhancements for customers considering running business critical applications on vSphere 5.5, especially for those Monster VM’s with large storage requirements.
The first thing you might notice from the opening paragraphs of this article is that I mentioned a 62TB VMDK, not a 64TB VMDK, which you may have heard elsewhere. You might be wondering why. The reason is quite simple and straight forward. In order to allow for snapshots and the additional metadata and log files for a VM to fit within a 64TB VMFS volume you can’t consume all the space with a single VMDK. So as the VMFS maximum volume size is 64TB the maximum supported VMDK is 62TB.
With the pervious 2TB maximum VMDK size it was theoretically possible to have a VM with 120TB storage assigned, which is quite a lot. With this new Jumbo VMDK size it is theoretically possible to have a single VM with either 3720TB assigned, assuming 4 vSCSI controllers with 15 devices each (60 vmdk’s total), or 7440TB assigned using 4 x new AHCI SATA controllers that allow up to 30 devices each (120 vmdk’s total, requires vHW v10). This would be a truly massive VM and I would say it’s not very practical right now for many reasons, one of which being you’d need more than one fully populated enterprise storage array for this one VM, but it is theoretically possible. Significantly this is all without resorting to the use of raw device maps (RDM’s). If you were to try this configuration you would consume 120 datastores per host (for this one VM). So you would have used up almost half of the maximum number of datastores (255) currently supported on an ESXi host. This is because each host in the cluster needs to see each datastore of each VM so that vMotion and VMware DRS will work properly. In any case I think that’s enough theory. Lets get into some of the practical considerations and specifics of the 62TB Virtual Disk Feature of vSphere 5.5.
Highlights of the Jumbo VMDK and VMFS Heap Enhancements
VMware has made the use of the VMFS Heap much more efficient in vSphere 5.5:
- Reduced memory consumption (256MB vs 640MB in previous releases)
- VMFS Heap is no longer a limiting factor for the max amount of open VMDK’s
- Pointer Block Cache Eviction (Described in more detail later) swaps out unused VMDK pointer blocks from memory
- New MaxAddressableSpaceTB advanced parameter controlling the size of Pointer Block Cache kept in memory
- Supported on VMFS5 or NFS (NFS depends on array supported maximum file size)
- No specific virtual hardware requirement (except if you want to use the AHCI SATA controller, which requires vHW v10)
- Requires ESXi 5.5
- 62TB Virtual Mode RDM’s also supported (vRDM)
Supported and Unsupported Cases in Detail
I have given you some of the highlights of the 62TB VMDK’s and vRDM’s but now lets look at what’s supported and what’s not supported in a bit more detail.
- Provisioning of VMDK’s and vRDM’s up to 62TB on ESXi 5.5 via the vSphere Web Client
- Provisioning of 64TB physical mode RDM (pRDM) is supported in ESXi 5.0 and above
- Offline extension or growing of a 2TB (-512B) VMDK up to 62TB on ESXi 5.5, provided they are GPT partitions
- 62TB VMDK’s are supported on VMFS5 or NFS
- Storage vMotion
- Storage DRS
- vCloud Director 5.5
- SRM/vSphere Replication*
- vSphere Flash Read Cache (However only 16TB of vSphere Flash Read Cache is supported, so not everything can be cached)
- Linked Clones
- SE Sparse Disks
- Online or Hot Extension of 2TB VMDK’s
- BusLogic Virtual SCSI Adapter
- Virtual SAN (VSAN)
- Fault Tolerance
- VI (C#) Client
- MBR Partitioned Disks
- VMFS Sparse Disks
* For SRM and vSphere Replication to work Jumbo VMDK’s need to be created in the vSphere Web Client and can’t be managed by the VI Client. SRM requires the VI Client for management still in order to function. If you try and manage a VM with Jumbo VMDK’s from the VI Client you will likely get errors. Remember that all new features and enhancements are supported only via the vSphere Web Client. Here is an example of what you might see.
Important Considerations when using Jumbo VMDK’s
It’s fantastic that we can now support up to 62TB VMDK’s on a VM in vSphere 5.5. But like with all storage architecture decisions you need to consider a number of things, there are always tradeoffs. So this section will briefly cover some of the important considerations when using Jumbo VMDK’s above 2TB.
Firstly we have Queue Depth. The SCSI queue depth for a device inside most Guest OS’s is by default 32 (LSI Logic) or 64 with PVSCSI. Each vSCSI Adapter has a queue depth as well and how big it is depends on which vSCSI adapter you’ve chosen. Just because you increase the size of your VMDK’s doesn’t mean the size of your queue depth increases. This will limit the amount of concurrent IO’s that can be serviced by the device. Even if you can increase the queue depth on your host HBA’s per device, which you shouldn’t do without careful consideration and speaking to your storage team or storage vendor, this isn’t going to help much if you only have one VMDK per datastore and you can’t change the Guest OS queue depth settings. But if you’re IO service times from your storage are fast enough having a limited queue depth might not be so much of a problem. It’s all a function of service time and concurrency. Having an understanding of your IOPS per TB and your service time will be a very useful metric when deciding on your VM storage architecture. Note that PVSCSI queue depths can be changed from their defaults see KB 2053145 – Large-scale workloads with intensive I/O patterns might require queue depths significantly greater than PVSCSI default values.
Then you have to consider your RPO’s (Restore Point Objective) and RTO’s (Restore Time Objective). How much data can you afford to lose and how long will it take to backup and recover. It will be significantly longer to check disk for integrity if there is any type of corruption. What method will you use to backup and recover these Monster Storage VM’s? Having fewer large devices means you have less devices overall to manage, but it also means the impact could be higher if one of these VMDK’s needs to be recovered.
Now you have to consider how you will grow your VM storage when capacity is getting close to being exceeded. Do you want to have to shutdown the VM, or do you want to hot add the storage? If you want to extend a 2TB Plus VMDK you have to do that offline. However you can just add another VMDK to the VM up to 2TB (minus 512B).
Do you need to be able to use Fault Tolerance of VSAN? If so then you won’t be able to use Jumbo VMDK’s for VM’s that need to leverage these features.
If you want to look at storage architecture, design and sizing for business critical applications in more detail and some of the considerations then I’d recommend you take a look at my article Storage Sizing Considerations when Virtualizing Business Critical Applications.
VMFS Heap Enhancements
Before I deep dive into this remember that for the vast majority of environments you don’t need to change the default settings and this information should be considered carefully along side your knowledge and understanding of your particular environment, circumstances and requirements. This really is for when you’re considering virtualizing business critical apps and Monster VM’s with large storage footprints. As I covered in The Case for Larger Than 2TB Virtual Disks and The Gotcha with VMFS previous versions of ESXi used a VMFS Heap value to control how much memory was consumed to manage the VMFS filesystem and for open or active VMDK capacity on a single ESXi host. This limit was not documented in the vSphere Maximum’s product document and by default with a 1MB Block Size on ESXi 5.0 GA it would limit a host to being able to open 8TB of VMDK’s before errors could occur (not very much). The maximum on ESXi 5.0 GA was 25TB with 1MB Block Size, which required adjusting advanced parameter. This was later increased to 60TB by default on ESXi 5.0 by applying patches as I outlined in Latest ESXi 5.0 Patch Improves VMFS Heap Size Limits and in ESXi 5.1 Update 1. The only downside of this was 640MB of RAM was consumed for the VMFS Heap.
The good news is that in vSphere 5.5. the whole VMFS Heap size problem has been addressed. The VMFS Heap is now irrelevant as a measure of how much Open and active VMDK capacity a single ESXi 5.5 host can handle. This is due to major improvements in the way the VMFS Heap and Pointer Blocks are managed.
What are Pointer Blocks? Pointer Blocks are a pointer to a VMFS block on disk. When a VMDK is opened on an ESXi 5.5 host all of the VMFS blocks are cached in the Pointer Block Cache, which is not part of the main VMFS Heap (where the pointer blocks were previously stored in prior versions of ESXi). This allows the open VMFS blocks to be addressed or accessed and managed as fast as possible without having to access metadata from the VMFS filesystem directly. The Pointer Blocks will remain in use so long as a VMDK or other file is open. However many blocks in any individual VMDK are not often active. It’s usually only a percentage of the blocks that are actively used (say 20%).
This is where the new Pointer Block Eviction Process introduced in ESXi 5.5 comes in. If the number of open and active VMFS blocks reaches 80% of the capacity of the Pointer Block Cache, a Pointer Block Eviction Process will commence. This basically means the Pointer Blocks that are not active, or least active, will be evicted from memory and only the active blocks will remain in the cache. This new process greatly reduces the amount of ESXi host memory consumed to manage VMFS filesystems and the open VMDK’s capacity per host. The VMFS Heap itself in ESXi 5.5 consumes 256MB of host RAM (down from 640MB), and the Pointer Block Cache by default consumes 128MB of host RAM. You no longer have to worry about adjusting the size of the VMFS Heap at all. A new advanced parameter has been introduced to control the size of the Pointer Block Cache, MaxAddressableSpaceTB.
As with all advanced parameters you should not change MaxAddressableSpaceTB without a good justification, and in most cases it will not be necessary. Here I will explain how it works and what the considerations are when using it. MaxAddressableSpaceTB by default is set to 32, with a maximum of 128. This controls the amount of host RAM the Pointer Block Cache consumes. With the default setting at 32 it will consume 128MB of host RAM (as mentioned previously) and with the maximum setting of 128 it will consume 512MB of Host RAM. However it’s important to note that this does not limit the capacity of open VMDK’s on the ESXi 5.5 Host, just how many of the Pointer Blocks can stay cached in RAM. If only 20% of all VMDK blocks are active then you could conceivably be able to have 640TB or more of Open VMDK capacity on the host, while still having the active Pointer Blocks cached without much if any performance penalty.
The way this new Pointer Block Eviction process works gives you a sense of having an almost unlimited amount of Open VMDK capacity per ESXi 5.5 host. But it’s not quite unlimited, there is a tradeoff as the amount of active VMDK capacity on an ESXi 5.5 host increases. The tradeoff is possible Pointer Block Cache Thrashing, which may impact performance.
With the default setting of MaxAddressableSpaceTB=32 the Pointer Block Eviction process won’t kick in until the amount of Open VMDK’s exceeds 25.6TB. So if you aren’t expecting the VM’s on your hosts to routinely exceed 25TB of open and active VMDK blocks there is probably no need to even look at adjusting MaxAddressableSpaceTB, this saves you some host RAM that can be used for other things. In most cases you would only have to adjust MaxAddressableSpaceTB if the active part of all open VMDK’s on a host exceeds 25TB. If active VMDK blocks exceed the capacity of the Pointer Block Cache then thrashing could result from constantly evicting and reloading pointer blocks, which may have a performance penalty.
You will see signs of Pointer Block Eviction in the VMKernel Logs on your hosts if it is occurring (Syslog, vCenter Log Insight or Splunk will help you spot this type of activity). If you start to notice any sort of performance impact, such as additional storage latency (KAVG in ESXTOP), and a correlation to Pointer Block Eviction, then that would be a sign you should consider adjusting MaxAddressableSpaceTB. If you’re planning to have 100TB of Open VMDK’s per host routinely, as in the case of large file servers, large Exchange Mailbox Servers or Large Database Servers, then I would suggest considering setting MaxAddressableSpaceTB = 64 to start with and adjusting upwards if necessary. If you’re not concerned about the amount of RAM the Pointer Block Cache will consume you could consider setting it to the maximum of 128 and just leave it at that. Doing so may consume host RAM unnecessarily so should be considered along with the total RAM per host and the RAM that is likely to be consumed by all VM’s. 512MB of RAM consumed for Pointer Block Cache on a host with 512GB of RAM or more is probably not significant enough to worry about, but could be worth considering carefully if your hosts only have 32GB of RAM. Remembering of course that any time you change an advanced parameter it’s something that has to be managed and considered when you are changing your environment.
This helps storage in general, not just Jumbo VMDK’s, but I thought it was important to mention. One of the features introduced in vSphere 5.0 Update 1 was support for a new type of storage behaviour called a Permanent Device Loss or PDL. This is a state common in Stretched Metro Cluster environments (vMSC) where a device becomes unavailable at one site, or where a device is administratively removed and will not be coming back. PDL handles the SCSI sense codes that are sent back from the storage arrays and vSphere then stops sending IO’s to the failed devices. Use of PDL is described in Duncan Epping’s article vSphere Metro Storage Cluster solutions and PDL’s. In vSphere 5.5 the PDL behaviour has been further enhanced.
vSphere 5.5 introduces PDL AutoRemove, which automatically removes a device in a PDL state from a host. A PDL state on a device implies it cannot accept more IOs, but needlessly uses up one of the 256 device per host limit. Now the PDL devices will be automatically removed, which frees up the number of devices per host, given that the devices in PDL state are not coming back. In the case of a vMSC environment the devices may eventually come back at some point in the future and you can simply rescan to pick them up. But that would mean a manual action to bring back the devices. Therefore with vSphere Stretched Metro Cluster environments it’s recommended to disable the PDL AutoRemove functionality by changing the advanced host setting Disk.AutoremoveOnPDL=0 (zero).
I hope this has given you a good understanding of the new enhancements in vSphere 5.5 with regard to Jumbo VMDK’s and VMFS Heap and how VMware has solved the whole Open VMDK capacity per ESXi Host problem. This really makes it possible to virtualize Monster VM’s with Jumbo VMDK’s with much more confidence. I think Jumbo VMDK’s in vSphere 5.5 will be very popular. Although I don’t think everyone will go out and provision 62TB VMDK’s all over the place I could see 3TB, 4TB and even 6TB or 8TB VMDK’s being used for certain scenarios.
Statistically storage contributes to > 80% of problems in VMware vSphere environments. So it’s important to architect and operate your storage properly. I covered some aspects to help avoid storage problems in my article titled 5 Tips to Help Prevent 80% of Virtualization Problems. Hopefully this article helps you understand the new VMFS Heap and 62TB features of vSphere 5.5, so you can make use of these new capabilities confidently. If you’d like to review more vSphere 5.5 features I recommend checking out the What’s New in vSphere 5.5. Platform-Quick Reference that Alan Renouf has been kind enough to put together.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2013 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.