Jetze's blog: Hyper-V home lab? Deduplication is awesome!

Tuesday, February 10, 2015

Hyper-V home lab? Deduplication is awesome!

Data deduplication is a Windows feature since Server 2012. Deduplication identifies identical chunks of data, stores a single copy and replaces the remaining copies with a pointer to this copy. Just as with compression, the maximum deduplication rate depends on your data. If every single file is 100% unique and shares no duplicates with other files, then there's not much to deduplicate. However, what if your files are VHDX files and every file contains the same Windows operating system files?

In Server 2012 R2 the Data Deduplication features were improved and deduplication of virtual machine files is now supported. Several limitations apply, for instance this is only supported with VDI workload and the server you enable deduplication on cannot be the Hyper-V server itself. The reason for that is that the actual process of deduplication, which runs as a scheduled task, requires quite a bit of resources and we don't want the performance of the VM's running on the server to be effected.

Microsoft claims that storage savings up to 95% can be achieved.

That's very interesting for business purposes but for my home lab too. My two Hyper-V servers have limited storage capacity and I have to remove unused files now and then to free up disk space. The workload is not VDI and the storage is on the local server so my configuration is not supported. Which is a risk I'm willing to take for my home lab.

A couple of days ago I enabled deduplication on the local data volume of the servers and used the Hyper-V usagetype to enable low-level optimizations for the deduplication of running Hyper-V images. First I had to install the Windows feature:

Add-WindowsFeature FS-Data-Deduplication
Enable-DedupVolume D: -UsageType HyperV

Enabling deduplication added three scheduled tasks under \Microsoft\Windows\Deduplication:

The tasks call ddcpicli.exe with various parameters, the Optimization task runs once a day. Ddcpicli.exe is not meant for manual usage, for that we have Get-DedupJob, Start-DedupJob and Stop-DedupJob.

I was patient and checked the result after some days with Get-DedupStatus:

After reviewing the full output I noticed that deduplication achieved a whopping 49% savings rate, even 52% on my second server!

So bear in mind, unless you're deduplicating VDI VM files on a remote Server 2012 R2 fileserver you're unsupported. If that's not a problem for your lab, try it for yourself. Before you do, make sure you've read the following articles:

What's New in Data Deduplication in Windows Server

Deploying Data Deduplication for VDI storage in Windows Server 2012 R2