2024-08-05 Update: Added “Method 2: Recovery using the DMDE utility,” renumbered earlier sections 2-6 (now 3-7)
This article explains various techniques and readily available tools for extracting data from an encrypted virtual disk. For incident-response situations in which the entire virtual disk has been encrypted, these tools and techniques may – may – enable the investigating team to retrieve data from the encrypted system.
Efforts to extract data from encrypted virtual disks can potentially lead to multiple positive outcomes: recovering customer data that is irretrievable via standard methods, helping rebuild virtualized customer infrastructure that has been compromised, and / or enriching an incident investigation timeline. So far, we’ve used these techniques successfully in DFIR investigations involving the LockBit, Faust / Phobos, Rhysida, and Akira ransomware groups.
We’ll say this at the beginning of the article and we’ll say it again at the end: Results are not guaranteed. No data-extraction method in existence is certain to yield full data from an encrypted VM. We will also highlight that while these methods have seen quite a high success rate in extracting forensic data that is valuable for the investigation (such as event logs, registry forensics, and the like), the success rate of retrieving data that can be used as part of the recovery process of production systems, such as databases, is much lower.
We strongly recommend that any recovery attempts should be conducted on “working copies” and not the originals, lest the attempts cause unintended further damage to the devices.
In the next section we’ll discuss in which situations retrieval may be possible and to what extent. After that, we’ll list some factors to take into consideration as you select which methods you’ll attempt. Finally, we’ll look at each method, listing the prerequisites (the tools required to attempt the method; all are required) and flagging other considerations. In the discussion of the most labor-intensive method, we’ll walk through the details of the process. In this article, references to “virtual disks,” “VM’s,” or “disk images” all refer to the same thing and can be any image of a disk such as VHD, VHDX, VMDK, RAW, and so on. All six techniques apply to Windows; a few also may work on Linux or other platforms, and we’ll note those in each case.
What is file / disk encryption?
When ransomware encrypts a virtual disk (or any file), the data has been essentially randomized, rendering the file unreadable by the operating system. The most well-known method of decrypting a file (returning the file to its original, readable state) is via a decryptor, a software tool or program designed to reverse the process of encryption, making encrypted files readable again.
In ransomware attacks, the decryptor is created and controlled by the threat actor. In those situations, unless the ransom is paid or the decryptor becomes publicly available, other methods of data recovery must be considered.
Ransomware binaries prioritize speed over thorough encryption. Encrypting entire files would be too time-consuming, so the attackers aim to inflict maximum damage swiftly, minimizing the window for intervention. Consequently, while smaller files like documents are usually fully encrypted, larger ones such as virtual disks may have significant portions left unencrypted. This provides investigators with opportunities to employ diverse techniques for extracting information from these virtual disks.
Which method to use: Considerations
There are multiple methods that can be used when looking to extract data from an encrypted Windows VM. (A few of these techniques are applicable to Linux recovery attempts as well, and we’ll indicate those.) In this article we will cover six:
- Method 1: Mounting the drive
- Method 2: Recovery using the DMDE utility
- Method 3: RecuperaBit
- Method 4: bulk_extractor
- Method 5: EVTXparser
- Method 6: Scalpel, Foremost, and other file-recovery tools
- Method 7: Manual carving of the NTFS partition
Which to try first? The following six considerations may help you in deciding which method is appropriate.
File size
Experience has shown that the larger the size of the virtual disk, the greater the chance of successful recovery. For Windows machines, this is largely because most VMs will have multiple partitions, usually three — recovery, boot, and the C: (user-visible) partition. (For this article, let’s assume the drive is mapped to the usual C:.) The first two partitions hold little data of use for an incident investigation, but because encryption commonly encrypts the first few bytes of the VM, only these partitions end up encrypted.
This, therefore, often leaves the C: partition, where customer data and potential forensic data is housed, untouched. This can help investigators to rebuild a compromised virtual device and enrich an incident investigation.
Conversely, if the VM file is relatively small, the likelihood of recovering data is lessened. However, there still may be an opportunity to harvest event logs or registry hives.
Tools
As with any other problem in incident response, there exist multiple methods and tools for tackling the same issue. Some tools may perform better than others depending on the type of encryption. It is worth trying multiple tools to get the result you need if your first attempt fails or only partially works.
It is also important to note that tools do stop getting updated and / or supported, so consider looking for additional tools not mentioned in this guide. The tools that we are using are third-party tools, or in some cases tools that are already part of Windows or Linux (this includes Windows Subsystem for Linux [WSL]). Throughout this article and in our everyday investigations, we acknowledge the great contribution the creators of those tools have made to defense efforts, especially in those cases in which the tools were not designed with encryption in mind.
Time
The time available to complete the task is something worth considering; the hardware / equipment you have available may play a part in this. For instance, manual carving (Method 7) is one available option, but this can take a long time; specifically, it can require a lot of processor power, which could slow down your device during the process. This could lead to you not being able to use the device you are using for forensic examination for other daily duties whilst this process completes. (Because of this, if it is not time-sensitive, we recommend you start the manual carving process towards the end of the working day and leave your device running overnight.) Different solutions take varying amounts of time and this needs to be considered.
Storage
Available storage space should be factored into your decision. Manual carving, for instance, can require quite a bit of storage space, as it will recreate a copy of the file; in other words, if you are trying to recover a 1TB virtual hard disk, you may well need at least another 1TB for the results. This is also true with some of the file recovery tools (Method 6), particularly if the master file table (MFT) is corrupt, since in that situation the tool could “recover” huge files that do not actually exist.
File types and priorities
Clients occasionally ask us to recover specific files (particularly Word documents and PDFs), as they are not interested in anything else. If that is the case, and you do not need any further data for the investigation as all the TTPs have been accounted for, it may be more useful for you to run an automated media file recovery tool over the VM, rather than doing a full recovery of the whole disk.
Need
In a related vein, the enterprise’s need to recover the data should be weighed in recovery decisions. For example, if the business plans to rebuild the device, they have a working backup of the data, and it’s not crucial to the investigation, what is to be gained by recovering data from it? Does it need to happen? (Probably not.) A clear understanding of the business need for recovery of this specific VM leads to better allocation of precious incident-response resources.
Methods of extraction: Seven techniques
The methods below cover multiple ways of attempting to extract data from a virtual machine. This is not an exhaustive list, since new methods and tools are being developed all the time; researching newer techniques and or tools is always encouraged, and we ourselves have updated this article as we added techniques to our own repertoire. With such a variety of options available, familiarizing yourself with the basics of each of these, then applying that knowledge to the considerations listed above, is likely the best approach – and one that gets easier with experience and practice.
All that said, though the list that follows is not in a strict order, we suggest that Method 1 should be the first step in any attempted recovery, for reasons that will be clear.
Method 1: Just mount it
Just because you have been told that the VM is encrypted doesn’t necessarily mean that it is. (Yes, cybercriminals sometimes lie.) We have encountered clients who have mistakenly thought their files were encrypted when, in fact, the attacker had simply changed the file extensions. In addition, we have seen instances where attackers’ encryption processes have failed and actually just renamed the file.
Always try this method first as it just might work — and save a lot of time. If it doesn’t succeed, you’ll have lost little time and have done nothing to impede other methods of retrieval. If, on the other hand, the method succeeds and the drive does mount, you can then access the file(s) and copy and paste from them as desired. In addition, because you are simply mounting the VM, endpoint protection (that is, antimalware / antivirus packages) should not detect or remove any malicious files. This will be useful if you plan to collect samples for labs submission. Some tips for success with this method:
- Try the 7-Zip GUI archiver; we have had a lot of success with 7-Zip in this situation
- Mount the drive
- If that’s not working, try FTK or any other third-party mounting tool
Method 2: Recovery using the DMDE utility
DMDE (DM Disk Editor and Data Recovery Software) is a utility designed for data recovery and disk management, developed and published by Dmitry Sidorov since 2006. It’s particularly useful for reconstructing data and volumes from partially encrypted virtual disks, such as those formatted with NTFS, FAT, and other file systems.
When dealing with a partially encrypted virtual disk, DMDE can locate and identify accessible sectors of data that remain unencrypted. It allows users to manually inspect and recover file structures, including directories and files within recognized file systems, using its hex editor and partition manager.
If recovery (even partial) is successful, DMDE is able to extract many crucial system files including EVTX logs, amcache, registry hives, and NTUSER.dat files. By leveraging the software’s ability to reconstruct lost partitions and restore the file system’s metadata, users can piece together fragmented or damaged data sectors. This facilitates the extraction and rebuilding of files even when portions of the disk are inaccessible due to encryption. With DMDE, it is possible to reconstruct what a standard forensic artifact collection tool would typically gather, making it an invaluable asset for data recovery professionals.
There is both a GUI and a command-line (console) version available for Windows, DOS, Mac, and Linux, and for each OS there is both a free and a fuller-featured paid version. We’ll demonstrate recovery techniques using the free version, which has the restriction of only being able to export individual files, and up to 4000 files from a specific directory; recursive file export is available only in the paid version. (As stated on the site, the paid and the free versions are capable of recovering all the same files. Without recursive file export, though, the process takes much longer and involve a lot of repetitive human actions.)
It should be emphasized that this is a recovery tool, not a file-system repair tool. Nothing will be written to the damaged disk, and you will need to define a storage location for your recovered files accordingly (see the Prerequisites box at left).
The tool offers numerous options and extensive functionality for various specific scenarios. To illustrate the steps for recovering files from possibly encrypted virtual disks, the following screenshots from the Windows GUI version demonstrate a typical and common scenario we encounter, with annotations indicating points of interest. The full DMDE manual is available on the product site.
Download the free version of the DMDE tool from the download page, choosing your preferred OS and interface. (Again, we’re strictly looking at Windows in this walkthrough.) Once downloaded and extracted, launch dmde.exe.
From the top toolbar, select Disk > Select Disk / Task. You’ll be presented with the window shown in Figure 1.
Figure 1: The task at hand, and other DMDE tasks
A window showing the available images opens; here you’ll select the encrypted virtual disk file. Make sure to change the drop-down to Any File so the file shows in the window. In the window that opens next (shown in Figure 2), select the file of interest and click Full Scan.
Figure 2: The image we hope to recover is selected
In the Scan Parameters window that opens next and is shown in Figure 3, ensure that the correct file-system formet (for example, NTFS if attempting to recover a Windows machine) is selected. Click Scan.
Figure 3: First choose the format, then start the scan
Wait for the scan to complete. Depending on the size of the virtual disk, this may take some time. Leave the Open Volume button alone during this process. As shown in Figure 4, the process will return an assortment of partitions. As a rule of thumb, the largest partition DMDE finds is most likely the one you most want to recover.
Figure 4: Progress bars and patience as the recovery process grinds on
Once the recovery process is completed, click on the largest recovered volume (note the difference in its status bar between Figure 4 and Figure 5) and click Open Volume.
Figure 5: Your recovered volume is ready
If you see errors such as this, ignore them.
Figure 6: A transient error; it needs nothing from you
When you open the recovered volume, you will see a window similar to the one in Figure 7. The window on the right should show a reconstruction of the original non-encrypted drive, which you can now browse.
Figure 7: The available files found in the recovery process
Once you find a file of interest, select it by clicking. Right-click on the highlighted file and select the Recover / Create File List option.
Figure 8: The NTUSER.DAT file can be recovered
In the Recover window that opens as shown in Figure 9, select a destination for the file to be recovered (again, files can’t be saved back to their original disk) and click OK.
Figure 9: The recovered file is on its way to safety. One down…
This will copy the selected files to your chosen directory.
Again, if you are using the free version, in order to recreate a typical collection of artifacts from a recovered device you will need to hand-select your chosen files and export them out to the chosen output directory, one by one. Since the free and paid versions can recover exactly the same files, if you see a good rate of recovery when you try the free version and wish to recover everything the program suggests you can, it may be worth it to you to pay for a full functional copy.
Method 3: RecuperaBit
RecuperaBit, created by Andrea Lazzarotto, is an automated tool that will rebuild any NTFS partitions that it can find in the encrypted VM. If it can find an NTFS partition, it will re-create the folder structure of that partition on the device being used for examination. If successful, you can then access the file(s) and copy and paste from them as desired from the newly created directory/folder structure.
It is a python script, so it will work on any OS that supports python3. It’s easy to use, and only a few options are needed to get it to rebuild the encrypted VM. Experience has shown that, on average, you should get a ‘yes’ or ‘no’ as to whether it can rebuild anything of use within about 20 minutes. After that, if it can manage the rebuild, it will take approximately another 20 minutes to recreate the partition for you.
It’s important to know that running RecuperaBit will likely set off endpoint-protection detections if ransom.exe or other malicious files are present. For this reason, if you choose to use RecuperaBit in situations where you hope to recover that executable for further analaysis you should run it in an environment where endpoint protections can be safely disabled — hence the prerequisite of a sandbox.
At the time of this writing, RecuperaBit can be downloaded from GitHub. There is a user guide on the GitHub page for the tool.
Method 4: bulk_extractor
Bulk_extractor (called bulk-extractor on its kali.org page, but the same program in either case) is a free tool that runs on Windows or Linux. It was created by Simson Garfinkel. It can recover system files such as Windows event logs (.EVTX) as well as media files. This tool is automated, so the investigator can start it and let it run, perhaps after hours, in hope it will recover something.
It is possible to configure it for specific file types or other artifacts by altering its config file. This can be very useful to speed analysis up in scenarios where you’re hoping for quick, focused, or specific results — for example, EVTX files only — rather than trying to recover the whole of the partition.
As with RecuperaBit in Method 3, running bulk_extractor will likely set off endpoint-protection detections if ransom.exe or other malicious files are present. For this reason, if you choose to use bulk_extractor in situations where you hope to recover that executable for labs submission or similar analysis, you should run it in an environment where endpoint protections can be safely disabled — hence the above prerequisite of a sandbox.
At the time of this writing, bulk_extractor for Linux can be downloaded from GitHub. There is a user guide on the GitHub page for the tool.
Method 5 : EVTXtract
This specialized tool searches a block of data (in this case, an encrypted VM) for complete or partial .evtx files. If it finds any, the tool pulls them back into their original structure, which is XML. This is an automated tool that is built to run on Linux only.
XML files are notoriously difficult to work with. In this case, the file will consist of incorrectly embedded EVTX fragments, so expect the output to be a bit unwieldly. To make it easier to review this tool’s output, you will have to massage the data. A couple of suggestions for doing this effectively:
- Attempt to convert the file to CSV format for easier viewing
- Use the grep command to get the outcome for YYYY-DD-MM (or any other date formats), event-IDs, keywords, or known IoCS indicating activity on the day of interest
Please note that this tool, just as the name indicates, recovers EVTX files or fragments only. If you are seeking other artifacts, you will need to use a different tool.
At the time of this writing, EVTXtract can be downloaded from GitHub. There is a user guide on the GitHub page for the tool.
Method 6 : Scalpel, Foremost, or other file-recovery tools
Turning our attention from EVTX-recovery tools back to those designed to restore other types of files, Scalpel and Foremost are two of many free file recovery tools currently available. (We covered another one, DMDE, separately in Method 2.) Though both are older tech, the Sophos IR team has had excellent results with these two in our investigations.
The original version of Scalpel, released in 2005, was based on Foremost, and the two carving and indexing applications are similar in approach. Both mainly recover media and document files, which makes them useful if your investigation is seeking documents, PDFs, or the like. For either one, the config file can be modified to focus on specific file types, or be left alone for a fuller (though slower) catch-all effort.
As mentioned, neither of these programs retrieves system files; other tools will be needed for that work. In addition, files recovered from these may kick off endpoint-protection detections if any malicious files are present (for instance, malicious PDFs from a phishing campaign). For this reason we recommend that investigators run these tools in a sandbox environment, where endpoint protection can be disabled, if such files must be preserved for the investigation.
As noted above, both these programs are older technology, which means that recovery of newer filetypes may not be feasible with these tools. Other tools exist, and the reader is invited to investigate those, but as easily available options these are both solid performers.
Foremost can be downloaded from GitHub, and there is a user guide on the GitHub page for the tool. It was originally developed by the US Air Force Office of Special Investigations and The Center for Information Systems Security Studies and Research. The version on GitHub does not appear to be actively maintained.
Likewise, at the time of this writing, Scalpel can be downloaded from GitHub. There is a user guide on the GitHub page for the tool. As stated on its GitHub page, this tool is not actively maintained.
Method 6 : Manual carving of the NTFS partition
In contrast to the tools and techniques summarized above, manual carving takes preparation and some finer understanding of the options available to you. We’ll make some recommendations for how to plan your effort, and then walk you through the specifics of working with dd, the powerful Linux utility you’ll use for this work.
(Some background: DD originally stood for “data definition” and is truly one of computing’s Elder Gods; it celebrated its 50th anniversary of existence in June 2024. New dd users are warned that typos can be catastrophic in this utility, earning it its alternate name of “disk destroyer”; it has been described as “a Swiss Army knife, but one that’s all blades and no handle.” It is recommended that investigators familiarize themselves with dd basics before proceeding. We also suggest typing the dd command into a text editor, making sure everything is correct, and then copying and pasting the command at the command line.)
Proper manual carving requires that investigators set three switches in dd prior to running the utility – bs (bytes per sector), skip (the offset value of the NTFS sector you aim to recreate), and count (the size of the sector). These calculations aren’t necessarily difficult, but they do take time and they are not optional. This section walks you through the steps for calculating all three.
In addition, the processing itself is rather slow, potentially taking hours to complete correctly. (As mentioned above, we generally recommend you start the manual carving process at the end of the working day and leave your device running overnight.) With some practice, however, the calculation of the switch values may take the investigator only a few minutes — and if you calculate the size of the partition you are going to carve before attempting to carve the partition, you reduce the likelihood of wasting time and processing power. So do that.
Note finally that this process is space-intensive, likely taking up the same amount of space the VM itself does, since you are essentially copying the VM. For example, if you’re working with a 100GB VM file, you’ll need another 100GB plus space in which to extract the files you want.
The process has four main steps:
- Analyze the encrypted VM for available NTFS partitions
- Carve the largest NTFS partition out and into a new file
- If the newly created file is intact enough, mount it in Windows
- Extract the artifacts you need
The utility that does the copying, dd, is built into Linux. The command is as follows:
sudo dd if= *** of=***.img bs=*** skip=*** count=*** status=progress
Again – and this cannot be emphasized enough – dd is entirely unforgiving of typos. Proceed with caution. The command and its switches may be understood as follows:
sudo = User needs to have highest privileges for this tool
dd = The utility itself
if = Stands for ‘input file’ — this value is the path and file name of the encrypted VM
of = Stands for ‘output file’ — this is the name of the recreated partition. Suggested file extension is newfilename.img
bs = The bytes per sector of the partition you are carving out; this value must be entered in bytes
skip = The offset value, in sectors, of the NTFS partition you are carving out, from the start of the disk / VM file
count = The size of the partition, in sectors, of the NTFS partition you are carving out
status = An optional switch to display a progress bar, to see how many bytes have been duplicated
As mentioned above, there are three values you must calculate and provide for the switches in this command: bs, skip, and count. The easiest way to work these values out is to use a GUI hex editor such as Maël Hörz’s HxD (which is Windows freeware), but a command-line tool such as xxd will work if preferred. The screen captures below show the steps using HxD.
Switches: Gathering the basic values
Start HxD and load in the encrypted VM file. Click the Offset column at the far left to change it to show values in decimal (base10). In HxD this is denoted by the letter D in brackets, as shown in Figure 10.
Figure 10: The offset values are now displayed in decimal numbers
Next, open Data inspector from the View dropdown, as shown in Figure 11.
Figure 11: The View dropdown in HxD with the Data inspector option selected
Now find the potential NTFS partitions. Highlight the very top left byte, then use the search function to search for the following hexadecimal string — as opposed to a decimal string or a text string, if such options are available.
EB 52 90 4E 54 46 53 20 20 20 20
Pay attention to which tab is open in the Find box, as shown in Figure 12.
Figure 12: Seeking the hex string that indicates the start of an NTFS sector
The above hexadecimal string is the ‘signature byte’ of a NTFS partition, so this search will find any potential NTFS partitions that you can carve out. There will likely be many presented in a list, as shown in Figure 13.
Figure 13: A fruitful search for potentially salvageable NTFS partitions
When you select one of these results, you will be presented with the header of the NTFS partition in the hex viewer window, as shown in Figure 14.
Figure 14: The header is shown above the selected NTFS partition
The header contains the basic information you need for the bs, skip, and count values required in the dd command. Next, we’ll explain how to calculate those three values. You’ll want to do these in order.
To calculate the bs (bytes per sector) value
Working from the start of the NTFS partition you have selected, highlight the bytes at offset 11 and 12, as shown in Figure 15. The value shown as Int16 in the data inspector is the value needed. In this example, the bs value is 512. (This value will almost always be 512. Almost.)
Figure 15: The bytes for the bs value are highlighted, and the data inspector shows that the value is indeed 512
To calculate the skip value
Now that you have the bs value, calculate the skip value by dividing the header offset value by the bs value. This calculation provides the sector value of where the NTFS partition starts.
For instance, the header offset decimal value for the NTFS partition highlighted in Figure 16 is 00576716800. (So we’re clear, the following screen captures are not from the same partition as the one in the screen captures shown above. As predicted above, though, you can see that the bs value for this NTFS partition — the bytes at offsets 11 and 12 — is once again 512. )
Figure 16: The header offset value is shown in the green box
In order to calculate the skip value, divide that value by the bs value (that is, 512). In other words, do the following:
576716800 / 512 = 1126400
1126400 is the skip value.
To calculate the count value
Locate and highlight the eight bytes that start at the 41st byte from the start of the NTFS header. To find this value, in the screen below, go down two rows from the first (EB) byte of the header, go across to the 08 column, and highlight the following eight bytes, as shown in Figure 17.
Figure 17: Finding the count value (highlighted)
Highlight the next eight bytes, all the way to column 15, as shown (so, bytes 41-48). The value that is shown in INT64 in the data interpreter is the count value – in the figure above, 1995745279. This value is in sectors, and the above command needs it in sectors, so no conversion is needed – note the value and you’re done.
Which partition to choose?
We said above that you should choose the largest available partition to carve out. The count value indicates how large the partition is. If the partition is only a few sectors in size, it is likely not worth carving out. To increase the chances of successfully carving out the C: drive, the best approach would be to find the largest partition in the initial list of NTFS partitions and carve that one out.
The largest partition should be approximately the same size as the overall VM file. However, the VM file size is shown in bytes, whereas the NTFS size is shown in total sectors. To compare them, you’ll convert the sector size of the partition into bytes to compare.
In order to convert the sector size of the partition into bytes, multiply the sector size (as shown in the data interpreter) by the bs value. So, using the numbers we found in the above examples:
1995745279 x 512 = 1021821582848 bytes (951.64 GB)
Ready, set…
You now have the three values you require to use the dd utility. Enter the needed values into the dd command, paste the command into dd itself if you followed our advice to do all this in a text editor, hit Enter, and dd will carve out the chosen NTFS partition.
When completed, mount the new file that you just carved. You should then be able to recover what you need. If the drive does not mount, try 7-Zip (or other archiving tools), other mounting tools, or FTK.
To recap, Figure 18 shows an annotated diagram of the NTFS header and where the values are located.
Figure 18: A colorful look at an NTFS header (count value is marked as “total sectors in file system”)
Conclusion
Once more, we caution the reader that results are not guaranteed; the best method of retrieving data encrypted in an attack is to pull a copy from a clean, unaffected backup. However, these methods may help the investigating team claw back data in situations where there’s no other choice.
When is it time to give up? Sadly, data cannot always be recovered fully, in part, or even at all. Expect results to vary, sometimes for no reason that can be determined. It’s up to you, in consultation with the business stakeholder, to decide when to walk away from the process.
Acknowledgements
The authors wish to thank the creators of the software mentioned above. The editor wishes to thank Jonathan Espenschied for the Swiss-Army-knife-with-no-handle description of dd. Some information in this article was originally presented as part of CyberUK in May 2024.