Over the last two years, ransomware has been all over the news. Hardly a week goes by without a report of a large ransomware outbreak or the emergence of a new ransomware family. Despite all this attention, very little is known about how profitable ransomware is and who the criminals are that benefit from it.

To answer these questions and expose the inner workings of the ransomware economy, our research team at Google, in partnership with NYU, UCSD and Chainanalysis, has developed a new methodology and a set of technologies to trace bitcoin ransom payments at scale. Over the last 12 months or so we have applied it to hundreds of thousands of ransomware binaries from over 30 ransomware families. This large-scale tracing has enabled us to build up a precise picture of the ransomware economy and identify the key ransomware groups.

This series of three blog posts summarizes the key findings of our large-scale study. By the end of the series you will have a clear understanding of how the ransomsphere is structured and who its kingpins are.

This blog post, the first in the series, explains the methodology and techniques we developed to trace ransomware payments end-to-end. The next post will discuss the key insights we garnered about the ransomware ecosystem as a whole. The last post will take a detailed look at some of the major ransomware players and analyze what made them successful.

We presented the findings discussed in these posts at Blackhat USA this summer, in a talk called “Tracking desktop ransomware payments end-to-end”. You can check out the slides here.

People don’t backup their data

Before delving into the ransomware world it is important to take a step back and acknowledge that the fundamental reason why ransomware is so successful, and here to stay, is that people simply don’t backup their data.

I couldn’t find recent data on this, so in July I ran a survey to ask US internet users about their backup habits. The results, summarized in the chart above, clearly show that despite all the awareness campaigns about ransomware, backup habits are still very poor: only 37% of respondents reported that they backup their data.

Even within the minority who do backup, the results are still grim. Over half of them backup only infrequently, which still leaves them vulnerable to ransomware attacks. Given this, it seems that changes at operating system level are in order, to protect people by default as they won’t do it themselves.

The lifecycle of a ransomware infection

Let’s start by quickly recapping how ransomware works, so that everyone is on the same page.

As illustrated in the figure above, the ransomware lifecycle can be broken down into four phases. Let’s briefly describe this process.

Ransom

After a victim is infected by ransomware, their personal files are encrypted and a ransom note is displayed. As can be seen in the screenshot above, the ransom note usually points to a TOR website that contains a unique identifier used by the ransomware author to identify the victim. These days, most ransomware puts pressure on users to pay quickly by threatening to increase the price after a certain amount of time. Some even threaten to delete all of the victim’s files if the ransom is not paid in time.

Payment site

Once the victim is able to visit the TOR website, they end up on a page that provides them with instructions on how many bitcoin to pay, where to buy them, and which bitcoin wallet to pay into. As bitcoin allows anyone to create as many wallets as they want, and instantly, most ransomware generates one bitcoin wallet per victim.

Bitcoin buyout

The victim goes to buy bitcoin at one of the bitcoin exchanges, such as LocalBitcoins. Then they transfer the bitcoin to the criminal’s wallet that is identified on the ransom site.

Decryption

Once payment is confirmed by the ransomware group, the victim receives decryption keys and is able to recover their files. Most ransomware criminals do honor their promises to recover files and provide decryption keys. The most notable exception to this is wipeware, such as Wannacry and NotPetya, which only pretended to be ransomware in order to hide their true purpose (see the next post in this series for more details!).

However, paying the ransom is ill-advised (although it is open for debat) as there is still a chance that the victim will be scammed. Furthermore, it is not uncommon for decryption keys to be made available for free by the authorities or researchers once the ransom authors are arrested or a vulnerability in the ransomware encryption schema is found.

Why does ransomware use Bitcoin and Tor?

Before explaining how we can trace ransomware payments, I first need to explain why criminals favor Bitcoin and TOR, to show the challenges we faced while tracing payments.

Why are ransoms in Bitcoin?

  • Anonymous: Creating bitcoin wallets, like other virtual currencies, does not require any form of ID. This makes it ideal for conducting cybercrimes.
  • Fully Automatable: It is easy to fully automate all aspects of ransom payments, from wallet creation to payment monitoring to moving money to cashing it out.
  • Irrefutable: Bitcoin transactions are irrefutable, which guarantees that once the ransom is paid, the money will not be charged back—unlike credit card transactions.
  • Fungible: Bitcoin is the only virtual currency with enough people who want to buy it to make it fungible. This alone explains why bitcoin is the virtual currency of choice for ransomware. No other currency would so easily allow cybercriminals to cash out the large proceeds of their criminal activities (tens of millions of dollars).

Why does ransomware use so many Bitcoin wallets?

Cybercriminals make use of the fact that creating and monitoring Bitcoin wallets can be done automatically; it assists them in figuring out which victims have paid. Ransomware creates one wallet for each infection, so it is easy to tie a specific payment to a given ransomware infection and a given victim.

Why does ransomware use TOR?

TOR makes it difficult for law enforcement authorities to locate ransom websites and shut them down. As a result, it is no longer possible to rely on one of the most effective tactics against botnets: shutting down the control site. TOR also makes it harder to crawl the sites to get ransom wallet addresses, as the sites require that the crawler supports the TOR protocol.

Tracing ransomware payments: overview

Tracing ransomware payments is done in four phases:

  1. Gathering ransomware samples: We build an initial dataset by finding and labeling ransomware samples for all the families we are studying.
  2. Increasing coverage via clustering: Using the binaries from the original dataset as seeds, we are able to use clustering to double the number of ransomware binaries in our dataset.
  3. Finding the bitcoin wallets associated with each ransomware family: By applying dynamic execution and machine learning to the ransomware binaries, and crawling the payment sites associated with the ransomware included in our dataset, we are able to link ransomware families to specific bitcoin wallets.
  4. Identifying ransomware cashout wallets: Making tracing payments to the wallets we identified in the previous phase enables us to trace how the ransom payments were transferred through the bitcoin chain and uncover the wallets used by cybercriminals to cash out.

Let’s discuss each of these phases in more detail.

Gathering ransomware samples

The first phase of our research involved creating a corpus of ransomware binaries for all the major ransomware families. The key difficulty of that phase was to find which of the malicious binaries collected by VirusTotal belonged to a given family. To find these needles in the haystack we wrote rules that matched each variant of each ransomware family. This was a titanesque task as there were 34 families (as shown in the tag cloud above) and hundreds of variants. These rules allowed us to collect an initial dataset of around 154.000 ransomware binaries.

Increasing coverage via clustering

Armed with our initial dataset, we turned to clustering and code similarity to find additional ransomware binaries automatically. This phase was essential because we knew that our rules, while very precise, missed a lot of ransomware binaries. As shown in the diagram above, the code similarity analysis helped find more ransomware, whereas the clustering algorithm that looked variously at the domain contacted, the files dropped, and other dynamic execution indicators, allowed us to assign the newly discovered ransomware to its correct family and variants.

This phase allowed us to almost double the size of our dataset by uncovering an additional 147,361 binaries. Adding all those extra binaries ensured good coverage, and that our dataset was representative of ransomware activities.

Finding the bitcoin wallets associated with each ransomware family

Getting from the ransomware binaries to the bitcoin wallets was achieved in three steps:

  1. The ransomware binaries were executed to produce ransom notes.
  2. The ransomware notes were analyzed with deep-learning to extract the payment sites’ TOR addresses.
  3. Our Tor-aware web crawler scraped payment sites to retrieve details of the bitcoin wallets where ransoms must be paid.

Identifying ransomware cashout wallets

Once we had the bitcoin address associated with the ransomware binaries, the final step in closing the loop was to trace the payments through the bitcoin chain, to find out where the money was flowing and was cashed out.

Anatomy of a ransomware payment via Bitcoin

Tracing bitcoin movements, while difficult, is possible. Bitcoin transactions are public and include all the information we need to trace ransom payments, as long as we know which wallets to look at.

To illustrate this, let’s look at a real Locky ransom payment from 2016. The screenshot above shows two transactions in this wallet. Each transaction contains the following information: the amount transferred, the sender wallet, the recipient wallet, and the date of the transaction.

The difficulty in tracing ransom payments through the blockchain is that you have to identify which wallets were used to pay the ransom and which wallets are used to cash out.

The ransom wallets were identified by completing the first three phases of our research. For the wallets associated with bitcoin exchanges cashout we relied on Chainanalysis’ dataset, as they did a fantastic job of identification.

Combining these two datasets of wallets with bitcoin transaction records allowed us to attribute the transactions above to a ransom payment for the Locky ransomware family that was made in August 2016.

We are also able to infer that the ransom amount of four bitcoins was bought on localbitcoins.com, a popular exchange, because we know they control the sender wallet 1N1NnUFAxbJScsDN6fVuoNMsCtbWwnE1Ji that you can see in the first transaction. Similarly, we know that those four bitcoins were ultimately cashed out by the Locky gang via BTC-e, as BTC-e controls the recipient wallet of the second wallet: 152LfB5rEXnWvk2W2GvvcQWjX6ibC4kKna.

Dealing with intermediary wallets

In most cases, tracing payment is not that easy as cybercriminals move the bitcoins through multiple wallets in an attempt to evade payment tracing. Some use bitcoin mixers to make it even harder to trace the payments.

However, this is not insurmountable, because no matter how many times the bitcoins are moved, ultimately they must be cashed out at exchange points. So we just need to keep tracing movements until we reach a cash-out wallet.

Doing this at scale is possible because cybercriminals, in order to simplify cash-out operations, move multiple ransom payments into a single wallet designed to be cashed out. We call these wallets accumulation wallets. During the course of our study we found out that these wallets are fairly stable and are used to cash out up to a million dollars over the course of a few weeks.

To identify these accumulation wallets, we made micro-payments to the ransom wallets we had identified in the earlier phase of our study. We then followed each payment as it was moved from one wallet to another until it reached an accumulation wallet.

Identifying ransom payments

Accumulation wallets are the key to identifying ransomware payments. Once one of these wallets is identified, we look at its transaction ledger to trace back all the payments that ended up in it. This tells us how many ransoms were paid to this wallet and when they were paid. Armed with this last piece of information, we are able to close the loop and tie back ransom payments and temporal data to a given binary and ransomware family.

By repeating this procedure over and over we were able to uncover the inner workings of the ransomware economy and identify the kingpins of this underworld. The next post in this series will focus on the key insights we garnered about the ransomware ecosystem while analysing the data collected via the methodology described in this blog post.

Thank you for reading this post till the end! If you enjoyed it, please don’t forget to share it on your favorite social network so that your friends and colleagues can enjoy it too and learn about ransomware.

To get notified when the next post of the series is online, follow me on Twitter, Facebook, Google+. You can also get the full posts directly in your inbox by subscribing to the mailing list or via RSS.