The Chainalysis Data Accuracy Flywheel (2024)

We believe that cryptocurrency is the future. As our CEO Michael Gronager explained in a recent blog post, we expect that blockchains will soon be the world’s primary mechanism for the exchange of value, just as the internet has become the primary mechanism for the exchange of information. Our mission is deeply tied to that future — by building trust in blockchains, we’re working to set the stage for mass adoption of crypto in a way that gives participants safety and security. But how are we doing that on a practical level? By building the world’s most robust knowledge graph of blockchain activity, so that stakeholders across the public and private sectors can see the entities transacting on-chain, the activity those entities are involved in, and the connections between them.

At its core, that knowledge graph is made up of raw, public on-chain data augmented by Chainalysis attributions of cryptocurrency addresses to specific services or wallets, plus our ability to then group together more addresses belonging to each service or wallet in a process we call clustering. Given our long-term mission, as well as the more immediate fact that our solutions are used to detect and investigate criminal activity, it’s of the utmost importance that our attribution and clustering of blockchain entities be accurate and verifiable. For that reason, we take a conservative approach and only attribute addresses to a service or wallet when we have ground truth data — meaning directly observable, empirical evidence — demonstrating that the address belongs to that service or wallet. We only then cluster new addresses together with those ground truth addresses on a deterministic basis. Deterministic methods of data analysis will always yield the same outputs when given the same inputs because they operate based on a set of predefined rules — in the case of our clustering process, those rules are based on the empirically observed on-chain behavior of wallets and services.

In other words, the information users see in Chainalysis solutions is there because it’s accurate. Our approach to address attribution and clustering allows us to live up to three important principles:

    1. Consistency of results. Ground truth attributions and deterministic clustering mean that our solutions always produce the same measurements of any service or wallet’s on-chain activity given the same data inputs. [1]
    2. Auditability and transparency. Since our methods always produce the same results given the same data inputs, those results are fully auditable. Unlike probabilistic methods, the deterministic clustering process enables us to reconstruct a cluster representing a service from scratch step by step. This auditability also makes internal peer review possible and provides unparalleled transparency into how we arrived at our conclusions. As such, law enforcement agencies can conduct their own investigations to verify our conclusions for prosecutions.
    3. Always backed by human review. A ground truth standard for address attribution requires that humans start the process by personally verifying the ground truth data used to attribute an address to a given wallet or service. Other humans then double check that verification. Next, deterministic clustering identifies further addresses belonging to that wallet or service based on rules created, reviewed, and constantly improved by humans. Finally, automation and humans also watch for evidence that a service has changed its on-chain behavior or infrastructure in a way that limits our ability to continuously cluster more of its addresses — if it has, we need to come up with new clustering methods. This is a major differentiator from blockchain analysis tools based only on machine learning.

Our focus on the principles above drives accuracy in the information displayed in our solutions. But more importantly, we get confirmations of its accuracy every day in the natural course of how customers in both the public and private sectors interact with our solutions, creating a flywheel effect through which our methodology and results are constantly affirmed. Below, we’ll break down in more detail how we attribute and cluster addresses with the highest possible standard for accuracy, and how the results from those processes are confirmed to be accurate.

Building the Chainalysis blockchain knowledge graph

The Chainalysis Data Accuracy Flywheel (1)

At a high level, building the Chainalysis knowledge graph consists of the two processes we discussed in the introduction.

  1. Ground truth address attribution
  2. Deterministic address clustering

We’ll break them both down below.

Ground truth address attribution

Chainalysis has a large team of blockchain intelligence and forensics experts collecting cryptocurrency addresses they can attribute to the specific real-world entities who control them.

The simplest and most common way of doing this in the case of services is by observing known transactions involving them, which allows us to identify deposit and withdrawal addresses for a given service.

We can also collect deposit addresses via open-source intelligence (OSINT), which usually means scouring online platforms (e.g. YouTube, Telegram, Reddit), clearnet and darknet web forums, and other similar platforms. Keep in mind that simply seeing the address isn’t enough — we also need to check the blockchain and confirm that someone has transacted with that address before.

The Chainalysis Data Accuracy Flywheel (2)

Finally, Chainalysis customers also regularly share with us their own specific address attributions for inclusion in our dataset. We forward those addresses — along with our customers’ proof — to our intelligence team, which then verifies before including them as ground truth addresses and pushing them into our knowledge graph.

Deterministic address clustering

Chainalysis has clustered over a billion addresses across more than 55,000 services, wallets, and protocols. In order to do this at such scale, we build heuristics that can start with a ground truth-attributed address for a given service or wallet, then automatically identify new addresses belonging to the same service or wallet and group them all together. Those deterministic clustering heuristics are built based on the observable on-chain behavior of different types of wallets and services, and are refined through careful human review by blockchain analysis experts trained in computer science, mathematics, and cryptography.

Our deterministic clustering heuristics generally fall into one of two categories:

  • Generic clustering heuristics, also known as network-wide heuristics, which can be applied to more than one service or wallet on the blockchain.
  • Service-specific heuristics, which are needed when generic or network-wide clustering heuristics cannot comprehensively capture the transactional behavior of a given service. Service-specific heuristics are particularly necessary to comprehensively cluster services that intentionally obfuscate their on-chain behavior, such as mixers or darknet markets. More information on service-specific heuristics and how we develop them are available to Chainalysis customers through their account teams.

Below, we’ll take a closer look at some of our most-often used generic clustering heuristics.

Generic clustering heuristics

Generic clustering heuristics are sufficient to cluster addresses belonging to the vast majority of services and wallets on the blockchain. These heuristics include (but are not limited to) the following.

Co-spend heuristic. Co-spend identifies co-ownership of addresses by identifying Unspent Transaction Outputs (UTXOs) that are spent as inputs in the same transaction. Co-spend only applies to UTXO-based blockchains like Bitcoin. When properly implemented, these heuristics must also account for CoinJoin transactions that are specifically used in an attempt to spoil co-spend analysis.

Deposit heuristic. This heuristic allows us to identify and cluster the addresses of a centralized service such as an exchange, starting from deposit addresses and following them to consolidation addresses, which are addresses used by the exchange to hold and co-mingle cryptocurrency deposited by many different users (similar to how traditional banks don’t simply hold each customer’s balance in a dedicated individual account). Deposit heuristics are primarily used to cluster addresses of services in account-based blockchains such as Ethereum.

The Chainalysis Data Accuracy Flywheel (3)

Event-based heuristics. These heuristics cluster together addresses associated with decentralized protocols on smart contract-enabled blockchains like Ethereum by monitoring specific events carried out by the protocol’s factory contract.

Potentially unnamed service heuristics. In some cases, we identify wallets that exhibit behavior that can only be exhibited by a service, but whose real-world custodian is unknown. When that happens, we cluster the addresses together and simply label them as an unnamed service until we can identify that service based on a ground truth attribution.

How our attribution and clustering is constantly verified

While it’s not possible to audit the ownership of every address on the blockchain, in addition to the ongoing internal review and monitoring that we perform, we receive external verification of our data accuracy in two key ways. First, many of the services (e.g. exchanges) whose addresses we cluster are also Chainalysis customers who use us for transaction monitoring. As such, those customers share thousands to tens of thousands of addresses with us per day (depending on the scale of transaction activity on their platforms), allowing us to validate our clustering accuracy. As of the publish date of this blog, we have never found a discrepancy between our data and the addresses provided to us by these customers — as in, a customer has never shared an address with us via transaction monitoring that we’ve then found was incorrectly clustered with another service or wallet.

Our public sector customers are the second source of independent accuracy verification. Law enforcement agencies around the world routinely send subpoenas to cryptocurrency businesses asking for information on the owners of specific deposit addresses based on their research using Chainalysis solutions. If exchanges were responding to those subpoenas by denying ownership of the addresses in question, it would create a steady flow of inquiry from our law enforcement customers questioning the validity of our data and causing them to look for other blockchain analysis solutions.

Chainalysis has been a key solution in many law enforcement investigations that have resulted in successful convictions, as well as over $11 billion worth of asset seizures from criminals. These cases span several forms of crime, including ransomware, CSAM, stolen funds, terrorism, and scams. Documents filed by the U.S. Department of Justice also note several examples of our data’s validity, including statements from FBI agents stating that they validate Chainalysis attributions and clusters thousands of times per day, including by checking them against transactions carried out by undercover agents. In one case, agents reviewed Chainalysis data identifying 50 wallets as belonging to customers of a CSAM website on the darknet. Law enforcement was able to confirm the data was accurate in all 50 cases upon interviewing suspects and obtaining search warrants of their devices.

Our unique place in the world of cryptocurrency, serving both the public and private sectors, means that our data is tested every day in high-stakes situations, which often require collaboration among the law enforcement agencies and crypto businesses we serve. We take a comprehensive but careful and conservative approach to the attribution and clustering process, and maintain a high standard for pushing new data into our solutions.

If you’d like to learn more about Chainalysis data and how it powers our solutions, please contact us here.

End notes

[1] Note that as new transactions occur, the data inputs can change in ways that alter the information displayed in our solutions. For instance, imagine that we have a cluster for a cryptocurrency exchange that includes a consolidation wallet (meaning, an internal wallet controlled by the exchange where funds deposited by many users are held and co-mingled). If new on-chain transactions showed a new set of addresses sending funds to that consolidation wallet, we could deterministically cluster those new deposit addresses in with the exchange, and the historic transaction activity of those new addresses would now be grouped in with that of the exchange. Our solutions would therefore display previously unseen transactions for that exchange, and data points such as its lifetime value received would be different than they were previously, but only because the data inputs changed as new information came to light on-chain.

This material is for informational purposes only, and is not intended to provide legal, tax, financial, investment, regulatory or other professional advice, nor is it to be relied upon as a professional opinion. Recipients should consult their own advisors before making these types of decisions. Chainalysis does not guarantee or warrant the accuracy, completeness, timeliness, suitability or validity of the information herein. Chainalysis has no responsibility or liability for any decision made or any other acts or omissions in connection with Recipient’s use of this material.

Address attributionBlockchain dataChainalysisData accuracyDeterministic clustering

The Chainalysis Data Accuracy Flywheel (2024)

FAQs

What is cluster Chainalysis? ›

Chainalysis examines public blockchain data from its beginning to assess the entire history of transactions. Using deterministic methodology and clustering heuristics, we identify which addresses are managed by the same entity and should therefore be grouped together in a cluster.

What is an unnamed service Chainalysis? ›

This category refers to currently unidentified clusters that show the behavior expected of a service. For the Bitcoin blockchain, Chainalysis automatically labels an unidentified cluster as an unnamed service if one of the below is true: The cluster contains 500 or more addresses.

How does Chainalysis work? ›

Chainalysis delivers industry-leading blockchain intelligence by connecting real-world entities to on-chain activity through sophisticated machine learning, dedicated forensic experts, and an extensive customer network.

What is co-spending in bitcoin? ›

In a common spend (or co-spending) transaction, multiple lesser-value input addresses are used to fund a higher-value transaction, similar to paying for something expensive with a handful of small bills and mixed coins.

What does a cluster analysis look like? ›

Subjects are separated into groups so that each subject is more similar to other subjects in its group than to subjects outside the group. In a market research context, cluster analysis might be used to identify categories like age groups, earnings brackets, urban, rural or suburban location.

What is the difference between chain and cluster? ›

Besides its geographic focus, a cluster is often distinguished from a typical value chain in that it includes stakeholders who may be one step removed from the actual value chain but who nonetheless provide key inputs into value chain operations.

What is a Peelchain in Chainalysis? ›

A peel chain is a transaction pattern commonly seen in blockchain analysis, in which funds appear to move through several intermediate addresses.

Is Chainalysis a Fintech? ›

Only three cryptocurrency firms made Forbes' annual list of 50 innovative fintech companies: Chainalysis, Fireblocks and Gauntlet. To qualify for consideration, companies had to be privately owned and based in the U.S.

Is Chainalysis a SaaS? ›

Chainalysis is the largest SaaS enterprise company in the Web3 market.

Is Chainalysis legit? ›

Chainalysis blockchain analytics is auditable and transparent. Chainalysis's attribution and clustering of blockchain entities is accurate and verifiable. This fact is of paramount importance to our company and the industry.

Who uses Chainalysis? ›

Businesses, banks, and governments use Chainalysis to make critical decisions, encourage innovation, and protect consumers.

Who owns Chainalysis? ›

Chainalysis is an American blockchain analysis firm headquartered in New York City. The company was co-founded by Michael Gronager, Jan Møller and Jonathan Levin in 2014, and is the first start-up company dedicated to the business of Bitcoin tracing.

Why do people use Bitcoin instead of cash? ›

A bitcoin has value because it can be exchanged for and used in place of fiat currency, but it maintains a high exchange rate primarily because it is in demand by investors interested in the possibility of returns.

What is double signing in crypto? ›

- Double signing: When a validator attempts to validate conflicting blocks or transactions simultaneously. - Downtime: Failure to participate in block validation or network maintenance as required by the consensus protocol.

What happens if you double spend Bitcoin? ›

If a transaction is included in a block in the blockchain, it is objectively valid. Any future transaction attempting to double spend the same bitcoin will be rejected by all nodes on the Bitcoin network.

What is a cluster in crypto? ›

A cluster is defined as a group of addresses controlled by one entity. ... Context 2. ... data provider currently assumes that every entity can only belong to one category at a time, which means that the categories are mutually exclusive.

What is a blockchain cluster? ›

A group of blockchains that can communicate with each other in a trust-minimized way.

What is cluster analysis in pattern recognition? ›

Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning, which is grouping “objects” into “similar” groups. This process includes a number of different algorithms and methods to make clusters of a similar kind.

What is a cluster in information security? ›

Dynamic clustering techniques in cybersecurity involve clustering log sequences rather than individual log lines. These techniques identify patterns and behavior in log files, allowing security analysts to detect and respond to potential threats more effectively.

Top Articles
Time taken to verify KYC account.
How to Rip Content from CDs Onto Your Computer
7 C's of Communication | The Effective Communication Checklist
Dunhams Treestands
Diario Las Americas Rentas Hialeah
Rubratings Tampa
Walgreens Pharmqcy
Dricxzyoki
Noaa Swell Forecast
Nikki Catsouras Head Cut In Half
Vardis Olive Garden (Georgioupolis, Kreta) ✈️ inkl. Flug buchen
Ukraine-Russia war: Latest updates
Cnnfn.com Markets
Local Collector Buying Old Motorcycles Z1 KZ900 KZ 900 KZ1000 Kawasaki - wanted - by dealer - sale - craigslist
Rainfall Map Oklahoma
Craigslist Malone New York
Www Craigslist Com Phx
Bcbs Prefix List Phone Numbers
Procore Championship 2024 - PGA TOUR Golf Leaderboard | ESPN
24 Hour Drive Thru Car Wash Near Me
CANNABIS ONLINE DISPENSARY Promo Code — $100 Off 2024
Craigslist Maui Garage Sale
Walmart Car Department Phone Number
Scream Queens Parents Guide
Employee Health Upmc
Mybiglots Net Associates
Slim Thug’s Wealth and Wellness: A Journey Beyond Music
Craig Woolard Net Worth
Water Temperature Robert Moses
Dal Tadka Recipe - Punjabi Dhaba Style
Publix Near 12401 International Drive
His Only Son Showtimes Near Marquee Cinemas - Wakefield 12
Japanese Emoticons Stars
R3Vlimited Forum
Strange World Showtimes Near Regal Edwards West Covina
Kstate Qualtrics
RUB MASSAGE AUSTIN
CARLY Thank You Notes
Flashscore.com Live Football Scores Livescore
Afspraak inzien
Facebook Marketplace Marrero La
Page 5662 – Christianity Today
2700 Yen To Usd
Amc.santa Anita
VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium
Tricare Dermatologists Near Me
Bmp 202 Blue Round Pill
Crystal Glassware Ebay
DL381 Delta Air Lines Estado de vuelo Hoy y Historial 2024 | Trip.com
Erica Mena Net Worth Forbes
Www Extramovies Com
Latest Posts
Article information

Author: Edmund Hettinger DC

Last Updated:

Views: 6369

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Edmund Hettinger DC

Birthday: 1994-08-17

Address: 2033 Gerhold Pine, Port Jocelyn, VA 12101-5654

Phone: +8524399971620

Job: Central Manufacturing Supervisor

Hobby: Jogging, Metalworking, Tai chi, Shopping, Puzzles, Rock climbing, Crocheting

Introduction: My name is Edmund Hettinger DC, I am a adventurous, colorful, gifted, determined, precious, open, colorful person who loves writing and wants to share my knowledge and understanding with you.