What is tokenization? (2024)

What is tokenization? (1)

A terracotta soldier figurine emerging from a digital tablet. The soldier looks digitized at it's base but becomes a solid form at it's top.

(5 pages)

Events of the past few years have made it clear: we’re hurtling toward the next era of the internet with ever-increasing speed. Several new developments are leading the charge. Web3is said to offer the potential of a new, decentralized internet, controlled by participants via blockchainsrather than a handful of corporations.

Get to know and directly engage with senior McKinsey experts on tokenization.

Matt Higginson is a partner in McKinsey’s Boston office, and Prashanth Reddyis a senior partner in the New Jersey office.

Web3 applications rely on a process called tokenization. In this case, tokenization is a digitization process to make assets more accessible. (AI models and new modes of payments also use a process called tokenization, both of which have little to do with Web3 tokenization—or each other, for that matter. In payments, tokenization is used for cybersecurity and to obfuscate the identity of the payment itself, essentially to prevent fraud. For a detailed description of tokenization in AI, see sidebar, “How does tokenization work in AI?”)

How does tokenization work in AI?

Tokenization in AI is used to break down data for easier pattern detection. Deep learningmodels trained on vast quantities of unstructured, unlabeled data are called foundation models. Large language models (LLMs) are foundation models that are trained on text. Trained via a process called fine-tuning, these models can not only process massive amounts of unstructured text but also learn the relationships between sentences, words, or even portions of words. This in turn enables them to generate natural-language text or perform summarization or other knowledge-extraction tasks.

Here’s how tokenization makes this possible. When an LLM is fed input text, it breaks the text down into tokens. Each token is assigned a unique numerical identifier, which is fed back into the LLM for processing. The model learns the relationships between the tokens and generates responses based on the patterns it learns.

There are a number of tokenization techniques commonly used in LLMs:

  • Word tokenization splits text into individual words or word-like units, and each word becomes a separate token. Word tokenization might struggle with contractions or compound words.
  • Character tokenization makes each character in text its own separate token. This method works well when dealing with languages that don’t have clear word boundaries or with handwriting recognition.
  • Subword tokenization breaks down less frequently used words into units of frequently occurring sequences of characters. Subword tokens are bigger than individual characters but smaller than entire words. By breaking words into subword tokens, a model can better handle words that were not present in the training data. Byte pair encoding (BPE) is one subword tokenization algorithm. BPE starts with a vocabulary of characters or words and merges the tokens, which most often appear together.
  • Morphological tokenization uses morphemes, which are individual words or parts of words that carry specific meanings or grammatical functions. The word “incompetence,” for example, can be broken down into three morphemes: “in-” (a prefix indicating negation), “competent” (the root), and “-ence” (a suffix indicating a state or quality). In morphological tokenization, each morpheme becomes a token, which enables LLMs to handle word variations, understand grammatical structures, and generate linguistically accurate text.

The type of tokenization used depends on what the model needs to accomplish. Different tokenization methods may also be combined to achieve the required results.

After a couple false starts, tokenized financial assets are moving from pilot to at-scale development. McKinsey analysis indicates that tokenized market capitalization could reach around $2 trillionby 2030 (excluding cryptocurrencies like Bitcoin and stablecoins like Tether). Specifically, we expect that organizations working with certain asset classes will be the quickest adopters; these include cash and deposits, bonds and exchange-traded notes, mutual funds and exchange-traded funds, as well as loans and securitization. Larry Fink, the chairman and CEO of BlackRock, said in January 2024: “We believe the next step going forward will be the tokenization of financial assets, and that means every stock, every bond … will be on one general ledger.”

But before we get specific, let’s get the basics down.

Learn more about McKinsey’s Financial Services Practice.

How does tokenization work?

In general, tokenization is the process of issuing a digital, unique, and anonymous representation of a real thing. In Web3 applications, the token is used on a (typically private) blockchain, which allows the token to be utilized within specific protocols. Tokens can represent assets, including physical assets like real estate or art, financial assets like equities or bonds, intangible assets like intellectual property, or even identity and data.

Web3 tokenization can create several types of tokens. One example from the financial-services industry is stablecoins, a type of cryptocurrency pegged to real-world money designed to be fungible, or replicable. Another type of token is an NFT—a nonfungible token, meaning a token that is provably scarce and can’t be replicated—which is a digital proof of ownership people can buy and sell.

As noted earlier, AI also uses a concept called tokenization, which is quite different from Web3 tokens (despite their shared name). A large language model (LLM) used in an AI application could tokenize the word “cat” and use it to understand relationships between “cat” and other words. (For a more detailed explanation of what tokenization means in an AI context, see sidebar, “How does tokenization work in AI?”)

The benefits of Web3 tokenization for financial institutions include the following:

  • Programmability. Programmability is the ability to embed code in the token and its capacity to engage with smart contracts, enabling higher degrees of automation. (For more on smart contracts, read our blockchain Explainer.)
  • Composability. This is the ability to interact with other assets and applications on the network.
  • Operational efficiency. Web3 tokenization can help streamline processes, automate transactions, and more, making operations more efficient.

These, in turn, can mean increased efficiency, liquidity, and new revenue opportunities.

What’s an example of tokenization in practice?

Financial-services incumbents like BlackRock, WisdomTree, and Franklin Templeton, as well as Web3 natives Ondo Finance, Superstate, and Maple Finance, are increasingly adopting tokenized money market funds. In first quarter 2024, these funds surpassed $1 billion in total value (not much compared with total market size, but a milestone nonetheless).

Immutable data on the shared ledger reduces data errors associated with manual reconciliation, while 24/7 instant settlement and composability provide better user experience and new revenue sources.

What types of technologies make Web3 possible?

As we’ve seen, Web3 is a new type of internet, built on new types of technology. Here are the three main types:

  • Blockchain. A blockchainis a digitally distributed, decentralized ledger that exists across a computer network and facilitates the recording of transactions. As new data are added to a network, a new block is created and appended permanently to the chain. All nodes on the blockchain are then updated to reflect the change. This means the system is not subject to a single point of control or failure.
  • Smart contracts. Smart contracts are software programs that are automatically executed when specified conditions are met, like terms agreed on by a buyer and seller. Smart contracts are established in code on a blockchain that can’t be altered.
  • Digital assets and tokens. These are items of value that only exist digitally. They can include cryptocurrencies, stablecoins, central bank digital currencies (CBDCs), and NFTs. They can also include tokenized versions of assets, including real things like art or concert tickets.

As we’ll see, these technologies come together to support a variety of breakthroughs related to tokenization.

What are the potential benefits of tokenization for financial-services providers?

Some industry leaders believe tokenization stands to transformthe structure of financial services and capital markets because it lets asset holders reap the benefits of blockchain, such as 24/7 operations and data availability. Blockchain also offers faster transaction settlement and a higher degree of automation (via embedded code that only gets activated if certain conditions are met).

While yet to be tested at scale, tokenization’s potential benefits include the following:

  • Faster transaction settlement, fueled by 24/7 availability. At present, most financial settlements occur two business days after the trade is executed (or T+2; in theory, this is to give each party time to get their documents and funds in order). The instant settlements made possible by tokenization could translate to significant savings for financial firms in high-interest-rate environments.
  • Operational cost savings, delivered by 24/7 data availability and asset programmability. This is particularly useful for asset classes where servicing or issuing tends to be highly manual and hence error-prone, such as corporate bonds. Embedding operations such as interest calculation and coupon payment into the smart contract of the token would automate these functions and require less hands-on human effort.
  • Democratization of access. By streamlining operationally intensive manual processes, servicing smaller investors can become an economically attractive proposition for financial-services providers. However, before true democratization of access is realized, tokenized asset distribution will need to scale significantly.
  • Enhanced transparency powered by smart contracts. Smart contracts are sets of instructionscoded into tokens issued on a blockchain that can self-execute under specific conditions. One example could be a smart contract for carbon credits, in which blockchain can provide an immutable and transparent record of credits, even as they’re traded.
  • Cheaper and more nimble infrastructure. Blockchains are open source, thus inherently cheaper and easier to iterate than traditional financial-services infrastructure.

There’s been hype around digital-asset tokenization for years, since its introduction back in 2017. But despite the big predictions, it hasn’t yet caught on in a meaningful way. We are, though, seeing some slow movement: as of mid-2023, US-based fintech infrastructure firm Broadridge was facilitatingmore than $1 trillion monthly on its distributed ledger platform.

What is tokenization? (2)

Looking for direct answers to other complex questions?

Explore the full McKinsey Explainers series

Learn more about McKinsey’sFinancial Services Practice.

How does a Web3 asset get tokenized?

There are four typical steps involved in asset tokenization:

  1. Asset sourcing. The first step of tokenization is figuring out how to tokenize the asset in question. Tokenizing a money market fund, for example, will be different from tokenizing a carbon credit. This process will require knowing whether the asset will be treated as a security or a commodity and which regulatory frameworks apply.
  2. Digital-asset issuance and custody. If the digital asset has a physical counterpart, the latter must be moved to a secure facility that’s neutral to both parties. Then a token, a network, and compliance functions are selected—coming together to create a digital representation of the asset on a blockchain. Access to the digital asset is then stored pending distribution.
  3. Distribution and trading. The investor will need to set up a digital wallet to store the digital asset. Depending on the asset, a secondary trading venue—an alternative to an official exchange that is more loosely regulated—may be created for the asset.
  4. Asset servicing and data reconciliation. Once the asset has been distributed to the investor, it will require ongoing maintenance. This should include regulatory, tax, and accounting reporting; notice of corporate actions; and more.

Learn more about McKinsey’sFinancial Services Practice, and check out Web3-related job opportunities if you’re interested in working at McKinsey.

Pop quiz

Articles referenced:

  • From ripples to waves: The transformational power of tokenizing assets,” June 20, 2024, Anutosh Banerjee, Julian Sevillano, and Matt Higginson
  • Tokenization: A digital-asset déjà vu,” August 15, 2023, Anutosh Banerjee,Ian De Bode, Matthieu de Vergnes,Matt Higginson, and Julian Sevillano
  • Tokenizing nontraditional assets: A conversation with Ascend Bit’s Brian Clark,” March 17, 2023, Andrew Roth
  • Web3 beyond the hype,” September 26, 2022, Anutosh Banerjee,Robert Byrne,Ian De Bode, andMatt Higginson
  • How can healthcare unlock the power of data connectivity?,” December 9, 2021, Prashanth Reddy

This article was updated in July 2024; it was originally published in October 2023.

What is tokenization? (3)

Want to know more about tokenization?

Talk to us

What is tokenization? (2024)

FAQs

What is tokenization in simple terms? ›

Tokenization involves protecting sensitive, private information with something scrambled, which users call a token. Tokens can't be unscrambled and returned to their original state.

What is a simple example of tokenization? ›

For example, consider the sentence: “Never give up”. The most common way of forming tokens is based on space. Assuming space as a delimiter, the tokenization of the sentence results in 3 tokens – Never-give-up. As each token is a word, it becomes an example of Word tokenization.

What is tokenization in AI? ›

Tokenization, in the realm of Artificial Intelligence (AI), refers to the process of converting input text into smaller units or 'tokens' such as words or subwords. This is foundational for Natural Language Processing (NLP) tasks, enabling AI to analyze and understand human language.

What is an example of data tokenization? ›

When a merchant processes the credit card of a customer, the PAN is substituted with a token. 1234-4321-8765-5678 is replaced with, for example, 6f7%gf38hfUa. The merchant can apply the token ID to retain records of the customer, for example, 6f7%gf38hfUa is connected to John Smith.

What is the main reason for tokenization? ›

What is the Purpose of Tokenization? The purpose of tokenization is to protect sensitive data while preserving its business utility. This differs from encryption, where sensitive data is modified and stored with methods that do not allow its continued use for business purposes.

What are the problems with tokenization? ›

Tokenization faces challenges such as ambiguity, where words have multiple meanings, and context is needed for accurate segmentation. Out-of-vocabulary (OOV) words pose another challenge when encountering terms not present in the model's vocabulary.

How do you do tokenization? ›

The simplest way to tokenize text is to use whitespace within a string as the “delimiter” of words. This can be accomplished with Python's split function, which is available on all string object instances as well as on the string built-in class itself. You can change the separator any way you need.

Which algorithm is used for tokenization? ›

The Unigram algorithm always keeps the base characters so that any word can be tokenized.

Why does AI use tokens? ›

In such models, tokens serve as inputs for algorithms to analyse and learn patterns. For instance, in a chatbot development, each word in the user's input is treated as a token, which helps the AI understand and respond appropriately. In advanced AI models like transformers, tokens are even more crucial.

Is Bitcoin a tokenization? ›

Furthermore, Bitcoin (BSV) Blockchain permits tokenizing full possession, like owning a condominium. Tokenization further enables splitting substantial, non-liquid assets into smaller and more liquid segments. Using the condo example, multiple parties can own the unit and tokens can represent each owner's stake.

Who invented tokenization? ›

The concept of tokenization was created in 2001 by a company called TrustCommerce for their client, Classmates.com, which needed to significantly reduce the risks involved with storing cardholder data. From this, TC Citadel was developed, allowing customers to reference a token in place of their sensitive card data.

Is tokenization the same as encryption? ›

Tokens serve as reference to the original data, but cannot be used to guess those values. That's because, unlike encryption, tokenization does not use a mathematical process to transform the sensitive information into the token. There is no key, or algorithm, that can be used to derive the original data for a token.

What is tokenization in real world? ›

It is possible to tokenize anything, from real estate property to trademarks, patents, and fine art. The use of tokenization in finance is also growing, with stocks, bonds, and treasuries also being tokenized to allow greater access to financial instruments among underbanked and unbanked populations.

What are the key benefits of tokenization? ›

Tokenization can allow for increased liquidity of traditionally illiquid assets; greater accessibility and ease of access for otherwise cloistered investment opportunities; greater transparency regarding ownership and ownership history; and a reduction in administrative costs associated with the trading of these assets ...

What is an example of payment tokenization? ›

For example, a credit card number of “1234 5678 9012 3456” might become “1234 5698 3211 3456,” or “1234 XYZ# ABC& 3456.” Partial replacement tokens can be helpful in situations where a merchant might need to verify a cardholder by asking them for the last four digits of their SSN or PAN.

What is the difference between encryption and tokenization? ›

Tokenization focuses on replacing data with unrelated tokens, minimizing the exposure of sensitive information, and simplifying compliance with data protection regulations. Encryption, on the other hand, secures data by converting it into an unreadable format, necessitating a decryption key for access.

Top Articles
La règle du 50-20-30 : Gérer son budget facilement
How to Earn Extra Money as a Brand Ambassador and How to Find Gigs
Melson Funeral Services Obituaries
Don Wallence Auto Sales Vehicles
Insidious 5 Showtimes Near Cinemark Tinseltown 290 And Xd
David Packouz Girlfriend
Purple Crip Strain Leafly
Syracuse Jr High Home Page
Hope Swinimer Net Worth
Hmr Properties
D10 Wrestling Facebook
Apne Tv Co Com
Spoilers: Impact 1000 Taping Results For 9/14/2023 - PWMania - Wrestling News
Cocaine Bear Showtimes Near Regal Opry Mills
Libinick
Ge-Tracker Bond
Bernie Platt, former Cherry Hill mayor and funeral home magnate, has died at 90
Xsensual Portland
Company History - Horizon NJ Health
At&T Outage Today 2022 Map
Project Reeducation Gamcore
Paris Immobilier - craigslist
Is Light Raid Hard
Harrison 911 Cad Log
Weather October 15
Jazz Total Detox Reviews 2022
Ezstub Cross Country
Kagtwt
Unlock The Secrets Of "Skip The Game" Greensboro North Carolina
Indiefoxx Deepfake
Why Gas Prices Are So High (Published 2022)
Oxford Alabama Craigslist
Craigslist List Albuquerque: Your Ultimate Guide to Buying, Selling, and Finding Everything - First Republic Craigslist
Planet Fitness Santa Clarita Photos
Dcilottery Login
Dispensaries Open On Christmas 2022
Doublelist Paducah Ky
Denise Monello Obituary
Ghareeb Nawaz Texas Menu
Sechrest Davis Funeral Home High Point Nc
Devotion Showtimes Near Showplace Icon At Valley Fair
5103 Liberty Ave, North Bergen, NJ 07047 - MLS 240018284 - Coldwell Banker
Freightliner Cascadia Clutch Replacement Cost
Superecchll
Minecraft Enchantment Calculator - calculattor.com
라이키 유출
Access One Ummc
Tamilyogi Cc
Land of Samurai: One Piece’s Wano Kuni Arc Explained
Latest Posts
Article information

Author: Greg O'Connell

Last Updated:

Views: 5784

Rating: 4.1 / 5 (42 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Greg O'Connell

Birthday: 1992-01-10

Address: Suite 517 2436 Jefferey Pass, Shanitaside, UT 27519

Phone: +2614651609714

Job: Education Developer

Hobby: Cooking, Gambling, Pottery, Shooting, Baseball, Singing, Snowboarding

Introduction: My name is Greg O'Connell, I am a delightful, colorful, talented, kind, lively, modern, tender person who loves writing and wants to share my knowledge and understanding with you.