type/token ratios (2024)

WordList > type/token ratios

If a text is 1,000 words long, it is said to have 1,000 "tokens". But a lot of these words will be repeated, and there may be only say 400 different words in the text. "Types", therefore, are the different words.

The ratio between types and tokens in this example would be 40%.

But this type/token ratio (TTR) varies very widely in accordance with the length of the text -- or corpus of texts -- which is being studied. A 1,000 word article might have a TTR of 40%; a shorter one might reach 70%; 4 million words will probably give a type/token ratio of about 2%, and so on. Such type/token information is rather meaningless in most cases, though it is supplied in a WordList statistics display. The conventional TTR is informative, of course, if you're dealing with a corpus comprising lots of equal-sized text segments (e.g. the LOB and Brown corpora). But in the real world, especially if your research focus is the text as opposed to the language, you will probably be dealing with texts of different lengths and the conventional TTR will not help you much.

Wordlist uses a different strategy for computing this, therefore. The standardised type/token ratio (STTR) is computed every n words as Wordlist goes through each text file. By default, n = 1,000. In other words the ratio is calculated for the first 1,000 running words, then calculated afresh for the next 1,000, and so on to the end of your text or corpus. A running average is computed, which means that you get an average type/token ratio based on consecutive 1,000-word chunks of text. (Texts with less than 1,000 words (or whatever n is set to) will get a standardised type/token ratio of 0.)

Setting the N boundary

Adjust the n number in Minimum & Maximum Settings to any number between 100 and 20,000.

What STTR actually counts

Note: The ratio is computed a) counting every different form as a word (so say and says are two types) b) using only the words which are not in a stop-list c) those which are within the length you have specified, d) taking your preferences about numbers and hyphens into account.

The number shown is a percentage of new types for every n tokens. That way you can compare type/token ratios across texts of differing lengths. This method contrasts with that of Tuldava (1995:131-50) who relies on a notion of 3 stages of accumulation. The WordSmith method of computing STTR was my own invention but parallels one of the methods devised by the mathematician David Malvern working with Brian Richards (University of Reading).

Further discussion

TTR and STTR are both pretty crude measures even if they are often assumed to imply something about "lexical density". Suppose you had a text which spent 1,000 words discussing ELEPHANT, LION, TIGER etc, and then 1,000 discussing MADONNA, ELVIS, etc., then 1,000 discussing CLOUD, RAIN, SUNSHINE. If you set the STTR boundary at 1,000 and happened to get say 48% or so for each section, the statistic in itself would not tell you there was a change involving Africa, Music, Weather. Suppose the boundary between Africa & Music came at word 650 instead of at word 1,000, I guess there'd be little or no difference in the statistic. But what would make a difference? A text which discussed clouds and written by a person who distinguished a lot between types of cloud might also use MIST, FOG, CUMULUS, CUMULO-NIMBUS. This would be higher in STTR than one written by a child who kept referring to CLOUD but used adjectives like HIGH, LOW, HEAVY, DARK, THIN, VERY THIN to describe the clouds... and who repeated DARK, THIN, etc a lot in describing them.....

(NB. Shakespeare is well known to have used a rather limited vocabulary in terms of measures like these!)

type/token ratios (2024)
Top Articles
Wire Transfer Limits: A Guide to Making Large Wire Transfers in the U.S.
What You Need to Qualify
Poe Pohx Profile
Gore Videos Uncensored
Stolen Touches Neva Altaj Read Online Free
Skip The Games Norfolk Virginia
Horned Stone Skull Cozy Grove
Zoebaby222
Pwc Transparency Report
Cranberry sauce, canned, sweetened, 1 slice (1/2" thick, approx 8 slices per can) - Health Encyclopedia
Brenna Percy Reddit
Slmd Skincare Appointment
Sport Clip Hours
Zürich Stadion Letzigrund detailed interactive seating plan with seat & row numbers | Sitzplan Saalplan with Sitzplatz & Reihen Nummerierung
Oc Craiglsit
Aspen.sprout Forum
Colorado mayor, police respond to Trump's claims that Venezuelan gang is 'taking over'
Bcbs Prefix List Phone Numbers
Haunted Mansion Showtimes Near Millstone 14
Driving Directions To Bed Bath & Beyond
Troy Bilt Mower Carburetor Diagram
Osborn-Checkliste: Ideen finden mit System
Byui Calendar Fall 2023
Walgreens Alma School And Dynamite
Holiday Gift Bearer In Egypt
Wics News Springfield Il
Coomeet Premium Mod Apk For Pc
Gina Wilson Angle Addition Postulate
Sessional Dates U Of T
Meta Carevr
Robotization Deviantart
Best Restaurants Ventnor
Sam's Club Near Wisconsin Dells
J&R Cycle Villa Park
Hoofdletters voor God in de NBV21 - Bijbelblog
"Pure Onyx" by xxoom from Patreon | Kemono
1987 Monte Carlo Ss For Sale Craigslist
Dreammarriage.com Login
Xemu Vs Cxbx
Carespot Ocoee Photos
Bismarck Mandan Mugshots
Ktbs Payroll Login
Cookie Clicker The Advanced Method
Cpmc Mission Bernal Campus & Orthopedic Institute Photos
All Weapon Perks and Status Effects - Conan Exiles | Game...
Avatar: The Way Of Water Showtimes Near Jasper 8 Theatres
Gas Buddy Il
Stitch And Angel Tattoo Black And White
CPM Homework Help
Deshuesadero El Pulpo
Raley Scrubs - Midtown
Latest Posts
Article information

Author: The Hon. Margery Christiansen

Last Updated:

Views: 6147

Rating: 5 / 5 (70 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: The Hon. Margery Christiansen

Birthday: 2000-07-07

Address: 5050 Breitenberg Knoll, New Robert, MI 45409

Phone: +2556892639372

Job: Investor Mining Engineer

Hobby: Sketching, Cosplaying, Glassblowing, Genealogy, Crocheting, Archery, Skateboarding

Introduction: My name is The Hon. Margery Christiansen, I am a bright, adorable, precious, inexpensive, gorgeous, comfortable, happy person who loves writing and wants to share my knowledge and understanding with you.