PEP 3127 – Integer Literal Support and Syntax | peps.python.org (2024)

Author:
Patrick Maupin <pmaupin at gmail.com>
Discussions-To:
Python-3000 list
Status:
Final
Type:
Standards Track
Created:
14-Mar-2007
Python-Version:
3.0
Post-History:
18-Mar-2007
Table of Contents
  • Abstract
  • Motivation
  • Specification
    • Grammar specification
    • int() specification
    • long() specification
    • Tokenizer exception handling
    • int() exception handling
    • oct() function
    • Output formatting
    • Transition from 2.6 to 3.0
  • Rationale
    • Background
    • Removal of old octal syntax
    • Supported radices
    • Syntax for supported radices
  • Open Issues
  • References
  • Copyright

Abstract

This PEP proposes changes to the Python core to rationalizethe treatment of string literal representations of integersin different radices (bases). These changes are targeted atPython 3.0, but the backward-compatible parts of the changesshould be added to Python 2.6, so that all valid 3.0 integerliterals will also be valid in 2.6.

The proposal is that:

  1. octal literals must now be specifiedwith a leading “0o” or “0O” instead of “0”;
  2. binary literals are now supported via aleading “0b” or “0B”; and
  3. provision will be made for binary numbers instring formatting.

Motivation

This PEP was motivated by two different issues:

  • The default octal representation of integers is silently confusingto people unfamiliar with C-like languages. It is extremely easyto inadvertently create an integer object with the wrong value,because ‘013’ means ‘decimal 11’, not ‘decimal 13’, to the Pythonlanguage itself, which is not the meaning that most humans wouldassign to this literal.
  • Some Python users have a strong desire for binary support inthe language.

Specification

Grammar specification

The grammar will be changed. For Python 2.6, the changed andnew token definitions will be:

integer ::= decimalinteger | octinteger | hexinteger | bininteger | oldoctintegeroctinteger ::= "0" ("o" | "O") octdigit+bininteger ::= "0" ("b" | "B") bindigit+oldoctinteger ::= "0" octdigit+bindigit ::= "0" | "1"

For Python 3.0, “oldoctinteger” will not be supported, andan exception will be raised if a literal has a leading “0” anda second character which is a digit.

For both versions, this will require changes to PyLong_FromStringas well as the grammar.

The documentation will have to be changed as well: grammar.txt,as well as the integer literal section of the reference manual.

PEP 306 should be checked for other issues, and that PEP shouldbe updated if the procedure described therein is insufficient.

int() specification

int(s, 0) will also match the new grammar definition.

This should happen automatically with the changes toPyLong_FromString required for the grammar change.

Also the documentation for int() should be changed to explainthat int(s) operates identically to int(s, 10), and the word“guess” should be removed from the description of int(s, 0).

long() specification

For Python 2.6, the long() implementation and documentationshould be changed to reflect the new grammar.

Tokenizer exception handling

If an invalid token contains a leading “0”, the exceptionerror message should be more informative than the current“SyntaxError: invalid token”. It should explain that decimalnumbers may not have a leading zero, and that octal numbersrequire an “o” after the leading zero.

int() exception handling

The ValueError raised for any call to int() with a stringshould at least explicitly contain the base in the errormessage, e.g.:

ValueError: invalid literal for base 8 int(): 09

oct() function

oct() should be updated to output ‘0o’ in front ofthe octal digits (for 3.0, and 2.6 compatibility mode).

Output formatting

In 3.0, the string % operator alternate syntax for the ‘o’option will need to be updated to add ‘0o’ in front,instead of ‘0’. In 2.6, alternate octal formatting willcontinue to add only ‘0’. In neither 2.6 nor 3.0 willthe % operator support binary output. This is becausebinary output is already supported by PEP 3101(str.format), which is the preferred string formattingmethod.

Transition from 2.6 to 3.0

The 2to3 translator will have to insert ‘o’ into anyoctal string literal.

The Py3K compatible option to Python 2.6 should causeattempts to use oldoctinteger literals to raise anexception.

Rationale

Most of the discussion on these issues occurred on the Python-3000mailing list starting 14-Mar-2007, prompted by an observation thatthe average human being would be completely mystified upon findingthat prepending a “0” to a string of digits changes the meaning ofthat digit string entirely.

It was pointed out during this discussion that a similar, but shorter,discussion on the subject occurred in January 2006, prompted by adiscovery of the same issue.

Background

For historical reasons, Python’s string representation of integersin different bases (radices), for string formatting and tokenliterals, borrows heavily from C. [1] [2] Usage has shown thatthe historical method of specifying an octal number is confusing,and also that it would be nice to have additional support for binaryliterals.

Throughout this document, unless otherwise noted, discussions aboutthe string representation of integers relate to these features:

  • Literal integer tokens, as used by normal module compilation,by eval(), and by int(token, 0). (int(token) and int(token, 2-36)are not modified by this proposal.)
    • Under 2.6, long() is treated the same as int()
  • Formatting of integers into strings, either via the % stringoperator or the new PEP 3101 advanced string formatting method.

It is presumed that:

  • All of these features should have an identical setof supported radices, for consistency.
  • Python source code syntax and int(mystring, 0) shouldcontinue to share identical behavior.

Removal of old octal syntax

This PEP proposes that the ability to specify an octal number byusing a leading zero will be removed from the language in Python 3.0(and the Python 3.0 preview mode of 2.6), and that a SyntaxError willbe raised whenever a leading “0” is immediately followed by anotherdigit.

During the present discussion, it was almost universally agreed that:

eval('010') == 8

should no longer be true, because that is confusing to new users.It was also proposed that:

eval('0010') == 10

should become true, but that is much more contentious, because it is soinconsistent with usage in other computer languages that mistakes arelikely to be made.

Almost all currently popular computer languages, including C/C++,Java, Perl, and JavaScript, treat a sequence of digits with aleading zero as an octal number. Proponents of treating thesenumbers as decimal instead have a very valid point – as discussedin Supported radices, below, the entire non-computer world usesdecimal numbers almost exclusively. There is ample anecdotalevidence that many people are dismayed and confused if theyare confronted with non-decimal radices.

However, in most situations, most people do not write gratuitouszeros in front of their decimal numbers. The primary exception iswhen an attempt is being made to line up columns of numbers. Butsince PEP 8 specifically discourages the use of spaces to try toalign Python code, one would suspect the same argument should applyto the use of leading zeros for the same purpose.

Finally, although the email discussion often focused on whether anybodyactually uses octal any more, and whether we should cater to thoseold-timers in any case, that is almost entirely besides the point.

Assume the rare complete newcomer to computing who does, eitheroccasionally or as a matter of habit, use leading zeros for decimalnumbers. Python could either:

  1. silently do the wrong thing with their numbers, as it does now;
  2. immediately disabuse them of the notion that this is viable syntax(and yes, the SyntaxWarning should be more gentle than itcurrently is, but that is a subject for a different PEP); or
  3. let them continue to think that computers are happy withmulti-digit decimal integers which start with “0”.

Some people passionately believe that (c) is the correct answer,and they would be absolutely right if we could be sure that newusers will never blossom and grow and start writing AJAX applications.

So while a new Python user may (currently) be mystified at thedelayed discovery that their numbers don’t work properly, we canfix it by explaining to them immediately that Python doesn’t likeleading zeros (hopefully with a reasonable message!), or we candelegate this teaching experience to the JavaScript interpreterin the browser, and let them try to debug their issue there.

Supported radices

This PEP proposes that the supported radices for the Pythonlanguage will be 2, 8, 10, and 16.

Once it is agreed that the old syntax for octal (radix 8) representationof integers must be removed from the language, the next obviousquestion is “Do we actually need a way to specify (and display)numbers in octal?”

This question is quickly followed by “What radices does the languageneed to support?” Because computers are so adept at doing what youtell them to, a tempting answer in the discussion was “all of them.”This answer has obviously been given before – the int() constructorwill accept an explicit radix with a value between 2 and 36, inclusive,with the latter number bearing a suspicious arithmetic similarity tothe sum of the number of numeric digits and the number of same-caseletters in the ASCII alphabet.

But the best argument for inclusion will have a use-case to backit up, so the idea of supporting all radices was quickly rejected,and the only radices left with any real support were decimal,hexadecimal, octal, and binary.

Just because a particular radix has a vocal supporter on themailing list does not mean that it really should be in thelanguage, so the rest of this section is a treatise on theutility of these particular radices, vs. other possible choices.

Humans use other numeric bases constantly. If I tell you thatit is 12:30 PM, I have communicated quantitative informationarguably composed of three separate bases (12, 60, and 2),only one of which is in the “agreed” list above. But thecommunication of that information used two decimal digitseach for the base 12 and base 60 information, and, perversely,two letters for information which could have fit in a singledecimal digit.

So, in general, humans communicate “normal” (non-computer)numerical information either via names (AM, PM, January, …)or via use of decimal notation. Obviously, names areseldom used for large sets of items, so decimal is used foreverything else. There are studies which attempt to explainwhy this is so, typically reaching the expected conclusionthat the Arabic numeral system is well-suited to humancognition. [3]

There is even support in the history of the design ofcomputers to indicate that decimal notation is the correctway for computers to communicate with humans. One ofthe first modern computers, ENIAC [4] computed in decimal,even though there were already existing computers whichoperated in binary.

Decimal computer operation was important enoughthat many computers, including the ubiquitous PC, haveinstructions designed to operate on “binary coded decimal”(BCD) [5], a representation which devotes 4 bits to eachdecimal digit. These instructions date from a time when themost strenuous calculations ever performed on many numberswere the calculations actually required to perform textualI/O with them. It is possible to display BCD without havingto perform a divide/remainder operation on every displayeddigit, and this was a huge computational win when mosthardware didn’t have fast divide capability. Another factorcontributing to the use of BCD is that, with BCD calculations,rounding will happen exactly the same way that a human woulddo it, so BCD is still sometimes used in fields like finance,despite the computational and storage superiority of binary.

So, if it weren’t for the fact that computers themselvesnormally use binary for efficient computation and datastorage, string representations of integers would probablyalways be in decimal.

Unfortunately, computer hardware doesn’t think like humans,so programmers and hardware engineers must often resort tothinking like the computer, which means that it is importantfor Python to have the ability to communicate binary datain a form that is understandable to humans.

The requirement that the binary data notation must be cognitivelyeasy for humans to process means that it should contain an integralnumber of binary digits (bits) per symbol, while otherwiseconforming quite closely to the standard tried-and-true decimalnotation (position indicates power, larger magnitude on the left,not too many symbols in the alphabet, etc.).

The obvious “sweet spot” for this binary data notation isthus octal, which packs the largest integral number of bitspossible into a single symbol chosen from the Arabic numeralalphabet.

In fact, some computer architectures, such as the PDP8 and the8080/Z80, were defined in terms of octal, in the sense of arrangingthe bitfields of instructions in groups of three, and usingoctal representations to describe the instruction set.

Even today, octal is important because of bit-packed structureswhich consist of 3 bits per field, such as Unix file permissionmasks.

But octal has a drawback when used for larger numbers. Thenumber of bits per symbol, while integral, is not itselfa power of two. This limitation (given that the word sizeof most computers these days is a power of two) has resultedin hexadecimal, which is more popular than octal despite thefact that it requires a 60% larger alphabet than decimal,because each symbol contains 4 bits.

Some numbers, such as Unix file permission masks, are easilydecoded by humans when represented in octal, but difficult todecode in hexadecimal, while other numbers are much easier forhumans to handle in hexadecimal.

Unfortunately, there are also binary numbers used in computerswhich are not very well communicated in either hexadecimal oroctal. Thankfully, fewer people have to deal with these on aregular basis, but on the other hand, this means that severalpeople on the discussion list questioned the wisdom of addinga straight binary representation to Python.

One example of where these numbers is very useful is inreading and writing hardware registers. Sometimes hardwaredesigners will eschew human readability and opt for addressspace efficiency, by packing multiple bit fields into a singlehardware register at unaligned bit locations, and it is tediousand error-prone for a human to reconstruct a 5 bit field whichconsists of the upper 3 bits of one hex digit, and the lower 2bits of the next hex digit.

Even if the ability of Python to communicate binary informationto humans is only useful for a small technical subset of thepopulation, it is exactly that population subset which containsmost, if not all, members of the Python core team, so even straightbinary, the least useful of these notations, has several enthusiasticsupporters and few, if any, staunch opponents, among the Python community.

Syntax for supported radices

This proposal is to use a “0o” prefix with either uppercaseor lowercase “o” for octal, and a “0b” prefix with eitheruppercase or lowercase “b” for binary.

There was strong support for not supporting uppercase, butthis is a separate subject for a different PEP, as ‘j’ forcomplex numbers, ‘e’ for exponent, and ‘r’ for raw string(to name a few) already support uppercase.

The syntax for delimiting the different radices received a lot ofattention in the discussion on Python-3000. There are several(sometimes conflicting) requirements and “nice-to-haves” forthis syntax:

  • It should be as compatible with other languages andprevious versions of Python as is reasonable, bothfor the input syntax and for the output (e.g. string% operator) syntax.
  • It should be as obvious to the casual observer aspossible.
  • It should be easy to visually distinguish integersformatted in the different bases.

Proposed syntaxes included things like arbitrary radix prefixes,such as 16r100 (256 in hexadecimal), and radix suffixes, similarto the 100h assembler-style suffix. The debate on whether theletter “O” could be used for octal was intense – an uppercase“O” looks suspiciously similar to a zero in some fonts. Suggestionswere made to use a “c” (the second letter of “oCtal”), or evento use a “t” for “ocTal” and an “n” for “biNary” to go alongwith the “x” for “heXadecimal”.

For the string % operator, “o” was already being used to denoteoctal. Binary formatting is not being added to the % operatorbecause PEP 3101 (Advanced String Formatting) already supportsbinary, % formatting will be deprecated in the future.

At the end of the day, since uppercase “O” can look like a zeroand uppercase “B” can look like an 8, it was decided that theseprefixes should be lowercase only, but, like ‘r’ for raw string,that can be a preference or style-guide issue.

Open Issues

It was suggested in the discussion that lowercase should be usedfor all numeric and string special modifiers, such as ‘x’ forhexadecimal, ‘r’ for raw strings, ‘e’ for exponentiation, and‘j’ for complex numbers. This is an issue for a separate PEP.

This PEP takes no position on uppercase or lowercase for input,just noting that, for consistency, if uppercase is not to beremoved from input parsing for other letters, it should beadded for octal and binary, and documenting the changes underthis assumption, as there is not yet a PEP about the case issue.

Output formatting may be a different story – there is alreadyample precedence for case sensitivity in the output format string,and there would need to be a consensus that there is a validuse-case for the “alternate form” of the string % operatorto support uppercase ‘B’ or ‘O’ characters for binary oroctal output. Currently, PEP 3101 does not even support thisalternate capability, and the hex() function does not allowthe programmer to specify the case of the ‘x’ character.

There are still some strong feelings that ‘0123’ should beallowed as a literal decimal in Python 3.0. If this is theright thing to do, this can easily be covered in an additionalPEP. This proposal only takes the first step of making ‘0123’not be a valid octal number, for reasons covered in the rationale.

Is there (or should there be) an option for the 2to3 translatorwhich only makes the 2.6 compatible changes? Should this berun on 2.6 library code before the 2.6 release?

Should a bin() function which matches hex() and oct() be added?

Is hex() really that useful once we have advanced string formatting?

References

Copyright

This document has been placed in the public domain.

PEP 3127 – Integer Literal Support and Syntax | peps.python.org (2024)
Top Articles
Warning to Sellers, check the payment is there before shipping out!
Find a Homeless Shelter
St Thomas Usvi Craigslist
Bank Of America Financial Center Irvington Photos
Pet For Sale Craigslist
Jonathon Kinchen Net Worth
Mileage To Walmart
Rainbird Wiring Diagram
CA Kapil 🇦🇪 Talreja Dubai on LinkedIn: #businessethics #audit #pwc #evergrande #talrejaandtalreja #businesssetup…
Decaying Brackenhide Blanket
My.doculivery.com/Crowncork
Iron Drop Cafe
Craigslist Pets Southern Md
How to Store Boiled Sweets
Who called you from 6466062860 (+16466062860) ?
Who called you from +19192464227 (9192464227): 5 reviews
Swgoh Blind Characters
Why Should We Hire You? - Professional Answers for 2024
Danielle Ranslow Obituary
Foolproof Module 6 Test Answers
Avatar: The Way Of Water Showtimes Near Maya Pittsburg Cinemas
European Wax Center Toms River Reviews
Craigslist Fort Smith Ar Personals
Times Narcos Lied To You About What Really Happened - Grunge
The Powers Below Drop Rate
Weather Underground Durham
Stubhub Elton John Dodger Stadium
La Qua Brothers Funeral Home
Envy Nails Snoqualmie
Ark Unlock All Skins Command
Truckers Report Forums
Tal 3L Zeus Replacement Lid
Domina Scarlett Ct
Go Smiles Herndon Reviews
Mydocbill.com/Mr
ENDOCRINOLOGY-PSR in Lewes, DE for Beebe Healthcare
Trivago Myrtle Beach Hotels
My Locker Ausd
Free Crossword Puzzles | BestCrosswords.com
Craigslist Antique
Sara Carter Fox News Photos
Random Animal Hybrid Generator Wheel
Dayton Overdrive
Haunted Mansion Showtimes Near Millstone 14
Used Sawmill For Sale - Craigslist Near Tennessee
Clock Batteries Perhaps Crossword Clue
Grace Family Church Land O Lakes
Ty Glass Sentenced
Round Yellow Adderall
Latest Posts
Article information

Author: Edmund Hettinger DC

Last Updated:

Views: 5592

Rating: 4.8 / 5 (78 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Edmund Hettinger DC

Birthday: 1994-08-17

Address: 2033 Gerhold Pine, Port Jocelyn, VA 12101-5654

Phone: +8524399971620

Job: Central Manufacturing Supervisor

Hobby: Jogging, Metalworking, Tai chi, Shopping, Puzzles, Rock climbing, Crocheting

Introduction: My name is Edmund Hettinger DC, I am a adventurous, colorful, gifted, determined, precious, open, colorful person who loves writing and wants to share my knowledge and understanding with you.