Python is one of those languages that fills many roles. It can be usedfor prototyping, for writing actual production code, as an interfacebetween software components, or as a handy tool for easily writing quickscripts. For many of these purposes, cryptography can be a usefulcapability. Some relevant modules come with the standard Pythondistribution; there's already a module supporting the MD5 hashalgorithm, and there's a demo implementing the RSA public key system.However, the core distribution can't support everything, or it wouldhave to come on its own CD-ROM. The Python Cryptography Toolkit is acollection of extension modules for Python. One part of the Toolkit isa number of different algorithms. The list includes most of the commonones:
- Encryption algorithms: Alleged RC4, Blowfish, DES, Diamond,IDEA, LOKI91, RC5, REDOC III, Sapphire.
- Hash algorithms: MD2, MD4, MD5, Secure Hash Algorithm
- Public key algorithms: ElGamal, RSA
An important design criterion was that, assuming the Python code to becarefully written, it should be trivial to replace one algorithm withanother. To this end, modules that implement a particular class ofalgorithm share identical interfaces, and variables parameterizing themodule's characteristics are available to help in programming portably.
Encryption algorithms transform their input data (calledplaintext) in some way that isdependent on a variable key, producing ciphertext;this transformation can easily be reversed, if (and, hopefully, onlyif) one knows the key. The key can be varied by the user orapplication, chosen from some very large space of possible keys.
For block private-key encryption, the new()
function is calledwith the key and an encryption mode parameter. Block algorithms operateon fixed chunks of plaintext, usually 8 or 16 bytes. In ElectronicCodebook (ECB) mode, each block is encrypted independently of eachother. This is the fastest mode, but long strings of repeatedcharacters in the plaintext encrypt to repeating blocks, which may behelpful to an adversary. In the Cipher Block Chaining (CBC) and CipherFeedback (CFB) modes, plaintext is XORed with the previous ciphertext;this breaks up any such repeating patterns.
As an example, let us encrypt a horrifying message:
>>> import des>>> obj=des.new('abcdefgh', des.ECB)>>> plain="Guido van Rossum is a space alien.">>> len(plain)34>>> obj.encrypt(plain)Traceback (innermost last): File "", line 1, in ?ValueError: Strings for DES must be a multiple of 8 in length>>> ciph=obj.encrypt(plain+'XXXXXX')>>> ciph'\021,\343Nq\214DY\337T\342pA\372\255\311s\210\363,\300j\330\250\312\347\342I\3215w\03561\303dgb/\006'>>> obj.decrypt(ciph)'Guido van Rossum is a space alien.XXXXXX'
Hash functions produce short "fingerprints" of arbitrary data. Unlikesimple checksums, it is very difficult to find two messages that producethe same hash value, or to modify a message without changing theresulting hash value. Hash functions can be used as checksums, or aspart of a digital signature system.
>>> import md5>>> obj=md5.new()>>> obj.digest()'\324\035\214\331\217\000\262\004\351\200\011\230\354\370B~'>>> obj.update("This is a test message for a hashing function.")>>> obj.digest()'F\315\344~\032\234\227\2627\276\366d\255\262%r'>>> obj.update('\000')>>> obj.digest()'\222?:\262g\252\255\023?w\314\305~\374=6'
Public-key cryptography uses two keys; one encrypts, one decrypts. Thepublic key encrypts data, producing ciphertext that can only bedecrypted by the private key, which is only known to the legitimateowner (we hope). The public key can then be listed in a directory orhanded out to correspondents, who can then send the owner securemessages without having to arrange a key beforehand. Also, digitalsignatures can be created by decrypting data with the private key;anyone can then encrypt the data with the corresponding public key andverify that the signature and the message match. The PCT allows both the generation and use of various public-key systems.
As an example, let us generate an RSA private key to sign a text stringplaintext
. (Actually, we will sign a hash of the plaintext.)randfunc()
is a random number generation (not shown) functionthat accepts a single integer parameter N, and returns anN-byte string of random data; it's used in generating the primenumbers required for an RSA key. A class for cryptographically strongrandom number generation is provided with the Toolkit, or you canimplement your own technique.
>>> import md5, RSA>>> RSAkey=RSA.generate(384, randfunc)>>> hash=md5.new(plaintext).digest()>>> signature=RSAkey.sign(hash, "")>>> signature # Print what an RSA sig looks like--you don't really care.('\021\317\313\336\264\315' ...,)>>> RSAkey.validate(hash, signature) # This sig will check out1>>> RSAkey.validate(hash[:-1], signature)# This sig will fail0
Change md5
to SHA
or MD4
, and everything will workidentically.
The PGP module is only partially implemented; currently, public orprivate PGP keys can read from a file or a string, and be written outagain. Message support is not yet implemented, but the module is usefulfor manipulating keyrings; for example, a simple script toalphabetically sort PGP keyrings by user ID is included with theToolkit.
Future plans
In future, adding algorithms will be of less importance in development.There are two classes of cryptographic algorithms: the innumerable onesthat are proposed, and the few ones that are actually used. Newalgorithms are constantly being invented, but relatively few of thembecome part of the security engineer's toolchest, and there's no pointin trying to implement every single one of them. There are a few thatshould still be implemented: the only one remaining on my list at thispoint is the Haval hashing algorithm.
Afterwards, I will change my focus to optimizing and clarifyingthe existing C and Python code, and implementing some interestingprotocols. Also, most of the implementations have used existing code asmuch as possible. However, some of that code comes with annoyingconditions that forbid commercial use; I will begin the slow work ofreimplementing algorithms and placing them in the public domain.(Specifically, MD2 and MD4 are problems in this respect; MD5 is apublic-domain implementation by Colin Plumb.)
There are many protocols that could be implemented. As the useof the Internet for commercial purposes spreads, network securitybecomes more important as more and more financial data and proprietaryinformation is transmitted across the network. There is no singlestandard for secure Internet communications yet. Instead, there arevarious standards being proposed, and it is probably best that Python beable to use them all. I am not currently planning to work on any of thefollowing, though I'd certainly be willing to provide assistance.
- Kerberos: Kerberos is an identification andauthentication protocol developed at MIT and described in RFC 1510,designed to operate in an environment of insecure individualworkstations and a few trusted servers. The Kerberos protocol is fairlycomplex and specialized, so it may not be worth the effort of actuallyimplementing the RFC in Python. Rather, it would be much simpler towrite a wrapper module around the appropriate library routines forKerberos version 4 or 5.
- Netscape's Secure Socket Layer: After glancing at thespecification, SSL doesn't seem to be too difficult to implement.However, a complete implementation is probably impossible, because ituses the RC2 encryption algorithm, which is proprietary to RSA DataSecurity, Inc., and has not been publicly described. Thus, any Pythonimplementation will not be able to fully interoperate with officialimplementations.
- Secure HTTP: SHTTP is yet another proposed standard that allowsmore freedom in the algorithms used. Version 1.1 of the specificationis available at http://www.eit.com/projects/s-http/shttp.txt, and hasbeen submitted as an Internet draft. Because of SHTTP's more openand flexible nature, I'd prefer that it be implemented first. Whycontribute to making SSL a de facto standard when we'll probably neverbe able to fully implement it?
There are also Python-specific applications for cryptography. One ofthe demo scripts included with the Toolkit implements a crude version ofa secure import statement. Armed with a public key and a list ofsignatures (which are assumed to be magically available and known to becorrect), any compiled module is run through a hash function beforebeing imported. The hash value is then checked against thecorresponding signature; if it fails, an ImportError exception israised. This has obvious relevance to implementing distributed systemsand agents in Python.
The next release of the Toolkit is 0.0.3, and will include the DigitalSignature Algorithm and improved documentation, plus some other goodies.Hopefully, the encumbered modules will have been reimplemented aswell. I will also tackle implementing the issues on my wish list, whichleads naturally to...
My wish list
There are some enhancements to Python that would improve the Toolkit bymaking it simpler or faster.
- The
struct
module does not yet allow specifying the byteorder, nor does it support character arrays, C strings, or converting astring to a Python long integer. This would really speed up the PGPmodule, and might make it almost as fast as PGP itself, which is writtenpurely in C. - Your integer arithmetic routines can never be too fast. Python'slong integers are reasonably speedy, but I intend to benchmark various arbitrary-precision integer libraries, and see which comes out on top.