Introduction to Cryptography
Lecturer: Professor Fred B. SchneiderLecture notes byLynette I. Millett
Revised by Jed Liu
Introduction to Cryptography
The term cryptography comes from the Greek, and means hidden or secretwriting. Cryptography includes:- techniques for implementing secrecy -- ways to obscure the content of a message from eavesdroppers,
- integrity checking -- ways to reassure the recipient that a messagewas not altered since it was generated by the sender, and
- authentication -- ways to verify the identity of the source of the message. There are two classes of applications: verifying the original sender of the message and making sure that the source of a givenmessage in a protocol is the same as the source of a previous message.
Some terminology:
- Cryptographer -- invents codes.
- Cryptanalyst -- breaks codes.
- Cryptology -- the study of cryptography (encryption/decryption).
One should note that it is far easier to break codes than it is to createthem. In fact, only a few people in the world really know how tocreate codes; hence, we should be very skeptical of any newly proposedcryptosystem.
Today we will discuss a few primitive cryptographic systems in order to build intuition about cryptanalysis.
Monoalphabetic Substitution Cipher
The first scheme is called a monoalphabetic substitution cipher.In this cipher, we encrypt a given letter in the message by shifting it to theright (in the alphabet) by some number n. For example, in the Caesar cipher,n = 3. That is, an 'a' becomes 'd', 'b' becomes 'e' and so on. This worksfor any n not greater than 25 (assuming a 26 letter alphabet). In thiscipher, the number n is the 'key.'
Another—somewhat cryptographically stronger—example of amonoalphabetic substitution cipher is to use an arbitrary permutation of thealphabet, rather than shifting by a certain number. Rather than only 25possible keys, we have 26! (26 factorial, the number of permutations of thealphabet, assuming a 26 letter alphabet), or roughly 4 x 1026. Abrute force approach to cracking this cipher, even if one spends only 1microsecond per permutation, would take roughly 10 trillion years. It isimportant to note that two successive encryptions do not increase the strengthof this cryptosystem because the product of two permutations (keys) is anotherpermutation (key).
In actuality, 10 trillion years are not needed to crack a monoalphabeticcode. We can take advantage of the fact that sentences in natural language(for the sake of argument, assume English) do not have uniform frequencydistributions over the entire alphabet. Thus, we can calculate the frequencydistribution for each character in the cipher text and compare it to what weknow about the English language. (For example, 'e' is the most frequentletter, occurring 14% of the time, 't' occurs 9.85%, while 'q' only occurs0.26%.) Monoalphabetic substitution ciphers have the property that frequencydistributions are preserved. Therefore, if we know the original language, andwe know that a monoalphabetic substitution was used, then we have a goodchance of cracking the code. This kind of attack is referred to as aciphertext only attack.
Cryptanalytic Attacks
There are several types of cryptanalytic attacks:
- Ciphertext only: the attacker has only the ciphertext.
- Known plaintext: the attacker has full or partial (e.g. messageheaders) plaintext, or probable plaintext.
- Chosen plaintext: the attacker can encode any arbitrary message.The attacker then cracks an encrypted message by guessing the contents andsubmitting the guess for encryption and comparing. This is especially usefulif there are only a small number of possibilities for what the plaintext mightbe.
- Algorithm and ciphertext (also known as a 'dictionary attack'):the attacker runs the algorithm on massive amounts of plaintext and find theone plaintext message that encrypts to the ciphertext you are analyzing. Thisis the reason behind the 'salt' in UNIX password representations: the passwordgets contaminated with 'salt' (a random bit string) and then encrypted beforebeing stored in /etc/passwd. As a result, cracking passwords gets much moredifficult than simply encrypting a dictionary once and comparing: it is nownecessary to separately encrypt each salt value with the entire dictionary.
Polyalphabetic Substitution Cipher
The problem with monoalphabetic substitution ciphers is that the preservationof alphabet distributions makes them vulnerable to frequency-based attacks.We would like a scheme that encrypts plaintext (manifesting a particulardistribution) into ciphertext that has a smooth distribution. Therefore,instead of mapping a letter to a fixed symbol, we will map both high and lowfrequency symbols to the same symbol by using a different permutation of thealphabet for different character positions in the message. This isaccomplished through what is known as a Vigenère Tableaux, a list ofpossible permutations of the alphabet. The key is a sequence of lines in thetableaux. If there are four permutations in the tableaux, then the firstcharacter in the message is substituted based on the first line in the table,the second character by the second line, the third by the third line, thefourth by the fourth line, the fifth character by the first line, and so on ina cyclical fashion.How can such a substitution be cracked? Note that if we know the keylength (how many different permutations are used), then, in the above examplewe would know that the first, fifth, ninth, etc. characters were encryptedunder the same permutation. Thus, knowing that the key length is n reducesthe problem to cracking n monoalphabetic substitution ciphers. Cryptanalysisof a polyalphabetic substitution cipher is therefore accomplished in threesteps:
- Determine the key length.
- Break ciphertext into separate pieces; one piece per permutation.
- Solve each piece as a separate monoalphabetic substitution usingfrequency distributions.
- Identify repeated patterns of 3 or more letters.
- For each pattern, write down the starting positions for all the instancesof the pattern.
- Compute the differences between the starting positions of successiveinstances.
- Determine all the factors of these differences.
- The key length will be one of the factors that appears often.
Transposition Ciphers
The problem the Kasiski method exposes is that with substitution ciphers theinformation in the message does not get 'spread out' enough. That is, thetrigram 'the' is still a trigram in the ciphertext (albeit encoded.) One easyscheme to accomplish this 'spreading' is by using transposition. In thisscheme, the message is written out in rows, but the columns are written down.For exampleTHIS IS ANEXAMPLE OFA MESSAGEbecomes
TEAHX IAMSME PSILSSEA GAOENFThis distributes the n-grams, and once the transposition is done asubstitution can also be performed. To attack such a transposition, one needsto determine the dimension of the transposition matrix (in the above example:10x3). This adds one more level of complexity, but is not fundamentallydifferent than the other schemes we have examined. Spreading informationthrough a message like this is called diffusion.
Perfect Substitution Cipher
Many ciphers in existence only provide computational security.That is, the encoded information is kept secret only because of limits in ourcurrent processing capacity. However, given enough computing power, one canalways launch a brute force attack to break the cipher.
One might ask, then, what are the characteristics of a perfectsubstitution cipher? Such a cipher would provide us with unconditionalsecrecy, where the probability that a message can be decrypted isunaltered by knowledge of the ciphertext for that message. This can beaccomplished with a one-time pad, an infinite sequence of randombits. This sequence is XORed with the message. If it is a random sequence,then knowing any one bit will tell you nothing about what the next bit willbe. Moreover, decoding is simple because (key XOR message) XOR key = message.The problem is that unlimited key material is needed, and absolute synchronyis necessary between the sender and the receiver.
There are two properties of an encryption scheme that are desirable:
- confusion -- the interceptor should not be able to predict theeffect of changing one character in the plaintext on the ciphertext.
- diffusion -- changes in the plaintext should affect many partsof the ciphertext. (Substitution and permutation do not exhibit diffusion.)
Everlasting Security
In January 2002, Michael Rabin of Harvard University announced a schemefor what he called "everlasting security". Here, he uses a bounded storagemodel, a refinement of computational security, in which it is assumed that anyadversary has a bounded amount of storage available, but no assumption is madeabout computational power.
To encryt using this scheme, the sender A and the receiver B use apublicly accessible sequence α of random bits. This sequence should belonger than their adversary's storage capacity and can come, for example, froman intercepted digital streams from a TV channel. The key that A and B shareis an algorithm for picking out bits from α to use for a one-time pad.To solve the problem of perfect synchony, it will suffice for A and B to use aself-synchronizing stream, such as that from a digital TV broadcast or asatelite signal.
To see why this works, suppose an adversary captures the ciphertext andlater captures the key. But since the adversary lacks the ability to storeα, she will not be able to construct the one-time pad, and so, will notbe able to decrypt the message.