Introduction to Cryptography
Lecturer: Professor Fred B. SchneiderLecture notes byLynette I. Millett
Introduction to Cryptography
The term cryptography comes from the Greek, and means hidden or secretwriting. Cryptography includes:- techniques for implementing secrecy -- ways to obscure the content of a message from eavesdroppers,
- integrity checking -- ways to reassure the recipient that a messagewas not altered since it was generated by the sender, and
- authentication -- ways to verify the identity of the source of the message. There are two classes of applications: verifying the original sender of the message and making sure that the source of a givenmessage in a protocol is the same as the source of a previous message.
Some terminology:
- Cryptographer -- invents codes.
- Cryptoanalyst -- breaks codes.
- Cryptology -- the study of cryptography (encryption/decryption).
Monoalphabetic Substitution Cipher
The first scheme is called a monoalphabetic substitutioncipher. An example is the Caeser cipher. Here, for a givenletter in the message, shift to the right (in the alphabet) bythree. That is, an 'a' becomes 'd', 'b' becomes 'e' and so on. Thiscan be generalized to work for any n not greater than 25 (assuming a26 letter alphabet). In this cipher, the number n is the 'key.'
Another--somewhat stronger, cryptographically--example of amonoalphabetic substitution cipher is to use an arbitrary permutationof the alphabet, rather than shifting by a certain number. Ratherthan only 25 possible keys, we have 26! (26 factorial, the number ofpermutations of the alphabet, assuming a 26 letter alphabet.) A bruteforce approach to cracking this cipher, even if one spends only 1millisecond per permutation, would take roughly 10 trillionyears. Note: two successive encryptions do not increase the strengthof this cryptosystem because the product of two permutations (keys) isanother permutation (key).
In fact, 10 trillion years are not needed to crack amonoalphabetic code. We can take advantage of the fact that sentencesin natural language (for the sake of argument, assume English) do nothave uniform frequency distributions over the entire alphabet. Thus,we can calculate the relative frequency per character in the ciphertext and compare it to what we know about the English language. (Forexample, 'e' is the most frequent letter, occurring 14% of the time,'t' occurs 9.85%, and so on.) Monoalphabetic substitution cipherspreserve this distribution. Therefore, if we know the originallanguage, and we know that a monoalphabetic substitution was used,then we have a good chance of cracking the code. This kind of attackis referred to as a ciphertext only attack.
Cryptoanalytic Attacks
There are several types of cryptoanalytic attacks:
- Ciphertext only -- attacker has only the ciphertext.
- Known plaintext -- attacker has full or partial (e.g. message headers)plaintext, or probable plaintext.
- Chosen plaintext -- attacker can get an arbitrary message encoded.Attackerthen cracks an encrypted message by guessing the contents and submitting the guess for encryption and comparing. This is especially useful if there are only a small number of possibilities for what the plaintext might be.
- Algorithm and ciphertext -- also known as a 'dictionary attack.' Run the algorithm on massive amounts of plaintext and find the one plaintextmessage that encrypts to the ciphertext you are analyzing. This is the reason behind the 'salt' in UNIX password representations. The password contaminated with 'salt' (a random bit string) and then encrypted beforebeing stored in /etc/passwd. Cracking passwords is now much more difficultthan simply encrypting a dictionary once and comparing. It is now necessaryto separately encrypt each salt value with the entire dictionary.
Polyalphabetic Substitution Cipher
The problem with monoalphabetic substitution ciphers is that thepreservation of alphabet distributions makes them vulnerable tofrequency-based attacks. We would like a scheme that encryptsplaintext (manifesting a particular distribution) into ciphertext thathas a smooth distribution. Therefore, instead of mapping a letter toa fixed symbol, we will map both high and low frequency symbols to thesame symbol by using a different permutation of the alphabet fordifferent character positions in the message. This is accomplishedthrough a Vigenere Tableaux: a list of possible permutations of thealphabet. The key is a sequence of lines in the tableaux. If there arefour permutations in the tableaux, then the first character in themessage is substituted based on the first line in the table, thesecond character by the second line, the third by the third line, thefourth by the fourth line, the fifth character by the first line andso on in a cyclical fashion.How can such a substitution be cracked? Note that if we know the keylength (how many different permutations are used), then, in the above example we would know that the first, fifth, ninth, etc. characterswere encrypted under the same permutation. Thus, knowing that the key length is n reduces the problem to cracking n monoalphabetic substitution ciphers.Cryptoanalysis of a polyalphabetic substitution cipher is thereforeaccomplished in three steps:
- Determine the key length.
- Break ciphertext into separate pieces; one piece per permutation.
- Solve each piece as a separate monoalphabetic substitution using frequency distributions.
- Identify repeated patterns of 3 or more letters.
- For each pattern, write down the starting positions for all the instancesof the pattern.
- Compute the differences between the starting positions of successiveinstances.
- Determine all the factors of these differences.
- The key length will be one of the factors that appears often.
Transposition Ciphers
The problem the Kasiski method exposes is that with substitutionciphers the information in the message does not get 'spread out'enough. That is, the trigram 'the' is still a trigram in theciphertext (albeit encoded.) One easy scheme to accomplish this'spreading' is by using transposition. In this scheme, the message iswritten out in rows, but the columns are written down. For exampleTHIS IS ANEXAMPLE OFA MESSAGEbecomes
TEAHX IAMSME PSILSSEA GAOENFThis distributes the n-grams, and once the transposition is done asubstitution can also be performed. To attack such a transposition,one needs to determine the dimension of the transposition matrix (inthe above example: 10x3). This adds one more level of complexity, butis not fundamentally different than the other schemes we haveexamined. Spreading information through a message like this is calleddiffusion.
Perfect Substitution Cipher
What are the characteristics of a perfect substitution cipher? Wedesire perfect secrecy which we define as: The probabilitythat a message can be decrypted is unaltered by knowledge of theciphertext for that message. A perfect substitution cipher wouldtherefore destroy all n-gram frequencies. This can be accomplishedwith a one-time pad, an infinite sequence of randombits. This sequence is XORed with the message. If it is a randomsequence, then knowing any one bit will tell you nothing about whatthe next bit will be. Moreover, decoding is simple because (key XORmessage) XOR key = message. The problem is that unlimited key materialis needed, and absolute synchrony is necessary between the sender andthe receiver.There are two properties of an encryption scheme that are desirable:
- confusion -- the interceptor should not be able to predict the effect of changing one character in the plaintext on the ciphertext.
- diffusion -- changes in the plaintext should affect many parts ofthe ciphertext. (Substitution and permutation do not exhibit diffusion.)