CS 513 System Security -- Introduction to Cryptography (2024)

Introduction to Cryptography

Lecturer: Professor Fred B. Schneider
Lecture notes byLynette I. Millett

Introduction to Cryptography

The term cryptography comes from the Greek, and means hidden or secretwriting. Cryptography includes:

techniques for implementing secrecy -- ways to obscure the content of a message from eavesdroppers,
integrity checking -- ways to reassure the recipient that a messagewas not altered since it was generated by the sender, and
authentication -- ways to verify the identity of the source of the message. There are two classes of applications: verifying the original sender of the message and making sure that the source of a givenmessage in a protocol is the same as the source of a previous message.

Cryptography involves converting plaintext into ciphertext through aprocess known as encryption. Ciphertext is converted back to plaintextby decryption. Usually the algorithms are public, but an input, calledthe key, is secret. Note that the key for encryption does notnecessarily have to be the same as the key for decryption.

Some terminology:

Monoalphabetic Substitution Cipher

The first scheme is called a monoalphabetic substitutioncipher. An example is the Caeser cipher. Here, for a givenletter in the message, shift to the right (in the alphabet) bythree. That is, an 'a' becomes 'd', 'b' becomes 'e' and so on. Thiscan be generalized to work for any n not greater than 25 (assuming a26 letter alphabet). In this cipher, the number n is the 'key.'

Another--somewhat stronger, cryptographically--example of amonoalphabetic substitution cipher is to use an arbitrary permutationof the alphabet, rather than shifting by a certain number. Ratherthan only 25 possible keys, we have 26! (26 factorial, the number ofpermutations of the alphabet, assuming a 26 letter alphabet.) A bruteforce approach to cracking this cipher, even if one spends only 1millisecond per permutation, would take roughly 10 trillionyears. Note: two successive encryptions do not increase the strengthof this cryptosystem because the product of two permutations (keys) isanother permutation (key).

In fact, 10 trillion years are not needed to crack amonoalphabetic code. We can take advantage of the fact that sentencesin natural language (for the sake of argument, assume English) do nothave uniform frequency distributions over the entire alphabet. Thus,we can calculate the relative frequency per character in the ciphertext and compare it to what we know about the English language. (Forexample, 'e' is the most frequent letter, occurring 14% of the time,'t' occurs 9.85%, and so on.) Monoalphabetic substitution cipherspreserve this distribution. Therefore, if we know the originallanguage, and we know that a monoalphabetic substitution was used,then we have a good chance of cracking the code. This kind of attackis referred to as a ciphertext only attack.

Cryptoanalytic Attacks

There are several types of cryptoanalytic attacks:

Ciphertext only -- attacker has only the ciphertext.
Known plaintext -- attacker has full or partial (e.g. message headers)plaintext, or probable plaintext.
Chosen plaintext -- attacker can get an arbitrary message encoded.Attackerthen cracks an encrypted message by guessing the contents and submitting the guess for encryption and comparing. This is especially useful if there are only a small number of possibilities for what the plaintext might be.
Algorithm and ciphertext -- also known as a 'dictionary attack.' Run the algorithm on massive amounts of plaintext and find the one plaintextmessage that encrypts to the ciphertext you are analyzing. This is the reason behind the 'salt' in UNIX password representations. The password contaminated with 'salt' (a random bit string) and then encrypted beforebeing stored in /etc/passwd. Cracking passwords is now much more difficultthan simply encrypting a dictionary once and comparing. It is now necessaryto separately encrypt each salt value with the entire dictionary.

Polyalphabetic Substitution Cipher

The problem with monoalphabetic substitution ciphers is that thepreservation of alphabet distributions makes them vulnerable tofrequency-based attacks. We would like a scheme that encryptsplaintext (manifesting a particular distribution) into ciphertext thathas a smooth distribution. Therefore, instead of mapping a letter toa fixed symbol, we will map both high and low frequency symbols to thesame symbol by using a different permutation of the alphabet fordifferent character positions in the message. This is accomplishedthrough a Vigenere Tableaux: a list of possible permutations of thealphabet. The key is a sequence of lines in the tableaux. If there arefour permutations in the tableaux, then the first character in themessage is substituted based on the first line in the table, thesecond character by the second line, the third by the third line, thefourth by the fourth line, the fifth character by the first line andso on in a cyclical fashion.

How can such a substitution be cracked? Note that if we know the keylength (how many different permutations are used), then, in the above example we would know that the first, fifth, ninth, etc. characterswere encrypted under the same permutation. Thus, knowing that the key length is n reduces the problem to cracking n monoalphabetic substitution ciphers.Cryptoanalysis of a polyalphabetic substitution cipher is thereforeaccomplished in three steps:

Determine the key length.
Break ciphertext into separate pieces; one piece per permutation.
Solve each piece as a separate monoalphabetic substitution using frequency distributions.

We present Kasiski's method for determining the key length. In asubstitution cipher, patterns in the plaintext will manifestthemselves in ciphertext. For instance, the digram 'th' occursfrequently in English. In a polyalphabetic substitution, it is likelythat 'th' will be permuted using permutations 1 and 2 at multiplepoints. If the message is long enough, this will happen repeatedly andwe can look for these repeated patterns. Kasiski's method works ontrigrams (or more) as follows:

Identify repeated patterns of 3 or more letters.
For each pattern, write down the starting positions for all the instancesof the pattern.
Compute the differences between the starting positions of successiveinstances.
Determine all the factors of these differences.
The key length will be one of the factors that appears often.

Once a probable key length has been found, rather than attempting todecode the entire message, one can quickly check and see whether theresult of partitioning the message according to that key length hasthe same kind of frequency distribution that English has.

Transposition Ciphers

The problem the Kasiski method exposes is that with substitutionciphers the information in the message does not get 'spread out'enough. That is, the trigram 'the' is still a trigram in theciphertext (albeit encoded.) One easy scheme to accomplish this'spreading' is by using transposition. In this scheme, the message iswritten out in rows, but the columns are written down. For example

THIS IS ANEXAMPLE OFA MESSAGE

becomes

TEAHX IAMSME PSILSSEA GAOENF

This distributes the n-grams, and once the transposition is done asubstitution can also be performed. To attack such a transposition,one needs to determine the dimension of the transposition matrix (inthe above example: 10x3). This adds one more level of complexity, butis not fundamentally different than the other schemes we haveexamined. Spreading information through a message like this is calleddiffusion.

Perfect Substitution Cipher

What are the characteristics of a perfect substitution cipher? Wedesire perfect secrecy which we define as: The probabilitythat a message can be decrypted is unaltered by knowledge of theciphertext for that message. A perfect substitution cipher wouldtherefore destroy all n-gram frequencies. This can be accomplishedwith a one-time pad, an infinite sequence of randombits. This sequence is XORed with the message. If it is a randomsequence, then knowing any one bit will tell you nothing about whatthe next bit will be. Moreover, decoding is simple because (key XORmessage) XOR key = message. The problem is that unlimited key materialis needed, and absolute synchrony is necessary between the sender andthe receiver.

There are two properties of an encryption scheme that are desirable:

confusion -- the interceptor should not be able to predict the effect of changing one character in the plaintext on the ciphertext.
diffusion -- changes in the plaintext should affect many parts ofthe ciphertext. (Substitution and permutation do not exhibit diffusion.)