Twofish For Python
Documentation

Menu:

Home

News
Software
Documentation
Links
EMAIL Me!

EFF
OpenDVD

Some Handy Documentation

  1. About Ciphers and Encryption: A Beginner's Tutorial
  2. Using twofish.py for Work and Entertainment
Note: This is under construction. In the meantime, the tutorial starts below:

About Ciphers and Encryption: A Beginner's Tutorial

By Eric Lee Green

Encryption is the process of scrambling data in a reversible manner, such that the recipient cannot un-scramble it without one or more "keys". The goal is to keep information private from those who would intercept it during transmission and use it for their own ulterior purposes.

There are two basic types of encryption, based on the kind of keys needed to scramble and unscramble the data:

  1. Symmetric (shared private key) encryption, and
  2. Asymetric (public key) encryption.
Public key encryption relies upon each side of the communication having two keys -- a public (shared) key, and a mathematically-related private key. Thus there is no need for a shared private key, and no problem with communicating such a key. The downside is extremely poor performance, since public key encryption involves mathematical operations on very large numbers (generally at least 512 bit exponentiation and multiplication operations, and often much more). I will discuss public key encryption further in a future tutorial.

Symmetric key encryption is the classic encryption that relies upon having a shared secret at each end. This shared secret might be a one time pad, or a code phrase book, or a set of settings for a cipher machine. Most modern encryption uses a "password" or pass phrase, which, when condensed down, is used as a set of settings for a very complex (virtual) cipher machine, implemented as software within your computer. The problem, of course, is how do you get the key to the person you want to read your message, without telling everybody in the whole world what your password is? That's where public key encryption comes into play, but we're not going there yet.

Private key encryption can be basically broken down into three categories:

  1. One Time Pads: The "key" here is a "pad" of data the exact same size as the message. It is added into the data using what's called an "exclusive or" operation, which has the property that it is reversable by merely "exclusive-or'ing" with the pad again. Thus both ends must have a rather bulky pad that was somehow obtained without anybody else seeing it, and the pad can only be used once (because otherwise, thanks to various mathematical properties of the "exclusive or" operation, the pad data can be recovered and used to decrypt all messages encoded with the pad).
  2. Stream Ciphers: These basically use a pseudo-random-number generator to create a "pseudo-one-time-pad". The key is used to seed the generator to an initial state, and then the "one time pad" is generated "on the fly" as bytes are needed. As with one-time pads, the keys cannot be re-used for a "pure" stream cipher. However, there are derivatives of the "classic" stream cipher that allow key re-use (generally using a combination of a unique but not-necessarily-secret initialization value and a "feedback" mechanism to alter the "one time pad" for each encryption).
  3. Block Ciphers: These operate upon a small block of data, generally 64 to 128 bits. Within that small block of data, bits are shuffled and folded and mixed and mashed until the result is statistically unrelated to the original input value. You can view a block cipher as a mixing machine, with its internal settings being set by the key data. The other end uses that same key data to reverse the original mixing and mashing to get back the original result.

Benefits of block ciphers

Generally, most emphasis today in private key encryption is placed upon block ciphers. This is because modern block ciphers have a number of characteristics that are useful in today's world:
  1. Resistence to 'known text' attacks: With one-time pads, if you know the plain text being encrypted, you can recover the key data. That's one reason why you must use a different pad for each encryption.
  2. No Unique Initialization Values needed: Since you don't need a different key for each encryption, you don't need to somehow generate a unique initialization value for each encryption, the way you must do with many stream ciphers.
  3. No leaking of key data: The output is statistically unrelated to the key data, for a properly designed block cipher. By contrast, the padstream is directly mathematically related to the key data in a stream cipher. A poorly implemented stream cipher using easily reversible operations to produce its pad data could thus be "backtracked" to its initial state, i.e., the key, if somehow the actual pad data is intercepted (e.g., if 'known text' was fed to the padstream generator).

Disadvantages of block ciphers

Those are only a few reasons why block ciphers have become somewhat popular recently, but there are also drawbacks to block ciphers.
  1. Block ciphers operate upon blocks of data, not upon streams of data. With a block cipher, we must chunk data before we can encrypt it. There's some data, like keystrokes being typed via a "telnet" session, that cannot be easily chunked in that manner. Thus a "stream" mode has been created that uses a block cipher to create a "one time pad", but this mode shares the same problem as most stream ciphers (i.e., must use a unique "initialization value" for each message encrypted, so that you never create the same "one time pad" for different messages).
  2. Block ciphers don't care what order you put the blocks in. Thus an attacker can snip 128 bits of data from the stream, insert it elsewhere in the stream, and it decodes to valid data. Cipher block feedback modes have been created so that if data is injected out-of-order, it not only will not decode to valid text, but all further data will similarly not decode.
  3. If you encrypt the phrase "now is the time." twice with a 128-bit block cipher, it encrypts to the exact same text. The attacker could possibly use that fact to gain unwanted insights about your data, and possibly even to compromise your network security. For example, if the same 128 bit encryption of your passphrase flows between the server and your desktop computer each time you log in, the attacker does not need to know your password -- he can merely re-transmit that 128 bits of data without knowing what it means. Using a block cipher feedback mode with a random initialization "challenge" value (provided by the server end, hopefully) will foil this.
TwoFish is a block cipher. It operates upon blocks that are 128 bits (16 characters) in size. It uses keys that are either 128, 192, or 256 bits in length. For the Python module, the block cipher machinery itself is implemented in "C". This is because Python itself is notoriously inefficient at doing bit-mixing type operations, while "C", being one step removed from pure machine language, is excellent at such operations. The various feedback modes used to either implement a stream cipher using TwoFish as the "one time pad" generator, or to "feed back" blocks into themselves in order to avoid the re-ordering and repeated-text problems mentioned above, are implemented as Python routines with "C" helper routines to do the actual low-level bit-boffing.

Feedback Modes

CBC Mode:

Cipher Block Chaining mode is a full-block feedback mode which masks its input (via the "exclusive-or" operation) using the output of the previous block's encipherment. The first mask is 128 bits of random data, which is used to mask the initial input block and which is also sent to the recipient so that it can do the same, in reverse, when deciphering the information. From then on, the output of each block enciphered is used to mask the next 128 bits of input data prior to feeding it to the encryption routine. The other end is doing the reverse in order to The net result is that
  1. Data is order-dependent. You cannot insert data at random and have it decode to something that makes sense, and
  2. The phrase "this is the time" will not encrypt to the same 128 bits of output data in any block even if your entire file consists of that phrase repeated over and over again.
This is a block mode. This means that all data must be collected into 128-bit chunks prior to encoding it. That makes this mode useful for any block-oriented transactions (such as, e.g., encoding network packets), but not useful for stream-oriented sessions such as, e.g., a telnet session or a terminal session, where bytes come and go and never in a predictable manner. On the other hand, this is by far the most powerful and cryptographically strong encryption mode, with all of the native power of a block cipher and more.

Cipher Feedback Mode (128-bit Mode)

CFB-128 mode uses the block cipher as the "one time pad" generator for a stream cipher. First, a unique 16-byte initialization value is generated. This initialization value must be different for each encipherment made with the same key, else the generated "one time pad" will be the same for multiple encipherments (meaning that your data is toast). Next, this 16 byte initialization vector is encrypted by TwoFish using your key. The results of the encryption are used as a 16-byte "pad" to exclusive-or with the next 16 bytes of data as they come in. Note that this is a one-byte-at-a-time process, the input data does not have to be gathered into chunks in order to be encrypted. The output encrypted data is collected until 16 bytes of output data have been collected. This 16 bytes is then encrypted, and the encrypted 16 bytes of data are used as the next 16 bytes of the "one time pad". This process continues until the last byte of input has been processed.

The problem with Cipher Feedback Mode is guaranteeing a unique initialization value. In most implementations that's hard to guarantee. The best I've come up with is to use a random number generator to create it, but eventually, the same initialization value will come up again. Perhaps the best solution would be to use CBC mode or public key encryption to handshake across a session key the first time the two ends of the communications connect with each other, and simply increment a counter from 1 through 2^128 during subsequent communications in order to generate initialization values. Remember, it does not matter whether the attacker knows the initialization value, as long as it is different (and thus generates a different "one time pad" each time).

Conclusions

This introduction to encryption with TwoFish has already gone on long enough. I have tried to stay fairly non-mathematical here, but there is a large body of mathematics concerned with the subject. Evaluation of symmetric ciphers requires a huge dollop of statistics in order to measure unpredictability, randomness, diffusion, etc. For public key encryption, I'd suggest that you study number theory, especially regarding math within a prime field, properties of exponentiation and logs within a prime field, prime multiplication/division/factoring, elliptic curves, and other such esoteric subjects.

If, on the other hand, you just want to use encryption... the problem here is that, without at least an intuitive grasp of the mathematics involved, it is very easy to make mistakes. I am not a mathematician. But when somebody described CFB-128 mode and asked if it were cryptographically secure (as compared to CFB-8 mode), it was immediately obvious to me that if TwoFish was a strong cipher (i.e., one whose output showed no statistical relationship to its input), every byte resulting from encrypting a unique value would be statistically unpredictable and thus suitable for use as keypad material.

If you want the minimum of math, and a maximum of applications, I suggest Bruce Schneir's "Applied Cryptography" (hit "Links" in the left margin). Unfortunately, you'll need another book to learn about stream ciphers and public key encryption -- Bruce is a block cipher man, and it shows. I will certainly accept suggestions, with the understanding that it cannot be a highly mathematical textbook -- I suspect that most people reading this page are working engineers looking for a solution, rather than academics looking for math problems. In other words, the math is necessary, but should not be the whole point of the book.


Eric Lee Green

Note that everything on this page represents my own opinions and not the opinions of my employer, SourceForge, my mother, my best friend, or my cat (well, MAYBE the cat, since we all know who the boss is around the house!).

Created by 'm4web'. Last modified: Mon Jan 31 21:10:17