Menu:
Home
News
Software
Documentation
Links
EMAIL Me!
|
Some Handy Documentation
- About Ciphers and Encryption: A Beginner's Tutorial
- Using twofish.py for Work and Entertainment
Note: This is under construction. In the meantime, the tutorial starts
below:
About Ciphers and Encryption: A Beginner's Tutorial
By Eric Lee Green
Encryption is the process of scrambling data in a reversible manner,
such that the recipient cannot un-scramble it without one or more
"keys". The goal is to keep information private from those who would
intercept it during transmission and use it for their own ulterior
purposes.
There are two basic types of encryption, based on the kind of keys needed to
scramble and unscramble the data:
- Symmetric (shared private key) encryption, and
- Asymetric (public key) encryption.
Public key encryption relies upon each side of the communication
having two keys -- a public (shared) key, and a mathematically-related
private key. Thus there is no need for a shared private key, and no
problem with communicating such a key. The downside is extremely poor
performance, since public key encryption involves mathematical
operations on very large numbers (generally at least 512 bit
exponentiation and multiplication operations, and often much more). I
will discuss public key encryption further in a future tutorial.
Symmetric key encryption is the classic encryption that relies upon
having a shared secret at each end. This shared secret might be a one
time pad, or a code phrase book, or a set of settings for a cipher
machine. Most modern encryption uses a "password" or pass phrase,
which, when condensed down, is used as a set of settings for a very
complex (virtual) cipher machine, implemented as software within your
computer. The problem, of course, is how do you get the key to the person
you want to read your message, without telling everybody in the whole world
what your password is? That's where public key encryption comes into play,
but we're not going there yet.
Private key encryption can be basically broken down into three categories:
- One Time Pads: The "key" here is a "pad" of data the exact same
size as the message. It is added into the data using what's called an
"exclusive or" operation, which has the property that it is reversable
by merely "exclusive-or'ing" with the pad again. Thus both ends must
have a rather bulky pad that was somehow obtained without anybody else
seeing it, and the pad can only be used once (because otherwise,
thanks to various mathematical properties of the "exclusive or"
operation, the pad data can be recovered and used to decrypt all
messages encoded with the pad).
- Stream Ciphers: These basically
use a pseudo-random-number generator to create a
"pseudo-one-time-pad". The key is used to seed the generator to an
initial state, and then the "one time pad" is generated "on the fly"
as bytes are needed. As with one-time pads, the keys cannot be re-used
for a "pure" stream cipher. However, there are derivatives of the
"classic" stream cipher that allow key re-use (generally using a
combination of a unique but not-necessarily-secret initialization
value and a "feedback" mechanism to alter the "one time pad" for each
encryption).
- Block Ciphers: These operate upon a small block of data, generally
64 to 128 bits. Within that small block of data, bits are shuffled and folded
and mixed and mashed until the result is statistically unrelated to the
original input value. You can view a block cipher as a mixing machine, with
its internal settings being set by the key data. The other end uses that
same key data to reverse the original mixing and mashing to get back the
original result.
Benefits of block ciphers
Generally, most emphasis today in private key encryption is placed upon
block ciphers. This is because modern block ciphers have a number of
characteristics that are useful in today's world:
- Resistence to 'known text' attacks: With one-time pads, if you
know the plain text being encrypted, you can recover the key
data. That's one reason why you must use a different pad for each
encryption.
- No Unique Initialization Values needed: Since you don't need a different
key for each encryption, you don't need to somehow generate a unique
initialization value for each encryption, the way you must do with many
stream ciphers.
- No leaking of key data: The output is statistically unrelated to the
key data, for a properly designed block cipher. By contrast, the padstream
is directly mathematically related to the key data in a stream cipher.
A poorly implemented stream cipher using easily reversible operations
to produce its pad data could thus be "backtracked" to its initial state,
i.e., the key, if somehow the actual pad data is intercepted (e.g., if
'known text' was fed to the padstream generator).
Disadvantages of block ciphers
Those are only a few reasons why block ciphers have become somewhat
popular recently, but there are also drawbacks to block ciphers.
- Block ciphers operate upon blocks of data, not upon streams of data.
With a block cipher, we must chunk data before we can encrypt
it. There's some data, like keystrokes being typed via a "telnet"
session, that cannot be easily chunked in that manner. Thus a "stream" mode
has been created that uses a block cipher to create a "one time pad", but
this mode shares the same problem as most stream ciphers (i.e., must
use a unique "initialization value" for each message encrypted, so that
you never create the same "one time pad" for different messages).
- Block ciphers don't care what order you put the blocks in. Thus an
attacker can snip 128 bits of data from the stream, insert it elsewhere in
the stream, and it decodes to valid data. Cipher block feedback modes have
been created so that if data is injected out-of-order, it not only will
not decode to valid text, but all further data will similarly not
decode.
- If you encrypt the phrase "now is the time." twice with a 128-bit
block cipher, it encrypts to the exact same text. The attacker could
possibly use that fact to gain unwanted insights about your data, and
possibly even to compromise your network security. For example,
if the same 128 bit encryption of your passphrase flows between the server
and your desktop computer each time you log in, the attacker does not need to
know your password -- he can merely re-transmit that 128 bits of data without
knowing what it means. Using a block cipher feedback mode with a random
initialization "challenge" value (provided by the server end, hopefully) will
foil this.
TwoFish is a block cipher. It operates upon blocks that are 128 bits
(16 characters) in size. It uses keys that are either 128, 192, or 256
bits in length. For the Python module, the block cipher machinery
itself is implemented in "C". This is because Python itself is
notoriously inefficient at doing bit-mixing type operations, while
"C", being one step removed from pure machine language, is excellent
at such operations. The various feedback modes used to either
implement a stream cipher using TwoFish as the "one time pad"
generator, or to "feed back" blocks into themselves in order to avoid
the re-ordering and repeated-text problems mentioned above, are
implemented as Python routines with "C" helper routines to do the
actual low-level bit-boffing.
Feedback Modes
CBC Mode:
Cipher Block Chaining mode is a full-block feedback mode which
masks its input (via the "exclusive-or" operation) using the output of
the previous block's encipherment. The first mask is 128 bits of
random data, which is used to mask the initial input block and which is
also sent to the recipient so that it can do the same, in reverse, when
deciphering the information. From then on,
the output of each block enciphered is used to mask the next 128 bits of
input data prior to feeding it to the encryption routine. The other end is
doing the reverse in order to The net result
is that
- Data is order-dependent. You cannot insert data at random and have it
decode to something that makes sense, and
- The phrase "this is the time" will not encrypt to the same 128
bits of output data in any block
even if your entire file consists of that phrase
repeated over and over again.
This is a block mode. This means that all data must be collected into 128-bit
chunks prior to encoding it. That makes this mode useful for any block-oriented
transactions (such as, e.g., encoding network packets), but not useful for
stream-oriented sessions such as, e.g., a telnet session or a terminal
session, where bytes come and go and never in a predictable manner.
On the other hand, this is by far the most powerful and cryptographically
strong encryption mode, with all of the native power of a block cipher
and more.
Cipher Feedback Mode (128-bit Mode)
CFB-128 mode uses the block cipher as the "one time pad" generator for a
stream cipher. First, a unique 16-byte initialization value is
generated. This initialization value must be different for each
encipherment made with the same key, else the generated "one time pad"
will be the same for multiple encipherments (meaning that your data is
toast). Next, this 16 byte initialization vector is encrypted by
TwoFish using your key. The results of the encryption are used as a
16-byte "pad" to exclusive-or with the next 16 bytes of data as they
come in. Note that this is a one-byte-at-a-time process, the input
data does not have to be gathered into chunks in order to be
encrypted. The output encrypted data is collected until 16 bytes of
output data have been collected. This 16 bytes is then encrypted, and the
encrypted 16 bytes of data are used as the next 16 bytes of the "one time
pad". This process continues until the last byte of input has been processed.
The problem with Cipher Feedback Mode is guaranteeing a unique
initialization value. In most implementations that's hard to
guarantee. The best I've come up with is to use a random number
generator to create it, but eventually, the same initialization value
will come up again. Perhaps the best solution would be to use CBC
mode or public key encryption to handshake across a session key the
first time the two ends of the communications connect with each other,
and simply increment a counter from 1 through 2^128 during subsequent
communications in order to generate initialization values. Remember,
it does not matter whether the attacker knows the initialization
value, as long as it is different (and thus generates a different "one
time pad" each time).
Conclusions
This introduction to encryption with TwoFish has already gone on long
enough. I have tried to stay fairly non-mathematical here, but there
is a large body of mathematics concerned with the subject. Evaluation
of symmetric ciphers requires a huge dollop of statistics in order to
measure unpredictability, randomness, diffusion, etc. For public key
encryption, I'd suggest that you study number theory, especially
regarding math within a prime field, properties of exponentiation and
logs within a prime field, prime multiplication/division/factoring, elliptic
curves, and other such esoteric subjects.
If, on the other hand, you just want to use encryption... the problem
here is that, without at least an intuitive grasp of the mathematics
involved, it is very easy to make mistakes. I am not a
mathematician. But when somebody described CFB-128 mode and asked if
it were cryptographically secure (as compared to CFB-8 mode), it was
immediately obvious to me that if TwoFish was a strong cipher (i.e.,
one whose output showed no statistical relationship to its input),
every byte resulting from encrypting a unique value would be
statistically unpredictable and thus suitable for use as keypad
material.
If you want the minimum of math, and a maximum of applications, I suggest
Bruce Schneir's "Applied Cryptography" (hit "Links" in the left margin).
Unfortunately, you'll need another book to learn about stream ciphers
and public key encryption -- Bruce is a block cipher man, and it shows.
I will certainly accept suggestions, with the understanding that it cannot
be a highly mathematical textbook -- I suspect that most people reading
this page are working engineers looking for a solution, rather than
academics looking for math problems. In other words, the math is necessary,
but should not be the whole point of the book.
|