Encryption software under the hood

UniCipher's encryption software features their homemade encryption algorithm to safe-guard your files and private data. In this essay we'll explore this encryption algorithm and reach the conclusion that this algorithm is far from secure. I'll also show a successful attack against this algorithm. By the end of this essay I hope you will realize this is a definite case of a "snake oil" - the product is sold without consideration of its quality or its ability to fulfill its vendor's claims. The majority of this essay will be technical, and will deal with cryptography, yet I hope it will at least pass the following point: good cryptography is not easy to build. It is, in fact, very difficult. So whenever you have a choice between a well-known encrypting algorithm, and a new-home-made-super-secret algorithm, be sure to choose the former.

C.R.O.E.S. is the name of UniCiphers's encryption algorithm.
This is the description you'll get from the help file:

	C.R.O.E.S. is a stream cipher. The key array is fed into the shift register
	and serves as a rotor. A second, larger rotor is generated from the key 
	prior to encryption and is also fed into the shift register during encryption.
	For those of you who study encryption, these rotors would be described  to you
	as s-boxes  that change with each use.
	
	Excellent results are achievable by using passwords of moderate length 
	(8 characters and up). The values obtained from both rotors compound 
	into the shift register  with single byte of data encrypted. Both rotors 
	are changed byte by byte with each operation  and  fed back into the shift 
	register.  The methods used to permute the rotor, the key  and  the shift 
	register can produce patternless cipher with very short keys.  Encryption 
	can further be enhanced by activating CBC mode, whereby the key (rotor 1)
	is permuted by previously encrypted bytes of data. No initialization 
	vector is created. C.R.O.E.S. is an algorithm developed in 1995 by 
	NaillonWorks programming.

Help files will get us nowhere. A plaintext encrypted with a simple xor encryption and with the DES encryption will probably produce the same 'garbage-look' in the ciphertext. In order to know how good the algorithm really is, we'll have to reverse engineer (the RC4 algortihm, for example, was published after it was reverse engineered).
I wont bother you with softice breakpoints and IDA disassembly. It takes only a few hours of tracing to reach a good understanding of the algorithm. This is it's real description:

1. C.R.O.E.S. encryption algorithm is a stream cipher.

2. A key stream of 500 bytes are produced each time. The first 500 keystream bytes are generated from the password. The rest of the key stream (byte 501 and on) is generated from
preceding keystream bytes.

3. During encryption/decryption there's a running key byte, hereonafter refered as keyValue, which is the accumelative sum of all keystream bytes until the current position (modulo 0xff).
keyValue is updated after each encryption step.

4. To encrypt, the current keyValue is added to the plaintext byte, and this results in the current ciphertext byte.

5. To decrypt, the current keyValue is subtracted from the ciphertext byte, and this results in the plaintext byte.

Points 4 and 5 are the engine of the algorithm, while points 2 and 3 are its heart, and we'll explore them closely.

The keystream is a 500bytes (dwords acctually, but they are treated as unsigned chars) array. The first keystream block is initialized with the password:

	// at start up
	seed = sum of password chars	// seed0
	key = seed
	i = 0
	// build 500bytes keystream loop
	do
		// update seed
		seed *= 0x80884505
		seed++

		// update key
		key = key + ((key*seed) >> 0x20) + password[i % passwordLength]
		// make sure key is byte size
		key = key % 0xff
		// put it in the keystream buffer
		K[i] = key

		// next byte
		i++
	loop until i == 500

For the record, the password string is rotated whenever (i % passwordLength) has reached the last char of the password.
When you wish to encrypt a plaintext document bigger than 500bytes you'll have to generate new keys in the stream. All the keystream blocks (other than the first one) are generated after each encryption step, so keyValue is updated after each byte encryption:

	// KeyBlock₀ is described above
	// KeyBlock_n is the current 500bytes key stream block used for the encryption
	KeyBlock_n+1[i] = KeyBlock_n[i]+keyValue

Basically, encryption is performed on a byte level. A plaintext byte is fetched from the plaintext. Its value is added to the current keyValue, and the result byte (hence and 0xff) is the ciphertext byte. After each encryption step, keyValue is updated for both the next encryption step, and the next keystream block.

	// init keyValue with the last key from the first 500bytes keystream block
	keyValue = KeyBlock₀[499]
	i = 0
	n = 0
	// encryption loop
	do
		// encryption
		C[i] = (P[i] + keyValue) & 0xff

		// update keyValue for next encryption step
		keyValue = keyValue + KeyBlock_n[i%500]
		keyValue = keyValue % 0xff

		// update next keystream block as well
		KeyBlock_n+1[i%500] = keyValue

		i++
		// if we're out of keys, move to the next keystream block
		if i%500 == 0
			n++
		end if	
	loop until no more plaintext bytes to encrypt

Decryption is much like encryption. As this is a stream cipher, the keystream and keyValue are generated like they would in encryption, only that keyValue is subtracted from the ciphertext byte to give the original plaintext byte:

	// init keyValue with the last key from the first 500bytes keystream block
	keyValue = KeyBlock₀[499]
	i = 0
	n = 0
	// decryption loop
	do
		// decryption
		P[i] = (C[i] - keyValue) & 0xff

		// update keyValue for next encryption step
		keyValue = keyValue + KeyBlock_n[i%500]
		keyValue = keyValue % 0xff

		// update next keystream block as well
		KeyBlock_n+1[i%500] = keyValue

		i++
		// if we're out of keys, move to the next keystream block
		if i%500 == 0
			n++
		end if	
	loop until no more ciphertext bytes to decrypt

Lets keep things simple and work with a small plaintext document (much smaller than 500 bytes) so we'll only have to work with the first keystream block. This means we can cut the 'update next keystream block' part, and we are left with the following algorithm:

	// Build KeyBlock₀ using the password as described above
	buildKeyBlock₀()

	// init keyValue with the last key from the first 500bytes keystream block
	keyValue = KeyBlock₀[499]
	i = 0
	// encryption loop - suppose plaintext document size is smaller than 500 bytes
	do
		// encryption
		C[i] = (P[i] + keyValue) & 0xff

		// update keyValue for next encryption step
		keyValue = keyValue + KeyBlock₀[i]
		keyValue = keyValue % 0xff

		i++
	loop until no more plaintext bytes to encrypt

Bare in mind the keystream (KeyBlock₀) depends only on the password. keyValue depends only on the previous keyValue, and the keystream bytes.
Now suppose we have a few known plaintext bytes (this is well in reach when you encrypt files like EXEs and ZIPs that have a known header with constant bytes). With every plaintext-ciphertext pair we can deduct the keyValue used for the encryption. The beautiful thing is that when we have a sequence of known plaintext bytes (and hence a sequence of known keyValue bytes), we can reconstruct a sequence of keystream bytes:

	Known plaintext bytes:   P₀ , P₁ , P₂ , P₃ , P₄ , P₅
	Known ciphertext bytes:  C₀ , C₁ , C₂ , C₃ , C₄ , C₅
	Deducted keyValues:		 keyValue_x = C_x - P_x
		keyValue₀ , keyValue₁ , keyValue₂ , keyValue₃ , keyValue₄ , keyValue₅
	During encryption we have:	 keyValue_x+1 = keyValue_x + K_x
	This means from every consecutive keyValue bytes, we can deduct the key of the keystream:
		 		 K_x = keyValue_x+1 - keyValue_x
			K₀ , K₁ , K₂ , K₃ , K₄

O.k. we had a sequence of known plaintext bytes from the file (the bytes are located in the first 500 bytes block in the file), and we have built a sequence of keystream bytes. Now lets take another look at how the stream of keys is generated from the password:

	// at start up
	seed = sum of password chars	// seed0
	key = seed
	i = 0
	// build 500bytes keystream loop
	do
		// update seed
		seed *= 0x80884505
		seed++

		// update key
		key = key + ((key*seed) >> 0x20) + password[i % passwordLength]
		// make sure key is byte size
		key = key % 0xff
		// put it in the keystream buffer
		K[i] = key

		// next byte
		i++
	loop until i == 500

Analysis:

We start with seed0 = 0. For every pair of the key sequence we deducted, we reconstruct the seed value (it depends on seed0 and the file position), and calculate the password char. If we get a typeable char, we try the next key-pair, else we try another seed0 since we know this cant be a correct password.
Here is the attack formally:

Bottom line, from every 2 keys we can deduct, we "guess" seed0, and calculate the password char. If the password char is not typeable, we try a different seed0. Now if we have a stream of keys, and the each pair producess a printable char - the odds are we just figured the encryption password. If we process seed0 different from the correct seed0, the test is very likely to give an un-typeable password char in one of the key pairs.

The PE file MS-DOS header has a field of 20bytes long buffer marked as reserved - i.e. all null. This means that for every exe you encrypt with this stream algorithm you know a sequence of 20bytes plaintext! Thats a lot!
A 19chars long password can be recovered, let alone the "moderate length (8 characters and up)" they said in the help file. If you set maximum seed0 value to 2500 (the sum of 20 'z') the full attack takes only a couple of seconds.
Here's a sample output of the attack program. The password I used was "phySics".

The password is rotated, but thats expected and it doesnt take long to realize the correct password.

Sourcecode: I've included the source code of my crypt-analysis attack to the UniCipher algorithm. You are welcomed to experiment with it. You can find it here.

Snake Oil FAQ: Please have a look at the snake-oil faq ("Snake Oil Warning Signs: Encryption Software to Avoid") - it was written in 1996. Its surprising to see how much of the warning signs written there are well shown in UniCipher case. The sad truth is that there are more such cases of selling poor cryptography. I personally encourage everyone to study cryptography and try to invent algorithms, its great fun, but please - dont sell such things.

Disclaimer: This essay and the attack program source code is provided for educational purposes only. Any use, mis-use or illegal activity is the sole responsibility of the reader.

Take care,
The+Q

(you can reach me at qster@oldleetos.net)