[Essay00 - even the steel chain has it's weak link] Published by tsehp Sept 2000.

A $19 shareware program with an extremely messy interface, limited use (there exists better freeware tools), but an interesting protection.

In this essay, I will not give a complete walkthrough of the program with tons of addresses that are useless to people with a different version. I do not know whether older versions use the same scheme, nor if future versions will.

Furthermore, I will only discuss a specific part of the protection, namely the encrypted code block. If you are not able to find this encrypted code block, or the other parts of the protection, without help, I suspect that this essay might be over your head.

IDA 4.0.4 - http://www.datarescue.com
GNU GCC, djgpp distribution
Windows calculator, of course in scientific mode
Hexit - http://mklasson.cjb.net
I didn't use softice, as I have only been working on dead listings.

You may choose your own tools. Especially on the topic of hex editors, a Jihad seems to be raging. Who cares, as long as your editor of choice has the necessary functionality. If you're setting out to make a keygen, I recommend you to fetch a copy of softice. Hm, this shouldn't really be necessary to write here, as I believe most of you will have it installed [running] already.

It would be fair to say I was brought in on this project from the outside, by my good friend KaKTuZ, who had a little trouble with the bruteforcing part of the protection. He had done all the bothersome work of figuring out the program logic, ripping out code and tables, and converting assembly dump to C source code. This made my work much easier. Had I been hadnling this protection myself, I would have used softice to help identify the interesting parts of the code, rather than focusing 100% on the bruteforcing.

Before reading on, I advise you to take a look at the target yourself, as this will give the deepest satisfaction when you make the final breakthrough. Of course I cannot stop you from reading on and spoiling all the fun. I do suggest you to have a look first, though, as it will make the rest of the essay so much clearer (if the target still uses the same protection ;-).

Good, you have returned (or perhaps you never left). You should have reached the following conclusion (this is only part of it, as I only concentrate on part of the protection). The name/serial combination is used to form a 5-byte decryption key. This key is, oddly enough, used to decrypt part of the code (a 220byte area in my case).

After the code has been decrypted, an industry-standard CRC32 checksum is computed, and used to verify that the code has indeed been correctly decrypted (meaning the user entered a valid name/serial combination). Note, however, that a CRC32 isn't 100% reliable. During the tests of my bruteforce keymaker, at least two wrong keys were found that produced a correct CRC32. The chances of this happening by the user entering a incorrect name/serial combination are minimal, though.

If you have just come back to the essay after failing at writing a bruteforcer, I must congratulate you on your interest in the matter, and your desire to stand on your own legs. You should have realized that bruteforcing five bytes isn't an easy task, with the amount of operations that need to be done here. My semi-optimized code would probably take about 50-60 days to search the entire 2^40 keyspace that five bytes give.

What we need to do is reducing the keyspace. One way is to do some advanced mathematical stuff involving the CRC32 algorithm, but I am not at a sufficient mathematical level to do that. I will instead present an easier way: known-plaintext attack.

Of course we don't really have any known plaintext (unless of course you have access to a valid user/serial combination), but we can make some qualified guesses. My first guesses were that we'd have either some standard C prologue code (push ebp / mov ebp, esp), or a ret at he end of the code block. This turned out to be a wrong assumption.

Had I taken a closer look at the target, rather than spending my waking hours on trying to optimized the CRC32 function and the decryption function, I would probably have thought of the [now obvious] bytes I should search for in the known-plaintext approach. However, I had gotten my mind locked too tightly on the idea that if I optimized enough, I wouldn't have to do a cryptographical attack, and it was the good duelist that opened my eyes so I gave the known-plaintext another shot...

If you look at winsin.exe in an IDA disassembly, you will notice that the encrypted code is only referenced directly from one location, and at this location you will see a jump the encrypted code. Just between the jump and the encrypted code, there are two dword values. The most interesting of those two dword values is 0xFCFDFEFF. Why? Because it appears at another location in the program. Right after the 220 bytes of encrypted code. Not that the other dword isn't interesting... it's the CRC32 value of the decrypted code.

Stop reading now. Think a bit on the information I have given you above. Try to visualize the implications this brings...

If you have found the key by now, I congratulate you. Otherwise, read on and smile at the beauty of all this.

It seems interesting with those two DWORD values, doesn't it? After all, if you view them as instructions, you will see that it's garbage. And the dword value is right after the encrypted code block. Do you suppose the decrypted code block has a ret somewhere in it? Or perhaps... yes, you should have come to the conclusion that the decrypted code will jump over the dword value.

What can we deduct from this? The last few bytes of the code block must be a jump opcode. This is speculation, yes, but a rather good guess.

I hope you're not asking yourself "now why is this so great?". If so, you should have a big glass of whiskey and relax. Get your mind in the special non-focused-but-focused-anyhow mode, and think. Why of course, it means we have made two bytes of the 5byte key constant - removed 16 bits from the 40 bit key - now we don't have a 2^40 keyspace, but only a 2^24 keyspace! With my semi-optimized decryption and table-driven CRC32 functions, my athlon700 is able to search this space (0x1000000 iterations) in about 62 seconds - which is a lot shorter time than the 2^40 range takes ;-). And I'll bet you a box of sardines that you'll find the correct key before you've searched the entire space.

Now, assuming that my previous assumptions are true :-), what jump is used? Most like the short one - EB xx. We guess that 4 bytes are skipped (the magic dword), which would require an EB 04 jump opcode. After all, if you disassemble from right after the second magic dword, sensible code is revealed.

The last two bytes of the [encrypted] code are E3 followed by D1. We assume those two bytes should be EB 04 when decrypted. Luckily, the XOR function is very easy to reverse:

E3 XOR ?? = EB <=> ?? = E3 XOR EB <=> ?? = 08

D1 XOR ?? = 04 <=> ?? = D1 XOR 04 <=> ?? = D5

This means that, when decrypting the last two bytes, the indices in "serial" that are used for decryption, must be 08 D5. Since the serial is modified during the decryption loop, we also have to reverse the decryption function, to find what values we should place in the key before starting.

First, we should know which key positions are used to decrypt the last two bytes. We know the key is five bytes. The decryption algorithm starts by using the first entry in the key (key[0]), then increments the key index. When the fifth key index (key[4]) has been used, the first key index is reused, and so on. Thus, the coherence between code index i and table index j can be defined as `j = i modulus 5'.

Since we're working 0-based, second-last entry is 218 and last entry is 219. 218 % 5 = 3, 219 % 5 = 4 (duh!). This could hardly be better! It means the last two bytes of the serial can be kept constant, while we're iterating through the three first. To remove the number of compares (and thus slow conditional jumps) in the code, you can treat the first four bytes of the code as an integer, with a little careful handling. I will leave that up to you. Now, how to determine what values to put in the last two key positions? After all, the key loop adds one to key[j] after using the value of key[j] for the XOR operation.

Well. We are proceeding mathematically here. You could easily make a little "first-part" bruteforcer to run through the 65536 possible combinations to see which two bytes would yield the right result after being run through the decryption function. But that would not be very sophisticated approach.

Let's think. When reaching index 218 in the encrypted code, key[3] must have been incremented (218/5) = 43 times. Remember to round down and not up :-). Same result for key[4]. Thus,

'key[3] = 0x08 -
43'

and 'key[4] = 0xD5 - 43'. What a pity that the magic subtract number wasn't 42...

As this is my first bruteforcer, I made a number of mistakes on this journery. The first mistake was to concentrate so much on optimization, rather than subject the protection to a cryptoanalytic assault (a field where I am too `green' for my liking.) As for the known-plaintext (or `wild-guess') approach, I had a couple of unsuccessful tries at first, with the "C prologue" and "it must end with a ret" assumptions. As for "interesting side notes", KaKTuZ had in his original version inserted the encrypted bytes as an array of dwords rather than an array of bytes. Probably due to late night cracking, eh my dear KaKTuZ? :-). This turned out to be a rather good thing: the versions of the bruteforcer working with the array of dwords was acutally a lot faster than the byte-array ones. Another thing that amazed me a lot was that, for this program, GCC turned out to provide faster code than Micro$oft's Visual C++, probably because GCC dares to perform non-downward compatible optimizations when you use the "-march" switch. This for example means that it uses the CMOV instruction rather than construction slow conditonal jump code.

And thus, this essay is finished. If you want to make a keygen for this application, I wish you luck with reversing the necesarry parts of the application. If you understood this essay, programmed a bruteforcer, and found the 5byte key, this shouldn't be too hard for you. If not...well, I suggest you to either try on, or try [on] another target.