Interactive Disassembler Pro v3.7 Demo (II)
(How to load the previous databases)


by Quine
(30 October 1997)


advanced
Advanced cracking series
Courtesy of Fravia's page of reverse engineering

Well, this is SERIOUS ADVANCED CRACKING once more. Once more a fundamental tool of the trade (IDA). Once more a function reenabling work (the loading of the previous databases, i.e. one of the most important crippled functions of the crippled version: you do not want to start everything anew every time you use IDA, do you?). Once more something we all need: new knowledge that you can at once apply to other targets and reverse engineering endeavours.
Quine is getting us used to this kind of well-crafted essays. I'm afraid newbyes will not understand much here, please read the 'basic' essays first, and peruse the other +HCU page (where you'll find a lot of help for newbyes) before delving in this.
For all other Fravia: some considerations.
1) Read it twice (at least) what you'll find daunting at the first quick glance will be understandable at the second and easy to follow at the third pass. In my opinion if you read this at least a couple of times you will not even need to have a crippled IDA target on your HD to follow the path;
2) Note how all this is built and buids on the contributes of others. Quine is an extraordinary Fravia and a very good didact, witty and deep, whose mighty essays never forget (as unfortunately some other do :-( to quote all contributions to the crumb trail he has followed and left behind;
3) I believe it is time to tackle a VITAL sector: compiler specifities.
Actually that what we are doing is not very professional. This has some advantages (only non programmers can see the non-obvious) and some disadvantages (we find relocation tables and we don't even know that they are called vtables :-)
Time to correct the disadvantages keeping the advantages. Let's begin to work on 'compiler specific differences' or whatever you want to call that what we should understand perfectly if we really want to migrate from 'good dilettante' to 'elite' reverse engineers.

This said, here you have a real reverse engineering essay in all its glory... enjoy!

Target:

Interactive Disassembler Pro v3.7 Demo
We're going to enable the loading of saved databases.


Source:
                                
http://www.datarescue.com/ida.htm (homepage)
http://www.datarescue.com/ida/demo37.zip (9,884,100 bytes)
 or, rather, ftp://195.0.122.253/pub/ida/375evl.zip (thank Ed!) 
read on for more web resources :-)

Tools used:

IDA Pro 3.7 itself  (nothing comes close to comparing to it)
SoftICE for NT v3.2 (that new video driver is amazing)
(if you can't get Pinnacle's version to install, change the byte at
2B6C in setup.ins from 2B to B8)
HexWorkshop32 (for patching - any hex editor will do)

In this essay, I will assume that you are familiar with my previous
essay here on Fravia+'s site about cracking the file size limit on the
IDA Pro demo.  I am also assuming that you have ida.wll loaded into
IDA Pro.

Ok, the first thing to do is to find the place where it puts up the
message that it can't load old databases.  Our previous work on the
file size limit suggests that this message will be in ida.hlp, which
it is.  Using the method I outlined in that article, compute the index
number for the help message and use IDA's search for immediate
function to find the place where 36Eh is moved into a register or
pushed on the stack.  Sure enough, we find it early on at 403520.
This routine is directly called anywhere in the program, but IDA very
helpfully tells us that the value 403520 is referenced at 403D3F
(actually it will give you the starting address of the function it
occurs in and an offset, but I will usually translate that into an
address in this article).  Follow the hyperlink and we find ourselves
in a familiar place:  just past the explicit file size check and
before the rather vulgar expiration date check (now would be a good
time to patch it before you forget-- Jan 1st isn't that far away).
403520 is moved into ebx, then we get the expiration check, then call
ebx.  But look, depending on the value of esi at 403CDE, either 403520
or 40352C is moved into ebx.  Let's jump back to 40352C and see what's
going on there.

40352C calls two functions, one indirectly, and returns.  A little
thought will tell you that this routine must be called when a new file
(i.e., a file to be disassembled rather than a saved database) is
loaded into IDA.  Why?  Well, we ought to assume that the expiration
date is always checked and therefore, since execution follows
immediately to the call ebx and we don't get the message about loading
old databases when we load new files (obviously), it must call 40352C
(this can be verified in SoftICE if you feel like it).  Therefore, the
value of esi at 403CDE must indicate whether or not we've got a new
file or a saved database.

This, unfortunately, is the point at which pedagogy must depart from
actual practice, because explaining everything I tried at this point
would take far too long and furthermore I can't even remember
everything I did that might be significant.  Instead, I will attempt a
rational reconstruction of the process and try to cover the major
points of interest.  Remember, though, that this crack required a lot
of tedious pouring through code and slowing but surely putting
together a rather detailed picture of everything that IDA does between
starting up and displaying the message saying that it can't load old
databases.  I could not have hoped to put such a picture of IDA
together without IDA itself.  The commenting and renaming features and
all the other features that make it a truly interactive disassembler
(unlike w32dasm which is basically a text viewer) are what saved me
hours of scratching done notes and trying to remember where I had been
and what functions did what.

Enough of that and on with the crack.  There are two reasonable things
to do here.  One is to trace back the value of esi and see how it gets
set.  The other is to simply force the code to jump to 40352C no
matter what.  Let's try the second, but it sounds like more fun.  Fire
up the text editor and change the byte at file offset 3340h from 20h
to 2Ch.  Start up IDA, load a saved database and see what happens.
Well, it happens is that it crashes at location 44D2BB trying to
access memory at 00000064.  That's no good.  No memory access there
for humble console applications.  Load up SoftICE and try it again.
Now when it crashes, we'll be able to look at what's going on.  Turns
out that the function in which it crashes got a null pointer from
sonewhere, because the offending instruction is mov edi, [eax+64h] and
eax is 0.  This is bad news for us because there are any number of
ways that that pointer could be set.  Also, patching the code to jump
to 40352C could have introduced further problems.  This is a tough
position to be in when cracking a target.  So, let's sit back and
evaluate the situation and try to gather everything we know about
IDA's start up code so far.

First, how perceptive where you when you loaded your old database?
I'll tell you that I (stupidly) wasn't very perceptive at all for
quite some time.  Part of the problem is that I was using as a test
database one that I had created from the tiny hello.exe sample program
included in IDA.  The fact that this program is small means that it
disassembles quickly and produces a small database (which is why I
chose it).  With a bigger database what I'm about to point out is much
more obvious.  Look at the screen when you load the old database.  IDA
allocates memory for the database, it unpacks the file, it compiles
the default macro (ida.idc) and at least begins to execute it.
Furthermore, it does this whether you've patched the program or not.
Only after having done all this does it display the mesg box/crash.
What does this tell us?  Well, it tells that Mr. Guilfanov did not
remove large sections of the code that have to do with loading saved
files and furthermore that that code actually executes.  However,
there's still something that needs to be done to get the database
loaded.  Here's the crucial point:  INSTEAD OF DOING THAT EXTRA THING,
IDA CALLS THE ROUTINE THAT DISPLAYS THE MSG BOX.  In other words, the
call to 403520 needs to be replaced with a call to a function that
works the missing magic.  I wouldn't expect anyone to figure out this
last point exactly without more work, because it certainly took me
forever to figure it out, but it does seem rather obvious in
hindsight.  Of course, we still don;t know what the missing magic is
or what function does it.  We do know that 40352C doesn't do it.
Also, having 403520 simply return instantly doesn't work either.

Now, let's get back into SoftICE and learn a little more about our
crashing patched program.  When the crash puts you into SoftICE, we're
going to walk the stack back as far we can go and find out where that
null pointer comes from (see my previous article on IDA for a
description of my particular stack walking technique).

44D2BB is sub_44D2AC which was called at 417CC4 in sub_417CA4.  eax
came into sub_44D2AC live and it was assigned a value just before the
call in sub_417CA4 with the following command: mov eax, [ebx+0Eh].
Great.  Another pointer.  What are these pointers that have immediate
values added to them?  

Brief digression about the importance of understanding the compiler
When I ask this question, I am asking "What is the program's author
doing here that causes the compiler to generate such commands?".  This
is THE SINGLE MOST IMPORTANT QUESTION a reverse engineer can ask
him/herself when dealing with compiled code (which is almost always
unless you're in a library routine where you shouldn't be in the first
place).  No person wrote the code you are looking at.  Who would write
the following?
0041809A mov     edi, edx
0041809C mov     ebx, eax
0041809E mov     edx, edi
004180A0 mov     eax, ebx
Only an idiot or a compiler.  This is taken straight out of ida.wll
which was compiled by Borland C++ 5.01 with the optimization level set
to maximize speed of execution (I know because I have the
makefile---read on :-).  Compilers just have their ways of doing
things and it is very helpful to figure out just what those ways are.
End digression.

They are pointers into structures when the immediate value is not an
address offset.  Here's the situation.  You pass a pointer or a
reference as an argument to a function and that function is then able
to get at the members of the structure by adding values to the pointer
that was passed.  Keep in mind, also, that the same goes for C++
classes (but they have added complexities that I'll get into in a
moment).  So, in the struct/class based at [ebx] in sub_417CA4 there
is a member at offset 0Eh which is itself a pointer to a struct/class
which in turn has another pointer at offset 64h and that pointer is
00000000h and we don't want it to be.  On with the tracing!

sub_417CA4 is called at 417B89, which is in sub_417B50.  We see that
again in this function ebx is used to hold the pointer to the struct.
(Since writing my last article, i have read that most Win32 compilers
use ebx, edi, and esi as holders for enregistered local variables).
Anyway, it came in through eax from a call at 41E941 in sub_41E934.
Here's where we get our first break.  eax is assigned the address of
export _637.  Now, let's dump the memory at _637 and see what we get.
What we get are mostly zeros and FFFFh at one point.  And of course at
_637+0Eh we get zeros.  However, significant progress has been made.
We can safely assume that the struct/class at _637, which was declared
at global scope in the program or else it wouldn't be able to be
executed, isn't filled porperly.  Furthermore, this must be a fairly
significant struct or else it wouldn't be exported.  Before we get too
excited, though, let's continue tracing backwards as far as we can.

There are quite a few functions that you will go through before you
get back to the function that has the file size check, the expiration
check, etc.  I won't go through all the details here, but it is worth
looking at one function call in particular.  sub_422AF8 calls
sub_403300 at 422B23, but it does it with the following instruction:
call dword ptr [esi+2Ch].  This is interesting.  I trust that everyone
has read Fravia+'s fine article on call relocation tables.  But now we
must ask what these call relocation tables are.  Are they a structure
or array of pointers to functions declared explicitly by the
programmer?  Almost always not.  In fact, the average programmer
probably only has the vaguest idea that they exist at all.  They are
an invention of the compiler used to deal with virtual function calls
in C++ and are commonly referred to as vtables.  I am still working on
the details of how they are implemented (it differs from compiler to
compiler and the optimization settings also affect it), but notice
that the pointer to the beginning of the vtable is gotten in a fairly
roundabout way.  You will see these offsets (7D9, 7DD) in other places
in the program and if you get an indirect call right after that,
you'll be able to know which vtable you are dealing with.  This
particular vtable starts at 489191.  Another thing you know is that
all the functions in a particular vtable are member functions of the
same class and are therefore related.

Ok, so you end up in sub_403934, which is the big routine that has the
date check, etc. at loc_403E2E, which calls sub_408100.  This tells us
quite a lot.
First, the call to 40352C does not crash when we're loading a saved
file.  Second, an obvious test with SoftICE will tell us that
sub_408100 is only called when a saved file is loaded.  That means
that at 403E23 esi==0 if and only if we're loading a database.
Furthermore, sub_408044 isn't called at 403E27 when we are loading a
database.  Finally, at 403E33, the load new file and the load database
paths meet up.  Something has happened in the routines that handle
loading a new file, that hasn't happened in the routines that load a
saved database and that something has to do with the struct/class at
export _637.

So, let's start up IDA, load a new file, let it run for a minute and
then break in with SoftICE and see what's going on at _637.  With
ida.wll loaded into IDA, this is what I get (remember, ida.wll is
relocated to BB0000 on my machine):

00C47A74 09 00 00 FF 29 00 90 AE-C6 00 00 00 00 00 28 E6
00C47A84 C7 00 90 AE C6 00 E0 B8-C6 00 B0 B8 C6 00 00 00

Good, we've got some pointers in here.  In particular we've got one at
_637+0Eh.  Dumping the memory at [_637+0Eh] doesn't tell us much (try
it), so let's look at some of these other pointers:

00C6AE90: 00401000  00489000  FF001CF5  FF001CF5
00C6AEA0: 00000000  01000203  00010000  FFFFFFFF

00C6B8E0: 00489000  004A1000  FF001CF6  FF001CF6
00C6B8F0: 00000000  01000203  00020000  FFFFFFFF

00C6B8B0: 004A118C  004A12DC  FF001CF7  FF001CF6
00C6B8C0: 00000000  01000203  00030000  FFFFFFFF

Now we're getting somewhere.  These ought to look familiar because the
first two dwords at each pointer are the begin and end addresses of
the various segments in ida.wll!  So, it looks like the struct at _637
holds information about all the segments in the open file.  No wonder
the program couldn't get anywhere without this information.  What we
need to do now is figure out how to get this information out of the
saved database and into the struct at _637 before getting to the call
to sub_408100.  Is this our missing magic that 403520 was supposed to
do?

Well, this is where I got stuck for a long time.  I pretty sure I knew
what had to be done, but had no idea how to do it.  Furthermore, I
wasn't sure that this was the only thing that had to be done.  Where
there other structures that needed to be filled in?  I wouldn't know
until I figured out how to get the segment structure filled in.  What
saved me is what some might consider cheating, because it involves
having access to way more information than you usually do when
reversing.  Here's the story.  On IDA's US web site
(www.datarescue.com) there is a mention of an SDK (Software
Developer's Kit) for IDA that enables you to write processor modules
for IDA (see my first essay on IDA).  This sounded very helpful, but
it wasn't available for download.  They said to e-mail them for
information on it.  So I did.  This was the response:

It is free to registered users of IDA Pro. Have you registered your
copy ?

Well, no, I was planning to crack my copy instead.  I went out in
search of more information on IDA.  Maybe there was some out of the
way web site containing more info.  There was and still is.  IDA is
written by a brilliant Russian man named Ilfak Guilfanov and Mr.
Guilfanov has his own IDA web site on a server in Russia (
http://www.unibest.ru/~ig/index.html ).  Go there now and download
everything you can, because it has, among other things, the IDA SDK.
The IDA SDK has very well commented C++ header files for most of the
program.  This was an unimaginable boon.  Even better, it has a
Borland lib file for accessing the exported functions in ida.wll.
This lib file conatins the real names of all those 500+
functions/global variables.  To get at this information, you need a
program that dumps out the contents of Borland lib files (which are a
proprietary format).  tdump.exe, which is included in most Borland
development products, does it, and you can easily find that or a
freeware equivalent on the web.  Now you can go into IDA and start
renaming the exports to their real names instead of those meaningless
numbers.  Between the headers and the lib file I had more than enough
to finish the job.

Sure enough, export _637 is called _segs (this made me feel pretty
good).  In the header files you can find a complete desription of the
class object that resides there (it's an area control block
(areacb_t)).  Furthermore, looking through the segment.hpp and
area.hpp headers you'll see some very interesting functions, including
the following:

// Link area control block to Btree. Allocate cache, etc.
// Btree should contain information about the specified areas.
// After calling this function you may work with areas.
.. some comments deleted ...
  int link(const char *file,		// Access to existing areas
  	   const char *name,
  	   int useva,
  	   int infosize);
  	   
// Initializa work with segments
// Called by the kernel itself.
//	file - name of input file

void	initSegment	(const char *file);

Btree is the database.  Calling one of these two functions seemed like
the thing to do.  However, neither of them are exported by ida.wll, so
we've got to find them.  Finding them took a while, but I realized an
interested fact about executable files in the course of doing it.
What determines where a particular function is put inside of an
exe/dll/etc.?  When a programmer compiles a project, each source file
is compiled into .obj files, which contain the machine code to be
processed by the linker.  The linker then combines all the obj files
into the finished product, changing the addresses appropriately so
that everything works out.  What does this mean?  It means that all
functions in the same source file will be adjacent to one another.
Now,of course, different programmers arrange their source files in
different ways, but we still know that adjacent functions tend to be
conceptually linked in some way.  Of course, when we have the header
files, we have a very good idea where to look for functions.

To make a long story short, here's how I found the link and
initsegment functions.  First, we know in general where to look.
Second, we know what parameters each function takes and that they,
like just about every function in ida.wll, were compiled with the
__fastcall option (see the appendix to my last essay).  Borland
implements __fastcall in the following way:

arg1:  eax
arg2:  edx
arg3:  ecx
arg4:  last thing pushed on stack
arg5:  second to last pushed
etc.

I looked for link first because it has more arguments and ought to be
easier to find.  Well, I found what I'm pretty sure is it at
sub_4399AC, but more importantly, in the course of looking, I found
the right function which is initSegment (with a name like that and
given our problem, you may be wondering how I could have thought that
any other function could possibly be the right one---well, it was late
and I'd been looking at this program for days and managed to get
myself to believe all sorts of crazy things about it).  initSegment is
at sub_456D70.  The first thing it does is call areacb_t::create to
create the _segs area control block.  It then calls another function
which in turn calls link.

Ok, what we need to try now is to rewrite the function at sub_403520
to call initSegment.  However, we need to pass it the name of the EXE
file that was saved in the database.  However, eax comes in sub_403520
with a pointer to the name of DATABASE file.  So, how do we get a
pointer to the right filename?  Well, in the course of studying IDA, I
discovered that there is a very easy way to do this.  Look at this
code snippet which is straight out of my ida.wll database:

00403DB0    mov     eax, offset _RootNode ; idb specific
00403DB5    call    @netnode@value$xqqrv ; netnode::value(void)
00403DBA    push    eax         ; pointer to exe filename from dbase
00403DBB    push    244h        ; Database for file '%s' is loaded.
00403DC0    call    @Message$qie    ; Message(int,...)
00403DC5    add     esp, 8          ; end idb specific

To get the filename pointer into eax, all we have to do is call
netnode::value and pass it the address of _RootNode (4998B0).  So,
sub_403520 needs to be this:

mov  eax, 4998B0h
call 425F5C ; netnode::value
call 456D70 ; initSegment
ret

Unfortunately, we've got two problems.  (1) This code takes 10h bytes
and we've only got 0Fh in the area of sub_403520.  (2)  We're
referencing a global variable in a program that is inevitably going to
be relocated.  That means that _RootNode is never actually going to be
at 4998B0.  Windows deals with this little issue in the .reloc section
of PE files.  This section contains all the addresses of places in the
program that make absolute reference to an address (note the most jmp
and call instructions use relative offsets and are therefore not
affected by reloctaion).

The first problem is easy to get around.  Take a look at PNA's essay
on adding a save function to the demo of w32dasm.  We'll just stick
the code at the end of the CODE segment where there are about 190h
free bytes.  The second problem involves patching the relocation
table.  I won't describe the details of this table, because they are
somewhat hairy and you can find many good descriptions other places.
The best I have seen is in the ESSENTIAL and INVALUABLE book "Windows
95 System Programming Secrets" by Matt Pietrek (a NuMega employee no
less), but descriptions of the PE file format are a dime a dozen on
the web (I think there is even one on the site).

Let's start patching.  First thing is change the value assigned to ebx
at 403D3F to point to our new routine.  We're going to put the routine
at 488875, right after the dll import jump stubs.  So, patch
403D3F  mov     ebx, offset loc_403520
to
403D3F BB 75 88 48 00   mov  ebx, offset loc_488875

Notice that we don't have to worry about relocations here because
there already was an absolute address reference at the location where
we've stuck the new one, so the loader already knows to fix it up.
Now, let's insert our new routine:

488875 B8 B0 98 49 00  mov  eax, 4998B0h
48887A E8 DD D6 F9 FF  call 425F5C ; netnode::value
48887F E8 EC E4 FC FF  call 456D70 ; initSegment
488884 C3              ret

The last thing left to do is patch the relocation table.  We need the
dword at 488876 to be adjusted.  The necessary patch is to change the
two bytes at offset 96DA4 in ida.wll from 00 00 to 76 38.  I'll leave
it as an exercise to figure out exactly how this works if you don't
already know. 

Here a small correction: the relocation table patch must be applied to locations 9EDA4 and 9EDA5, not 96DA4 and 96DA5, as Quine says. It may be a Tipo. greets as usual zeezee
Now, the moment of truth. Run it. Load a database. It works. That's it. I really didn't think it would work, to be honest. I assumed that all the other global areacb_t's (_funcs, etc.) would have to be initialized also. That, however, gets done eventually in the call to sub_408100. Could I have done it without the header files and the export names? Who knows. If I could have, it's not entirely clear that I wouldn't have given up in frustration after weeks of trying before I ever got it. I was glad to know that I was at least on the right track. Demo function enabling is what I suppose that I find most enjoyable in cracking and I have a word of advice to demo writers: TAKE AS MANY FUNCTIONS AS YOU CAN OUT OF THE DEMO. Mr. Guilfanov took one very small function out and left in a ton of code that he never intended the demo to execute. With those functions gone, it is simply impossible barring an act of God to re-enable the function. I don't care how good a cracker you are. There would be no concievable way to reconstruct what happens in the call to 408100. Future Plans for IDA Pro First, enable the saving of ASM files. This code really is gone from the demo, but with the information I have about the program, I'm half way there. It's going to involve inserting rather a lot of code in the target, so hopefully I'll come up with tricks for that. Second, and more interesting, adding features. This can be done in part through the IDC macro language, but also through code patching. To get an idea of the features I have in mind, check out http://www.cs.uq.edu.au/groups/csm/dcc.html and anything you can find written by Cristina Cifuentes (she is truly BRILLIANT). DCC is a full fledged DOS deCOMPILER! That's right, it kicks out real C code. Admittedly this can be done only to a limited extent with large and complex programs, the concepts she discusses are very deep and important for understanding how to reverse engineer. Ilfak Guilfanov has certainly read her work (see his web site). Well, I'm tired and I am very behind in my work. Good Night.

And you better don't joke with this request of Quine either, because I'll find out if you do.
 Fravia+,

	I was wondering if you could add to my essay(s) the note that I
would very much like for no one to release the cracks to IDA as crack
programs (those vulgar little .com files) on the web or for anyone to
publish the cracks without the full essay.  I have too much respect
for the author of the program to have the demo crack tossed about the
web for people who are not serious about reverse engineering.  He has
written such a beautiful program that those of us who really cannot 
afford to buy it ought to -at least- earn the right to use it.

Thanks,
Quine
(c) Quine 1997. All rights reversed
You are deep inside Fravia's page of reverse engineering, choose your way out:

redBack to Advanced Cracking redBack to Project 1
redhomepage redlinks redanonymity +ORC redstudents' essays redacademy database
redtools redcocktails redantismut CGI-scripts redsearch_forms redmail_Fravia
redIs reverse engineering legal?