isDcc
An installshield Decompiler
advanced
Advanced reversing
30 October 1998
by adq
Courtesy of Fravia's page of reverse engineering
slightly edited
by Fravia+
fra_00xx
981030
adq
0010
AD
0T
Well, well, well... a "real" reversing essay! Andrew shows us all here what's like when you seriously work on an 'our tools' project. You'll be able to download isDcc either here or on Andrew's main page. Installshield decompiling is developing into a full-fledged science, after NaTzGUL's 'bahnbrechenden' InstallSHIELD Script Cracking. That's GOOD. In a sofware world where more and more processes are hidden or concealed from the users, the fact that (some) users are re-gaining control is a very positive development. Transparency and free knowledge are our (very strong) weapons, dark hyding and hideous concealing are the (very strong) weapons of our enemies... There is a crack, a crack in everything That's how the light gets in... how true!
ourtools
Our tools
There is a crack, a crack in everything That's how the light gets in
Rating
( )Beginner ( )Intermediate (X)Advanced ( )Expert

This is an overview of the main structure of isDcc. It describes how the main algorithms work, so you should have a good understanding of computer algorithm design.
isDcc
An installshield decompiler
Written by adq


Introduction
Like wisdec, isDcc allows decompilation of a compiled installshield script (.ins file) into source code (.rul file). Due to the nature of the original installshield compiler, the "original" source code cannot be recovered exactly, but compilable scripts are produced, providing the same version of the installshield compiler is used to recompile them.

Tools required
GNU Emacs, and MS Visual C++ would be useful if you intend to recompile the thing.

Target's URL/FTP
http://www.tardis.ed.ac.uk/~adq http://www.installshield.com/

Program History
v1.00 - Initial release
v1.01 - Couple of bugs fixed

Essay
isDcc was written with the help of wisdec v1.0, by NaTzGUL/SiraX, see NaTzGUL's InstallSHIELD Script Cracking. Wisdec is a masterful piece of work, which obviously involved reverse engineering the installshield compiler to discover the format of the compiled script files.

I used wisdec to explore the compiled files, changing scripts, recompiling them, and observing the differences reported by wisdec. It took a couple of evenings to divine the format of an installshield 2 file in this manner.

Here, I shall describe a couple of things about installshield files which are essential to understand the following:

Installshield scripts consist of a header, describing global features, for example function prototypes. The actual body of the script consists of a set of opcodes, describing the series of operations to take. Each opcode has some associated information after it (e.g. for a function call, you will find the parameters for the function call immediately after the opcode).

The first thing written was a parser for the header. This code is fairly simple: it reads values in from the file, processes them, and stores them in appropriate data structures.

However, the main script decoder is rather more complex. It involves three passes through the script code:
The first pass actually reads the raw opcodes from the file, and transforms them into an internal structure describing the code. This is implemented as a massive table-driven algorithm. The table is keyed by opcode. Each entry contains a function pointer to a specific parser function, along with some extra information, such as the parameter count. For an "installshield system function", a generic decoder function is available, since they all have the same format. The main loop of this stage reads in an opcode, looks it up in the table, and executes the associated function there. This function takes care of the specific processing for that opcode, before returning to the main loop. This continues until the end of the file is encountered.

The second pass works out function/prototype pairings, and fixes local variable counts. Because of the way a compiled script works, it is only possible to work out which function prototype is associated with which function body after a call has been made to that function. This stage goes through the interpreted code, looking for function calls, and associating function bodies with prototypes when it finds one. It also works out which variables in the function are locals, and which are parameters, since, again, this is not possible until it is discovered which function prototype pairs with which function. Note that it doesn't actually alter the code in the function to reflect this; it just works out which variables are which. Note that this means any function which is not called cannot be matched to it's prototype, and therefore has to be discarded.

The third, and final pass, goes through the code again, this time transforming the code in function bodies to reflect whether a local or a parameter variable is being accessed, to simplify any later processing.

Now, we have a huge memory structure, representing the compiled file. The next step will be to optimise the code sequences, and recover more of the original structure, for example FOR loops, and IF/ELSE sequences. However, this part is still under development.

Finally, the memory structure is decoded into a .RUL file and output it.


Installshield 5 brings a few changes. Some functions have had extra parameters added, necessitating special decoder functions for some installshield system functions. Also, user defined datatypes are possible, which changed the header slightly.

It has also had a large number of functions added to it. To find these, I examined the handy installshield documentation. (I even found some hidden features - see below)

Several functions have been removed from installshield 5, notably the CompressGet family. Installshield have completely revamped their method of installation, and have unfortunately decided to completely unsupport the previous method.

All this means that you cannot recompile an installshield 3 script with the installshield 5 compiler, and vice versa.



Final Notes
One of the main problems with the decompiler at the moment is that it cannot recover higher level code structures (e.g. FOR loops) from the sequence of GOTOs they are transformed into by the compiler. This means that if one of these structures is used in a function, we will end up with GOTOs in a function, which is not allowed by the installshield script compiler version 3. Hoever, the compiler for installshield version 5 does not check for this, so installshield 5 scripts are recompilable as they are at the moment.

Due to the fact that the code automatically discards unused functions, installshield scripts tend to halve in size when recompiled. For example, even if you only use one of the SdDialog functions, the compiler includes all of them in the compiled file.

Incidentally, I discovered a hidden feature of installshield scripts: the call statement. It seems you can have subroutines based on call/return as well as functions. I saw one script which used this feature, which prompted me to investigate further. I wonder why they don't tell anyone about it, since it is still in the compiler.

Currently I am developing code the recover higher level code structures, so that installshield 3 scripts should soon be recompilable too.



Ob Duh

Doesn't apply: we are reversing on our own and creating our own tools

You are deep inside Fravia's page of reverse engineering, choose your way out:

advanced
Back to Advanced reversing

--> redhomepage redlinks redsearch_forms red+ORC redstudents' essays redacademy database
redreality cracking redhow to search redjavascript wars
redtools redanonymity academy redcocktails redantismut CGI-scripts redmail_Fravia
redIs reverse engineering legal?