Decompiler

The Decompiler plugin is a sophisticated transformation engine which automatically converts the binary representation of individual functions into a high-level C representation. The Decompiler presents a view of a program which is interactive and dynamically updated as the user adds or makes changes to the annotations associated with the program. A Decompiler window maintains the correspondence between the C representation and the assembly representation displayed in the Code Browser window, to the extent possible. The window allows instant visual association and navigation between C language expressions and their corresponding assembly instructions.

To display the decompiler window, position the cursor on a function in the Code Browser, then select the icon from the tool bar, or the Decompile option from the Window menu in the tool.

Some of the primary capabilities of the decompiler include:

Variables

The decompiler will attempt to combine different locations (stack, memory, register) for variables within a function. Data type information for variables is gathered automatically from several sources. Any annotated function signatures, both of the function and of any sub-functions it calls, provide type information. If the function contains references to global memory locations that have a data type applied to them, these will also be used, and any local variables of the function can be annotated directly with data types. The user can provide data-type information to the decompiler by annotating all these sources. The more information that can be provided the better the produced C-code will be.

Variables not labeled directly are assigned types by analyzing local type propagation. Typically, assigning data types to a few key variables dramatically improves the readability of the C-code, as propagation will accurately fill in all the other data types. Assigning types in function signatures and to global variables is particularly effective because of their effect across multiple functions simultaneously.

If you have C-header files for an API a program is using, there is a prototype C-Code parser than can extract the Data Type information from C-Code and create a Ghidra Data Type Archive (.gdt). The interface is currently fairly crude, but it handles most C syntax including macro expansion. The function signatures and data types extracted can be applied to the program. Just open the archive in the Data Type Manager window, select the archive, right mouse click, and select Apply Function DataTypes. Ghidra currently provides definitions for the majority of windows API functions and data types automatically.

Parameter Variables

Specifying data types for function parameters is especially useful. A function that has data types defined for its parameters will propagate these types into the variables of any calling functions.

C variable argument conventions, or varargs, are are also supported. For instance, if the user has identified the standard C library routine printf, the signature can be defined to be void printf (char *, ...). Now whenever printf() is called, the decompiler will display the correct number of variable arguments.

Function signatures can be applied from a Ghidra data type database. Windows data types and standard C library function signatures are included with the standard distribution. More definitions will be added in the future.

Internal Decompiler Functions

Occasionally, the decompiler may use one of several internal decompiler functions that don't get transformed into more 'C'-like expressions. Use of these can indicate that the pcode is incorrect or needs to be "Tuned" to make the decompiler output better. It can also mean that the decompiler needs an additional simplification rule to take care of that particular situation.

SUB42(0xaabbccdd,1) = 0xbbcc

When "c" is 0, the operation is almost always a cast between integer sizes, where the decompiler didn't quite figure it out. Usually the decompiler didn't figure out that "x" was an integer type or was forced to assume otherwise.

SUB41(x,0) is usually a cast from "int" to "char".
SUB42(x,0) is a cast from "int" to "short" and so on.
SUB84(x,4) is probably part of an extended precision multiplication but also turns up in other things like division strength reduction.

CONCAT31(0xaabbcc,0xdd) = 0xaabbccdd

This usually crops up when a 1-byte sized (char) variable is being stored in a 4-byte register. All the basic arithmetic/logical ops on the 4-byte register give the correct result for doing the operation on a 1-byte variable; the compiler just has to make sure to ignore the 3 most significant bytes of the register. The CONCAT31 is the decompiler keeping track of these most significant bytes that the compiler was ignoring because it is mistakenly interpreting the register variable as being a 4-byte variable. In many cases the decompiler can figure this out, but especially in looping constructs, it cannot. This is really a dead code issue. The decompiler currently makes judgements about dead code for entire varnodes. A full fix of this problem would require a dead code elimination algorithm that could decide if part of a varnode were dead.

This is almost always a cast from small integer types to big unsigned types.

This is probably a cast from a small signed integer into a big signed integer.

Returns "true" if there is an arithmetic overflow when subtracting "y" from "x" as signed integers. These are generated particularly by signed integer comparisons. There are rules in place for recovering the original comparison, but this is a missed one special case. These could also conceivably be generated in extended precision subtraction.

Returns "true" if there would be a carry adding x to y.

If these are turning up everywhere in a particular binary, it could be a missed simplification that could be easily fixed.

Register Settings

Occasionally a program will use a register to store a global constant. By using the <Set Register> function on the right mouse pop-up menu, the user can specify this value to the decompiler. The constant will be propagated automatically throughout the function, and the resulting code may be greatly simplified.

Decompiler Options

The following Decompiler Analysis Options are available ( Edit->Options Decompiler/Analysis ):


Data Mutability

Decompiler output can be influenced by the mutability of data locations within memory.  Supported mutability settings include:

Data mutability may be controlled by the user in one of two ways:

  1. Memory Block Settings
  2. Data Settings

Tips:

It is important to note that the decompiler is only as good as the definition of the the underlying assembly language code. Each assembly instruction has an associated PCODE definition that describes what the instruction does, essentially an RTL (Register Transfer Language). For example, the following MOV instruction which moves a value into an offset onto the stack also has a PCODE definition.

MOV local_1c[ESP], 0x804aac8

temp1 = INT_ADD 0x4, ESP
temp2 = COPY 0x804aac8
STORE ram(temp1), temp2

Irregularities in the produced C-code can often be attributed to errors in this underlying definition. Such errors can usually be fixed quickly. Please feedback any problems or issues you find.

Decompiler Window

To display the decompiler window, position the cursor on a function in the Code Browser, then select the icon from the tool bar, or the Decompile option from the Window menu in the tool.

Errors from the decompiler process are reported in the status area of the tool and sometimes at the end of the C code in the decompiler window.

Mouse Actions

You can navigate to the target of a goto statement by double-clicking its label (you can also double-click a brace to navigate to the matching brace).



Other actions available in the decompiler are described in the following paragraphs.

Copy/Copy Special...

C Code from the decompiler window can be copied and pasted into any other system text window. Select the text to copy, and then choose Copy from the popup menu.

Comments->Set...

Set a comment on a line of C-Code.  The comment will be stored in the program database at the closest assembly line associated with the generated C-Code.  Any type of comment (EOL, Post, Pre, Plate) can be attached to the representative C-Code.  When this function is re-displayed at some later point, the comment will persist.

Commit Params/Return

By default, the decompiler will analyze the code to try to discover function parameters, return type, and local variables. Each time the decompiler displays C-code for a function it does this analysis again. Commit Params/Return causes any parameter names and types and return type to be saved in the program database so that next time the function is decompiled the current definitions will be used. This is useful for "syncing" the function signature with the disassembly display. This causes the names and types of parameters and returns in the disassembly to agree with the decompiler names and types.

Ghidra will do stack analysis that will recover parameters and return types, but for many programs, the analysis the decompiler does is better.

There is a prototype plug-in that automatically pulls in the decompiler derived information and applies it to each function as the function is created.

If a variable displayed in the assembly window has an undefined type, the decompiler will still respect the name of the variable.

Commit Locals

By default, the decompiler will analyze the code to try to discover function parameters, return type, and local variables. Each time the decompiler displays C-code for a function it does this analysis again. Commit Locals causes any local variable names and types to be saved in the program database so that next time the function is decompiled the current local variable definitions will be used. This is useful for "syncing" the local variable definitions with the disassembly display. This causes the names and types of locals in the disassembly to agree with the decompiler names and types.

Ghidra will do stack analysis that will recover local variables on the stack, but for many programs, the analysis the decompiler does is better.

There is a prototype plug-in that automatically pulls in the decompiler derived information and applies it to each function as the function is created.  The plugin by default will not commit local variable definitions, either stack or register locals.  Committing locals automatically can be turned on by changing the analysis options for the Decompiler Parameter ID plugin.  In most cases it is better to commit locals only for certain functions that you really care about, or after the data type definitions (structures, etc...) have settled down for the program you are Reverse Engineering.

If a variable displayed in the assembly window has an undefined type, the decompiler will still respect the name of the variable.

Auto Create Structure / Auto Fill in Structure

Automatically creates a structure definition for the pointed to structure, and fills it out based on the references found by the decompiler.

To use this, place the cursor on a function parameter variable, or any variable within a function that is a pointer to a structure.  It could currently have a data type of undefined, int, void *, char *, etc...  For example:  func(int *this), (for a C++ this call function).

If the variable is already a structure pointer, any new references found will be added to the structure, even if the structure must grow in size.  This is very useful as you find more places the structure is used.  If you have already started recovering a portion of a structure and find it used in another function.  Retype the variable to be the structure, and then use Auto Fill in Structure to add any new fields recovered for the structure.

This feature is also available in the assembly listing when the cursor is placed on a defined parameter or return variable.

Currently this only recovers the structure by following the structure pointer through the current function and any function the structure is passed into within the current function.  Eventually this will be put into a global type analyzer, but for now it is most useful interactively.

For best results, the function should be well formed with good flow, and all the switch statements should be recoverable.

There is also a script called CreateStructure that you can use for automated structure recovery.  For instance if you have a set of ThisCall routines where the first parameter to all the routines is a pointer to a shared class structure, the script could be modified to recover the structure for each this parameter.

Highlight Def-Use

Highlights all places a value is used, starting at the place it is first written, and including all the places where that one value is read. This is usually a proper subset of all the places a variable appears in the function Place the cursor over a variable you would like to highlight and select Highlight Def-Use from the pop-up menu.

As an example the a at the top of the function is under the cursor when Highlight Def-Use is chosen.

Notice that the first three references to a are highlighted but the final use of a is not because the value might have changed in the else clause.

Highlight Forward Slice

Highlight Forward Slice highlights each variable whose value may be affected by the value in the variable under the cursor.

As an example, b, the output of max_alpha, is under the cursor when Highlight Forward Slice is chosen.

We can see that c is tainted by the value of b all the way through to the bottom of the function.

Highlight Backward Slice

Highlight Backward Slice highlights all points in the function that contain a value involved in the creation of the value in the variable under the cursor.

As an example the final a of the function is under the cursor when Highlight Backward Slice is chosen.

We can see that the final value of a is affected by the loop and by the input parameter but never by b and c.

Highlight Forward Instruction Slice

Highlight Forward Inst Slice highlights each instruction whose value may be affected by the value in the variable under the cursor, rather than the values themselves.

Highlight Backward Instruction Slice

Highlight Backward Inst Slice highlights all instructions in the function that contribute to the creation of the value in the variable under the cursor.

Rename Variable

Any parameter or local variable can be renamed. Just place the cursor over a variable definition, or any use of the variable and choose Rename Variable from the popup menu. The name will now be saved for this function, so the next time the decompiler displays the code for the function, the same name is used.

Rename Function

A shortcut for renaming the function from within the decompiler window.

Retype Variable

The decompiler does its best to recover the type of a variable automatically but often only has limited information for analysis. Explicitly changing the type of a variable can dramatically improve the C-code produced. This is especially true for structures. Changing the type of a parameter variable will affect the display for every place the function is called.

To change a variables data type; place the cursor over the variable definition or use of the variable, select Retype Variable from the popup menu, and then enter the name of the type. The name of any data type known to the program can be used.

A simple code improvement is to locate any functions with obvious string parameters and re-type the parameter to be a "char *". Any references to defined memory will now display the passed parameter as a character "string".

Edit Data Type of Variable

Only structure, union, and enum data types can be edited. If a variable's data type is one of these it can be edited. Also, if the data type is a type definition, array, or pointer based on an editable data type, then the base data type can be edited. For example, if you have a structure pointer for a variable then you can edit the structure. To edit a variable's data type; place the cursor over the variable definition or use of the variable and select Edit Data Type... from the popup menu. For structures and unions, the structure editor will appear, and for enums the enum editor will appear. If the data type for a variable can't be edited, the action will be disabled in the popup. 

Edit Function Signature

The Edit Function Signature dialog allows you to change the function's signature, the calling convention, whether the function is inline and whether the function has no return.


The function signature includes

This features allows you to edit a function signature text string to change any of these.
For example if a function is actually printf(), instead of changing the name, return type, and parameters individually, the entire function signature can be changed all at once. To do this you could enter
    void printf( char *fmt, ...)
within the Signature field and then select the OK button.

In addition, you can select the Calling Convention for this function from a list of available calling conventions as determined by the program's language. Selecting the Inline checkbox indicates that the function is in-lined. Selecting the No Return checkbox indicates that the function does not return.

The signature of the current function, or any called function can be changed.

To edit a function's signature from the Decompile window. Just place the cursor over any function name, select Edit Function Signature from the popup menu, and the dialog will appear with the function's current information.

Override Signature

Overrides the signature of a called function at the point it is called. This allows you to set the parameter values for a particular call.

Remove Signature Override

This action allows you to remove a previously added function signature override.

Find...

Find any string of text within the currently decompiled function.

Debug Function Decompilation

For certain functions, the decompiler may produce an error message, produce incorrect code, or simply exit without producing results. Selecting \/ Debug Function Decompilation, from the decompiler provider window toolbar, will run the decompiler again, and save all relevant information to an XML file. Instead of submitting the entire program to be analyzed to discover the problem, only a small XML file is needed.

Graph AST Control Flow

Selecting Graph AST Control Flow, from the decompiler provider window toolbar, will generate an abstract syntax tree (AST) control flow graph based upon the decompiler results and render the graph within the current Graph Service.

If no Graph Service is available then this action will not be present.

Export to C

You can export the current decompiled function to a file by selecting the icon in the local tool bar of the decompiler window. A file chooser dialog is displayed for you to select the name of the output file. If you do not specify a file extension, ".c" is appended to the filename.

Snapshot

Creates a Snapshot of the current decompiler window, which allows you to leave the current decompiled function in place while navigating to other functions.

Properties

The colors used in the decompiler window can be changed by editing the C Display Options through the Edit Options dialog. To edit the options, choose Edit Tool Options... from the tool menu. Click on the C Display node in the Options tree. A panel shows the colors that can be customized. Click on the color bar to bring up the color chooser to change the color.

The other options allow you to change the maximum characters in a line displayed in the decompiler window, and the number of characters for indenting in the code.

Mouse Hovers

These function similarly to Code Browser Mouse Hovers

Provided by: Decompiler Plugin

Related Topics: