Overview

RetDec is a retargetable machine-code decompiler based on LLVM.

The decompiler is opensource since December 2017 shortly after the BotConf conference. It supports several executable file formats used by different operating systems such as ELF, PE, Mach-O, COFF, AR (archive), Intel HEX, and raw machine code.

It also supports the following architecture:

  • 32-bit: Intel x86, ARM, MIPS, PIC32, and PowerPC
  • 64-bit: x86–64, ARM64 (AArch64)

RetDec has several features that can be useful for Binary Analysis to perform reverse engineering, malware analysis, vulnerability detection (i.e DAST), binary verification, binary comparison, etc.

Features:

  • Static analysis of executable files with detailed information.
  • Compiler and packer detection.
  • Loading and instruction decoding.
  • Signature-based removal of statically linked library code.
  • Extraction and utilization of debugging information (DWARF, PDB).
  • Reconstruction of instruction idioms.
  • Detection and reconstruction of C++ class hierarchies (RTTI, vtables).
  • Demangling of symbols from C++ binaries (GCC, MSVC, Borland).
  • Reconstruction of functions, types, and high-level constructs.
  • Integrated disassembler.
  • Output in two high-level languages: C and Python-like language.
  • Generation of call graphs, control-flow graphs, and various statistics.

RetDec support installation for Microsoft Windows, Linux, macOS, and FreeBSD (Experimental). However, in the following article, we are using RetDec in the docker container format. One of the reasons to run in container format is to let the tool be integrated into the Kubernetes cluster in our lab, and also become part of our DevSecOps pipeline.

How does It work?

Jakub Kroustek, Peter Matula, and Petr Zemek from Threat Labs @ AVAST present RetDec at the Botconf 2017. Their presentation material can be found here. I will explain a little bit about how RetDec works where some of the material is ‘borrowed’ from the presentation.

RetDec is essentially a decompilation tool. The below picture shows how decompilation works (simplified mode)

Figure 1: Compilation v.s Decompilation

In short, a programmer writes source code in a C language and compiles it into executable binary format. The operating system then executes the binary. Malware is amongst a sample of executable binary that is executed by the operating system.

In order to gain quick insight into how RetDec works, I will show how a binary executable file is compiled and run in macOS operating systems. And after that, I will try to do the decompilation of those binary executable files into their source code. We’ll look at what the decompiled code looks like, and then, we’ll try to submit the decompiled code into Synopsis Code Dx®. Code-Dx is an application vulnerability correlation (AVC) solution that consolidates application security (AppSec) results to provide a single source of truth, prioritize critical work, and centrally manage software risk.

In this exercise, we are going to use a very simple code:

#include <stdio.h>

int main(void) {  
  printf("Hello World\n");
}

MacOS (Mach-O)

Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically-loaded code, and core dumps. It was developed to replace the a.out format.

Mach-O is used by some systems based on the Mach kernel. NeXTSTEP, macOS, and iOS are examples of systems that use this format for native executables, libraries, and object code.

GCC is used to compile the C source code such as the following,

Figure 2: Compilation of C source code in macOS

Run the RetDec container and we’ll see several tools we can use as part of RetDec project,

Figure 3: Run RetDec container

Let's run one simple tool called retdec-fileinfo

Figure 4: retdec-fileinfo output

As you can see, the output shows information about the binary file. It specifically shows the file format is Mach-O, 64-bit, and for ARM architecture (Apple’s M1 processor).

The following output shows the decompilation results

Figure 5: Decompile hello-world

Retdec decompilation produces several files,

Figure 6: Outputs of retdec decompilation

We can predict from the extension name what would be the content of its output. For example, .dsm should mean the disassembly, so the file contains source code in assembly language such as follows,

Figure 7: ASM of hello-world

And, finally, the decompiled output in C language

Figure 8: Decompiled output in C language

It is different from the original source code, but, essentially similar from a machine perspective.

Binary to SAST

There are many cases, especially in the security industry where the requirement is to perform an analysis of a binary executable file to find out its vulnerabilities. Not only against ‘good’ executable file, but also useful against ‘bad’ executable files such as Malware.

There are several tools and methods that can be used, and decompilation of binary into source code and feeding them into SAST tools is one of the tricks to find vulnerabilities out of the binary code.

I’ve compiled a very simple buffer-overflow source code into the binary file. Here’s the output of the decompiled source using retdec

Figure 9: Decompilation of binary with buffer-overflow bug

Now, I will submit the source code of decompiled binary into Synopsy Code-Dx

Figure 10: Decompiled binary submitted to Synopsis Code-Dx

Code-Dx detects the file in C language and then performs static analysis on the source code. Its analysis shows the source code contains one critical vulnerability.

Figure 11: Code-Dx detail output on vulnerable code

Code-Dx not only shows the location of vulnerable code but also exposes the capability to show how to reach the vulnerable code through its Data Flow section.

Decompiling a binary in the real-world is not that easy, it requires human intervention in many cases. The basic information presented here tries to demonstrate how decompilation software such as RetDec, can be used to analyze binary files with the possibility of automation when integrated into DevSecOps, for example, using Jenkins.

In a future article, we'll talk more about this automation.

Synopsis Code-Dx

Code Dx® by Synopsys is an application vulnerability correlation (AVC) solution that consolidates application security (AppSec) results to provide a single source of truth, prioritize critical work, and centrally manage software risk. Contact our sales team to get more information about Code-Dx.

Share this post