ARM recap and OS basics

Since there are some great tutorials on ARM assembly and reverse engineering in general out there, I will only add some points I assume important for understanding ROP chaining.

If you worked through Azerias great ARM Assembly Labs you are well equipped for this series! Actually, I assume you worked through them. In the next post I will use the TCP connectback shellcode, she explained here, since I will not develop my own during this series.

Leaf and non-leaf functions

Consider the following source:

int leaf(int a){
        return a+5;
int nonleaf(int a){
        return leaf(a+2);

Since nonleaf() calls leaf(), it is considered a non-leaf function. The difference is important since they differ in the epiloge:

leaf vs nonleaf function

Functions in ARM are called using the branch with link (BL) instruction, which loads PC+4 into LR and then branches off by loading the address of the called function into PC. The called function can then use BX LR to return to the caller. But think about a function which itself calls a function. The called function has to preserve LR on the stack using PUSH {FP, LR} and POP {FP, PC}, see nonleaf() in the example above), since it will be overwritten when bl leaf is called. Leaf functions, ones which do not call other functions, do not need to preserve LR, since it will not be overwritten.

The classic stack overflow overwrites the saved LR and hijacks the execution flow, when the saved LR is restored into PC, to return to the caller. If we want to exploit leaf-functions, we have to overwrite the stack until we reach the nonleaf surrounding the leaf function and hijack the execution flow from there.

Summary: leaf and non-leaf function differ in their prologue an epilogue. We'll need to handle BX LR instructions in a special way, more on that in the following post. For the following simple example we can ignore that.

Try to get the basic idea of ROP chains using the following simple example and use the next post as an in-depth explanation.

ROP theory

Under the assumption that we could hijack the process execution flow, what next? Many modern CPUs and operating systems try to make the world a little harder for an attacker, who gained control over the process execution flow, by implementing countermeasures like DEP and ASLR.

In a process protected by DEP (Data Execution Prevention) - there shouldn't be any part which is executable and writeable at the same time: Libraries and .text segments have only read and execute permissions (r-x), .data, stack and heap are read and write only (rw-).

In the case of a buffer overflow an attacker is able to write more data then expected by the developers into a buffer located either on the stack or the heap - into memory regions which are only (rw-). Therefore an attacker can inject his own code into the process, but neither can hijack the execution flow nor execute arbitary commands, like shellcode. By using a technique called ROP – Return Oriented Programming – an attacker nontheless can gain back many possibilies, we lust due to DEP. The following sections will expain how.

The basic idea of ROP: search in executable sections of the vulnerable binary (.text, mapped libraries) to find gadgets. A gadget consist of any number of instructions, which the last one let us further control the execution flow (e.g. chain multiple gadgets, therefore let's call them chaining instructions). The other instructions or parts of the gadget will help us to fullfill – step by step – our main goal. If your goal is complex, the ROP chain can get pretty long.

In this blog post series we will achieve two separate goals using ROP chains:

  • execute system('/bin/sh') (this post)
  • regain back a executeable stack region and execute our own shellcode (next post)

In my imagination gadgets are little functions which do have side effects on other gadgets and whose "parameters" are passed – if needed – on the stack. The ROP artists task is to orchestrate the gadgets in a way that

  • the chain is small
  • the side effects are used and combined wisely
  • the side effects are minimal
  • the goal is achieved

Some words on ROP and dependency on library and binary versions: Using gadgets out of libraries and the binary, makes the ROP chain dependable on the used versions and compiler optimization. If the target is going to be an embedded systems that's not problem at all, since the exact same firmware is written on thousands of devices.

Example 1 - Executing system() using ROP

Let's try to understand the basic idea using a simple example. A detailed explanation with a more complex example will follow in the next post.

The following picture shows a stack overflow happening in a function, while it executes. As soon as the function returns we control execution flow through the saved, then overwritten and finnally POP {PC}'d LR value: When the function epilogue executes, it will clean up local variables and restore the state of the saved registers. Two things will happen:

  1. the stack pointer SP will move towards 0xFFFFFFFF pointing to the memory location following the saved LR.
  2. The saved LR will get popped into PC to return the execution flow back to the calling function. As we have overwritten the saved LR value, we control the execution flow.

The stack layout after overflow, before epilogue execution:

basic stack overflow while execution

The state after epilogue execution:

basic stack overflow after epilogue

Imagine that we want to exploit a stack overflow on a local root process by executing system(/bin/sh;#). For a successfull system()-call we have to load a pointer to the string "/bin/sh;#" into R0. We are going to put the string "/bin/sh;#" onto the stack and then use gadgets to put a pointer to this string into r0 and then redirect the execution flow to system().

Let's check out the following gadgets I found in my binary. They reside in the (r-x) section of the dynamically loaded libc library of myhttpd.

1st gadget: 0x0001053c: pop {r4, pc};
2nd gadget: 0x00039074: ldr r0, [sp, #4]; blx r4;

For now, take these as granted. I will explain how to find gadgets in the next post of this series.

We are going to redirect the execution flow from the overwritten LR to the first gadget. Then, using the chaining instruction (POP {...,PC}) from the first gadget, we will further forward the exeuction flow to the second gadget. Additionally we can use the first gadget to prepare (using the side effect) where the chaining instruction of the second gadget (blx r4) will forward the flow to.

The first gadget has two "parameters", which we provide on the stack (the value for R4 and PC). We continue from the previous pictured state, where the epilogue of the vulnerable function was executed by preparing our overflow data for the execution of the 1st gadget:

basic stack overflow after epilogue, prepared for 1st gadget

Since we control PC we put there the address of the first gadget. After the epilogue, the vulnerable function will continue there and POP the two values we prepared for the 1st gadget from the stack: 0x000aabbc into R4 and 0x00039074 into PC . With that, we achieved that R4 is prepared for the second gadget and the ROP flow continues with the execution of the second gadget.

Lets execute the first gadget using the prepared stack:

1st gadget: 0x0001053c: pop {r4, pc};

basic stack overflow after execution of first gadget

SP moved, R4 and PC changed.

Let's continue with the second gadget. The second gadget has 1 "parameter" – the value we want to load into R0, which we serve on the stack (SP + 4). Actually, the second gadget will consume 2 parameters (or 8 bytes) from the stack, but the first (SP + 0) will only be garbage.

With the appopriate stack layout the second gadget achieves two things: First, it is going to move a pointer to /bin/sh;# into R0 and then jump to system() – the value we loaded into R4 using the 1st gadget. Again: since the chaining instruction – in this case – is using R4 we need to put the address of system() into R4.

Lets prepare the stack for the execution of the second gadget:

basic stack overflow after execution of first gadget, stack prepared for execution of 2nd gadget

And then execute the second gadget:

2nd gadget: 0x00039074: ldr r0, [sp, #4]; blx r4;

basic stack overflow after execution of 2nd gadget

The second gadget loaded SP+4 into R0 and then branched to R4, which we prepared already using the first gadget to point to system() - we executed system(/bin/sh;#).

Depending on the size of the overflown buffer and other local variables, the injected overflow string will look something like this:

'A' * n + 0x51525354 + 0x0001053c +  0x000aabbc + 0x00039074 + 0x51525354 + '/bin/sh;#'

Where n depends on the stack layout (e.g. the size m of local variables like buffer[m],...) of the vulnerable function and 0x51525354 beeing garbage values. Remember: It is highly likely that you are exploting a little-endian system. When injecting the shown string into a process, we have to adjust the values. In the next post I will introduce a script which takes care of this.

Some remarks on that example:

  • You might have notices that we use ';#' at the end of your string. That solves an important problem we have had: The NULL byte which we normally have to insert to tell system(), where to stop reading the command line parameter. In our case ';' ends one command and # introduces a comment, so everything following the # is ignored by system(). Alternatively you could use gadgets to insert NULL at the end of the string...
  • I tried to introduce the basic idea. That's why I left our one important point: How do we find the addresses we inserted in our ROP chain (like 0x00039074, 0x000aabbc,...)? I already explained that we find gadgets in the .text segment of our binary or its loaded libraries. We use the base address, the address where the libary we used to find gadgets in, is loaded into memory, plus the offset to the gadget, in our chain. I will explain in the following post how to find the gadgets, their offsets and base addresses.

<< previous post of this series | next post of this series >>