[Miyagi karate-chops the tops off three beer bottles]
Daniel: How did you do that? How did you do that?
Miyagi: Don't know. First time.
(Karate Kid)

Currently I feel like a k̵a̵r̵a̵t̵e̵ binary kid on its long, hard way to become a binary ninja ;) To train my reverse engineering skills I started some time ago to reverse engineer the NoPetya/Wiper variant 027cc450ef5f8c5f653329641ec1fed9.

tux can karate
source: CC BY_NC

As I am a self-taught reverse engineer and this is my second, "comprehensive" project, be aware there might be mistakes and wrong assumptions. Suggestions and comments are, as always, highly welcome!

notes

  • In the following I will mark my function and variable namings in the disassembly as fat text. These fat texts are not embedded in the binary but are chosen by me.
  • In the most cases I write my blog posts to clearify things for myself. In this case I had to correct some assumptions and findings during writing, since i learned a whole lot new things about PE/COFF files and their loading process. So there are some comments on the screenshots which might not fit the descriptions I will make in the text of the post.
  • My understanding of the code changed a whole lot while writing this post. I noticed that I mainly analyzed a pretty much standard PE loader code, with some modifications (see parse_reloc_for_absolute()) Nevertheless I decided to publish this for documentation purposes... ;)

the storyline

Since this will be a pretty dense post I want to give you a brief overview of the storyline of this post, so you can cherry-pick the topics you want to know more about.

In this post I am going to first describe how NoPetya loads itself through DllMain() and its single exported function which has ordinal #1. See section first run: calling DllMain and #1

After that the interesting part begins as I analyze the function load_and_delete(), which

  • parses the DLL for modifications of the .reloc table.
  • setup the new copy of the DLL (remap sections, loads dependencies)
    • I reverse engineered the functions responsible for these tasks. As far as I saw there is nothing very fancy happening there... so I will ignore them for now
  • jumps to that new version for execution of the following tasks.
  • remove the original version of itself from memory (executed through the copy).
  • overwrite the file on disk (executed through the copy).

first run: calling DllMain and #1

DllMain()

NoPetya exports two functions, one nameless with ordinal 1 and the default DllMain(). From reports I was aware that NoPetya calls itself like this:

C:\\Windows\\system32\\rundll32.exe\” \”C:\\ProgramData\\perfc.dat\”,#1 30

Even if NoPetya spreads itself by calling the exported function with ordinal #1, in the default settings DllMain() is still called when the dll is attached and detached to a process or thread - NoPetya disables the calls to DllMain() in the case of thread attach and detach. Besides that the malware stores its base address (parameter hinstDLL) for later use in DllMain(): I named this global variable baseAddress

Besides that nothing more happens in DllMain().

#1

One of the first actions NoPetya executes in #1 is to obtain SeShutdownPrivilege, SeDebugPrivilege and SeTcbPrivilege and check if some processes are running on the system by calculating the hashes of the processes exe filenames with 3 fixed values. (Done through CreateToolhelp32Snapshot(), Process32First(), Process32Next(), szExeFile field in a PROCESSENTRY32 structure)

To me it is not obvious if this is a well-known hashing function - I checked for some which produce 32 bit hashes but none machted. Here a user found some promising values for the hashes:

  • 2e214b44 = avp.exe
  • 6403527e = ccSvcHst.exe
  • 651b3005 = NS.exe

I will not go into further detail since I think the following parts are more interesting.

An additional step which is done in this startup function is that NoPetya copies itself into heap for later use, for example for the lateral spreading part. (self_in_heap)

After that the malware calls load_and_delete(). #1() is called recursively from within this function so this call is skipped if the first recursion is happening (ebp+hThread == 0xFFFFFFFF).

NoPetya: check for running processes

load_and_delete()

load_and_delete():load new copy of the dll into memory

load_and_delete() executes the following tasks:

  • allocate a new, hidden copy of the DLL in memory
  • make sure the new copy is properly loaded in memory (sections mapped with appropriate access rights, dependencies loaded correctly, the function pointers to the used library calls are correctly set up)
  • check for manipulations (especially the .reloc table)
  • transfer the subsequent exection to the new copy
  • overwrite and delete the original copy

As a first step NoPetya retrieves the size of itself out of the loaded DLLs PE header (1), allocates appropriate amount of memory (2) and copies itself into that memory (3). This copy is going to be used later for execution of the attack vectors embedded in the malware. The originally loaded version gets freed from the process in a later step.

load_and_delete(): check for .reloc manipluations

load_and_delete():magic

First I assumed there are two sanity checks of the .reloc table:

  • get_real_reloc() (4)
  • parse_reloc_for_absolute() (5)

During writing this post I learned a lot about the loading process of PE/COFF files - for example I understood that both of these functions are part of the normal loading process and that parse_reloc_for_absolte() is enhanced with a sanity check of the .reloc table. Additionally I assume that the function get_real_reloc() contains a small error (I have no other idea ;)), which has no impact on the functioning of the code - for more information see the following section.

If the relocations (and the contained sanity check) are passed successfully the malware remaps the sections of the copy with the proper memory R/W/X permissions. (6)

get_real_reloc()

First I assumed get_real_reloc() is some kind of manipulation check, because the malware calculates the address of the .reloc section with some detour. Please note: in this part the screenshots might not fit the text, since I learned a lot new things about the loading process of a PE file and the PE file structures while writing.

The detour I mentioned is marked with a 7 in the following screenshot. NoPetya takes the raw address of the .reloc section in self_in_heap substracts the value of the RVA (of .reloc) from the section header table, and adds the RVA (the same value) from the optional headers (red on yellow on the following screenshot). I currently see no value in this calculation?

get_real_reloc() function returns the offset of the .reloc section (RAW address), which subsequently is passed to parse_reloc_for_absolute() (see next section). get_real_reloc() is working on the inactive copy of itself in heap (self_in_heap) to calculate these values.

get_real_reloc()

get_real_reloc()

In the upper part of the image we see how the malware iterates all sections in the sections table and if the stored value of the RVA is less than the currently parsed section plus its size, it assumes that it found the real .reloc.

After NoPetya calculated the raw address of the .reloc table it passes this value via ecx into parse_reloc_for_absolute().

parse_reloc_for_absolute()

In a later step I will show how I modified the binary to avoid self-deletion from memory and disk and to keep the execution in the original DLL instead of the copy. This manipluations require also a modification of the .reloc table, which otherwise causes unwanted modifications in the patched parts of the binary resulting in unpredictable opcodes in the .code section.

To modify the .reloc table to ignore the entries, which otherwise would write into my patched parts of the .code section, I disabled the entries by setting the fixup type to 0. (below a picuture of the PECOFF specification where I found the ABOLUTE entry)

fixup types in PECOFF specification

modified reloc table

After I patched the binary and reloaded it into the debugger I experienced a different behaviour - the binary quit itself before reaching my breakpoints, where I wanted to continue my investigation. I ran into the already mentioned sanity check of the .reloc table: parse_reloc_for_absolute() (5) detects all entries in the .reloc table which are not IMAGE_REL_BASED_HIGHLOW, and stops the exection of itself:

modified reloc table

This function is build of two loops, one iterates all blocks of the .reloc table and the inner loop iterates over all entries in the current block. 8 is (my modified) version of the check if the current entry has IMAGE_REL_BASE_HIGHLOW set as fixup value. In my the check is disabled by using a "jump above" instead the original "jump if not zero".

load_and_delete(): jump into copy

For now I will skip how exactly NoPetya is remapping sections of the DLL copied into memory. To proper execute the copy, NoPetya needs to set the permissions (RWX) to the sections (e.g. .code = RX). This is done in remap_sections()

After the setup of the DLL is completed the execution is transfered to the copy. To ease my analysis I decided to patch the malware to stay in the original DLL. To achive that I replaced some opcodes with nops, as you can see in the left part of the following image. 9 shows how the absolute address of the function self_delete_and_load in the copy is calculated. (ebx contains the pointer to the base address of the copy).

Note: As I learned while writing, this is pretty much the same method a PE loader would use, see PE-Loader-Sample on Github

modified reloc table

Since the address of baseAddress gets relocated through a entry in the .reloc section we have to manipulate the entry at RVA 963E to be ignored (see the subsection load_and_delete(): check for .reloc manipluations, parse_reloc_for_absolute()). Otherwise the linker would write arbitary data into this part of the .code section, which then would be interpreted and executed as code.

The code jumps into the function self_delete_and_load(), which I will analyze in the following section.

self_delete_and_load()

From within the copy self_delete_and_load() frees the originally executed binary from the process, allocates a zero-initialized heap with the size of the malwares DLL, overwrites the DLL with this heap section (all zeros) and deletes the DLL in the end. After successfully executing the mentioned tasks the initially called function #1 (see section first run: calling DllMain and #1) is called.

Within self_delete_and_load() there were three parts which I had to patch:

  • the freeing of the library (freed by a call to FreeLibrary() in kernel32.dll
  • the overwriting (WriteFile() kernel32.dll)
  • the self-deletion (DeleteFile() kernel32.dll)

Here are the screenshots of the patched and unpatched calls to FreeLibrary() and DeleteFile(). Note: both patches require also a change in the reloc table, as I described them in the previous sections! Since the malware checks the return values to be successfull we need to move 1 into eax. After removing the calls to the kernel32 functions and replacing some bytes with mov eax,1 there still are some free bytes, which we pad with 0x90 (NOP) to keep the file size the same. (10 and 11)

modified reloc table modified reloc table

patching the binary

For the sake of completeness a few words on patching the binary. If you found some instructions you want to patch out of a binary IDA is very helpful by showing the offset of these bytes in the input file

modified reloc table

From there you can get the hex editor of your choice and replace the instructions with NOPs or other instructions:

modified reloc table

The next thing you must check is if there is a entry in the reloc table, which might alter your patched regions. If you replace a call into a DLLs function with NOPs you surely have to deactivate the associated entry in the reloc-table - as I showed in the sections above.

patching the .reloc table

PPEE (puppy) is the PE tool which I used to deactivate entries in the reloc table by setting them to type 0. However I did not find a tool which allows the deletion of entries.

The reloc tables entries list die relative virtual address of the relocations, which the linker should execute. So you have to calculate the absolute address by substracting the base address of the image from the addresses virtual address, which gets relocated. (see see picture above, IDA shows that address in the bar)

the end

Altogether I spend quite some time in reverse engineering and understanding a - pretty much standard - PE loading process. I certainly learned a whole lot about PE/COFF files, the PE loading process, PE file structures and reverse engineering in general. If my spare time permits there will be some follow ups on this post, since I did not touch the interesting parts of the malware ;)