Extending LibVMI into User Space Applications

April 26, 2016

Recently, I've been working with LibVMI for a course project where we were able to subvert the Linux kernel and OpenSSL RNGs in a running VM. The general overview can be found in our class paper on the projects page.

Specifically, extending the attack from the kernel to user space was tricky and there really wasn't much documentation on the subject available online, so I wanted to document the process we went through. Maybe it will be helpful to someone someday.

If you are interested in playing with the code yourself, you can get it here: GitHub

As a brief background of the work on this project, we started with the Kernel RNG attack (rng-hook_kernel_fixed.c), which required creating a few helper constructs, such as a "breakpoint" struct (and helper functions).

Setting breakpoints just before and after the call to extract_buf.

To find a good breakpoint location, we reversed random.c, the kernel's random number generator, and found that random bytes were extracted from the entropy pool in the extract_entropy_user function. We can then look up the kernel memory address for the extract_entropy_user function and jump a fixed known offset from there to the instruction on which to break (line 1228 in random.c). Using GDB on random.o worked great for this.

Output of the GDB command `pdisas extract_entropy_user'.

Looks like the call to extract_buf is 155 bytes into extract_entropy_user, and the instruction immediately after that is at a 160-byte offset.

Notice the argument order. On x86_64, that means tmp will be passed in RSI.

After that, we simply inspect the local variables ...

Unfortunately, RSI get clobbered with the call to extract_buf so we have to record tmp's address beforehand.

... and overwrite the tmp buffer containing random bytes with whatever we want.

Overwriting the generated random bytes with our signal value, RNG_VALUE (all 0x66's).

Relatively simple, at least once all the code for setting breakpoints was in place.

Reading random bytes from /dev/urandom on the VM before and after we attach with LibVMI and launch the attack.

However, extending this attack to Apache2 and OpenSSL was a bit of a nightmare and the real story of this post. Completely aside from the fact that OpenSSL is somewhat of a mess under the hood and therefore difficult to reverse engineer, simply moving from kernel space introspection to user space introspection with LibVMI is difficult in its own right.

To start with, we reverse engineered the Apache2 and OpenSSL source code to pinpoint a good code location for introspection, similar to the location in extract_entropy_user in the kernel attack above. We found it at the end of the bnrand function in OpenSSL. This is where new random BIGNUM structs are generated and then returned for use in a variety of crypto applications.

The moment a buffer of random bytes are converted to a BIGNUM.

So, let's start finding the proper offsets into the bnrand function for our breakpoints.

Finding Apache2's pid. We can use any of the three.
You can use GDB to attach to running processes. Pretty cool huh?
Looks like that symbol doesn't exist.

Problem #1: Our function doesn't exist.

Well, bnrand is called from BN_rand, so let's look at that.

We discovered this jmp goes to a different region of memory each time Apache2 is restarted in a new process.

An unconditional jump. Goodie. Let's see what that's pointing at.

All those pushes look like the start of a function saving the caller's register values. Let's scroll down...

Found it! So, the compiler optimized away the function call to bnrand (sorta). Well, that's going to make finding this memory offset dynamically from LibVMI a real pain. We'll need to start at the symbol for BN_rand, then read the address at BN_rand+13 bytes (the address stored in the jmp instruction), then jump down 572 bytes (0x7f99441e59dc - 0x7f99441e57a0) from there to find our breakpoint location.

Okay, so we'll just use LibVMI's handy user space symbol lookup function to find BN_rand, do the above steps, and then everything will work like the kernel approach above...

Linux is unimplemented at this time!?

Problem #2: Our function doesn't work.

So, as it turns out, because of the complexity that will shortly become apparent, LibVMI doesn't currently support looking up user space process symbols. Bummer! Well, what would it entail to do it manually? Is there something in the kernel's process list data structures that will let us go retrieve this information ourselves?

task_struct is a kernel data structure that stores information about a running process. It is used in LibVMI's process-list example code. Figure used with permission from Zhiqiang Lin.

When the process runs, it loads (mmaps) all the dynamic libraries (Linux shared objects) it needs into its virtual memory. These libraries are loaded at unpredictable offsets. Also, when these libraries are linked in at run time, all the symbol references to functions in those libraries must be looked up as essentially a fixed offset from the start of the library. Thus, it makes sense that the symbol we're looking for (BN_rand) isn't going to be in some global process symbol table, but rather in a symbol table for the particular library.

So, I can't say I feel really comfortable with this yet, but from reading up about how symbols are stored in ELF files it looks like symbols can be categorized into dynsym symbols and symtab symbols. Dynsym symbols are the global symbols required to run the program and get loaded into memory when the binary runs, whereas symtab symbols appear to be debugging symbols, useful for getting GDB to print out function names, but unnecessary for runtime linking.

This distinction between dynsym and symtab symbols looks promising because BN_rand is probably a dynsym symbol and therefore loaded into memory where we could dynamically grab it with LibVMI. This looks like something we can use to make the vmi_translate_sym2v function work! Let's come back to this later (as in, another later post).

However, at this point, I made another observation: The library containing the BN_rand function is static, and so from the start of the region of the library that gets mapped into memory, the BN_rand function will always start at a fixed offset. Let's find that offset.

Great, so we've found the library Apache2 uses. This library will be loaded into memory into one of those purple locations in the Process Virtual Memory diagram above.

Now let's find the offset.

Excellent. Since shared libraries use Position-independent code, the BN_rand function will be located 0xd5a50 bytes from the start of wherever this library gets mapped into memory.

Before we proceed, let's make sure this checks out on the running program.

Remember, the PID of our Apache2 process is 4560.
Same image as before. Where does BN_rand start?
Hey, that's the same offset that we got before!

Great. We now know the offset from the start of libcrypto.so at which we will find the BN_rand function, bypassing the entire symbol lookup.

So, to review, so far we've identified:

  1. we need to find the start of libcrypto.so in memory, dynamically
  2. 0xd5a50 bytes from there, we find the start of BN_rand
  3. 13 bytes into BN_rand, we find the address of bnrand
  4. 572 bytes into bnrand, we find the call to BN_bin2bn where we want to break

Guess it's time to see if we can find where the start of libcrypto.so is in the process's virtual memory.

Same as before, just saving you from having to scroll.

Just one more problem. Ya-know how all this code we're introspecting is running on the VM? Well, what if the VM kernel data structures are compiled differently than the kernel data structures on my host machine? Dang it. Now we need the offsets of those structures directly from the VM. I modified the linux-offset-finder tool that LibVMI provides to find all the offsets for those data structures.

Also, did'ja know that the default seven parameters you put in /etc/libvmi.conf when setting up LibVMI for the first time are the only acceptable parameters? No dynamic parameter support at all. So, after grabbing all the offsets from the VM, I just included them statically in the code and moved on.

Feels dirty writing code like this. Looked into making LibVMI dynamic, but coming up on a deadline, skipped it. I hope to come back and update LibVMI with dynamic configuration support and push it upstream.

So, I updated the code (using the process-list example as a guide) to find the Apache2 process (remember, we don't care which one, they're all the same) and then wrote a function, walk_vmmap_for_lib, to walk the kernel data structures and return the first memory address where libcrypto.so is loaded. (This could be extended to return the Nth memory address where it's loaded, but wasn't necessary in this case.)

Following the chain of dereferences and offsets described above, we can find the proper place to set our breakpoint, and overwrite OpenSSL's RNG'd bytes with whatever we want.

Just a reminder of argument order.
Callback code for overwriting random bytes when our breakpoint trips.

In this case, what we want is a bunch of 0x66 bytes so it's easy to detect. This could just as easily be a pseudorandom stream of bytes generated with my public key, only predictable with my private key (although asymmetric crypto may impose a noticible performance hit, more research on that later).

Overwriting Apache2's Diffie-Hellman 256-byte private key.

Now, time to see if we can detect it!

Let's open up wireshark and capture a Server Key Exchange.

Wireshark, inspecting publicly-visible Diffie-Hellman key exchange parameters.

Take a moment to refresh your memory of the Diffie-Hellman protocol. The server picks two public parameters, p and g. Each side of the communication generates a random private key, priv and passes the other their corresponding public key pub, computed by taking g^priv % p. Finally, the session key that encrypts the rest of the TLS session is found on both sides by taking g^(priv_A * priv_B) % p.

If a third-party can predict the private key of either side, then using parameters passed in the clear during this key exchange, they can compute the session key and do all sorts of bad things to the connection.

(g^priv % p) == pub

If our attack worked, the random number OpenSSL generated for priv should have been 256 bytes of 0x66.

Since taking g to this predicted priv power, modulo p does in fact compute pub, we have successfully subverted the OpenSSL RNG used in Apache2!