Recently, I've been working with LibVMI for a course project where we were able to subvert the Linux kernel and OpenSSL RNGs in a running VM. The general overview can be found in our class paper on the projects page.
Specifically, extending the attack from the kernel to user space was tricky and there really wasn't much documentation on the subject available online, so I wanted to document the process we went through. Maybe it will be helpful to someone someday.
If you are interested in playing with the code yourself, you can get it here: GitHub
As a brief background of the work on this project, we started with the Kernel RNG attack (rng-hook_kernel_fixed.c), which required creating a few helper constructs, such as a "breakpoint" struct (and helper functions).
To find a good breakpoint location, we reversed random.c, the kernel's random number generator, and found that random bytes were extracted from the entropy pool in the extract_entropy_user function. We can then look up the kernel memory address for the extract_entropy_user function and jump a fixed known offset from there to the instruction on which to break (line 1228 in random.c). Using GDB on random.o worked great for this.
Looks like the call to extract_buf is 155 bytes into extract_entropy_user, and the instruction immediately after that is at a 160-byte offset.
After that, we simply inspect the local variables ...
... and overwrite the tmp buffer containing random bytes with whatever we want.
Relatively simple, at least once all the code for setting breakpoints was in place.
However, extending this attack to Apache2 and OpenSSL was a bit of a nightmare and the real story of this post. Completely aside from the fact that OpenSSL is somewhat of a mess under the hood and therefore difficult to reverse engineer, simply moving from kernel space introspection to user space introspection with LibVMI is difficult in its own right.
To start with, we reverse engineered the Apache2 and OpenSSL source code to pinpoint a good code location for introspection, similar to the location in extract_entropy_user in the kernel attack above. We found it at the end of the bnrand function in OpenSSL. This is where new random BIGNUM structs are generated and then returned for use in a variety of crypto applications.
So, let's start finding the proper offsets into the bnrand function for our breakpoints.
Problem #1: Our function doesn't exist.
Well, bnrand is called from BN_rand, so let's look at that.
An unconditional jump. Goodie. Let's see what that's pointing at.
All those pushes look like the start of a function saving the caller's register values. Let's scroll down...
Found it! So, the compiler optimized away the function call to bnrand (sorta). Well, that's going to make finding this memory offset dynamically from LibVMI a real pain. We'll need to start at the symbol for BN_rand, then read the address at BN_rand+13 bytes (the address stored in the jmp instruction), then jump down 572 bytes (0x7f99441e59dc - 0x7f99441e57a0) from there to find our breakpoint location.
Okay, so we'll just use LibVMI's handy user space symbol lookup function to find BN_rand, do the above steps, and then everything will work like the kernel approach above...
Problem #2: Our function doesn't work.
So, as it turns out, because of the complexity that will shortly become apparent, LibVMI doesn't currently support looking up user space process symbols. Bummer! Well, what would it entail to do it manually? Is there something in the kernel's process list data structures that will let us go retrieve this information ourselves?
When the process runs, it loads (mmaps) all the dynamic libraries (Linux shared objects) it needs into its virtual memory. These libraries are loaded at unpredictable offsets. Also, when these libraries are linked in at run time, all the symbol references to functions in those libraries must be looked up as essentially a fixed offset from the start of the library. Thus, it makes sense that the symbol we're looking for (BN_rand) isn't going to be in some global process symbol table, but rather in a symbol table for the particular library.
So, I can't say I feel really comfortable with this yet, but from reading up about how symbols are stored in ELF files it looks like symbols can be categorized into dynsym symbols and symtab symbols. Dynsym symbols are the global symbols required to run the program and get loaded into memory when the binary runs, whereas symtab symbols appear to be debugging symbols, useful for getting GDB to print out function names, but unnecessary for runtime linking.
This distinction between dynsym and symtab symbols looks promising because BN_rand is probably a dynsym symbol and therefore loaded into memory where we could dynamically grab it with LibVMI. This looks like something we can use to make the vmi_translate_sym2v function work! Let's come back to this later (as in, another later post).
However, at this point, I made another observation: The library containing the BN_rand function is static, and so from the start of the region of the library that gets mapped into memory, the BN_rand function will always start at a fixed offset. Let's find that offset.
Great, so we've found the library Apache2 uses. This library will be loaded into memory into one of those purple locations in the Process Virtual Memory diagram above.
Now let's find the offset.
Excellent. Since shared libraries use Position-independent code, the BN_rand function will be located 0xd5a50 bytes from the start of wherever this library gets mapped into memory.
Before we proceed, let's make sure this checks out on the running program.
Great. We now know the offset from the start of libcrypto.so at which we will find the BN_rand function, bypassing the entire symbol lookup.
So, to review, so far we've identified:
Guess it's time to see if we can find where the start of libcrypto.so is in the process's virtual memory.
Just one more problem. Ya-know how all this code we're introspecting is running on the VM? Well, what if the VM kernel data structures are compiled differently than the kernel data structures on my host machine? Dang it. Now we need the offsets of those structures directly from the VM. I modified the linux-offset-finder tool that LibVMI provides to find all the offsets for those data structures.
Also, did'ja know that the default seven parameters you put in /etc/libvmi.conf when setting up LibVMI for the first time are the only acceptable parameters? No dynamic parameter support at all. So, after grabbing all the offsets from the VM, I just included them statically in the code and moved on.
So, I updated the code (using the process-list example as a guide) to find the Apache2 process (remember, we don't care which one, they're all the same) and then wrote a function, walk_vmmap_for_lib, to walk the kernel data structures and return the first memory address where libcrypto.so is loaded. (This could be extended to return the Nth memory address where it's loaded, but wasn't necessary in this case.)
Following the chain of dereferences and offsets described above, we can find the proper place to set our breakpoint, and overwrite OpenSSL's RNG'd bytes with whatever we want.
In this case, what we want is a bunch of 0x66 bytes so it's easy to detect. This could just as easily be a pseudorandom stream of bytes generated with my public key, only predictable with my private key (although asymmetric crypto may impose a noticible performance hit, more research on that later).
Now, time to see if we can detect it!
Let's open up wireshark and capture a Server Key Exchange.
Take a moment to refresh your memory of the Diffie-Hellman protocol. The server picks two public parameters, p and g. Each side of the communication generates a random private key, priv and passes the other their corresponding public key pub, computed by taking g^priv % p. Finally, the session key that encrypts the rest of the TLS session is found on both sides by taking g^(priv_A * priv_B) % p.
If a third-party can predict the private key of either side, then using parameters passed in the clear during this key exchange, they can compute the session key and do all sorts of bad things to the connection.
If our attack worked, the random number OpenSSL generated for priv should have been 256 bytes of 0x66.
Since taking g to this predicted priv power, modulo p does in fact compute pub, we have successfully subverted the OpenSSL RNG used in Apache2!