An article written by Dan Goodin from The Register was recently published, it
mentions a forthcoming presentation by Vincenzo Iozzo, which presents a method
to load a binary on runtime, directly from memory, in Mac OS X systems.
Here we like to stick to the technical side of things... so let's get started
on explaining how this can be done, in case you aren't planning to attend Black Hat or
just feel particularly curious on the topic!
The Mach-o Dynamic Loader: Dyld and runtime binary loading
When you execute a program, the operating system processes its main binary
("the executable") and resolves its dependencies before execution begins. Modern
operating systems allow programs to depend on other software dynamically. Instead
of compiling all the features statically (that is, built-in in the main executable),
it lets you select such dependencies dynamically. When the executable is loaded,
a piece of software takes care of finding such dependencies, placing them in
memory, and updating the locations where our program will find the necessary
functions, et cetera. This provides an efficient way to save space and produce
less bulky binaries, as well as easing updates, since a library can be upgraded
while retaining backwards compatibility.
The good fellows at Apple designed an even more efficient procedure for loading
common libraries, and some of them stay on memory after the system boots, providing
faster loading times and better execution speed, while lowering the stress on the disk
caused by repeatedly loading libraries when executing a new process. The place
where such common libraries are loaded is the shared region. It's been used to
produce 100% reliable local privilege escalation exploits, too.
In Mac OS X, the dynamic linker
is known as dyld. Leopard implements
a rudimentary form of ASLR, consistent enough to deter the most simple threats and
inefficient against some other issues (heap overflows, memory leaks and so forth).
The dyld happens to be loaded on a static address in every Leopard installation,
independently of language, distribution (that means Server) or platform.
Given a couple thousand different Intel-based 32-bit Leopard installations, dyld
will live at 0x8fe00000, for all of them.
0x8fe00000 marks the spot
Apple provides an API
that let's you load binaries from memory, on runtime, without any hack whatsoever.
That means an official procedure
exists for this purpose. No need for fancy hacks, or complex Mach-O position
independent loaders in shellcode or similar trickery.
We can observe that dyld is loaded at the same exact location for all processes.
For an Intel up-to-date installation of Leopard:
(edited output) $ python examples/dump_self.py Dumping maps for `Python` pid=37578): 0x8fe00000-0x8fe2e000 184K [ rx/rwx] SM=01 /usr/lib/dyld 0x8fe2e000-0x8fe30000 8K [ rw/rwx] SM=01 /usr/lib/dyld 0x8fe30000-0x8fe67000 220K [ rw/rwx] SM=02 0x8fe67000-0x8fe75000 56K [ r/rwx] SM=01 /usr/lib/dyld 0x90000000-0x970e3000 115596K [ rx/ rx] SM=01 Done.
Loading binaries and bundles from memory
Apple provides the following API to perform binary and bundle loading operations:
- _dyld_func_lookup(const char* dyld_func_name, void** address);
- _dyld_lookup_and_bind(const char* symbol_name, void ** address, void* module);
- NSCreateObjectFileImageFromMemory
The purpose of _dyld_func_lookup is to provide a reliable way to
resolve the address to internal dyld functions (those prefixed by _dyld_). This
is the first step towards being able to resolve addresses to dynamically loaded
libraries, albeit dyld itself provides functions for memory allocation, string
manipulation and other functionality, without requiring further dependencies. This
will ease the work of developing shellcode since we only require to know where to
look for _dyld_func_lookup, and since dyld lives at a static location,
that's not a problem.
_dyld_lookup_and_bind is an equivalent of the dlsym
function from the standard library, but there's a slight difference: it doesn't
require a handle to a library instance. The module parameter can be set to NULL.
It will resolve the address to the specified symbol and store it into the given
pointer-to-a-pointer.
And finally NSCreateObjectFileImageFromMemory, with a self-explanatory
name. Its purpose is loading a Mach-O object (required to be a bundle) from
memory, stored in a Mach memory allocated buffer, and providing a ready to use
NSObjectFileImage object. Other functions such as NSAddImage exist
for the same purpose, but those use a path to an on-disk file, therefore aren't
suitable in this scenario.
/*
* Copyright (C) 2009 Subreption LLC. All rights reserved.
*/
#include <stdio.h>
#include <stdlib.h>
extern int _dyld_func_lookup(
const char* dyld_func_name,
void** address);
extern void _dyld_lookup_and_bind(
const char* symbol_name,
void ** address,
void* module);
int main(int argc, char **argv)
{
#pragma unused(argc)
#pragma unused(argv)
int err = 0;
unsigned char *buf = NULL;
void (*seitnap) (void *, size_t, void *) = NULL;
void *(*xmalloc)(size_t) = NULL;
void (*xfree)(void *) = NULL;
void (*funcaddr) (const char *, void **, void *) = NULL;
void *(*xmemset) (void *, size_t, int) = NULL;
int (*xprintf) (char *fmt, ...) = NULL;
char *(*xstrcpy) (char *, char *) = NULL;
const char *funcstr = "__dyld_lookup_and_bind";
err = _dyld_func_lookup(funcstr, (void *) &funcaddr);
if (!err) {
printf("Failed.\n");
exit(EXIT_FAILURE);
}
funcaddr("_printf", (void *) &xprintf, NULL);
funcaddr("_malloc", (void *) &xmalloc, NULL);
funcaddr("_free", (void *) &xfree, NULL);
funcaddr("_memset", (void *) &xmemset, NULL);
funcaddr("_strcpy", (void *) &xstrcpy, NULL);
xprintf("%s at %p\n", funcstr, funcaddr);
xprintf("Resolved malloc at %p, free at %p.\n", xmalloc, xfree);
xprintf("Resolved memset at %p, strcpy at %p.\n", xmemset, xstrcpy);
buf = xmalloc(64);
if (buf == NULL)
perror("malloc");
xmemset(buf, 0, 64);
xstrcpy((char *) buf, "Hello from heap memory!");
xprintf("Allocated some memory at %p! (%s)\n", buf, (char *) buf);
xfree(buf);
funcaddr("_NSCreateObjectFileImageFromMemory", (void *) &seitnap, NULL);
xprintf("Resolved NSCreateObjectFileImageFromMemory at %p.\n", seitnap);
return 0;
}
/*
$ ./a.out
__dyld_lookup_and_bind at 0x8fe0a0e0
Resolved malloc at 0x96aebf75, free at 0x96af1263.
Resolved memset at 0x96aeb318, strcpy at 0x96b15790.
Allocated some memory at 0x100160! (Hello from heap memory!)
Resolved NSCreateObjectFileImageFromMemory at 0x96bf3ed4.
*/
Please note the location of the _dyld_lookup_and_bind function, which
is resolved via _dyld_func_lookup. Resolving the addresses to the malloc
and free functions is superfluous, this is just an example. Dyld provides its own
wrappers to malloc, free and a handful other functions which can be resolved
from _dyld_func_lookup directly. Using _dyld_lookup_and_bind
we can resolve virtually any function as long as it is loaded within the current
process address space, and then proceed with the bundle loading API. This is a
straightforward procedure and extremely reliable for shellcode. No magic required,
just stick to the unlimited possibilities offered by Apple's API and that of
every loaded dynamic/shared library. You have openssl and many other libraries
which could help to create smaller symmetrically-encrypted shellcode stages, or
complex network communication functionality. The only boundary is your creativity
and technical skillset.
Leveraging dyld to load and execute your own binary
In order to load a standalone binary and execute it from memory, your shellcode
loader should use dyld facilities to resolve its dependencies before executing its
real code. The Mach-O ABI is particularly attractive because it relies on offsets
instead of full addressing, therefore relocation for the binary text is not a big
deal. This will be trickier than loading a dynamic library or bundle, but
completely doable. In addition, forking the process and replacing its image on memory
is done in the same fashion as the execve function operates. Also, inheriting
file descriptors and other implementation issues could be ignored altogether to
keep the loader footprint minimal. Certain Mach API might be of help for this purpose.
Either way, developing a standalone binary loader seems overkill when taking into
consideration the solid official API available for loading code dynamically. Using
constructors/destructors you can initialize whatever is necessary in your payload
and hook or redirect API for triggering your functionality. You won't have to
worry about detecting process termination, since that will be handled by your
library destructor. Far more simple, reliable and flexible.
Why runtime binary loading might not deter forensics on Mac OS X
In order to deter forensics when loading binaries off memory, you must be able to
control or influence how the memory will be used afterwards. In addition, you
must be aware of any existent caching mechanisms and sleep/hibernation issues.
Mac OS X stores a full memory image on disk when it goes into sleep mode. This happens
by default in all recent (including first generation Macbooks) Apple laptops
when certain things happen (including closing the lid, inactivity, screensaver
activation, etc). Once your code is saved into memory (along the key for the
AES-128/256 encrypted swap image, which means "secure VM" won't do any good during forensics)
it's already a game over.
If your code somehow manages to stay on memory afterwards (either because you
forgot to wipe it or it got loaded into the shared region), it's also game over.
Therefore, the impact of this technique against forensics could be negligible,
depending on the setup and specifics of the case. Claiming this deters forensics
right away, or that it will make attacks much more stealthy, is exaggerating
things quite a bit. It makes an attack more low profile, but it could use
a lot of improvement.
Hope you enjoyed reading, and thanks for your time! Thanks to Dan Goodin and Jared DeMott for proofreading and comments before publication.
Leave a comment