Runtime binary loading via the dynamic loader on Apple Mac OS X

| No Comments | No TrackBacks

An article written by Dan Goodin from The Register was recently published, it mentions a forthcoming presentation by Vincenzo Iozzo, which presents a method to load a binary on runtime, directly from memory, in Mac OS X systems.

Here we like to stick to the technical side of things... so let's get started on explaining how this can be done, in case you aren't planning to attend Black Hat or just feel particularly curious on the topic!

The Mach-o Dynamic Loader: Dyld and runtime binary loading

When you execute a program, the operating system processes its main binary ("the executable") and resolves its dependencies before execution begins. Modern operating systems allow programs to depend on other software dynamically. Instead of compiling all the features statically (that is, built-in in the main executable), it lets you select such dependencies dynamically. When the executable is loaded, a piece of software takes care of finding such dependencies, placing them in memory, and updating the locations where our program will find the necessary functions, et cetera. This provides an efficient way to save space and produce less bulky binaries, as well as easing updates, since a library can be upgraded while retaining backwards compatibility.

The good fellows at Apple designed an even more efficient procedure for loading common libraries, and some of them stay on memory after the system boots, providing faster loading times and better execution speed, while lowering the stress on the disk caused by repeatedly loading libraries when executing a new process. The place where such common libraries are loaded is the shared region. It's been used to produce 100% reliable local privilege escalation exploits, too.

In Mac OS X, the dynamic linker is known as dyld. Leopard implements a rudimentary form of ASLR, consistent enough to deter the most simple threats and inefficient against some other issues (heap overflows, memory leaks and so forth). The dyld happens to be loaded on a static address in every Leopard installation, independently of language, distribution (that means Server) or platform.

Given a couple thousand different Intel-based 32-bit Leopard installations, dyld will live at 0x8fe00000, for all of them.

0x8fe00000 marks the spot

Apple provides an API that let's you load binaries from memory, on runtime, without any hack whatsoever. That means an official procedure exists for this purpose. No need for fancy hacks, or complex Mach-O position independent loaders in shellcode or similar trickery.

We can observe that dyld is loaded at the same exact location for all processes. For an Intel up-to-date installation of Leopard:

(edited output)
$ python examples/dump_self.py
Dumping maps for `Python` pid=37578):
0x8fe00000-0x8fe2e000     184K [ rx/rwx] SM=01 /usr/lib/dyld
0x8fe2e000-0x8fe30000       8K [ rw/rwx] SM=01 /usr/lib/dyld
0x8fe30000-0x8fe67000     220K [ rw/rwx] SM=02
0x8fe67000-0x8fe75000      56K [  r/rwx] SM=01 /usr/lib/dyld
0x90000000-0x970e3000  115596K [ rx/ rx] SM=01
Done.

Loading binaries and bundles from memory

Apple provides the following API to perform binary and bundle loading operations:

  • _dyld_func_lookup(const char* dyld_func_name, void** address);
  • _dyld_lookup_and_bind(const char* symbol_name, void ** address, void* module);
  • NSCreateObjectFileImageFromMemory

The purpose of _dyld_func_lookup is to provide a reliable way to resolve the address to internal dyld functions (those prefixed by _dyld_). This is the first step towards being able to resolve addresses to dynamically loaded libraries, albeit dyld itself provides functions for memory allocation, string manipulation and other functionality, without requiring further dependencies. This will ease the work of developing shellcode since we only require to know where to look for _dyld_func_lookup, and since dyld lives at a static location, that's not a problem.

_dyld_lookup_and_bind is an equivalent of the dlsym function from the standard library, but there's a slight difference: it doesn't require a handle to a library instance. The module parameter can be set to NULL. It will resolve the address to the specified symbol and store it into the given pointer-to-a-pointer.

And finally NSCreateObjectFileImageFromMemory, with a self-explanatory name. Its purpose is loading a Mach-O object (required to be a bundle) from memory, stored in a Mach memory allocated buffer, and providing a ready to use NSObjectFileImage object. Other functions such as NSAddImage exist for the same purpose, but those use a path to an on-disk file, therefore aren't suitable in this scenario.

/*
 * Copyright (C) 2009 Subreption LLC. All rights reserved.
 */
#include <stdio.h>
#include <stdlib.h>

extern int _dyld_func_lookup(
   const char* dyld_func_name,
   void** address);

extern void _dyld_lookup_and_bind(
   const char* symbol_name,
   void ** address,
   void* module);

int main(int argc, char **argv)
{
#pragma unused(argc)
#pragma unused(argv)

        int err = 0;
        unsigned char *buf = NULL;
        void (*seitnap) (void *, size_t, void *) = NULL;
        void *(*xmalloc)(size_t) = NULL;
        void (*xfree)(void *) = NULL;
        void (*funcaddr) (const char *, void **, void *) = NULL;
        void *(*xmemset) (void *, size_t, int) = NULL;
        int (*xprintf) (char *fmt, ...) = NULL;
        char *(*xstrcpy) (char *, char *) = NULL;

        const char *funcstr = "__dyld_lookup_and_bind";

        err = _dyld_func_lookup(funcstr, (void *) &funcaddr);
        if (!err) {
                printf("Failed.\n");
                exit(EXIT_FAILURE);
        }

        funcaddr("_printf", (void *) &xprintf, NULL);
        funcaddr("_malloc", (void *) &xmalloc, NULL);
        funcaddr("_free", (void *) &xfree, NULL);
        funcaddr("_memset", (void *) &xmemset, NULL);
        funcaddr("_strcpy", (void *) &xstrcpy, NULL);

        xprintf("%s at %p\n", funcstr, funcaddr);
        xprintf("Resolved malloc at %p, free at %p.\n", xmalloc, xfree);
        xprintf("Resolved memset at %p, strcpy at %p.\n", xmemset, xstrcpy);

        buf = xmalloc(64);
        if (buf == NULL)
                perror("malloc");

        xmemset(buf, 0, 64);
        xstrcpy((char *) buf, "Hello from heap memory!");

        xprintf("Allocated some memory at %p! (%s)\n", buf, (char *) buf);

        xfree(buf);

        funcaddr("_NSCreateObjectFileImageFromMemory", (void *) &seitnap, NULL);
        xprintf("Resolved NSCreateObjectFileImageFromMemory at %p.\n", seitnap);

        return 0;
}

/*
$ ./a.out
__dyld_lookup_and_bind at 0x8fe0a0e0
Resolved malloc at 0x96aebf75, free at 0x96af1263.
Resolved memset at 0x96aeb318, strcpy at 0x96b15790.
Allocated some memory at 0x100160! (Hello from heap memory!)
Resolved NSCreateObjectFileImageFromMemory at 0x96bf3ed4.
*/

Please note the location of the _dyld_lookup_and_bind function, which is resolved via _dyld_func_lookup. Resolving the addresses to the malloc and free functions is superfluous, this is just an example. Dyld provides its own wrappers to malloc, free and a handful other functions which can be resolved from _dyld_func_lookup directly. Using _dyld_lookup_and_bind we can resolve virtually any function as long as it is loaded within the current process address space, and then proceed with the bundle loading API. This is a straightforward procedure and extremely reliable for shellcode. No magic required, just stick to the unlimited possibilities offered by Apple's API and that of every loaded dynamic/shared library. You have openssl and many other libraries which could help to create smaller symmetrically-encrypted shellcode stages, or complex network communication functionality. The only boundary is your creativity and technical skillset.

Leveraging dyld to load and execute your own binary

In order to load a standalone binary and execute it from memory, your shellcode loader should use dyld facilities to resolve its dependencies before executing its real code. The Mach-O ABI is particularly attractive because it relies on offsets instead of full addressing, therefore relocation for the binary text is not a big deal. This will be trickier than loading a dynamic library or bundle, but completely doable. In addition, forking the process and replacing its image on memory is done in the same fashion as the execve function operates. Also, inheriting file descriptors and other implementation issues could be ignored altogether to keep the loader footprint minimal. Certain Mach API might be of help for this purpose.

Either way, developing a standalone binary loader seems overkill when taking into consideration the solid official API available for loading code dynamically. Using constructors/destructors you can initialize whatever is necessary in your payload and hook or redirect API for triggering your functionality. You won't have to worry about detecting process termination, since that will be handled by your library destructor. Far more simple, reliable and flexible.

Why runtime binary loading might not deter forensics on Mac OS X

In order to deter forensics when loading binaries off memory, you must be able to control or influence how the memory will be used afterwards. In addition, you must be aware of any existent caching mechanisms and sleep/hibernation issues.

Mac OS X stores a full memory image on disk when it goes into sleep mode. This happens by default in all recent (including first generation Macbooks) Apple laptops when certain things happen (including closing the lid, inactivity, screensaver activation, etc). Once your code is saved into memory (along the key for the AES-128/256 encrypted swap image, which means "secure VM" won't do any good during forensics) it's already a game over.

If your code somehow manages to stay on memory afterwards (either because you forgot to wipe it or it got loaded into the shared region), it's also game over.

Therefore, the impact of this technique against forensics could be negligible, depending on the setup and specifics of the case. Claiming this deters forensics right away, or that it will make attacks much more stealthy, is exaggerating things quite a bit. It makes an attack more low profile, but it could use a lot of improvement.

Hope you enjoyed reading, and thanks for your time! Thanks to Dan Goodin and Jared DeMott for proofreading and comments before publication.

No TrackBacks

TrackBack URL: http://www.subreption.com/mt/mt-tb.fcgi/100

Leave a comment

About this Entry

This page contains a single entry by Subreption LLC published on February 6, 2009 3:57 PM.

Minor security fixes for Pyblosxom was the previous entry in this blog.

KERNHEAP for the Linux kernel 2.6 released is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.