Monday, June 01, 2009

Lambda Functions: Yep, C's got them too.

Right. It's been interesting taking CSE 341 and learning about lambda functions along with all the hoo-hah that goes along with it. You might be surprised to learn that C has lambda functions too. Yes. That's right. Except that you have to write them in machine code :)

But, fundamentally, this is how lambda functions are implemented anyway in those functional programming languages. The lambda functions you write in those high-level languages are ultimately translated into similar code and (usually) executed dynamically on the stack. A lot of these high-level languages compile down to C too (or C--).

The idea of executing machine code directly on the stack should be familiar to those who know how buffer overflow exploits work. So first, we create a reasonably sized buffer on the stack (in main()) to hold the code:

char lambda[5000];

Then, we declare some functions above main():

/**
* Definition of function #1:
*/
int add( int x, int y )
{
return x + y;
}
void dummyfunc1() {}
// We need to end with a dummy function so we can easily compute the size of
// the function by calculating the differences in addresses.

/**
* Definition of function #2:
*/
int sumsquares( int x, int y )
{
// Just for fun, we make this function bigger:
int part1 = x * x;
int part2 = y * y;
return part1 + part2;
}
void dummyfunc2() {}


Right. Now comes the power of the C programming language. The reason why I padded dummy functions after those declarations is just enhance clarity in this demonstration. They strictly aren't necessary and I could have simply computed the offsets from the next function. But function order must be strictly obeyed because of the way the code subsequently gets laid out in memory.

A function is simply an array of machine instructions. So we just need to compute the size of those two functions:

unsigned long function_add_size = (long)dummyfunc1 - (long)add;
unsigned long function_sumsquares_size = (long)dummyfunc2 - (long)sumsquares;

Now all have left to do is load the function into the stack buffer, by copying memory directly:

memcpy( &lambda, (char*)add, function_add_size );

Now we can simply execute the code anytime we wish by casting the stack buffer to the appropriate function pointer type and calling it:

a = ((int (*)(int, int)) lambda)(4, 5);

We do the same with the second function, then we print the result:

memcpy( &lambda, (char*)sumsquares, function_sumsquares_size );
b = ((int (*)(int, int)) lambda)(4, 5);


Lo and behold (this is what printf prints):
a = 9
b = 41


But we can go further than that. Let's dynamically load in the machine code at runtime. First, we write a function that does subtraction in C, then we disassemble the function:

(gdb) disassemble subtract
Dump of assembler code for function subtract:
0x0804845f <subtract+0>: push %ebp
0x08048460 <subtract+1>: mov %esp,%ebp
0x08048462 <subtract+3>: mov 0xc(%ebp),%edx
0x08048465 <subtract+6>: mov 0x8(%ebp),%eax
0x08048468 <subtract+9>: sub %edx,%eax
0x0804846a <subtract+11>: pop %ebp
0x0804846b <subtract+12>: ret
End of assembler dump.


GDB tells us that the function is 13 bytes in size. So let's directly read off the machine instructions:

(gdb) x/13xb subtract
0x804845f <subtract>: 0x55 0x89 0xe5 0x8b 0x55 0x0c 0x8b 0x45
0x8048467 <subtract+8>: 0x08 0x29 0xd0 0x5d 0xc3


Some fun in Ruby turns that into the following string (representing the machine code):

irb(main):026:0> puts "0x55 0x89 0xe5 0x8b 0x55 0x0c 0x8b 0x45 0x08 0x29 0xd0 0x5d 0xc3".split.map {|code| '\\'+code[1..3]}.join
\x55\x89\xe5\x8b\x55\x0c\x8b\x45\x08\x29\xd0\x5d\xc3


Which we directly load into the buffer...

strcpy( lambda, "\x55\x89\xe5\x8b\x55\x0c\x8b\x45\x08\x29\xd0\x5d\xc3" );

And execute...

c = ((int (*)(int, int)) lambda)(10, 2);

Which returns (the result of 10 - 2):
c = 8

Another reason why C is my favorite programming language. Source code is located here for the curious.

0 comments: