Linux Heap Exploitation Intro Series: The magicians cape – 1 Byte Overflow

Reading time ~21 min

Posted by Javier Jimenez on 20 September 2017

Categories: Heap, Heap linux, Heap overflow

Intro

Hello again! It’s been a while since the last blog post. This is due to not having as much time as we wanted but hopefully you all kept the pace with this heapy things as they are easy to forget due to the heavy amount of little details the heap involves.

On this post we are going to demonstrate how a single byte overflow, with a user controlled value, can cause chunks to disappear for the implementation like a magician puts a cape on top of objects (chunks) and makes them disappear.

The Vulnerability

Preface

For the second part of our series we are going for another rather common vulnerability happening out there. From my own experience, it sometimes confusing and hard to get the sizes right for elements (be it arrays, allocations, variables in general) because of the confusion between declaring “something” and actually accessing each element (i.e. byte) of that “something”. I know, sounds weird, but look at the following code:

int array_of_integers[10]; // (1)
int i;
for (i = 1; i <= 10; i++) // (2)
{
    array_of_integers[i] = 1; // (3)
}

In this code, we wanted to have an array of ten integers (1) and then iterate over the array (2) and set every element of the array to the number 1. As you might or might not know, the first element of an array is not the number one, in C, these start with zero (array_of_integers[0]). What will happen here is that when the variable i reaches 10, the code will try to write to array_of_integers[10] which is not allocated, effectively doing out-of-bounds write, exactly 4 bytes more than expected as 4 bytes is the size of an int.

Also, keeping the pace up with ptmalloc2 implementation and to keep these blog posts as “fresh” and “new” as possible, here we have a fairly new exploit against SAPCAR (CVE-2017-8852) by CoreSecurity that takes advantage of a buffer overflow happening in the heap. It’s an easy and cool read!

What

This kind of vulnerability falls into the category of buffer overflows or out-of-bounds. Specifically in this blog post we are going to see what could happen in the case of an out-of-bounds 1 byte write.

At first glance one could think there is not many possibilities on just writing one byte but, if we remember how an allocated chunk is placed into memory we begin to “believe”:

 +-+-+-+-+-+-+-+-+-+-+-+-+ 
 | PREV_SIZE OR USER DATA| <-- Previous Chunk Data (OVERFLOW HERE)
 +-----------------+-+-+-+ <-- Chunk start
 | CHUNK SIZE      |A|M|P| 
 +-----------------+-+-+-+ 
 |        USER DATA      | 
 |                       |
 | -  -  -  -  -  -  -  -| 
 | PREV_SIZE OR USER DATA| 
 +-----------------------+ <-- End of chunk

As we can see, 1 byte overwrite will mess up the Main Arena (A), Is MMaped (M) and Previous in-use (P) bits as well as overwriting the first byte of the CHUNK SIZE.

The implications of this type of vulnerability (cliché) are endless: changing the previous in-use bit leading to strange memory corruptions that when free()‘ing, maliciously crafted memory, would lead to arbitrary overwrite (overwriting pointers to functions), shrinking or growing adjacent chunks by tampering with the next chunk’s size thus leading to chunks overlapping or memory fragmentation which would lead to use-after-invalidation bugs (memory leaks, use-after-free, etc.).

When stars collide one byte

To prepare the scenario for this kind of vulnerability, I inspired myself heavily on the Forgotten Chunks^[1] paper by Context. I took my own way in the sense that I decided to build two proof of concepts for the techniques described there as well as a challenge for you to test out!

One byte write

In this scenario we count with the following: There are three allocated chunks and the first of them A, is the vulnerable one to our one byte overflow. To simplify things; we can write into the first two chunks A and B but not into the third chunk C. We can read from all of them. Let’s see the initial state of the heap of this in our beloved chunky-ascii-art.

 +---HEAP GROWS UPWARDS
 |                 
 | +-+-+-+-+-+-+
 | |  CHUNK A  | <-- Read/Write
 | +-----------+
 | |  CHUNK B  | <-- Read/Write
 | +-----------+
 | |  CHUNK C  | <-- Read
 | +-----------+
 | |    TOP    |
 | |           |
 V |           |
   +-----------+

Our goal is to write into chunk C. To do so we would need the following to happen.

1 – Previous to any overflows and to prevent memory corruptions because we are going to overflow into chunk B, the first thing to make happen is to free() chunk B.

 +---HEAP GROWS UPWARDS
 |                 
 | +-+-+-+-+-+-+
 | |  CHUNK A  | <-- Read/Write
 | +-----------+
 | |  FREE  B  | 
 | +-----------+
 | |  CHUNK C  | <-- Read
 | +-----------+
 | |    TOP    |
 | |           |
 V |           |
   +-----------+

2 – Now that chunk B is free, we are going to trigger the one byte overflow that is present in chunk A and overflow one byte into chunk B. This will change the first byte of chunk’s B size – in our case, we will make it grow. Let’s say that our chunks are of the following sizes:
A(0x100-WORD_SIZE), B(0x100-WORD_SIZE), C(0x40-WORD_SIZE)

Let me state that WORD_SIZE is a variable that will be 8 on 64bit systems and 4 on 32bit systems. This was already explained in the Painless Intro to ptmalloc2 blog post but again: This WORD_SIZE is subtracted from the size we actually want because it is the size that the chunk’s size header will take in memory so, to keep the chunks within the size we expect (0x100 or 0x40) we subtract the size header’s size (WORD_SIZE) to prevent padding to the next WORD_SIZE*2 size. I know this is too condensed but it is a must to be able to understand this to proceed.

By now we can see where all this goes, right? That’s it, we are about to overwrite chunk’s B size to make it be as big enough to have B+C size. So, we know that B is 0x100 and C is 0x40, then let me ask you a question: With which byte would we need to overflow A so that it puts it into B‘s size to completely overlap C on a 64bit system?

a) 0x40
b) 0x41
c) 0x48

That’s right, none of them. We shouldn’t be choosing values ending in zero or an even number because all of these, in binary, translate to leaving the last bit unset. This means that the PREV_INUSE bit will be set to zero creating a memory corruption in some cases as chunk A is not really free. That leaves 0x40 and 0x48 out. If you have chosen 0x48, you might know the reason why 0x41 is not valid to completely overlap C: Because we need to add 8 bytes due to C‘s size header length placed just after B. The right answer is 0x51 as it keeps the PREV_INUSE bit set and is the next 16 byte padded size to 0x40.

Enough chit-chat, but it was necessary. Remember that B is free()‘d, so B->fd and B->bk (pointers to next’s and previous free chunks if any) are set as well as the next’s chunk (C) PREV_SIZE:

-=Data gets populated from right to left and from top to bottom=-

 +-----------------+-+-+-+ 
 |CHUNK A SIZE = \x01\x01| <-- Size 0x100 + 0x1 (PREV_INUSE bit)
 +-----------------+-+-+-+ 
 |x51\x51\x51\x51\x51\x51| 
 |x51\x51\x51\x51\x51\x51| 
 |x51\x51\x51\x51\x51\x51| 
 |x51\x51\x51\x51\x51\x51| 
 +-----------------+-+-+-+ 
 |FREE  B SIZE = \x01\x51| <-- Size Overflown 0x151
 +-----------------+-+-+-+ 
 |B->fd = main_arena->TOP| 
 |B->bk = main_arena->TOP| 
 +-----------------------+ 
 |PREV_SIZE B  = \x01\x01| <-- PREV_SIZE now doesn't match B size
 +-----------------------+
 |CHUNK C SIZE =     \x41| <-- 0x100 + 0x1
 +-----------------+-+-+-+ 
 |                       | 
 |                       | 
 |                       | 
 |                       |
 +-----------------------+
 |          TOP          | <-- B->fd and B->fk point here
 +-----------------------+ 
 |                       | 
 |                       | 
 |                       | 
 |                       |
 +-----------------------+

3 – Ok! The buffer overflow is triggered now and chunk’s B size has been changed to 0x151. What should happen now to overlap into chunk C? We would need an allocation of a size near to: old B size + C size = 0x100 + 0x40. When the following allocation happens:

B = malloc(0x100+0x40);

Chunk B will be now overlapping chunk C in its entirety as we can see in the following figure.

-=Data gets populated from right to left and from top to bottom=-
  <--- 8 bytes wide  --->
 +-----------------+-+-+-+ 
 |CHUNK A SIZE = \x01\x01| 
 +-----------------+-+-+-+ 
 |x51\x51\x51\x51\x51\x51| 
 |x51\x51\x51\x51\x51\x51| 
 |x51\x51\x51\x51\x51\x51| 
 |x51\x51\x51\x51\x51\x51| 
 +-----------------+-+-+-+ 
 |CHUNK B SIZE = \x01\x51| <-- Size Overflown 0x151
 +-----------------+-+-+-+ 
 |B->fd = main_arena->TOP| <-- Even if allocated...
 |B->bk = main_arena->TOP| <-- ...memory isn't cleared and...
 +-----------------------+ 
 |PREV_SIZE B  = \x01\x01| <-- ...values are kept in memory.
 +-----------------------+
 |CHUNK C SIZE =     \x41| 
 +-----------------+-+-+-+ 
 |                       | 
 |                       | 
 |                       | 
 |                       |
 +-----------------------+ <-- B now takes up to here (1)
 |          TOP          | 
 +-----------------------+ 
 |                       | 
 |                       | 
 |                       | 
 |                       |
 +-----------------------+

(1) The “mathemata” out there must have spotted that chunk B actually goes 8 bytes into TOP but that is not a problem for us for now.

4 – As a last step, we would just need to write into chunk B at our desired position, effectively achieving our main goal: Writing into chunk C.

-=Data gets populated from right to left and from top to bottom=-
  <--- 8 bytes wide  --->
 +-----------------+-+-+-+ 
 |CHUNK A SIZE = \x01\x01| 
 +-----------------+-+-+-+ 
 |x51\x51\x51\x51\x51\x51| 
 |x51\x51\x51\x51\x51\x51| 
 |x51\x51\x51\x51\x51\x51| 
 |x51\x51\x51\x51\x51\x51| 
 +-----------------+-+-+-+ 
 |CHUNK B SIZE = \x01\x51| 
 +-----------------+-+-+-+ 
 |x42\x42\x42\x42\x42\x42| <-- Setting all B to hex('B') = '\x42'
 |x42\x42\x42\x42\x42\x42|
 +-----------------------+ 
 |x42\x42\x42\x42\x42\x42|
 +-----------------------+
 |x42\x42\x42\x42\x42\x42| 
 +-----------------+-+-+-+ 
 |x42\x42\x42\x42\x42\x42| <-- chunk C got overlapped and written
 |x42\x42\x42\x42\x42\x42| 
 |x42\x42\x42\x42\x42\x42| 
 |x42\x42\x42\x42\x42\x42|
 +-----------------------+
 |          TOP          | 
 +-----------------------+ 
 |                       | 
 |                       | 
 |                       | 
 |                       |
 +-----------------------+

Cool. If previous to filling all of chunk B with \x42, the “magically disappeared” chunk C gets free()‘d we will be able to leak pointers to main_arena->top by reading the contents of B. Also if C is still used, we can control the data inside it by writing on high positions at B. There are many more possibilities!

One null byte write

This scenario is initially the same as the previous one except that the sizes are A(0x100), B(0x250) and C(0x100); also, we can only overflow with a null byte \x00.

As the first steps are the same as the one byte write, we are going straight into the null byte overflow. Note that this is more likely to happen because in C, all strings must be terminated with the “null byte”.

Our goal here is to make the implementation forget about an allocated chunk by overlapping it with free space.

1 – We overflow in the same way as before but what is going to happen is a pretty much different thing than before. We are shrinking the size of chunk B from 0x250 to 0x200.

-=Data gets populated from right to left and from top to bottom=-

 +-----------------+-+-+-+ 
 |CHUNK A SIZE = \x01\x01| <-- Size 0x100 + 0x1 (PREV_INUSE bit)
 +-----------------+-+-+-+ 
 | EREH OG NAC GNIRTS YNA| <-- ANY STRING CAN GO HERE
 |     EB LLIW TI ESUACEB| <-- BECAUSE IT WILL BE
 |        DETANIMRET LLUN| <-- NULL TERMINATED
 |AAAAAAAAAAAAAAAAAAAAAAA| 
 +-----------------+-+-+-+ 
 |FREE  B SIZE = \x02\x00| <-- Size byte overflown 0x250 --> 0x200
 +-----------------+-+-+-+ 
 |B->fd = main_arena->TOP| 
 |B->bk = main_arena->TOP|  
 |                       | 
 |                       | <-- Free B ends here now (1)
 |                       |  
 |                       | 
 +-----------------------+ <-- Free space still ends here (2)
 |PREV_SIZE B  = \x02\x50| <-- C PREV_SIZE doesn't match B size
 +-----------------------+
 |CHUNK C SIZE = \x01\x00| <-- PREV_INUSE is zero now.
 +-----------------+-+-+-+ 
 |                       | 
 |                       |
 +-----------------------+
 |          TOP          |
 +-----------------------+ 
 |                       | 
 |                       | 
 |                       | 
 |                       |
 +-----------------------+

Chunk B has shrunk (1) and this will have its consequences. The first and easiest to see is that chunk’s C PREV_SIZE (remember that this value is used to merge free chunks) is different from the actual chunk’s B size. The second consequence is that for the implementation, the free space between chunk A and chunk C (2) is somewhat “divided” from the implementation’s perspective (it is corrupted already).

This consequences together have a third and final consequence. If we allocate two new chunks (J and K) of smaller size than 0x200/2 (0x100)…

2 – … the implementation is going to properly allocate these in the space of chunk B because there is space for two chunks of size, let’s say, 0x80.

-=Data gets populated from right to left and from top to bottom=-

 +-----------------+-+-+-+ 
 |CHUNK A SIZE = \x01\x01| <-- Size 0x100 + 0x1 (PREV_INUSE bit)
 +-----------------+-+-+-+ 
 | EREH OG NAC GNIRTS YNA| 
 |     EB LLIW TI ESUACEB| 
 |        DETANIMRET LLUN| 
 |AAAAAAAAAAAAAAAAAAAAAAA| 
 +-----------------+-+-+-+ 
 |CHUNK J SIZE = \x00\x81| 
 +-----------------+-+-+-+ 
 |                       | 
 |                       |   
 +-----------------------+
 |CHUNK K SIZE = \x00\x81|
 +-----------------------+
 |                       |  
 |                       | 
 +-----------------------+ 
 |PREV_SIZE B  = \x02\x50| <-- C PREV_SIZE doesn't match B size
 +-----------------------+
 |CHUNK C SIZE = \x01\x00|
 +-----------------+-+-+-+ 
 |                       | 
 |                       |
 +-----------------------+
 |          TOP          |
 +-----------------------+ 
 |                       | 
 |                       | 
 |                       | 
 |                       |
 +-----------------------+

Not much to explain about two simple allocations. This leads us onto the final part. Can you see which chunk is about to disappear to the implementation (Forgotten Chunk)?

3 – Finally if we free() chunk J and chunk C in that order the unlink macro will kick in and merge both chunks into one free space. So, which chunk is between the cape formed by chunk’s J and C? That’s right, the newly allocated chunk K.

First chunk J is free()‘d – note that K‘s PREV_INUSE is zero now due to the previous chunk J being free.

-=Data gets populated from right to left and from top to bottom=-

 +-----------------+-+-+-+ 
 |CHUNK A SIZE = \x01\x01| <-- Size 0x100 + 0x1 (PREV_INUSE bit)
 +-----------------+-+-+-+ 
 | EREH OG NAC GNIRTS YNA| 
 |     EB LLIW TI ESUACEB| 
 |        DETANIMRET LLUN| 
 |AAAAAAAAAAAAAAAAAAAAAAA| 
 +-----------------+-+-+-+ 
 |FREE  J SIZE = \x00\x81| (3)
 +-----------------+-+-+-+ 
 |J->fd = main_arena->TOP| 
 |J->bk = main_arena->TOP|   
 +-----------------------+
 |CHUNK K SIZE = \x00\x80|
 +-----------------------+
 |                       |  
 |                       | 
 +-----------------------+ 
 |PREV_SIZE B  = \x02\x50| (2) 
 +-----------------------+
 |CHUNK C SIZE = \x01\x00| (1)
 +-----------------+-+-+-+ 
 |                       | 
 |                       |
 +-----------------------+
 |          TOP          |
 +-----------------------+ 
 |                       | 
 |                       | 
 |                       | 
 |                       |
 +-----------------------+

4 – Then, we are free()‘ing C. Now pay close attention to the following:
– Chunk C PREV_INUSE bit is unset (1)
– Chunk C PREV_SIZE is still the old B size 0x250 (2)
– Chunk J is also free and at the position of old B (0x250 bytes back from chunk C) (3)

They are explained in reverse order because this is exactly how the unlink macro is going over all of them. Check’s that the current free()‘d chunk has its PREV_INUSE bit unset (1), then it proceeds to merge the other free()‘d chunk which is at C – PREV_SIZE = J. Now J and C are merged into one big free space at J. Poor K was in the middle of this merge situation and is now treated as free space – ptmalloc2 forgot about him :(

-=Data gets populated from right to left and from top to bottom=-

 +-----------------+-+-+-+ 
 |CHUNK A SIZE = \x01\x01| <-- Size 0x100 + 0x1 (PREV_INUSE bit)
 +-----------------+-+-+-+ 
 | EREH OG NAC GNIRTS YNA| 
 |     EB LLIW TI ESUACEB| 
 |        DETANIMRET LLUN| 
 |AAAAAAAAAAAAAAAAAAAAAAA| 
 +-----------------+-+-+-+ 
 |FREE  J SIZE = \x03\x51| <-- Free size starting here
 +-----------------+-+-+-+ 
 |J->fd = main_arena->TOP| 
 |J->bk = main_arena->TOP|   
 +  -  -  -  -  -  -  -  +
 | CHUNK K FORGOTTEN!    |
 +  -  -  -  -  -  -  -  +
 |                       |  
 |                       | 
 +  -  -  -  -  -  -  -  + 
 |                       |  
 |                       |
 |                       | 
 |                       | 
 |                       | 
 |                       |
 +-----------------------+ <-- Free size ends here 
 |          TOP          |
 +-----------------------+ 
 |                       | 
 |                       | 
 |                       | 
 |                       |
 +-----------------------+

Now chunk K is in the new free()‘d area (in green) which again has nearly the same implications as the previous scenario: use-after-invalidation, leaks, arbitrary writes, etc.

The playground

Let’s go have some fun! As in the previous series I am trying to create a tradition by providing proof of concepts and a challenge! You can get the playground’s code here.

Download Playground Code

In order to not make this blog post longer than expected, the proof of concepts provided are directly related with the When stars align section. Worry not for I have heavily commented each PoC. But wait! Don’t think I would be that lazy! Here, have some videos and a little explanation.

One byte write

This PoC is based around on writing to chunk C. An X is written to a point in B after the vulnerability is triggered. This point in chunk B is exactly the previous to last byte in the overlapped C.

One null byte write

This one corresponds to the second scenario with the difference that instead of creating J and K chunk variables, I have reused the B variable and just added an X variable to hold the fourth chunk.

This chunk X is the one to be overlapped and overwritten as we can see in the following video.

Now you

This second challenge is on 178.62.74.135 on ports 10000, 10001. Your goal is to retrieve the flag by overlapping a chunk that you cannot read directly from – the vulnerable chunk is A!

nc 178.62.74.135 10000 -v
nc 178.62.74.135 10001 -v

The flag is in the form: SP{contents_of_the_flag}.
Note that you must send us your exploit source code to be eligible to win!

The winner will win a SensePost shirt and will be chosen regarding the following criteria:

Originality – Programming Language, code length, formatting, comments, etc.
Accurate – Did you do the maths or bruteforced it?

Hints

– There are 3 allocated chunks already: A(0x100), B(0x100) & C(0x80)
– Values with the last 3 bits set can cause weird behaviour
– There are hidden easter eggs for those trying to break things further
– You can calculate or either brute force it. Your call!
– No, you don’t need to solve the 2016 challenge

Our Blog

Linux Heap Exploitation Intro Series: The magicians cape – 1 Byte Overflow

Intro

The Vulnerability

Preface

What

When stars collide one byte

One byte write

One null byte write

The playground

One byte write

One null byte write

Now you

Hints

References