While doing some thinking on threat modelling I started examining what the usual drivers of security spend and controls are in an organisation. I've spent some time on multiple fronts, security management (been audited, had CIOs push for priorities), security auditing (followed workpapers and audit plans), pentesting (broke in however we could) and security consulting (tried to help people fix stuff) and even dabbled with trying to sell some security hardware. This has given me some insight (or at least an opinion) into how people have tried to justify security budgets, changes, and findings or how I tried to. This is a write up of what I believe these to be (caveat: this is my opinion). This is certainly not universalisable, i.e. it's possible to find unbiased highly experienced people, but they will still have to fight the tendencies their position puts on them. What I'd want you to take away from this is that we need to move away from using these drivers in isolation, and towards more holistic risk management techniques, of which I feel threat modelling is one (although this entry isn't about threat modelling).
Auditors
The tick box monkeys themselves, they provide a useful function, and are so universally legislated and embedded in best practise, that everyone has a few decades of experience being on the giving or receiving end of a financial audit. The priorities audit reports seem to drive are:
But security vendors prioritisation of controls are driven by:
Every year around Black Hat Vegas/Pwn2Own/AddYourConfHere time a flurry of media reports hit the public and some people go into panic mode. I remember The DNS bug, where all that was needed was for people to apply a patch, but which, due to the publicity around it, garnered a significant amount of interest from people who it usually wouldn't, and probably shouldn't have cared so much. But many pentesters trade on this publicity; and some pentesting companies use this instead of a marketing budget. That's not their only, or primary, motivation, and in the end things get fixed, new techniques shared and the world a better place. The cynical view then is that some of the motivations for vulnerability researchers, and what they end up prioritising are:
Unfortunately, as human beings, our decisions are coloured by a bunch of things, which cause us to make decisions either influenced or defined by factors other than the reality we are faced with. A couple of those lead us to prioritising different security motives if decision making rests solely with one person:
The result of all of this is that different companies and people push vastly different agendas. To figure out a strategic approach to security in your organisation, you need some objective risk based measurement that will help you secure stuff in an order that mirrors the actual risk to your environment. While it's still a black art, I believe that Threat Modelling helps a lot here, a sufficiently comprehensive methodology that takes into account all of your infrastructure (or at least admits the existence of risk contributed by systems outside of a “most critical” list) and includes valid perspectives from above tries to provide an objective version of reality that isn't as vulnerable to the single biases described above.
[I originally wrote this blog entry on the plane returning from BlackHat, Defcon & Metricon, but forgot to publish it. I think the content is still interesting, so, sorry for the late entry :)]
I've just returned after a 31hr transit from our annual US trip. Vegas, training, Blackhat & Defcon were great, it was good to see friends we only get to see a few times a year, and make new ones. But on the same trip, the event I most enjoyed was Metricon. It's a workshop held at the Usenix security conference in San Francisco, run by a group of volunteers from the security metrics mailing list and originally sparked by Andrew Jacquith's seminal book Security Metrics.
There were some great talks, and interactions, the kind you only get at small groupings around a specific set of topics. It was a nice break from the offensive sec of BH & DC to listen to a group of defenders. The talks I most enjoyed (they were all recorded bar a few private talks) were the following:
Wendy Nather — Quantifying the Unquantifiable, When Risk Gets Messy
Wendy looked at the bad metrics we often see, and provided some solid tactical advice on how to phrase (for input) and represent (for output) metrics. As part of that arc, she threw out more pithy phrases that even the people in the room tweeting could keep up with. From introducing a new phrase for measuring attacker skill, "Mitnicks", to practical experience such as how a performance metric phrase as 0-100 had sysadmins aiming for 80-90, but inverting it had them aiming for 0 (her hypothesis, is that school taught us that 100% was rarely achievable). Frankly, I could write a blog entry on her talk alone.
Josh Corman - "Shall we play a game?" and other questions from Joshua
Josh tried to answer the hard question of "why isn't security winning". He avoided the usual complaints and had some solid analysis that got me thinking. In particular the idea of how PCI is the "No Child Left Behind" act for security, which not only targeted those that had been negligent, but also encouraged those who hadn't to drop their standards. "We've huddled around the digital dozen, and haven't moved on." He went on to talk about how controls decay as attacks improve, but our best practice advice doesn't. "There's a half-life to our advice". He then provided a great setup for my talk "What we are doing, is very different from how people were exploited."
Jake Kouns - Cyber Liability Insurance
Jake has taken security to what we already knew it was, an insurance sale ;) Jokes aside, Jake is now a product manager for cyber-liability insurance at Merkel. He provided some solid justifications for such insurance, and opened my eyes to the fact that it is now here. The current pricing is pretty reasonable (e.g. $1500 for $1million in cover). Most of the thinking appeared to target small to medium organisations, that until now have only really had "use AV & pray" as their infosec strategy, and I'd love to hear some case-studies from large orgs that are using it & have claimed. He also spoke about how it could become a "moral hazard" where people choose to insure rather than implement controls, and the difficulties the industry could face, but that right now work as incentives for us (the cost of auditing a business will be more than the insurance is worth). His conclusion, which seemed solid, is why spend $x million on the "next big sec product" when you could spend less & get more on insurance. Lots of questions left, but it looks like it may be time to start investigating.
Allison Miller - Applied Risk Analytics
I really enjoyed Allison and Itai's talk. They looked at practical methodologies for developing risk metrics and coloured them with great examples. The process they presented was the following:
Conclusion
I found the conference refreshing, with a lot of great advice (more than the little listed above). Too often we get stuck in the hamster wheels of pain, and it's nice to think we may be able to slowly take a step off. Hopefully we'll be back next year.
Over the last few years there has been a popular meme talking about information centric security as a new paradigm over vulnerability centric security. I've long struggled with the idea of information-centricity being successful, and in replying to a post by Rob Bainbridge, quickly jotted some of those problems down.
In pre-summary, I'm still sceptical of information-classification approaches (or information-led control implementations) as I feel they target a theoretically sensible idea, but not a practically sensible one.
Information gets stored in information containers (to borrow a phrase from Octave) such as the databases or file servers. This will need to inherit a classification based on the information it stores. That's easy if it's a single purpose DB, but what about a SQL cluster (used to reduce processor licenses) or even end-user machines? These should be moved up the classification chain because they may store some sensitive info, even if they spend the majority of the time pushing not-very-sensitive info around. In the end, the hoped-for cost-saving-and-focus-inducing prioritisation doesn't occur and you end up having to deploy a significantly higher level of security to most systems. Potentially, you could radically re-engineer your business to segregate data into separate networks such as some PCI de-scoping approaches suggest, but, apart from being a difficult job, this tends to counter many of the business benefits of data and system integrations that lead to the cross-pollination in the first place.
Next up, I feel this fails to take cognisance of what we call "pivoting"; the escalation of privileges by moving from one system or part of a system to another. I've seen situations when the low criticality network monitoring box is what ends up handing out the domain administrator password. It had never been part of internal/external audits scope, none of the vulns showed up on your average scanner, it had no sensitive info etc. Rather, I think we need to look at physical, network and trust segregation between systems, and then data. It would be nice to go data-first, but DRM isn't mature (read simple & widespread) enough to provide us with those controls.
Lastly, I feel information-led approaches often end up missing the value of raw functionality. For example, a critical trade execution system at an investment bank could have very little sensitive data stored on it, but the functionality it provides (i.e. being able to execute trades using that bank's secret sauce) is hugely sensitive and needs to be considered in any prioritisation.
I'm not saying I have the answers, but we've spent a lot of time thinking about how to model how our analysts attack systems and whether we could "guess" the results of multiple pentests across the organisation systematically, based on the inherent design of your network, systems and authentication. The idea is to use that model to drive prioritisation, or at least a testing plan. This is probably closer aligned to the idea of a threat-centric approach to security, and suffers from a lack of data in this area (I've started some preliminary work on incorporating VERIS metrics).
In summary, I think information-centric security fails in three ways; by providing limited prioritiation due to the high number of shared information containers in IT environments, by not incorporating how attackers move through a networks and by ignoring business critical functionality.
A longish post, but this wasn't going to fit into 140 characters. This is an argument pertaining to security metrics, with a statement that using pure vulnerability count-based metrics to talk about an organisation's application (in)security is insufficient, and suggests an alternative approach. Comments welcome.
Apart from the two bookends (SOSS and DBIR), other metrics are also published.
From a testing perspective, WhiteHat releases perhaps the most well-known set of metrics for appsec bugs, and in years gone by, Corsaire released statistics covering their customers. Also in 2008, WASC undertook a project to provide metrics with data sourced from a number of companies, however this too has not seen recent activity (last edit on the site was over a year ago). WhiteHat's metrics measure the number of serious vulnerabilities in each site (High, Critical, Urgent) and then slice and dice this based on the vulnerability's classification, the organisation's size, and the vertical within which they lie. WhiteHat is also in the fairly unique position of being able to record remediation times with a higher granularity than appsec firms that engage with customers through projects rather than service contracts. Corsaire's approach was slightly different; they recorded metrics in terms of the classification of the vulnerability, its impact and the year within which the issue was found. Their report contained similar metrics to the WhiteHat report (e.g. % of apps with XSS), but the inclusion of data from multiple years permitted them to extract trends from their data. (No doubt WhiteHat have trending data, however in the last report it was absent). Lastly, WASC's approach is very similar to WhiteHat's, in that a point in time is selected and vulnerability counts according to impact and classification are provided for that point.
Essentially, each of these approaches uses a base metric of vulnerability tallies, which are then viewed from different angles (classification, time-series, impact). While the metrics are collected per-application, they are easily aggregated into organisations.
In the extreme edges of ideal metrics, the ability to factor in chains of vulnerabilities that individually present little risk, but combined is greater than the sum of the parts, would be fantastic. This aspect is ignored by most (including us), as a fruitful path isn't clear.
One could just as easily claim that absolute bug counts are irrelevant and that they need to be relative to some other scale; commonly the number of applications an organisation has. However in this case, if the metrics don't provide enough granularity to accurately position your organisation with respect to others that you actually care about, then they're worthless to you in decision making. What drives many of our customers is not where they stand in relation to every other organisation, but specifically their peers and competitors. It's slightly ironic that oftentimes the more metrics released, the less applicable they are to individual companies. As a bank, knowing you're in the top 10% of a sample of banking organisations means something; when you're in the highest 10% of a survey that includes WebGoat clones, the results are much less clear.
In Seven Myths About Information Security Metrics, Dr Hinson raises a number of interesting points about security metrics. They're mostly applicable to security awareness, however they also carry across into other security activities. At least two serve my selfish needs, so I'll quote them here:
Myth 1: Metrics must be “objective” and “tangible”
There is a subtle but important distinction between measuring subjective factors and measuring subjectively. It is relatively easy to measure “tangible” or objective things (the number of virus incidents, or the number of people trained). This normally gives a huge bias towards such metrics in most measurement systems, and a bias against measuring intangible things (such as level of security awareness). In fact, “intangible” or subjective things can be measured objectively, but we need to be reasonably smart about it (e.g., by using interviews,surveys and audits). Given the intangible nature of security awareness, it is definitely worth putting effort into the measurement of subjective factors, rather than relying entirely on easy-to-measure but largely irrelevant objective factors. [G Hinson]
and
Myth 3: We need absolute measurements
For some unfathomable reason, people often assume we need “absolute measures”—height in meters, weight in pounds, etc. This is nonsense!
If I line up the people in your department against a wall, I can easily tell who is tallest, with no rulers in sight. This yet again leads to an unnecessary bias in many measurement systems. In fact, relative values are often more useful than absolute scales, especially to drive improvement. Consider this for instance: “Tell me, on an (arbitrary) scale from one to ten, how security aware are the people in your department are? OK, I'll be back next month to ask you the same question!” We need not define the scale formally, as long as the person being asked (a) has his own mental model of the processes and (b) appreciates the need to improve them. We needn't even worry about minor variations in the scoring scale from month to month, as long as our objective of promoting improvement is met. Benchmarking and best practice transfer are good examples of this kind of thinking. “I don't expect us to be perfect, but I'd like us to be at least as good as standard X or company Y. [G Hinson]
While he writes from the view of an organisation trying to decide whether their security awareness program is yielding dividends, the core statements are applicable for organisations seeking to determine the efficacy of their software security program. I'm particularly drawn by two points: the first is that intangibles are as useful as concrete metrics, and the second is that absolute measurements aren't necessary, comparative ordering is sometimes enough.
Measuring effort, or attacker cost, is not new to security but it's mostly done indirectly through the sale of exploits (e.g. iDefence, ZDI). Even here, effort is not directly related to the purchase price, which is also influenced by other factors such as the number of deployed targets etc. In any case, for custom applications that testers are mostly presented with, such public sources should be of little help (if your testers are submitting findings to ZDI, you have bigger problems). Every now and then, an exploit dev team will mention how long it took them to write an exploit for some weird Windows bug; these are always interesting data points, but are not specific enough for customers and the sample size is low.
Ideally, any measure of an attacker's cost can take into account both time and their exclusivity (or experience), however in practice this will be tough to gather from your testers. One could base it on their hourly rate, if your testing company differentiates between resources. In cases where they don't, or you're seeking to keep the metric simple, then another estimate for effort is the number of days spent on testing.
Returning to our sample companies, if the 5 vulnerabilities exposed in the Visigoth's each required, on average, a single day to find, while the Ostrogoth's 20 bugs average 5 days each, then the effort required by an attacker is minimised by choosing to target the Visigoths. In other words, one might argue that the Visigoths are more at risk than the Ostrogoths.
With this base metric, it's then possible to capture historical assessment data and provide both internal-looking metrics for an organisation as well as comparative metrics, if the testing company is also employed by your competitors. Internal metrics are the usual kinds (impact, classification, time-series), but the comparison option is very interesting. We're in the fortunate position of working with many top companies locally, and are able to compare competitors using this metric as a base. The actual ranking formulae is largely unimportant here. Naturally, data must be anonymised so as to protect names; one could provide the customer with their rank only. In this way, the customer has an independent notion of how their security activities rate against their peers without embarrassing the peers.
Inverting the findings-per-day metric provide the average number of days to find a particular class of vulnerability, or impact level. That is, if a client averages 0.7 High or Critical findings per testing day, then on average it takes us 1.4 days of testing to find an issue of great concern, which is an easy way of expressing the base metric.
As mentioned above, a minimum number of assessments would be needed before the metric is reliable; this is a hint at the deeper problems that randomly selected project days are not independent. An analyst stuck on a 4 week project is focused on a very small part of the broader organisation's application landscape. We counter this bias by including as many projects of the same type as possible.
This metric would also be very useful to include in each subsequent report for the customer, with every report containing an evaluation against their longterm vulnerability averages.
As mentioned above, a key test for metrics is where they support decision making, and the feedback from the client was positive in this regard.
This idea is still being fleshed out. If you're aware of previous work in this regard or have suggestions on how to improve it (even abandon it) please get in contact.
Oh, and if you've read this far and are looking for training, we're at BH in August.
[Update: Disclosure and other points discussed in a little more detail here.]
We choose to look at memcached, a "Free & open source, high-performance, distributed memory object caching system" 1. It's not outwardly sexy from a security standpoint and it doesn't have a large and exposed codebase (total LOC is a smidge over 11k). However, what's of interest is the type of applications in which memcached is deployed. Memcached is most often used in web application to speed up page loads. Sites are almost2 always dynamic and either have many clients (i.e. require horizontal scaling) or process piles of data (look to reduce processing time), or oftentimes both. This implies that the sites that use memcached contain more interesting info than simple static sites, and are an indicator of a potentially interesting site. Prominent users of memcached include LiveJournal (memcached was originally written by Brad Fitzpatrick for LJ), Wikipedia, Flickr, YouTube and Twitter.
I won't go into how memcached works, suffice it to say that since data tends to be read more often than written in common use cases the idea is to pre-render and store the finalised content inside the in-memory cache. When future requests ask for the page or data, it doesn't need to be regenerated but can be simply regurgitated from the cache. Their Wiki contains more background.
insurrection:demo marco$ ruby go-derper.rb -f x.x.x.x
[i] Scanning x.x.x.x
x.x.x.x:11211
==============================
memcached 1.4.5 (1064) up 54:10:01:27, sys time Wed Aug 04 10:34:36 +0200 2010, utime=369388.17, stime=520925.98
Mem: Max 1024.00 MB, max item size = 1024.00 KB
Network: curr conn 18, bytes read 44.69 TB, bytes written 695.93 GB
Cache: get 514, set 93.41b, bytes stored 825.73 MB, curr item count 1.54m, total items 1.54m, total slabs 3
Stats capabilities: (stat) slabs settings items (set) (get)
44 terabytes read from the cache in 54 days with 1.5 million items stored? This cache is used quite frequently. There's an anomaly here in that the cache reports only 514 reads with 93 billion writes; however it's still worth exploring if only for the size.
We can run the same fingerprint scan against multiple hosts using
ruby go-derper.rb -f host1,host2,host3,...,hostn
or, if the hosts are in a file (one per line):
ruby go-derper.rb -F file_with_target_hosts
Output is either human-readable multiline (the default), or CSV. The latter helps for quickly rearranging and sorting the output to determine potential targets, and is enabled with the "-c" switch:
ruby go-derper.rb -c csv -f host1,host2,host3,...,hostn
Lastly, the monitor mode (-m) will loop forever while retrieving certain statistics and keep track of differences between iterations, in order to determine whether the cache appears to be in active use.
insurrection:demo marco$ ruby go-derper.rb -l -s x.x.x.x
[w] No output directory specified, defaulting to ./output
[w] No prefix supplied, using "run1"
This will extract data from the cache in the form of a key and its value, and save the value in a file under the "./output" directory by default (if this directory doesn't exist then the tool will exit so make sure it's present.) This means a separate file is created for every retrieved value. Output directories and file prefixes are adjustable with "-o" and "-r" respectively, however it's usually safe to leave these alone.
By default, go-derper fetches 10 keys per slab (see the memcached docs for a discussion on slabs; basically similar-sized entries are grouped together.) This default is intentionally low; on an actual assessment this could run into six figures. Use the "-K" switch to adjust:
ruby go-derper.rb -l -K 100 -s x.x.x.x
As mentioned, retrieved data is stored in the "./ouput" directory (or elsewhere if "-o" is used). Within this directory, each new run of the tool produces a set of files prefixed with "runN" in order to keep multiple runs separate. The files produced are:
At this point, there will (hopefully) be a large number of files in your output directory, which may contain useful info. Start grepping.
What we found with a bit of field experience was that mining large caches can take some time, and repeating grep gets quite boring. The tool permits you to supply your own set of regular expressions which will be applied to each retrieved value; matches are printed to the screen and this provides a scroll-by view of bits of data that may pique your interest (things like URLs, email addresses, session IDs, strings starting with "user", "pass" or "auth", cookies, IP addresses etc). The "-R" switch enables this feature and takes a file containing regexes as its sole argument:
ruby go-derper.rb -l -K 100 -R regexs.txt -s x.x.x.x
ruby go-derper.rb -w output/run1-e94aae85bd3469d929727bee5009dddd
This syntax is simple since go-derper will figure out the target server and key from the run's index file.
2 We're hedging here, but we've not come across a static memcached site.
3 If so, you may be as surprised as we were in finding this many open instances.