Sabtu, 15 Maret 2008

How Many Burning Homes

I mentioned the idea of host integrity assessment in my post Controls Are Not the Solution to Our Problem. The idea is to sample live devices (laptops, desktops, servers, routers, switches -- anything that runs a network-enabled operating system) to see if they are trustworthy. (They may be trusted, but that does not make them trustworthy.)

I described how I might determine trustworthiness, or integrity, in Three Capabilities, Three Companies. I'd like to expand on these thoughts with five metrics. Before showing the security metrics, I'd like to introduce an analogy.

Imagine a city with an understaffed, under-resourced, and possibly unappreciated fire department. The FD would like to prevent fires, but it spends most of its time responding to fires. How should city leadership decide how to staff and resource the FD? (There is no way to eliminate fires, at least no way that could ever be financed using any foreseeable resources. Even if people lived in concrete cells with no furnishings, they would probably figure out a way to light each other or the ground on fire!)

In this situation, one might argue that one way to judge the peril of the situation is the ability of the FD to "manage the fires." In other words, perhaps there is some number of burning homes that can be maintained while the FD responds, contains, and extinguishes fires. If the FD is large enough the number of fires can be rapidly decreased such that the time to extinguish is very small. If the FD is too small, then eventually the whole city burns because the fires overwhelm the FD's ability to respond, contain, and extinguish.

The question becomes what is the "right" number? You could think in terms of the following metrics.


  1. Number of burning homes at any sampled time. The higher this number, the more likely the fire will spread.

  2. Average length of time any home is burning. Again, the higher this number, the more likely the fire will spread.

  3. Average time from detection to response. This measures how fast the FD arrives on site.

  4. Average time from response to recovery. This measures how effective the FD is fighting fires.

  5. Average property value of burning homes. One would be less concerned if the burning homes are abandoned or condemned, and more concerned if they are inhabited.


I do not consider the number of arsonists here. That is relevant but it brings into question the role of the police to deter, investigate, apprehend, prosecute, and incarcerate threats. The FD cannot fight arsonists directly.

Now let's turn to digital security. While it's easy to spot a fire, identifying a "burning" (i.e., compromised) computer can be more difficult. If we could do that via host integrity assessment, we could imagine the following metrics.

  1. Number of compromised computers at any sampled time. This is a statistically valid sample.

  2. Average length of time any computer is compromised. Answering this quesiton requires a forensic investigation to identify the point in time where the intrusion is most likely to have happened.

  3. Average time from detection to response. This measures the effectiveness of the intrusion detection program.

  4. Average time from response to recovery. This measures the effectiveness of the IRT and provisioning personnel.

  5. Average asset value of compromised computers. Again, a lot of owned low-value assets might not be a big problem.


So what do you do with these numbers? First, I recommend just collecting them. Second, take them to business owners and ask if the situation is acceptable. For example:

  • Is it acceptable to have 25% of a business' computers compromised? 50% 10%? 5%?

  • Is it ok for them to be owned for 6 months? 1 day? 2 years?

  • Is it ok for us to take 6 months to notice? 2 hours? 2 days?

  • Is it ok for us to take 1 week to recover? 1 day? 1 month?

  • Is it ok for us to be suffering compromise on development servers? Call center PCs? Human resources databases?


Note on arsonists: you should be able to tell that "arsonists" are intruders. Since most companies can't reduce threats directly, IRTs are in exactly the same position as the FD.

Note on prevention: you can extend the fire analogy to other areas. Fire resistance is like the time required for a red team to penetrate a target. Applying fire retardants is like blue teams taking countermeasures upon discovering vulnerabilities.

Finally, with these answers we can make decisions to change the metrics. For example, a firefighter could say "increase my staff by two people per shift, and buy this new fire engine, and I can change the metrics this way." In the digital realm, a security analyst could say "increase my staff by two people per shift, and buy this new sensor grid, and I can change the metrics this way."

You could also try to influence the prevention side by saying "change all antivirus software from vendor A to vendor B, and change all local users from administrators to unprivileged users" and then see if the metrics change.

The manager is now in a position where spending influences metrics, and the failure to spend could result in an unacceptable answer to the question "How many burning homes?"

0 komentar:

Posting Komentar