- Modern SecOps
- Posts
- Lazy SIEM #2 - Let's make tuning fun (actually)
Lazy SIEM #2 - Let's make tuning fun (actually)
Learn from my mistakes, tuning doesn't have to be awful
Table of Contents
I used to hate tuning.
Bad tuning = whack-a-mole
Every day I’d go to my alerts, open the alert in a new tab, close the false positive, add the IP to a whitelist, and repeat…
100
freaking
times
Sounds familiar? Who wouldn’t hate that?
It doesn’t have to be this way. Tuning can be efficient, it can be proactive, and most importantly, it can be fun.
Over the years I’ve learned how.
I’ll give you the steps, but with a disclaimer:
Each of these steps builds on the next, so skipping won’t save you time.
Here we go…
Extract your entities and enrich your data
Tuning an incident without proper entities is like trying to round up magnetic marbles that repel each other.
No matter how hard you try, the marbles scramble because of physics, and your incidents scramble because they’re missing a common entity to tie them together.
Entities are items that can be common across a range of alerts and incidents. Items like users, IPs, domains, URLs.… Items that you have in columns of your dataset.
Where are all the entities?
Entities are structured data. They are in standard forms like GUIDs, fully qualified domain names, user principal names, and so on..
If your entities are sitting in long strings of text that you search through, you need to do some work to get them out. This is where parsing comes in.
If you want an in depth parsing breakdown of network messages, check out my parsing article here.
Now that you have entities parsed, you can associate them with alert creation rules.. if you have the entities you need.
The right entity for the job
When I was stuck in the office laboring over tuning, I had all the IP entities I could dream of. It was great… except that I didn’t care about the IPs.
To tune those alerts, I actually needed the domain name.
What did I do? I hacked together a script that enriched the incidents with domain names.
That way, instead of looking up and adding hundreds of IP ranges daily, I could add a domain and save hours.
I looked at the automation and realized, I have entities, what if I take this to the next level?
Correlate data and prioritize automatically
If you could focus on one incident daily which would you choose? The one that has the highest risk and impact right?
So why don’t we do that all the time?
You might be thinking I already do that, I start with high severity, then go to medium…
That’s great, but it’s just the beginning.
The need for context
Not all alerts bubbling up from the same rule are the same. If they were we wouldn’t even need to look at them!
Ok that’s obvious, but why do usually only use the alert rule to determine severity?
What if, instead of the rule, we looked at all the context. Suspicious behavior by user entities, IPs being in watchlists, that email entity being seen in 100 different alerts today.
These are all context signals that we can use. How? Two parts.
The first part is extracting the entities… and you already have that down!
Part 2 is where the magic happens.
Automation
Part 2 is automated risk scoring.
You define the modules (user risk, IOC lookup, similar alerts), and let automation run them on every entity on every alert.
The modules output a risk score, and this becomes another data point you use (sometimes the only one) in your tuning decision.
This is SOAR in action.
Want to see a real world example?
Check out STAT from Brian Delaney. He’s built a full SOAR solution using Sentinel and automation, and he’s made it available on GitHub for completely free!
The possibilities are endless with SOAR, but how do we know it’s actually helpful?
Use metrics and reports for the bigger picture
That new automation could actually be slowing your team down. You won’t know unless you’re tracking metrics.
Which metrics to track
I won’t dive deep into SOC metrics here, but I’ll give you some ideas.
Here are a basic set of metrics you should be tracking:
Mean time to response, mean time to acknowledge, false positive and true positive rates, # of incidents by severity, and # of incidents by tool.
Here’s an example Sentinel workbook that helps you track all this: https://learn.microsoft.com/en-us/azure/sentinel/manage-soc-with-incident-metrics#security-operations-efficiency-workbook
Having metrics isn’t enough. The way you use metrics is more important than how many you collect.
Don’t abuse metrics
Using metrics to punish teammates is counterproductive.
Instead, use metrics like this:
Set a hypothesis with quantifiable success criteria when possible
Ie: this automation rule will reduce the mean time to response while not reducing the # of true positive incidents
Use metrics to prove or disprove the hypothesis
And iterate
But what if we could iterate automatically?
Advanced tuning: full automation
Labeled incidents are the most valuable dataset you have.
Why?
Unlabeled data is abundant, labeled data..? Not so much.
Most classification models (a malicious log classifier or a tuning model for example) are trained on labeled data.
Your incidents? They’re expert labeled data! Analysts are spending days labeling all those incidents.
So don’t throw away all that hard work.
And maybe one day, all that data will be used to train a model that does all your tuning for you…
When that day comes, you will look back and miss all the hours you spent tuning.
Especially now that you have what you need to make tuning work for you.
Happy tuning!
Don’t want to miss the article on advanced data tuning? Subscribe with the link below:
Enjoyed the article (even a little bit)? Follow me on LinkedIn on to hear more of my rants: https://www.linkedin.com/in/nouraie/
Reply