- Modern SecOps
- Posts
- Deep Dive into Microsoft Sentinel Summary Rules
Deep Dive into Microsoft Sentinel Summary Rules
Neurology, technology, and Sentinel Summary rules. Explaining the intuition and process of building summary rules from scratch.
Table of Contents
We’re going to dive deep into Microsoft Sentinel summary rules.
But first, let’s try to find out why they’re so important.
To do that, we’ll take a small detour into the human brain.
Neurology inspires technology
…and not just by making infrastructure so complex that it takes centuries to understand.
The best example of this is artificial neural networks, originally modeled after real neural networks in the brain.
And yes, those are the same artificial neural networks that are the backbone of today’s AI wave.
So it’s only natural we look to the brain for more inspiration. Let’s start by looking at memory.
Here’s an interesting idea about memory:
Our brain doesn’t store memory as a continuous stream of film.
Instead, scientists found evidence that supports the “event segmentation” theory.
The theory says that our brain stores memory as distinct moments grouped into events. They even found “boundary cells” that activate when our memory switches from one event to the next.
Event Segmentation Theory of Memory
They theorize that memory works like photos on a computer. Photos taken in the same time and place are grouped together and a key photo is used to represent the group.
When we want to expand on an event, we pick the “key photo” from the group. Then we find photos similar to that key photo.
Ok, that’s kinda cool.. but so what? How do we translate that to SIEM design?
Well.. what if we implemented event segmentation for a SIEM?
Hear me out.
What if we had “key records” that represented a bundle of events, and when we want to investigate more, we pivoted to those logs?
That way we:
Keep a cleaner SIEM
Triage incidents much quicker
Make it easier to surface anomalies
Save on storage costs by tiering key events
Here’s how: summary rules.
What are Microsoft Sentinel Summary Rules?
Summary rules aggregate a group of logs.
In specific, we will be looking at Microsoft Sentinel Summary Rules to give concrete examples of building summaries.
Here’s how they work:
Choose a source table
Craft a KQL (search query)
Choose the destination table
Set the frequency that the rule runs
Creating a Summary Rule in Microsoft Sentinel
It’s that simple.. or is it?
Creating the rule is the easy part. The key is choosing what goes inside.
How to build a Microsoft Sentinel summary rule
I think of summary rules as a form of compression.
Here’s why:
We want to make data smaller but keep security value.
But what is security value? How do we measure it? How do we maximize it?
Answering those question leads us to my favorite method of designing summaries.
Always start with use cases, then work back to the data you need.
Use cases can be:
Hunting for anomalies in sign-in activity
Analytic rules that detect threats in network logs
Workbooks that report on trends over time in web app access
The use case will give you a clear outcome to work towards.
Here are a few other lessons I’ve learned building summaries for over a petabyte of logs…
Microsoft Sentinel summary rules best practices
Not future-proof, just future-resistant
When creating a summary rule you select the destination table. Either a table that exists, or a new one.
Your table will have a schema. The schema defines the columns and their data types.
Sample table columns in Microsoft Sentinel
When writing your query or creating your table, consider whether your schema will hold up against the test of time.
Here are some questions to ask:
Could we add additional columns that will cover a whole different set of use cases without creating a whole new summary table?
Could we add additional columns that provide immediate hunting value, will save analysts time, and don’t incur too much storage (maybe by sampling some data)?
What if more data sources are added that need to be included in your summary? (Hint: writing summaries on top of ASIM could be an easy solution here)
The answer to these questions is the core of making summaries, and that is your ability to balance two things:
Cutting storage while keeping security use cases.
We talk about this tradeoff below in The future of summaries.
It’s not just about the output
You’re probably not the only person that will see your code.
Other team members will need to understand, explain, and modify your KQL, so don’t make their lives harder.
Which one of these do you prefer to look at?
_ASim_NetworkSession() | summarize min = min(TimeGenerated), max = max(TimeGenerated), sum(EventCount) by IpAddr = DstIpAddr, NetworkDirection, DvcAction | extend type = "Dst"|
union (_ASim_NetworkSession() | summarize min = min(TimeGenerated), max = max(TimeGenerated), sum(EventCount) by IpAddr = SrcIpAddr, NetworkDirection, DvcAction | extend type = "Src"
Or:
let DestinationSummary = _ASim_NetworkSession()
| summarize
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
// Cannot use count, each row != one event
EventCount = sum(EventCount)
by IpAddr = DstIpAddr, NetworkDirection, DvcAction
// Need to identify if IP is src or dest
| extend type = "Dst";
let SourceSummary = _ASim_NetworkSession()
| summarize
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
EventCount= sum(EventCount)
by IpAddr = SrcIpAddr, NetworkDirection, DvcAction
| extend type = "Src";
SourceSummary
| union DestinationSummary
You probably said the second one…
Even though they achieve the same outcome, the second query is easier to read and understand.
Here’s why:
Uses short but descriptive variable and column names
Adds comments to explain “the why” behind the code
Is well formatted (hint: use Alt + Shift + F in Sentinel to auto-format)
If you’re working as part of a team, push code that’s hard to write but easy to read, not the opposite.
Summaries don’t have to be the end
Just because you’re using summaries doesn’t mean you’re stuck with summary data.
Here’s an example:
You have logs coming into auxiliary tier (think of this kinda like cold tier) in Sentinel
You run summaries on these logs and store the summaries in analytics tier (think of this as hot tier)
An alert triggers on summary data
You start to investigate, but realize the summary data isn’t enough. The data you need lives in auxiliary tier…
What can you do?
Well, if it’s your first time encountering this, you’ll probably go to the original table and search for those logs.
If you want to avoid that, here’s what you do:
Create automation that “rehydrates” the data as-needed.
That way, an incident that bubbles up from summary data is automatically enriched with more granular context!
Logic App with Sentinel Incident Trigger
Now that you’re becoming an expert on summary design, let’s get a bit more advanced.
The future of summaries
Let’s take the analogy between summaries and compression to the next level.
Compression has a north star, it’s called lossless compression.
In compression, an algorithm is lossless if it is able to compress data without losing any important information.
In summaries, we can track important information as the use cases covered by a table (maybe by using something like the # of TTPs covered).
What if we could get to lossless compression for summaries?
Summary Rules Efficiency Curves
Maybe, one day we could.
And the answer might lie in neural networks themselves.
Wait, neural networks, for compression?
Yes, and here’s why:
One of the interpretations of neural networks is that they condense a lot of data into a model. Then we can use the model to get back information about the initial data.
That sounds almost like… compression!
Maybe the brain has more secrets than we expected…
This won’t be the last time I’ll talk about this. Subscribe with the link below so you don’t miss my article on advanced summary techniques.
Microsoft Sentinel Summary Rules Cheat Sheet
Want a quick reference for the material in this article?
Check out the cheat sheet below, it’s all yours!
Microsoft Sentinel Summary Rules Cheat Sheet
I’m writing an in depth post about summary logic and techniques.
A post where I give you sample queries with in depth KQL explanations.
Don’t want to miss it? Subscribe with the link below.
Enjoyed the article (even a little bit)? Follow me on LinkedIn on to hear more of my rants: https://www.linkedin.com/in/nouraie/
Disclaimer: written by a Microsoft employee. All writing is my own opinions
Reply