• Modern SecOps
  • Posts
  • Microsoft Sentinel Summary KQL deep dive (From Beginner to Advanced KQL)

Microsoft Sentinel Summary KQL deep dive (From Beginner to Advanced KQL)

How to write KQL Sentinel Summary Rules with 3 real world examples.

Microsoft Sentinel Summaries Fundamentals

Before we start with summaries, we need to remember two very important things.

The rule for rule building: use cases first.

With every step, we use use cases to drive our rules.

Second: most of the time, a better rule = more detections with better data

That’s the efficiency curve:

Summary Rules Efficiency Curves

Microsoft Sentinel Summaries Development Process

So where do we get started? Here’s exactly what I did to come up with these rules:

  • Picked a pack from the content hub

  • Looked at the logic in the rules

  • Determined the data I need

  • Write the summary

Now, which rules and which packs to deploy? That is a great question…

The answer is all about detection engineering.

Want to read more about that? I will dedicate a whole article to detection engineering in Sentinel. Subscribe so you don’t miss it.

Note, throughout this article we use IPs for summaries. A limit to limiting our summaries by IPs: IPs change, we usually want more persistent indicators.

Using ASIM for Microsoft Sentinel Summaries

We want to use ASIM whenever possible, this makes things cleaner and easier to modify.

Now a question, should we build the summary rule with an ASIM table or include the summary table in an ASIM parser?

You can do both!

Just be careful… because you might cause a loop. Here’s how:

Summary on ASIM > ASIM on the summary > Summarize the summary!

That can be avoided.

Rows have a column called type that tells us the table name. Use a naming conviction for your tables… then filter where type ≠ summary_*

Now time to dive in. Let’s start at the network.

Network IOC Detection with Microsoft Sentinel Summaries

Let’s start with a straightforward use case:

IOC detection in network logs

Our first instinct might be to write a rule like this:

_ASim_NetworkSession()
| summarize by SrcIpAddr
| union (_ASim_NetworkSession() | summarize by DstIpAddr)

That’ll show us all the source and destination IPs we’ve seen.

It’s good… but not good enough. That rule would show us what IPs connected, but what if we want to see just a bit more?

You hear me whispering… “Remember the efficiency curves!”

So, let’s break down a wish-list of what we want:

When was the IP first and last seen?

We can get that with these two lines:

FirstSeen = min(TimeGenerated)
LastSeen = max(TimeGenerated)

How many times the IP reached out?

We can’t just count the logs here. Each log could have multiple events.

So we sum the EventCount like this:

EventCount = sum(EventCount)

Let’s say we wanted to also detect attacks like port scans, data exfiltration, and DDoS

We would need a few things:

How many bytes were coming to and from this IP?

SrcByte = sum(SrcBytes)
DstByte = sum(DstBytes)

How many IPs did this IP reach out to?

UniqueSources = dcount(SrcIpAddr)

And finally…

How many unique source and dest ports did we spot from this IP?

SrcPorts = dcount(SrcPortNumber)
DstPorts = dcount(DstPortNumber)

To put it all together we take our summaries for IPs when the IP is the source, and union it with our summaries for when the IP is the destination.

We want to keep separate records for each firewall action (block, allow…) and each network direction: those are really important to differentiate an attack.

And there we have it! We’ve gotten at least 4 analytic rules out of a single summary.

Here’s the full rule

let DestinationSummary = _ASim_NetworkSession()
    | summarize
        FirstSeen = min(TimeGenerated),
        LastSeen = max(TimeGenerated),
        UniqueSources = dcount(SrcIpAddr),
        SrcPorts = dcount(SrcPortNumber),
        DstPorts = dcount(DstPortNumber),
        SrcByte = sum(SrcBytes),
        DstByte = sum(DstBytes),
        // Cannot use count, each row != one event
        EventCount = sum(EventCount)
        by IpAddr = DstIpAddr, NetworkDirection, DvcAction 
    // Need to identify if IP is src or dest
    | extend type = "Dst";
let SourceSummary = _ASim_NetworkSession()
    | summarize
        FirstSeen = min(TimeGenerated),
        LastSeen = max(TimeGenerated),
        UniqueDestinations = dcount(DstIpAddr),
        SrcPorts = dcount(SrcPortNumber),
        DstPorts = dcount(DstPortNumber),
        SrcByte = sum(SrcBytes),
        DstByte = sum(DstBytes),
        EventCount=  sum(EventCount)
        by IpAddr = SrcIpAddr, NetworkDirection, DvcAction
    | extend type = "Src";
SourceSummary
| union DestinationSummary

Onto identity.

Password Spray Detection with Microsoft Sentinel Summaries

Let’s use 2 detections here, password sprays and bruteforce attempts. What do we need to detect those?

  • Novelty of country

  • Time of sign in

  • Novelty of browser ID

  • Users signed in from source

  • Sign in attempts from users

Quick tangent:

A detected auth attack is better than a missed one, but still nightmare fuel.

What’s even better? A blocked one.

So while we talk about detection, I want to make sure you don’t miss the low hanging fruit:

Enable MFA, strong auth, PIM, conditional access, and RBA policies…

That’s my rant, now back to detection.

Now that you’ve already built your first rule, you can use a lot of the functions you’re already familiar with to get most of what we need:

FirstSeen = min(EventStartTime)
LastSeen = max(EventEndTime)
TargetUsers = dcount(TargetUserId)
TargetApps = dcount(TargetAppId)
UserAgents = dcount(HttpUserAgent)

But we need to add a few more

First, we want a set of regions and user agents. Let’s go with 5… that should good enough to give us a good idea.

First, we use coalesce, this says to give us the region whenever it exists, otherwise just return the country:

extend Region = coalesce(SrcGeoRegion, SrcGeoCountry)

We do a similar thing with source, again just in case one of the fields is missing:

Src = coalesce(Src, SrcIpAddr)

Now we can use the make_set function like this:

(We use variables in the top to make it easier to adjust how many items we want in the set.)

let UserAgentSample = 5;
let GeoSample = 5;
...
| summarize
    UserAgentSet = make_set(HttpUserAgent, UserAgentSample),
    GeoSet = make_set(Region, GeoSample)
    by ...

In this case with every row of the summary we get 5 user agents and 5 regions that the attacker used

Now… let’s turn the detection up a notch using our knowledge of attackers

Password sprays can be programmatic, and that leaves behind patterns. We can find those patterns in auth activity, especially when it comes to timing.

If we look hard enough, we can see what the time difference (delta) between each authentication attempt, and use that as a signal.

Human attempts will have a wide range of deltas… very randomly spread.

Automated attacks? Those are much less random

So, let’s stash some statistics in our summaries. Statistics to help us find the patterns.

First, we need to get the time deltas using the functions datetime_diff and next. Here’s how:

datetime_diff('day', next(timestamp), timestamp)

datetime_diff gives us a difference in our selected unit (in this case days) between two datetime values.

next() gives us the value of that column for the next row.

Here’s a demo in action:

(note that we need to sort before we can use the window function)

Look at that! A nice surprise: when the next value is empty, so is our delta, so it’s ignored in most of our aggregation functions!

Next we need to gather the statistics… here are some useful metrics

  • Standard deviation

  • Percentiles

  • Min

  • Max

  • Variance

(you probably won’t need all of these so choose wisely - hint: you can put them in a json column that can always be changed without changing the schema)

Want to see how we will use these? Subscribe with the form below so you don’t miss my article on advanced KQL detection techniques.

And here it is, the full rule!

let LowerPercentile = 5;
let MiddlePercentile = 50;
let HigherPercentile = 95;
let UserAgentSample = 5;
let GeoSample = 5;
_ASim_Authentication()
| sort by TimeGenerated asc
| extend
    EventDuration = datetime_diff('millisecond', EventStartTime, EventEndTime),
    TimeDelta = datetime_diff('second', next(EventStartTime), EventEndTime),
    Src = coalesce(Src, SrcIpAddr)
| extend Region = coalesce(SrcGeoRegion, SrcGeoCountry)
| summarize
    FirstSeen = min(EventStartTime),
    LastSeen = max(EventEndTime),
    TargetUsers = dcount(TargetUserId),
    TargetApps = dcount(TargetAppId),
    UserAgents = dcount(HttpUserAgent),
    UserAgentSet = make_set(HttpUserAgent, UserAgentSample),
    Geos = dcount(Region),
    GeoSet = make_set(Region, GeoSample),
    DeltaStdev = stdev(TimeDelta),
    DeltaPercentiles = percentiles_array(TimeDelta, LowerPercentile, MiddlePercentile, HigherPercentile),
    DeltaVariance = variance(TimeDelta),
    DeltaAvg = avg(TimeDelta),
    DeltaMin = min(TimeDelta),
    DeltaMax = max(TimeDelta)
    by Src, EventResultDetails, EventSubType

Web Crawler Detection with Microsoft Sentinel Summaries

Now that you’re stocked with all the summary fundamentals, let’s put it together.

We want to catch web crawlers. What are the patterns?

A single source going to multiple destinations. Many of those destinations not being valid, and low avg time per destination.

So here’s what we need:

  • # URLs seen by src

  • Counts of status

  • Time spent per url

You already have everything you need from before!

let LowerPercentile = 5;
let MiddlePercentile = 50;
let HigherPercentile = 95;
_ASim_WebSession()
| extend EventDuration = datetime_diff('millisecond', EventStartTime, EventEndTime)
| summarize
    FirstSeen = min(EventStartTime),
    LastSeen = max(EventEndTime),
    UrlsVisited = dcount(Url),
    EventCount = sum(EventCount),
    AvgDuration = avgif(EventDuration, EventDuration != 0),
    PercentilesDuration = percentiles_array(EventDuration, LowerPercentile, MiddlePercentile, HigherPercentile)
    by SrcIpAddr, EventResultDetails, DvcAction

But let’s not stop there.

Let’s turn the heat up on the attackers again… cuz why not?

A lot of time web crawlers will walk through paths

Something like this:

  • https://…/test

  • https://…/test1

There crawler leaves behind it’s scent, so let’s sniff it.

Those paths have common parts in their URLs. We need to detect that commonality.

The next url will be like the last, and this pattern continues.

So we need to use the next function again. This time mixed with two new helpers:

similarity = jaccard_index(to_utf8(next(url)), to_utf8(url))

to_utf converts our string into an array of utf 8 characters. We need that to use jaccard_index, which gives us the similarity between two arrays.

Here’s that logic in action:

One more thing…

What if we’re at the last row? In this case we don’t get a free null value, so we have to set it ourselves:

similarity = iif(isnotempty(next_url), similarity, real(null))

Now that we have our similarity, we can gather statistics on it!

Just like we did for duration before.

This time, let’s just grab the min, max, and mean.

And there we have it, the full summary:

let LowerPercentile = 5;
let MiddlePercentile = 50;
let HigherPercentile = 95;
_ASim_WebSession()
| sort by EventStartTime asc
| extend
    EventDuration = datetime_diff('millisecond', EventStartTime, EventEndTime),
    Similarity = iif(isnotempty(next(Url)), jaccard_index(to_utf8(next(Url)), to_utf8(Url)), real(null))
| summarize
    FirstSeen = min(EventStartTime),
    LastSeen = max(EventEndTime),
    UrlsVisited = dcount(Url),
    EventCount = sum(EventCount),
    AvgDuration = avgif(EventDuration, EventDuration != 0),
    PercentilesDuration = percentiles_array(EventDuration, LowerPercentile, MiddlePercentile, HigherPercentile),
    SimilarityMin = min(Similarity),
    SimilarityMax = max(Similarity),
    SimilarityAvg = avg(Similarity)
    by SrcIpAddr, EventResultDetails, DvcAction

Congratulations! You now have the tools to build advanced Sentinel summary rules. What will you build next?

Glossary of Functions

Reply

or to participate.