- Modern SecOps
- Posts
- Legacy Log Forwarding: The Parasite that Almost Killed my SOC
Legacy Log Forwarding: The Parasite that Almost Killed my SOC
How I beat legacy data pipelines to save a SOC.
An interviewer once asked me how I would build a SOC from scratch. The answer was obvious, set up a SIEM! What logs would I collect? Well, what a silly question.. the security ones!
And that’s how I built SOCs. Set up threat protection, endpoint security, and a SIEM. The SIEM cost too much? No problem, let’s start dropping logs based on security value.
This was great, until it stopped working.
One day I found myself starting at 3 TB of daily logs in the face. I began by asking the customer a question that had become rhetorical by now, do you really need all these logs?
Yes, they replied.
I was surprised, and somewhat awed. I guess it was time to spin up a few extra log forwarders..
Time to get to work. I built out the traditional architecture, discussed how many forwarders they needed, and was immediately rejected when I brought up the price.
Did I lose the deal? This couldn’t be it..
No matter the angle, a SIEM-only approach was not going to work for this customer.
I needed help, and I found answers in a new group, data engineers. After some conversation I was guided to an alternative to log forwarding and SIEMs, ETL.
Instead of just building log forwarders to SIEMs, I had to build a data pipeline with ETL (Extract, Transform, Load).
What’s the difference?
Smarter Routing
Not everything needs to go to the same destination. Some data is searched daily, some data is only stored for compliance and rare searches.
ETL lets me send that unused data to much cheaper storage, with lots of rules to do that routing.
Better Data
Same data, but smaller and quicker, without compression. Some call it magic, data engineers call it Parquet.
Parquet is an Apache data technology that makes files more efficient with some cool tricks.
Keep an eye out for an article dedicated to Parquet.
Cleaner Tools
Analysts don’t need to search all their data all the time.. so why is it always in their face?
Instead all the noisy data lands in a cheaper storage location (data lake, warehouse..) and the immediate actionable data goes into the SIEM.
Less Limits
No need for everyone to go through a single compute pipeline.
By sending data to a data warehouse, users can scale compute based on what they need.
This was not easy to build. Building pipelines is much more work than plugging in a log forwarder.
Months of troubleshooting, verifying log formats, and documenting the pipelines ensued.
But once it was built, it was beautiful to see.
Most importantly, the client was happy. They weren’t just given a SIEM, they were given an enterprise logging solution.
Enjoyed the article (even a little bit)? Follow me on LinkedIn on to hear more of my rants: https://www.linkedin.com/in/nouraie/
Reply