Introducing Sift: Automated Threat Hunting

TLDR:

GreyNoise is exposing a new internally developed tool, Sift, to the public for the first time. Sift curates a report of new/interesting traffic observed by GreyNoise sensors daily after doing much of the analysis and triage work itself.

Note that it is a new and experimental feature and will probably have some bugs and change without warning. We will soon be integrating direct marker.io feedback capability. For now, please direct all feedback to labs@greynoise.io. We really want to know what you think!

Threat Hunter Pain

There is a lot of traffic bouncing around the internet. Full stop. GreyNoise sees ~2 million HTTP requests (along with tens of millions of events from other protocols) a day. For our on-staff Detection Engineers and your engineers and analysts facing similar loads, analyzing millions of HTTP requests can be extremely tiresome and stressful.

It’s like looking for a needle in a haystack each day. Most of them are harmless, but some could be hiding malicious activity. It’s a tedious and time-consuming process, constantly payloads of data, and the fear of overlooking something dangerous adds a layer of stress. The task is mentally exhausting, and the perpetual strain can make it a painful experience, with the constant awareness that a single mistake could have serious consequences.

Introducing Sift

To help provide a painkiller, we’ve created Sift. Sift is a workflow that attempts to remove the noise of the background traffic and expose new and relevant traffic. Additionally, it describes the interesting traffic, tells you if it might be a threat, and prioritizes what payloads to look at first. Identification, explanation, and triage all in one tool.

To achieve this, we employ several advanced DS/ML/AI techniques, such as:

custom-built LLMs (Large Language Models)
nearest neighbor search and vector databases
unsupervised clustering, prompt engineer
RAG (Retrieval Augmented Generation), and
querying the state of the art generative models for additional analysis.

The result is a daily report of what GreyNoise sees in our vast sensor network distilled down to only the new items and with built-in analysis to give every defender an immediate look into what is really happening on the internet, no longer needing the luck of an analyst stumbling upon an attack in log traffic.

Currently, it is limited to HTTP traffic, but that won’t last long. It is an experimental feature on the bleeding edge of what is possible, so please bear with us as errors inevitably occur.

Directing Attention

As said earlier, GreyNoise sees millions of HTTP requests a day. After months of experimentation, we found several techniques to record, clean, dedupe, and convert this data into a numerical format for analysis. Applying this to our significant dataset of internet traffic, we’re able to automatically tell you what is new today vs. what we have seen in the last several weeks. This process effectively makes a noise filter for traffic.

In practice, our process takes ~2 million HTTP events down to ~50 per day that require an analyst to look at. Now, we can actually find the needles in our proverbial haystack scientifically and give our analysts a reasonable workload. This reduction in noise has dramatically improved the quantity of new Tags we can generate every week.

Explaining and Sorting

Once we’ve narrowed our focus, we can employ some of the more costly techniques of commercial large language models to help us answer specific questions about the payloads we’re considering. Without giving away all our techniques of how we accomplish it, we can generate an analysis of the payload, potential CVEs, and CPEs associated (which are more up-to-date than any language model), a score of how big of a threat it might be, what GreyNoise knows about the IPs (tags/riot/etc), a score of how confident we are, Suricata queries that might detect similar payloads, as well as keywords, techniques, and technologies affected.

In short, we’re trying to build an entire analyst report on the fly for only things you should look at. Additionally, we sort the reports, so you look at the most critical threats first.

Future Possibilities

Sift is brand new and full of possibilities. You can help flesh those out. We’re currently only exposing daily reports from the last month (excluding the previous week).

Would you like to see more reports? (e.g., Back in time or up to the current date?)

More tailored to your organization? (We are rolling out user-hosted sensors where you can get data that a Sift report could eventually filter. More info to come soon..)