Feature explainer: anomaly and trend detection

Defenders have a remarkably tough job. They must understand — to the largest extent possible — which event needs investigating right now. There are many triggers for such events, but a major one is knowing when their threat landscape has changed. More specifically, defenders need to know when traffic (and actor) behavior has changed sufficiently to warrant taking notice or action.

Change detection comes in many forms. One such form comes under the lofty heading of "anomaly detection," which may also be referred to as "trend detection." Most modern detection and response solutions have some sort of anomaly detection capability.

GreyNoise has recently introduced a new trend detection feature in our platform that will help inform both researchers and organization defenders about potentially new or dangerous traffic behavior changes as quickly as possible. This gives researchers and defenders the necessary context to decide if further action is warranted.

Background on trends and anomalies

Our new Anomaly/Trend Detection feature operates on Tags, GreyNoise’s automated event labeling system. Our processing takes signatures developed by our Research team and applies tags to the packet traffic from our sensor network. These tags are used to add context from actor attribution to behavior, including scanning, crawling, or exploitation events. Each one represents a particular kind of traffic: malicious, benign, or as yet unknown. Each can also encompass one or more protocols, vulnerabilities, and/or exploits.

Monitoring tag behavior over time is a large part of GreyNoise’s value. Because we see how many sensors are hit with traffic events labeled with a particular tag every hour, we can assess which tags are becoming more popular, which ones are experiencing a near-term anomaly in traffic (possibly a notable event in itself), and which are going quiet. At GreyNoise, we look for opportunities to filter out the noise. Traffic that trends upward or exhibits an anomalous "spike" is noteworthy because it is "noisy.”

How it works

Detecting trends and anomalies is about finding deviation from previous behavior, particularly in a positive direction. Both tasks start with finding the average over a long period, at least ten days. The Trends tab looks at slower increases in traffic for a specific tag, comparing the long-term average to a short-term, more recent average, and doing the classic percent-change formula. This produces a value we can use to rank which tags are seeing the largest increase in average traffic.

Immediately above is one example of “trending” behavior.

Detecting an anomaly (as seen immediately above) is somewhat trickier, as we’re not interested in alerting on every recent anomaly. Indeed, it’s not inconceivable that a “quiet” tag that’s gone unseen for a while could suddenly start appearing intermittently (see below). That tag could very well get a separate report for each anomaly, even if each anomaly is not all that tall compared to its neighbors. 

We have to find a single recent peak, and preferably the highest. We do so with two algorithms: first, we use a moving window to determine which sample(s) have the largest (positive) deviation from the average value for that window. Second, we check for a change in slope, as a peak signifies a change from increasing to decreasing behavior. Between the two algorithms, we can find peaks reliably, accurately, and precisely. The peak values are then subjected to the percent change formula, so we can compare anomalies and trends for each tag.

Ranking the output of these algorithms, we present the tags that have the largest recent increase in trend as well as the tallest recent peak, both relative to previous average behavior. 

What to do about it

Loading up our Tag Trends page, you’ll see trending tags and anomalies: lists of the tags exhibiting each of the behaviors the strongest out of our vast collection of known tags.

Trending tags are showing a marked increase in average traffic; that is, the average now is relatively greater than previously. This could continue over time, suggesting the possibility of the related vulnerabilities being more commonly exploited (or at the very least, more commonly seen in data from one’s inbound internet security perimeter). This raises the odds of seeing it on our customer’s own perimeter, and therefore increases the urgency for monitoring in more detail and patching, if possible.

Anomalies in a tag manifest sudden jumps in behavior–and could be more useful to see what already happened than for projection about future behavior. Anomalies, by our internal definition, have already started to ebb, so the real “peak” has passed. With the largest anomalies listed, though, we can note the timing of large events/incidents that may illuminate what happened and how. However, it is possible that repeated anomalies could signal a new behavior pattern of intermittent bursts of activity: then the timing between those anomalies could show coordination, how many groups are involved, or even in which time zones they likely operate. 

What is particularly powerful is the ability to raise the conversation around tags (and therefore particular exploit vectors) as soon as they’re seen in numbers “in the wild,” instead of waiting for a particular exploit to hit the news. Additionally, each tag notes a particular approach. So, if a product has multiple potential exploits one could patch for, the tag (or tags) that are seeing the most traffic should be patched soonest.

All this being said, of course, these behaviors may not be seen on your perimeter. You may load the Tag Trends page and find that none of the “trending” tags are increasing their traffic for you, and you may be seeing traffic on other vectors. This is perfectly normal, and likely indicates that your firewall is doing its job, or that your attack surface is minimized to the threats and activity on the web at large. You may also be investigating a sudden wave of traffic of a particular type, and see that it didn’t make our Anomalies list. This means that, likely as not, your wave of traffic is more targeted than it may appear at first, and perhaps some caution may be warranted.

A note on data over time

There’s been some talk in the last few paragraphs about possibilities because this is largely cutting-edge research in cybersecurity. Already, we have seen tags which saw a drop in behavior that appear to be trending (as said drop passes out of the recent time window). The movement of the drop drags down the previous baseline, which makes the resumption of normal activity look like it’s trending to the algorithm. Perhaps it’s not trending upward from where it started, but the whole sample still exhibits notable behavior; sometimes the resumption of normal behavior out-trends the weaker actual “trending” tags. Notable behavior is notable behavior. 

There are more exotic patterns lurking in the time series data that our researchers already know and recognize. For us, it’s a matter of building a statistical system around that recognition so we can further automate. No one knows all of what’s possible out there, but we’re listening, so we know better than many. As soon as GreyNoise finds something new, you can rest assured that our findings make their way into refining this feature and other features to come.

Get Started With GreyNoise for Free

This article is a summary of the full, in-depth version on the GreyNoise Labs blog.
GreyNoise Labs logo
Link to GreyNoise Twitter account
Link to GreyNoise Twitter account