Tags allow users to see the GreyNoise data from a non-IP-centric view. The difference between how we view tags from an IP-Centric perspective and a non-IP-Centric perspective can be seen in the differences between the visualizer’s Today and Tags view.
At the time of writing this, the Today view gives us a list of a list of 348,125 IPs, all seen within the last day. Each of these IPs contains different data points like the Source Countries, Destination Countries, ASN, and Top Tags. Approaching the data in this way shows IPs that contain tags. This lets users get a general overview of the characteristics 283,765 machines have shared over the past 24 hours. This is useful and fun information when trying to see the overarching landscape. However, when in need of a blocklist or alert about IPs that may be targeting a specific piece of software or hardware that your network contains, this is an instance where tags come in handy. This is because when looking at GreyNoise data from the visualizer’s Tags view, tags have IPs attributed to specific things that users may be interested in rather than just the IPs themselves. For example, if you’re using Cisco routers, you might go to the Tags view and just query Cisco, resulting in a list of tracked events related to Cisco devices.
The hope is that this design results in an experience more like Googling and less like writing an SQL query to find things that may interest users.
To reiterate, the technical goal of tags is to sort the data from outside the scope of the IP, and this is where GreyNoise classifications come into play. Classifications are split into Intent and Category. Intent is divided into Unknown, Malicious, and Benign, and Category is divided into Activity, Tool, Actor, Worm, and Search Engine. These sets are not limited to CVE based activity. They include behaviors, attribution, and unique traffic characteristics. This is what the bulk of GreyNoise tags boils down to: either tracking “behaviors,” like in the case of Malicious Activity, and “attribution,” like in the case of a Benign Actor. What’s particularly interesting about these two examples is that there shouldn’t be any overlap between them. When there is, that indicates the potential need to rewrite a tag, which leads to the topic you’ve probably all been waiting for: how a tag gets written.
A Malicious Activity and Benign Actor tag work well as examples because of the difference in how they’re tracked. A Malicious Tag is heavily based on the contents of captured web packets (pcap). In contrast, a Benign Actor tag will ideally be based on an IP list, and when that can’t be gathered, a combination of rDNS and ASN will be used if these variables are consistent. This makes writing a Benign Actor tag the most straightforward, and typically most time intensive of the two.
Writing a Benign Actor starts with finding an actor. This is usually suggested by consistent rDNS with a word like bot, scanner, or crawl in its name. We can easily search this in the viz with metadata.rdns:crawl. This returns a lot of GoogleBot hits. Refine the search some more: metadata.rdns:crawl -actor:”GoogleBot” and we can see some crawlers and actors that have not been tagged yet.
That first unknown IP in the list has crawl-149-56-150-195.dataproviderbot.com as its rDNS. The next question is, have we seen other IPs with dataproviderbot, or something like it, as it’s rDNS, and if so, are they also benign? A search for metadata.rdns:*dataprovider* will work.
There is no malicious activity to speak of, so they seem benign. This can be verified by trying to find any information about the source. Following the rDNS trail is a good start. The initial crawl link times out, but the parent rDNS resolves to https://www.dataprovider.com/. They seem like a good candidate for a Benign Actor tag, but before reaching out, it’s always worth checking if a tag already exists, and to my surprise, it does:
However, we don’t seem to be getting any hits for it. After reviewing the query we’re running on the backend, it looks like we’re tagging based on rDNS, but it’s not a match for what we’re seeing. They must have changed their crawler’s name! This is going to need some further investigation. A Google search for dataproviderbot leads to a page about their crawler specifically. Looks like they identify themselves with a User-Agent:
This won’t do because User-Agent’s can be easily spoofed. Fortunately, they have a contact link just for questions about the bot! At this point, I would usually just reach out to them, verify that the traffic we’re seeing is theirs, and ask if they can share their IP list. If they cannot share this, it does seem that we may have had enough information to write a tag for tracking them—the search for metadata.rdns:*dataprovider* done earlier had some promising results that I hadn’t mentioned.
This information and what we know about their User-Agent and rDNS could suffice.
When tagging this benign actor without an IP list, the primary points of interest are ASN, ORG, and rDNS. The only thing we might consider grabbing from the PCAP’s data field in this case would be the User-Agent. However, when tracking Malicious Activity, or Activity in general, we primarily focus on the data field of PCAP.
Here’s a scrubbed packet that matches a tag I wrote:
What information can we gather from this? When discussing this event, we can translate it to: “This is an HTTP GET request requesting the device.rsp endpoint on the server. The GET request queries for opt=user&cmd=list and the contents of its Cookie is uid=admin.” Experience and a basic understanding of standard protocols are a considerable help when recognizing anomalies or notable features of a web packet. We can't all have experience, but a plain-text protocol like HTTP has plenty of resources to help us understand what’s going on in this packet. This basic understanding allows us to infer the request for /device.rsp?opt=user&cmd=list and the user=admin cookie are likely part of an authentication bypass, making them defining features of this packet. We can check this with a Google search for allintext:”/device.rsp?opt=user&cmd=list”. And what do you know!? The first hit is for CVE-2018-9995; the description of this vulnerability makes it pretty clear we’ve found our match!
allow remote attackers to bypass authentication via a "Cookie: uid=admin" header, as demonstrated by a device.rsp?opt=user&cmd=list request that provides credentials within JSON data in a response.
This is an excellent time to pause and point out that the intention of tags is not only to generate data regarding a vulnerability but also to aid in the proliferation of related knowledge. This is achieved through the tag’s description and providing resources found in the research process. Once we’ve added these, we can form a query to search GreyNoise for similar packets. This is one of the more nuanced aspects of writing tags. Finding the balance between making the signature general enough to catch variations of the same event but specific enough so there are no false positives. An accurate query to the internal GN data set for this event would look something like this:
The query for this event is pretty straightforward, but in many cases, exploitation may work regardless of capitalization or order of request operands like opt and cmd. In these cases, the query has to be adjusted accordingly.
If you have any more questions about detection engineering and how we write tags here at GreyNoise, please feel free to reach out to me via email or socials: h0wdy@greynoise.io @h0wdy @h0wdy.bsky.social! Also! Please don’t hesitate to message me if you’re new to hacking! I’m also just starting my journey, and I am always down to connect and share knowledge with my fellow n00bs! <3