GreyNoise is a system that collects, analyzes, and labels omnidirectional Internet scan and attack activity. But what does that even mean, and who cares? In this blog post I'll cover what it means to "scan the Internet", who's doing it, and the problems it introduces. Finally I'll cover GreyNoise, how it works, the problems it solves, and how to access the data it collects.
"Scanning the Internet" means reaching out and collecting small pieces of information from every single computer directly connected to the Internet. At a technical level, mass scanning the Internet means requesting a slight amount of information (specifically a TCP SYN, UDP/ICMP packet, or banner grab) to all 4.2 BILLION IP addresses on the entire routable IPv4 space. And it turns out that tens of thousands of people are scanning the Internet constantly.
Good guys scan the Internet to measure the exposure of vulnerabilities, take inventory of software market share, and track botnets. Bad guys scan the Internet to find vulnerable devices that they can compromise and use for nefarious purposes. A lot of other groups scan the Internet for unknown or secret reasons. In fact, there are entire websites and companies that act as "search engines" devoted to mass scanning the Internet, such as Shodan, Censys, Rapid7's Project Sonar, and many others.
Now that people are mass-scanning the Internet constantly, every single device exposed to the open Internet receives traffic from thousands of unknown IP addresses per day. Some of these remote mass-scanner IP addresses will simply check if a port is running and go away, but others will attempt to exploit your machine by bruteforcing your login credentials or launching a remote exploit. For network defenders who observe this traffic, this creates a dilemma:
"Is everyone else seeing this stuff, or is it just me?"
In other words-
"Is this just regular Internet background noise or is machine actually targeting and attacking ME specifically?"
We built GreyNoise to collect and maintain a baseline of expected omnidirectional Internet mass-scan traffic to make researchers and network analysts' lives easier.
At GreyNoise, we deploy and manage hundreds of servers (in different data centers and different countries) across all different corners of the Internet. Their only purpose is to sit back and soak up all the opportunistic scan traffic generated by anyone mass scanning the Internet. From there, we establish a baseline of expected scan traffic. How many people are scanning the Internet right now? What IP addresses is it coming from? What are they scanning for?
For more information on the nitty gritty of how GreyNoise actually works, check out the slides from our Shmoocon 2018 talk.
Why do you care?
The data GreyNoise collects can be used to ignore traffic from omnidirectional background scanners, allowing you to focus on targeted scan and attack traffic. You can use the data to track opportunistic botnets that are comprised of compromised IOT devices. You can use the data to stay informed of what kinds of software people are scanning for. Or, if you're so inclined, you can opt out of many mass-scanners altogether by blocking them preemptively.
Accessing the data
You can access the GreyNoise data right now with our experimental front end or by accessing our alpha API. If you're a researcher who is interested in working with our data sets, we'd love to hear from you via email or on Twitter. If you need access to bulk data or have interest in a commercial license, contact us at firstname.lastname@example.org.
Thanks for checking out this debut blog post, and please don't hesitate to reach out to me directly via my personal Twitter if you have any questions or concerns.
Andrew Morris, Founder