Check out this session by Andrew Morris, founder and CEO of GreyNoise Intelligence, presented during the SANS CyberSolutions Fest (SOC/SOAR) in October 2021.
Abstract
Everyone in the SOC is too busy, and everything feels on fire all the time. With an insane amount of “bad” on the internet driving alert volumes in the SOC sky-high, it's hard to see which “bad” to focus on--because there’s simply too much noise. Automation is the obvious answer, but finding time to set up the right automation is difficult. Join GreyNoise founder and CEO Andrew Morris on this webinar to learn just how bad “internet noise” is today, and what you can do to reduce the racket and reclaim the value of your SIEM and SOAR.
Andrew will cover:
Check out this session by Andrew Morris, founder and CEO of GreyNoise Intelligence, presented during the SANS CyberSolutions Fest (SOC/SOAR) in October 2021.
Abstract
Everyone in the SOC is too busy, and everything feels on fire all the time. With an insane amount of “bad” on the internet driving alert volumes in the SOC sky-high, it's hard to see which “bad” to focus on--because there’s simply too much noise. Automation is the obvious answer, but finding time to set up the right automation is difficult. Join GreyNoise founder and CEO Andrew Morris on this webinar to learn just how bad “internet noise” is today, and what you can do to reduce the racket and reclaim the value of your SIEM and SOAR.
Andrew will cover:
Andrew Morris:
Hey, I'm Andrew Morris. I'm the founder and CEO of GreyNoise. I'm really excited to be talking to everybody right now. Little bit just about my background really quickly. I founded GreyNoise about four years ago. And before that I worked on on the R&D team over at Endgame, which was great, I was solving technical problems that the the intelligence community was having, and that security teams were having. And before that, I actually worked in an offense. But generally I've been in cybersecurity my entire career. And I've been I've been doing it since I was just a wee lad. And I'm really, really interested in I've always been really interested in esoteric datasets. And in, you know, looking at thinking about problems in a different way than then other people have thought about conventionally before.
And so some of that has manifested quite a bit in GreyNoise. And so I'm really, really excited to talk to you guys today. And if anybody has any questions about anything, as I'm talking, drop them into different chats, shoot them, to me directly here, over the Zoom client, I'll do my very best to answer as many questions as I possibly can. I really tried to make this presentation with as much content and as little filler as humanly possible. So I'm going to try to stick to that and be efficient, and and please encourage everybody to ask questions at any point.
So let's go ahead and keep moving. So the first problem that I'm going to talk about here right now is that everyone in the SOC is too busy, the SOC is slammed, right, and anybody here who works in the SOC knows this, I can't tell you how many times I have had a conversation with a different SOC analyst, SOC manager, etc. And we've talked about, you know, ways that, you know, really cool, interesting work that they could be doing to sharpen those detections into, into into really add more, you know, add more power behind all the different analysts in the SOC. And we've kind of come back to that same conclusion, where it's like, well, we just we don't have time, because there is it's such a slog, there's the alarms going off all the time, there are is there is a constant barrage of alerts, and and of incidents, events, and things to investigate.
So I tried to capture that feeling of working in the SOC, depending on the SOC, the feeling as best I possibly could, with with this visual on the right, of you know, of an iPhone alarm going off like a zillion times in the morning. It kind of feels like that a little bit. In the bottom left, we've got just a screenshot of a security product spitting out about 50,000 super-duper critical alerts, right. And the problem with this is that if everything's on fire all the time, and if everything is an emergency, then nothing is an emergency. And it can be really, really hard to spend more time on the things that really matter more are going to generate more returns to the organization and really let you work better. But it's hard to get out of that slog. Everyone knows that automation is in certain ways, you know, part of the answer, but finding the time to do that's really difficult and setting up the right automation is really, really challenging.
Another problem that I've observed in my conversations with people who are working in the SOC, is that it can be really difficult to figure out which bad to focus on. So what what do I mean by that? There was a time many years ago, where there was some bad on the internet, right? The internet was, you know, maybe maybe 15–20 years ago, where there was bad on the internet, and everybody kind of knew it. But now, for a multitude of different reasons, there is an absolutely insane amount of bad on the internet. There are tons and tons and tons of bad guys that are basically conducting attacks at higher volume at higher scale in an automated fashion that have automated more and more parts of kind of their attack process. And so machine hours are cheap, right once a computer program has been run or once once actually a problem has been gamed out.
Running that at scale is cheap, but the things that are required Tired of a human analysis and like some of that problem solving that feels human hours are really, really expensive. So which bad should the SOC analyst be focusing on? Well, obviously, we would say the most severe bad, the worst bad, the bad that is the most dangerous to the organization, the most high-risk bad, right? But there's so much bad that it can be really, really hard to figure out which bad you should be looking at. Is it? Is it the bad over? You know, is it? Is it this attack over here, this alert that's been going off for a really long time? Is it this kind of unique new thing? Is it this one that's happening from an IP address in a foreign country, it's really hard to figure that out, it's really, really difficult.
Another problem that I'm briefly going to talk about is that new vulnerabilities are being weaponized much, much more quickly than historically. This year alone, we've seen a staggering amount of vulnerabilities that have been disclosed by a vendor. And then very, very quickly, we've seen bad guys start using these vulnerabilities to, uh, to vulnerability, scan and exploit as many machines across the internet as humanly possible. Now, this concept is not new, right? Worms, computer viruses, other different automated attack frameworks, they've been around for a long time, this is not a new novel, unique thing. What is new, is the breakneck speed in between a vulnerability being announced or disclosed in a very major piece of software that is deployed across hundreds or 1000s of organizations across the internet. And bad guys really quickly figuring out a way whether or not they can use that vulnerability to compromise systems and get into get access to their targets, then package those up and start vuln-checking and exploiting people on the internet at very high scale. So now more of these than ever coming out. And it actually creates a noise problem. And I'm going to talk about that today, on the internet. There's always been this concept of a vulnerability being announced and everybody kind of asking that question like, how much do I really care about this vulnerability, right? How much does this thing matter? Do I need to care about this a lot? And the time delay in between the vulnerability coming out and people starting to be exploited on the internet, no matter who you are, is shorter and shorter and shorter. There's a bit of a race on the open internet, between good guys, bad guys, and kind of those that fall somewhere in between, on identifying who is vulnerable to what things on the internet, right?
So good guys want to find, once a new vulnerability comes out some of the cybersecurity companies the Shodan(s), the Censys(s), the attack surface management companies, they want to find that their customers whether or not they're vulnerable or not, right, so they want to find as many servers that are vulnerable on the internet so that they can let you know, right? The bad guys want to do exactly the opposite of that. They want to identify as many systems before the good guys as possible. And before things get patched. And before things get remediated, try to compromise as many of those as humanly possible to so there's somewhat of a race, right. And it means that now more than ever, there are tons and tons and tons of people that are scanning and crawling the internet. And I'm going to talk about what that means very briefly, in this presentation.
I have one case study in here. In particular, it's the recent Apache CVE-2021-41773. So this vulnerability was disclosed earlier this month. And it was a really interesting order of operations, where the patch was submitted, you could see it in the source code revision, or in in the, in the revision control. So in get, then GreyNoise actually started seeing opportunistic vulnerability checks for that before the vulnerability was even actually announced, or the CVE was created. And then a few days later, so a day later, Apache actually increment, they push their patch to general availability, and they disclose the vulnerability to CVE. And everybody can start patching it knows about it.
The whole point that I'm getting at here is that this timeline is really getting shorter and shorter. And what that really means for defenders is that we have to be able to respond much more quickly. It we have to be able to sort and differentiate between all of the noise that's happening on the internet versus the things that really matter to us, right? And the things that we really need to be have eyes on the have eyes on the different alerts and the different threads that are occurring to our organization. And what I'm trying to get at here with this is that the amount of time is getting shorter and shorter and it creates a problem it creates a noise problem on the internet.
So all of these things together, sort of amalgamate to this noise problem on the Internet. The Internet is really, really noisy. What I mean by that is, if you spin up a server on the internet right now in AWS, or in DigitalOcean, or in Google Cloud, or in any hosting provider whatsoever. If you spin up a host that is directly routable on the internet right now, like a VPS, a virtual private server, or an instance or something like that, and you run a packet sniffer on it immediately, you will see devices on the internet, the second it goes up, attempting to scan crawl, probe, exploit brute force your host, and you're gonna have no idea who they are. And you're not going to know the difference between if they're coming after you specifically, or if they are just hitting everybody on the entire internet.
The way that these technologies work here on the left—Shodan, Censys, BinaryEdge, Google—the way that these technologies work is they scan and crawl the internet, one of the ways. For Google, for example, they have to scan and crawl the Internet to index web content, that's easy, you know. For Shodan, Censys, etc. they're finding open ports on all of the hosts around the internet, right. So between the good guys and the bad guys, doing this kind of activity all the time, it's created a noise problem in and that problem is really painful for people who work in a security operation center. it means that can be really, really tricky to work through that noise and try to find signal and find the find the attacks or the behaviors or the activity that matter to your organization the most, because there's a lot of noise.
So I did an experiment. I installed Suricata, which is an IDS, on a number of different hosts on the internet. They had no business value whatsoever, there were no users, there were there were no actual users using it, they were not advertised anywhere, they had no regular business functionality. And I just basically loaded up a bunch of rules, right, and I set them on the internet, and I let it run for four days. And I have here the the specific rules that fired how many times they hired fired, and how many IPs they fired from. I only did this for a handful of hosts, maybe 10 hosts. We were seeing IDS alerts that were flying off from 1000s of IPs, some that we're hitting hundreds of 1000s of times, right? And all of this is what everybody on the internet is seeing all the time. And it makes it really, really difficult to actually identify the threats that matter the most on your perimeter, or that are entering hitting your network, etc.
So I wanted to really display that here visually in a way that would show exactly how noisy the internet is. And what I mean when I talk about internet background noise. So on particularly large networks on the internet, the noise problem is really bad. So on one individual IP address on the internet, one routable IPv4 IP address, it will see approximately 3000 SYNs or probes from about 1000 distinct IP addresses. And that's not the same 1000 IP addresses across the board. By the way, everyone's going to see some amount of overlap. But just the /24 alone, just from the SYN packets alone, /24 on the internet is going to be is going to see about 50MB of unsolicited traffic every day. Just from that, that's a /24. That's 255 IP addresses, if you're on a network that is 1000s of IP addresses, or hundreds of 1000s of IP addresses, or millions of IP addresses, then it's gonna get really, really noisy. And it's going to be it's going to be really tricky to differentiate between the things that really matter in the things that don't on your perimeter. And most people when that happens, they default to just saying I don't care about what happens on my perimeter, right? And that's because there hasn't really been a way around this problem for a long time.
Why is this happening? I mentioned this before, but the bad part of this is that bad guys want to do bad stuff to your network. And some of them are targeting you specifically and some of them aren't. The perimeter is a really obvious place to start. It's not the only place but it's a place to start. Bad guys use credential stuffing. They're checking proxies, they are brute forcing services, slinging exploits, checking for vulnerabilities. And good guys are doing things like web search, right asset discovery, third-party risk, security research, etc. So the whole point that I'm getting at here is that the internet is really noisy, and it creates a lot of problems for bad guys. It creates a lot of problems for the analysts that are sitting in the SOC that have to deal with all this stuff, right? The folks that are in this room right now. It's really painful.
So our proposal at GreyNoise is to enumerate and filter the everywhere and investigate the outliers, identify what's hitting everyone on the entire internet, and optionally subtract that out from what one specific network is seeing—your network is seeing—using the tools that you're already used to using, so that your SIEM or your SOAR platform are able to differentiate between alerts, etc. that are generated by internet background noise, so that you can focus on the ones that are hitting you specifically that matter to your organization. And I'm going to talk about how we do that right now.
So the paradigm that I'm going to use here, grossly oversimplified, is that you can think about it kind of like a rough quadrant, the problem that I'm discussing right now. The IP addresses that are hitting your organization, you've got basically good and bad here, on the left side, and then at the bottom, you've got everywhere and allele. In the top left, we have the Mereis, bots, worms, etc. That's the everywhere bad, right? Now, the everywhere bad can hurt you, but the everywhere bad is not nearly as dangerous to your organization as the targeted bad guy, as the APT, as the sophisticated actor. It does not pose as much risk to your organization as the motivated attacker who's coming after you. The bottom left the good everywhere, that's going to be your Google(s), your security companies, your universities, your security researchers, etc. And then the bottom right, the good—only you—that's obviously your users, your employees, your customers, it's just regular people who are using your services, etc.
So with GreyNoise, what we're trying to do is we're really trying to make your existing security products much more valuable, we're trying to bring value back to the alerts that you are looking at that you're investigating. And you can think of us like a service. It's kind of like noise cancelling headphones, right, or kind of like a spam filter for your SIEM or for your SOAR products, right? The way that email works right now, obviously, you've got your inbox, right, the things that you really need to read. But then you've also got that kind of spam folder over here, or you've got the folder of all the stuff that your email client decided, "Ah, I don't really think that's relevant, I'm making a decision to kind of put this over here." We're really trying to bring that back to to the security operations center by really starting with alert fatigue on the perimeter and moving to other adjacent problems. So I'm going to talk about that here right now.
With GreyNoise, I have right here, just a really quick slide. Like I said, I want to keep this with as much content as humanly possible. So all of the stuff that I'm talking about right now, these are all things that you can do right now without having a single conversation with us. You can go to greynoise.io right now and you can create a free account. You can even do a couple lookups without creating an account at all. You can look up a hit our API, a free on authenticated Community API, and you can dump logs into our analysis page and get the results right back.
The way the GreyNoise works is we operate a gigantic network of passive collector sensors—that's what we call it, passive collector sensors—in 1000s of different places around the globe. In different countries, in different data centers in different hosting providers and different ISPs—that look like different things, that act like different things, that are sitting in a ton of different places. They sit back and listen. They don't scan the Internet, they listen to the internet, more kind of like a honeypot than like someone's getting the internet, right? So we're completely different from a Shodan or a Censy, right? It's completely passive. We listen to internet background noise, all of that data from all of those different sensors. They stream all of the data into a central place. And our research team writes analytics on that data all the time constantly. And they're writing things like: this is an SSH brute force; this is Shodan; this is Google; this is someone exploiting this vulnerability, right?
The reason that we do that is that we want the data to be as explainable as humanly possible, as transparent as humanly possible. We want to provide as much context as humanly possible to the analysts we possibly can, right? Because if we're telling people to ignore something, we need to give them a lot of really good reasons why, right, we need to give our users the ability to really, you know, kind of double check that. So I have in here, basically, we have this web interface that you can go to right now. We have a free unauthenticated API down in the bottom left, you can hit that right now. You don't have to talk to us. And we have our analysis page in the bottom right. If you have any questions about any of this stuff, check out the website, hit me up, hit up the team.
This is the paradigm that we're really going after. With GreyNoise. We're really trying to help you waste less time with the everywhere bad and the everywhere good. So that you can focus much more of your time on the targeted attacks that matter to your organization. That's what that's what we're hoping for our users to accomplish using GreyNoise, right? Make those alerts matter more, get rid of the ones that are hitting everybody, or at least at least have a way to say, show me all the things that aren't hitting everyone, I just want to see the stuff that's hitting me, right? Less time spent, less time wasting time investigating alerts, that are that provide way less analytical value to the organization and more time looking at the things that really matter to you guys, right? Spend more time focusing on the threats that matter to you using the tools that you're already using.
How much does it suck when you've been investigating something for 10 minutes, an hour, a day, and eventually you figure out like, this is an IP address, this is Shodan, right? This is BitSight. This is not a big scary bad guy, right? Or even I thought this thing was really coming after us, specifically. But it turns out, all my friends are seeing this too, right? That's just it's a waste of time. So before GreyNoise, you would say things like, "Hey, let's just look at you know, failed login attempts, brute forces, IPs that are attempting to exploit our perimeter, IPs that are conducting reconnaissance on my network. They're scanning us, right?" And it's overwhelming. It's not useful. You can't do anything with that.
But after GreyNoise, you can ask questions like, "I see I see failed login attempts and a brute force attack right here. It's only hitting us, it's not hitting anybody else. This is unique to us. This is special, right? I see devices, I see an IP address over here. It's attempting to exploit my network. But it's not trying to exploit other people's networks, it's only coming after us. Right? And I see people that are scanning my network, specifically, they're not scanning the internet, they're not scanning subset, they are coming after us, I need to pay attention to this, I need to know what they found, I need to scan that same thing to see like what shows up what comes back, right?", etc.
This is what we're hoping to help you accomplish with GreyNoise, we have a ton of free products that you can use right now without having to buy anything, we have an Enterprise product that if you're interested, you can talk to our to our team about, right? So our users that get the most value out of GreyNoise are using us within the products that their analysts are using with workflows that they're already used, right, you've got your SIEMS, you have your SOAR platforms, those are the places that make the most sense with GreyNoise.
We also have a lot of people who use GreyNoise in their threat intelligence platform. And some of the more mature organizations actually have a data pipeline more or less a data pipeline that is, you know, moving telemetry from one place to another where, you know, we can actually enrich and filter data in route as it's doing that, right. But I just wanted to demonstrate right here, these are some of the products that you can use GreyNoise with right now. So you can go to our documentation and check it out. If and if there's something that you want to use that I don't have on this list, reach out to us because we'll help you build that integration so that we can get you guys using it as soon as possible.
And with that, that's really it. I just wanted to take a take a take a pause really quickly and see if anybody has any questions about GreyNoise about our definition of security, efficiency, how we do what we do, etc. Does anyone have any questions about anything whatsoever?
Unknown Interviewer:
So I think we don't have any in the queue. So I'm curious, a little bit about the kind of methodology that you're using in terms of the the approach to the collection, you had said that you were doing passive only sensors. That's a really nice idea. I'm wondering, how were you able to actually then do the sort of correlation and collection between those sensors, and end up with intelligence about that distinction and differentiation? I know that you have some labeling and things that you actually track, say on a webpage. But what's some of the stuff that you're doing specifically for that?
Andrew Morris:
Yeah, so there are a number of different axes that you can look at the data. There's there's volume, you know, kind of pervasiveness, how much of how many of our sensors, it's touched, what it did, when it touched one of our sensors, the organization owner of the IP address, any rDNS or user agents they were using when they were crawling the internet, things like that. Those are all little data points, that all go into our basically denormalized data that we have in the backend. And our research team is always writing analytics on top of that data to add those tags, of which we have at this point, I think almost 1000 tags like: What is it doing? Who is are they good or bad? Do they respect opt-out lists? How much of the Internet are they scanning? How long have they been around? Things like that. So it's Shodan? How do you know it's showdown? (etc.) Those are those are a lot of the questions that we're asking of the data. And we're trying to provide access to to our users so that they can make more confident decisions with with the products that they're already using.
Unknown Interviewer:
And so you're writing these contextualization, enrichments and then effectively ingesting threat intelligence or producing threat intelligence in order to in order to track the information that you're collecting. And then sharing that back with people.
Andrew Morris:
Yeah, the only the only, the only thing that I don't, that I don't agree with completely is that we don't ship all of our data to our users. The data is huge. So we use APIs so our users and our customers will query us about an IP address and we'll tell them everything that we know about it, right? And so we're enriching the alerts. But the data on our side is like, many, many terabytes a day, so we can't actually like, trust me, you don't want to deal with data in bulk. It's a nightmare. It's really big. And it's just a bunch of annoying, stupid PCAPs. So it's not going to be something that's going to make much sense to you. But otherwise, you're exactly right.
Unknown Interviewer:
Right, I meant that you were doing that contextualization, enrichment, intelligence work. But you're not, you're not delivering that. You're delivering a rating. We're simplifying slightly, but you're delivering a rating. Maybe a multi-dimensional rating is probably a better way to express that.
Andrew Morris:
So yeah, exactly. And so we're just helping basically, you know, the tools that you're already using have access to this data with as little work as possible from the analyst.
Unknown Interviewer:
There's a specific question in the Zoom about details related to context API, query limits, and so on. And maybe I'm going to ask you to actually have that conversation with the person that asked it offline in the Slack. Because it's very detailed...
Andrew Morris:
I was gonna say, probably most people that are listening to this right now don't care. So whoever you are, I'm looking forward to talking to you about that API.
Unknown Interviewer:
We're gonna put that in the in the in the thread to handle that. And let me ask you one other question, because we have about a minute left. Here's my thought, what can we do? I know that you're trying to turn to basically turn down the noise. But what can we do at a larger scale to address the root problem? I mean, have you had any thoughts on that? If you were able to suggest a technical implementation? And I'm not talking about a social change, because I don't think that we have the leverage to do that. But what sort of technical implementation could actually make sense on the larger-scale internet that would make it so that the noise would no longer be there? Is there anything that's like that, that you can think of? And I know, I'm asking an esoteric question and putting you on the spot for it. But just in case you have a thought for it.
Andrew Morris:
I am going to default back to what Dan Geer said in Cybersecurity as Realpolitik. After a period of time, devices that have been end of life have to either be open sourced or they have to be basically, they have to have the ability to patch themselves. So like IoT devices are a huge part of the problem. I would say, I only have social answers, like I know we have, we only have a minute left. But like the one thing that I just want to say is that I know that this is an unpopular thing to say, but your logs aren't as special as you think they are. And if you share as much as you possibly can with other organizations that are dealing with problems that are similar to yours that are also dealing with security problems, you guys are going to be able to get to your answers much more quickly.
So from our perspective, people have to be willing to look up an IP address against us, for us to give them the data that they need. To some people, they think that that's really sensitive, because it's portraying, like "Hey, maybe this person knows what IP address I'm interested in; or maybe they think that I got hacked." You gotta forget about all that. I think I think that more important than anything is sharing, transparency, and giving people that you trust giving other organizations the ability to check things or corroborate things against what you guys are seeing as well. "Hey, is anyone else seeing this thing? Is this unique to us? How special is this? Did you guys see this thing too?" I think that that's really, really powerful. And right now, we're not doing a good job of doing that systematically. And I'm hoping that GreyNoise can help people do that better in the future.
Unknown Interviewer:
So effectively, large-scale data mining is is needed. There's a trust and sharing issue associated with it. But the technical underpinning of it is an adequate engine to be able to ingest and assess enough information that the bad actors actually get culled out.
Andrew Morris:
So that's exactly right. We have to share more data.
Unknown Interviewer:
That's really cool. So awesome. Well, thank you very much, Andrew. I really appreciate the talk. I think it was I think it was a great one.