GreyNoise Python Scripting with Brad Chiappetta

Summary

At GreyNoise, we collect, analyze and label data on IPs that saturate security tools with noise. This unique perspective helps analysts waste less time on irrelevant or harmless activity, and spend more time focused on targeted and emerging threats.

The GreyNoise Python SDK is an abstract python library built on top of the GreyNoise API. It is the preferred library for interactions with the GreyNoise API for implementing integrations and/or tools. The library includes a small client to interact with the API. It also provides a CLI to call GreyNoise API directly from the command line.

Read the transcript

Hello, my name is Brad Chiappetta with GreyNoise. And today I'm going to be going over some basic Python scripting around interacting with the GreyNoise API. I'll be going through each one of the examples showing usage of the Python requests library, but also then leveraging the GreyNoise SDK, which in many cases can simplify the process for which you interact with the GreyNoise API. First up, I'm going to be showing a basic interaction with our ping or connectivity checking endpoint. And you'll this will be the basic pattern that I use in most of the examples that I go through, obviously, you'll be able to go through and modify them to better suit the needs of what you're trying to do. But for the most part, these will be the general patterns that you'll be following when you're interacting with GreyNoise API.

So in this first example, I've defined the API endpoint that I want to interact with, in which case, this is the ping endpoint, I've got my selection of headers that we're passing along, you'll notice that we use the key key here to indicate the API key for the GreyNoise API. And then I'm also including a user agent, this is helpful to sort of identify, identify the different types of scripting and integrations that you're building on. So we often use this to sort of build out those identifying markers, they're not required, but it's suggested when it can be used. In this particular case, the ping endpoint is a get endpoint. So I'm using requests.get here. And then I'm just gonna pass out both the status code and the response from this endpoint just so we can go see what this looks like. If we go ahead and run this, we can see that we get a 200 back from here, and also get the payload back that we would expect to get from this particular endpoint.

To show this same example, using the GreyNoise SDK, let me go ahead and copy that in here. Alright, in this particular case, again, to use the SDK, we're going to go ahead and import from the Windows SDK in this fashion, we're going to define a client or a session using this particular pattern, again, passing in that API key. And if you're building an integration or wanting to do some tracking along with what you're building here, you can pass along the integration name here as well. And then, in the SDK, we reference the ping endpoint, test_connection function. And so I'm gonna go ahead and execute this and just return the results of this as well. And so you can see, we get the same exact response back here. And by returning the response here, it's assumed that there is a status code 200. And that's why you're getting this. If it was a different status code other than that, it would give additional information around what the error was and what the potential problem was there.

So next up, I'm going to move on to the GreyNoise Community API. Again, we'll first take a look at it. In Python requests, we can again see that we're defining the community API here. This is a very simple IP lookup endpoint. And the intent being that we just do a simple get to this endpoint with the IP address appended at the end. And so in my request here, I'm going to go ahead and build that URL plus IP pattern, still passing through the same headers. As a note for the community API. The key here is optional. We do have an unauthenticated version of this, it does have different rate limits than the authenticated version. So we do encourage you to use an API key one you're able to, but for this endpoint is not required. So we'll go ahead and we'll run through so we can see what the payload looks like here, we sort of get back this expected payload from from that endpoint, giving us the IP whether or not it's part of our internet scanning noise dataset, whatever. It's part of our common business service provider data set, what it's classified as if we know who the provider actor is, and a link back to that data on our Visualizer as well as the last time that we've seen this particular idea as well.

So doing this same exact action, we'll go ahead and do this within the Python SDK. This is the only one that is slightly different. Because when you're interacting with the community API via the SDK, you need to include this additional offering equals community value here when you instantiate the session. And what that does is that tells the SDK that you want to just use the community endpoints and then that will go ahead and still allow you to use the IP command but it will actually take that IP and request Test it through the community API. Without that flag there, this command is going to go ahead and hit the IP context endpoint, which if you have a community level API key, it would not have access to. So I'm gonna go ahead and execute this. And you see that we get that same response back as we did in our first example.

OK, next, we're going to take a look at the GreyNoise context endpoint. This is the IP lookup endpoint to check and see if an IP address belongs in our internet scanning database, or whether we've had observed scanning activity across our sensor network. Same exact pattern in this one as we had with the community API. So we've defined what our API endpoint is here. And we're gonna go ahead and just append the IP address to the end of that request. And we're still again, just doing a simple get for this particular endpoint. So we go ahead and run that. And then we get back again, our status 200, and then also the payload that is returned in the JSON format. And for this one, we're getting back pretty much everything that we know from a scanning perspective.

So when the first time it was that it hit our sensor network, when the most recent time was it hit our sensor network, the fact that it's hit our sensor network, and it's, in fact, noise are seen on that network. If we have any tagging information, if we're able to define what attack there is, for that particular IP address, whether or not, we think there is potential for the traffic to be spoofed our classification, associated CDEs, if there is common bot activity or VPN association, and then a whole bunch of GeoIP data as well as the raw scanning data that we have observed for that IP address as well.

To do this same concept in the GreyNoise. SDK, we're gonna go ahead and pass this along here. In this case, we're gonna go ahead and remove the offering tag here. So that this way, we're using the enterprise API, again, gotta pass through an IP address. And here we go. And this is actually showing us a different IP address. And it's actually giving us a negative response here. So this is what you would get back to say, hey, this IP address is not part of our internet scanning database. And so there's no scanning activity associated with this that we have observed. If we're going to go ahead and copy in the IP address that we used in the previous example, we can bring that over, I'm just going to go ahead and replace this here. And you'll see I'm just passing the IP address here in the context here, and you can pass that as a variable or directly into the command, we run that in here. And we get that same JSON payload back as well.

All right, now I'm going to go ahead and show very quickly, the GreyNoise multi context endpoint. At the moment, as we were recording this, the multi context endpoint is not part of the GreyNoise SDK, though it will be shortly. So I'm just going to be showing this example here. as it looks, from a pipeline request standpoint, the multi context endpoint gives you the ability to submit a list of IP addresses to the context lookup and return the details on all of the IP addresses that we have information on in that particular lookup. So I'm gonna go ahead and actually append a second IP address here. So we can see what this looks like.

So in this case, we'll have a sample of the quantity here, then we'll include an additional IP as well. So we're gonna go ahead and add this into our list. And then we're gonna go ahead and run through this. And we'll see that we actually get the two results back here. The first one is saying this, you know, quad eight is not seen, you know that we don't have scanning activity for it. But if we go ahead and look at the second portion of that, we should see also, that there is an IP response for this 178 address, in which case, we do get that same payload back that we were getting from the individual context endpoint. So this gives us the ability to go through and provide all of that data for a list of IP addresses rather than having to do them one at a time. And as I mentioned, this is not yet in the GreenWaste SDK, but it will be shortly. And so you will probably by the time you are watching this be able to use an IP multi lookup command within the SDK.

Next, I'm going to go ahead and move on to the quick endpoint or the and what that looks like. In this particular example, here is the setup here. Alright, so this is the multi quick endpoint, where we would go ahead and again, pass in a list of IP addresses, and do what we call a quick lookup where we go ahead and just run through here and get a couple of very simple responses, including the IP address, whether or not it's noise. IP address, whether or not it's a right IP address, and sort of a response code, these codes are in the API reference documentation. But this is one of those instances where using the SDK can actually provide a little bit of benefit, I'm going to go ahead and pull in an example here.

So the quick function that is part of the SDK uses the multi context lookup here. So we can go ahead and run that through here. But I've got a community offering in here. So we want to make sure to pull that back up as this endpoint is not supported with the Community API. So we're gonna go ahead and run that again. And you see now we're running and using the actual community endpoints. And we can see that this success looked up that single IP address here, but we do have the ability to also pass in, paste in a full list of addresses. So I'm going to go ahead and bring that in as well. We're going to put that in here as an input. And then if we go ahead and run this, again, what we should get is sort of that list of responses that includes all of those. So again, that same amount of information. But one of the nice things is that we also get this translated code plus a code message. So the SDK has that built in, so that you don't have to go and do that work yourself, it will provide you what that message is for each one of those IP addresses that is passed into that list.

Now, this quick, this multi quick and multi context endpoints have a limitation of 500 IP addresses per payload. So you're going to want to ensure that if you're passing more than 500, you go ahead and chop this up into different segments and run through multiple times. But in general, you can put up to 500 IP addresses in here for a single call.

Alright, so moving on, we're now going to take a look at the riott endpoint. So we'll go back to our Python example. And we'll see here that we again, are using this just like we did with community and the context endpoint. And we're just going ahead and appending, an IP address to the end of the API. And when we go ahead and run this, we don't get a status back from our riot data set. And so basically, this IP address is in the right data set. So that's why we get this right through flag. And then we get the category information, we get the provider name and basic description and explanation.

Last time, it was updated in the dataset, any references or trust level information that is included as well. From the example of doing these in the SDK, we can go ahead and paste this along here. And now we're going to use the riot command. Alright, so in here, we run this, we're gonna get back to that same exact output as we got. But again, just simplifying it and making it easier to sort of go through and look at that information.

Now we're going to get into the query endpoints. And so first, we're gonna go ahead and take a look at this in Python request format. So the query endpoint, sort of defined in this particular way, where we hit this experimental G and Keo, and even though it shows us experimental is fully in production. So you don't have to worry about the naming convention here in this API. And what we're doing is we're sort of passing in a couple of different parameters to this endpoint, the first being the actual query, alright. And then you can optionally include a size, which is the number of IP address is to return an each response payload that can be anywhere between one and 1000 per request. And then if the query returns more than the more results than is in this size definition, or more than 1000, altogether, it will also include a scroll token that can be used to get the next page of results.

So you can see in this case, I'm sort of building out that sort of slightly more complex lookup, where it's taking my base URL appending, the query, and then the size and the scroll parameters, and then passing along the headers. So we'll go ahead and run this. So you can see the example here. Alright, and in this particular case, I've looked up for just a single record. Alright, so that is why I have a total count of 85, meaning that this query returned 85 different results, I've only pulled in one. So that's why my complete here is false, because I've got a whole bunch more pages. And I've also got that scroll token down here. So if I wanted to pass that along, I could go ahead and scroll through this data. But for each one of these 85, we returned the full IP context record here. So you can get all of that data back and pull through all that information.

If I wanted to do these five at a time, I could go ahead and change this size to five here. Alright, and that now gives me sort of the first five in here again, because I've got a total of 85 here. All right, that is going to require us to go and page through those results using that scroll token as well. If we take a look at this In the SDK, it's a very, very similar pattern. So we're gonna go ahead and paste this in here. And we can see, again, I'm defining the query. Now I'm just defining a variable for size and scroll, and then I can actually pass them to the query command within the SDK. So again, here's the query the size and the scroll values. And we can go ahead and run this. And same thing here, we can see that we get just that single record back. In this case, if I also want to go in and not paste, pass in that size at all, I can run this and it'll go through and pull all 85 results back, because it's going to be less than that 1000 threshold.

Okay, the next endpoint that we're going to take a look at is very similar and functionality in this is going to be the stats endpoint. The stats endpoint is just an overlay of the GNQL query endpoint. And what it does is it just pulls back statistics. So rather than pulling back the actual IP records for each one, it's just gonna give you a basic set of statistics for a query. So in this case, you don't need to pass any size or scroll information. So we're just defining what our query is. So we're going ahead and appending that query to the end of our base URL in this particular example. So we'll go ahead and run this. And then what we'll see is we get the response back to showing the query that we've run through, all right, and then we'll get that total count of IP addresses in the data set that match that query. And then there's a statistics section here, that gives you information on breakdowns on classification, on how many IP addresses are movable versus not movable, breakdown of the organization names, actors, tags, operating systems, categories, and ASNs. So you can see what each one of those components work together.

This is great for sort of monitoring, particular trends, or particularly CPE. These are tags within the data set. So you can see sort of how they grow over time and how their the landscape of that of those queries are changing as well. So we want to take a look at what this looks like in the SDK. There is, again, a command that is specifically for the stats endpoints. So same basic functionality as using the query command, except this time, we're gonna go and pass in just the query, no need for the size or scroll tokens. All right, and then we get that same exact response back from the API endpoint, just again, in a simplified format by using the SDK.

And finally, we have one last endpoint that we want to review here. And that is the tags or metadata endpoint. What this endpoint does, is it's very simple, you just basically do a get call to the actual API endpoint itself, don't need to pass anything along to it. And what it's going to do is it's going to return all of the tags that greenways currently has in the internet scanning database. And so you'll get all of that tag details. In addition to all those tags, down at the very bottom, you'll also get a list of all the VPN services that we're currently tracking that we have seen internet scanning behavior from as well. So you'll be able to reference those if you want to see what that full list is. The SDK does have a similar command. So you can pull this in and run this through the SDK as well. So in this particular case, we call it just this metadata command. And again, we don't need to pass any parameters to it. And it's going to return the same exact information that you got from the API endpoint directly using the other example.

This is useful to populate the additional information on a tag if you're trying to build out sort of a comprehensive overview of what GreyNoise has absorbed a particular IP address doing. Now, I'm going to do a couple of simple examples of how you can go through and use the metadata endpoint to do things like find a specific tag. Again, these are very simple patterns. But you know, they're a common use cases that we often see people wanting to use. Because we don't have the ability in the API endpoint itself to query for a specific tag, there's, you know, again, just a way for you to loop through the responses and find a specific tag, so you can pull out just that particular metadata. So we go ahead, and we look in here, and this is just taking in the value of the tag name that you want details on, it looks through the response that we get. And if there's a matching tag, it pulls out all of that tag data. So you can see that on just an individual tag. And as you can imagine, this pattern is almost identical for for the SDK. So we'll go ahead and show exactly the same thing. Again, just using the metadata command, and then parsing through the response looking for that particular tag match. Another common one that people ask about is a way to determine if GreenWaste has a particular CVE in its database, and whether or not we're monitoring for it.

So in this particular case, you can define a CVE name and then use metadata to go ahead and search for it and see if we have a hit. So even if the CVE isn't active, or we haven't had Add any hits a bit within the dataset, this at least can confirm for you that CVE is being monitored for. And you know, it is something that we're looking for. And again, to just show it within the context of the SDK, we'll go ahead and put in our SDK example here, again, using that metadata endpoint, and then just sort of sorting through the responses looking for that particular CVE. And we get that same document back. So we can say, hey, this tag is associated with this particular CVE. So another thing I want to go through, get a common use case.

So if you're unable to use our SDK for some reason, and you're trying to do sort of basic lookup, on a particular IP address, what this example is doing is taking advantage of sort of the code message that we have defined that you wouldn't necessarily get outside of using our SDK. So what this example is going to do is it does a two phase lookup, where first, it's going to go ahead and take a look at this IP address in the Quick endpoint to determine if it is, in fact in the noise dataset or not. And what it does is if it is in that data set, it's going to go ahead and pull in all that additional context information as well. So we're gonna go ahead and run this and see what gets returned. Alright, so we get sort of the combination output of both the quick end context endpoints here. So we can say, this IP address is not noise, it is not right, here's the code message. Alright, so we've added those, you know, to the sample script here as well. And this way, you can see all of that data here and sort of that more comprehensive measure there. If I go ahead and grab an active IP address here, give an example of what that looks like. So I'm just going to update the IP address in the script here. Alright, and now we can see what that sort of comprehensive responses here.

Again, this is pulling in all of the data from the context endpoint. But it's also appending in the data from the quick endpoint, as well as including a pre-built URL back to the Visualizer, which you don't get back by default from the context endpoint. So if you were building an integration, and you wanted to give people the ability to navigate out to GreyNoise's Visualizer, this again gives you a simple pattern to use to go ahead and pull that information along with including the coded messages that were indicated by the quick endpoint.

All right. And then in addition to that, we have a pattern defined in our documentation for what we call, excuse me, an advanced IP lookup. Alright, so I'm gonna go ahead and walk through what this is doing. So this is actually using a combination of four of our endpoints. So again, it's using the quick endpoint first to check an IP address, then it's going and based on the response from quick checking the context and RIOT endpoints. And then also, if there is context, it's going in expanding the tag information, so you get a more comprehensive Tag List of for that IP address. And again, we sort of, you know, take in our sample IP address, we have a function here to build out that more comprehensive tag explanation. So rather than just getting the tag name, you get all of the details of each tag as well. And then you can see we're doing our basic, quick response. And then based on what's returned there, if it's noise, if it's riot, then doing those secondary calls, and building out that, that full payload.

So if we go ahead and run this, alright, what you'll be able to see is that this is a sample of an IP address that is both in the noise and the riot datasets. So rather than having to go through and look at this data, independently, we're able to merge that into a single output. So you can see all of those. So we can see that this is a scanner that has hit the scanning network. But it's also here as in the right project as part of the Akamai CDN network. So again, you can pull sort of that combination of all that detail together in one shot. And to sort of show a very, you know, similar pattern, but again, using the SDK, we can, you know, simplify this within the SDK itself.

Alright, by again, using just the predefined commands in here. So again, using the quick command, in this case, going in and actually pulling in the context command using the IP here sets all the context data, and then also pulling in all of the right data and then merging them together in the same exact output. Alright, so we can see sort of an example of this IP address here, which is, again, has this more expanded section on the tag. So you get all of that tag metadata in addition to all of the scanning data that you would normally get from just the context and park and that sort of wraps up the content that we wanted to cover today.

If there are any questions or anything that you would like us to clarify, you can certainly reach out to us on the GreyNoise Slack. Or you can email us at support@greynoise.io. Thanks again for attending, and take care.