In this piece, I will guide you through my thought process while conducting an investigation, aiming to identify and collect sources of intelligence. Specifically, we will focus on utilizing open-source intelligence, Our primary goal is to monitor intelligence information from the following sources: ‘markets’ and ‘some shady activities.’ While I will present a case I worked on some time ago, I won’t provide details about the case itself. Instead, I will discuss the tools I used and how to conduct a hunt and collect intelligence and analysis techniques. As you join me on this journey, I hope you don’t get lost in my thought process.
We will shift our focus more on OSINT, Analysis for intelligence, Scouring Forums and Marketplaces, How to Infiltrate These Forums, mining intel, as well as usein’ Search Engines, and Blockchain Analytics.
[Tools and Resources]
- TorBot/TorCrawl: For crawling Tor-based sites.
- Selenium: For automating interactions with marketplaces.
- Wallet Explorer/OTX: For blockchain forensics.
- Wayback Machine: For historical snapshots of websites.
Case Study
Alright, so the case was simple. The profile we were hunting and their operation had already been investigated by other researchers, but our task was simply the process of collecting, analyzing, and disseminating actionable information(TTPs), When I received the case, there was already some internal research available, so it was easy to focus solely on the profiles and fingerprints of their activity across the web, and to uncover any hidden connections about the operation.
So,Simple, right? However, as you proceed through the investigation and hunt, you uncover a lot more information and details that may have been ignored or missed, and you have to decide how to proceed.
Keep this in mind as we cover more. As you can see below, we have two targets: one is a marketplace, the other is more like a personal service. These sites are collected and analyzed to determine the most likely explanation for a given set of data, which may indicate they are run by the same individual or entity, as they share a lot of characteristics and other small details. However, this is still just an assumption. We need facts and proof that they are indeed run or operated under the same actor, and that’s what we’re going to do.
There are several methodologies for analyzing CTI once it has been collected.
-
Evaluating multiple competing hypotheses to determine the most likely explanation for a given set of data. This has already been done.
-
Identifying and challenging the assumptions underlying an intelligence assessment, ensuring that they are well-founded and not based on unfounded beliefs or biases.
+---------+--------------------------------------------------------------------+
| Site | Description |
+---------+--------------------------------------------------------------------+
| Site A | Operates as a marketplace, facilitating the sale of hacking tools |
| | and personal identifying information (PII). |
+---------+--------------------------------------------------------------------+
| Site B | Appears to be a platform offering hacking services. |
+---------+--------------------------------------------------------------------+
The challenge was to prove that these sites were run by the same individual or group. To do this, I needed to find points of similarity and gather concrete evidence.
The Mindset of a Researcher
Alright, time to dig into this, Some researchers try to gather as much information as possible so they can analyze and organize data later. However, I like to see this from an offensive standpoint “Where they messed up?” A simple detail or vulnerability that I can be use to de-anonymize the operation or launch an offensive attack on the target,
I prefer to double-check any research that’s already been done, even if someone hands me a full file. There’s something satisfying about digging through the layers yourself and validating (or sometimes disproving) the data. The goal here isn’t just to confirm what’s already known but to see what’s been overlooked because there’s always something.
When hunting, you have to think like your target. If I were them, where would I slip? What seemingly minor oversight would I write off as harmless? Maybe it’s outdated software or a misconfigured service left running. Perhaps they’ve reused usernames or a PGP key across platforms.
Tip: Always double-check existing research. Even if someone hands you a complete file, there’s value in validating the data yourself. You might uncover something new.
To conduct an investigation, you need a profile, most likely a cybercriminal one. You don’t have to do anything illegal; you just need to appear as one. For that, you’ll need credibility, such as being an existing member of forums to vouch for you, or you can pay for entry.
If we talk about forums, establishing connections and using social engineering to infiltrate forums and establish a reputation can be your foothold. Alternatively, all this can be bypassed by targeting accounts with reputation and credibility, and using them as puppets with no need for a profile.
There are many accounts out there, especially in these “dark web” forums, that can be easily taken over. Since we’re not dealing with a multi-million-dollar company that has its own red team, But depending on who you ask, this may be fallen under “Legality and Ethics” Hacking laws differ across jurisdictions;
Be aware of what you’re doing and how you interact and move through these sites, Simply put, it boils down to one thing: Control over information and actions to prevent any attempts at turning them against you.
But how does this relate to the site itself? Since you’re not investigating a forum, how does this come into play? Well, this will come into play once you identify persons of interest who run or claim to run this site, and many of them have accounts across different TOR-based forums. But let’s not jump the gun. Let’s start by performing a simple information gathering on site A.
Info Gathering & Research
So, Site A. Your casual TOR-based Marketplace. However, the site was relatively new, not so popular, but had been running for some time with a low profile. It was the usual suspects: wildlife sales, phishing toolkits, stolen identification, and so on.
The first step is to crawl and collect data from sites. This involves collecting and searching for specific keywords and returning metadata to discover new links and content, possible emails, and Bitcoin addresses. Additionally, performing service fingerprinting on discovered services as well. However, for now, let’s focus on data collection.
Lucky for us, there are crawlers designed to collect data from sites on the Tor network. For example, TorBot is a tool that can crawl any URL hosted on the Tor network and save website data to an output JSON file. Another tool called TorCrawl can also crawl any URL on the Tor network. In addition to saving site data to an output HTML file, it can search for specific keywords and return metadata about the crawl itself.
There are other tools available as well. I usually leverage Selenium for its flexibility to address the unique authentication requirements of various marketplaces and to gather the exact data we needed for our analysis of illicit digital goods. For example, I gather and crawl domains in engines like Ahamia
or those posted in other CTI sharing researcher communities.
The list goes on; it can encompass thousands of different types of sites to run checks on those that are active and those that are not. This way, you can have a kind of a phone book that gets you where you need to go, There’s so much you can pull, but let’s get back to our subject, Site A.
Once I’ve extracted links and endpoints present in them at the time, I shift my focus to usually look for URLs with parameters like foo.php?id=2
. I run a simple fuzz on files (pdf, png, xml, etc.) and intel (emails, BTC, ETH, etc.).
I can’t really tell you how many vulnerabilities from outdated packages or outdated software are used in such sites, so usually, you look for SQL injection, directory traversal, and more on this later. But once the data is gathered and you have an idea about the overall site, You move to the other site “B” and do exactly the same.
Once you’ve gathered data, you start to cross-reference intel and build a report. Usually, at this time, I start noticing the resemblance between the two sites – the same technology is used, but also some small details. All this data must be captured and reported as you conduct an investigation; it’s like you’re writing a Pentest report. As you move, you document what’s going on because at times, .onion sites often go down for prolonged periods of time or entirely disappear.
Alright, we’re trying to find a link between two different onion sites. Now, past the information gathering phase, we start by identifying the administrators and popular vendors of site “A”. So, we begin by creating an intelligence profile, taking note of the following:
- Username / Alias
- Date of account creation / Online, Offline (map out an activity pattern)
- PGP public key (Important! Reused keys indicate related accounts)
- Type of merchandise offered
- Methods of contact
Now, it’s time to translate the intel we collected into actors, events, and attributes. First things first, remember that thing about the offensive approach? Well, in the data, I noticed a couple of things.
The site “B” is running on the ‘Apache web server,’ which is fairly standard. However, where gets interesting is that the site operator seems to have forgotten to disable the Apache status module, also known as mod_status or server-status. This module provides information about the requests Apache is currently serving and has recently served. like:
The time the server was last started/restarted and the duration it has been running and Details about the current hosts and requests being processed.
And more other juicy intel, but we keep moving as per the playbook. So, two profiles of interest showed up in the data regarding site “A”: one labeled as Admin and another as a vendor. Both were engaged in multiple activities like leaking sensitive data and such. I decided to focus more on gathering information about these profiles.
My goal is to determine whether I can link them to any profiles or accounts on other sites or forums. Starting with the administration of Site A, it is now time to cross-reference that intelligence profile and search for any useful information. I was able to trace the username across various forums, one of which is xss[.]is
. In case you’re unfamiliar, “xss is a Russian forum that hosts discussions on vulnerabilities, exploitation, malware, and various other cyber-related topics.” great forum.
To thoroughly scope the profile, I initiated a quick crawl to retrieve a link. Upon checking the link, I received no response, indicating that the site is no longer operational. Plus, the profile associated with it has been inactive since the last recorded activity in 2021, which mean’s the link is down and the profile was create before all this mess, So I run the site through BackMachine to see if any snapshots taken,
Snapshots dating back to 2018, with ones from 2019, 2020, and 2021.
As we go into the snapshots, we discover a subdomain labeled ‘Sub’ and ‘Services.’ Interestingly, it turns out that the site was offering (Pen-Testing) services before it went offline, What’s even more intriguing is that before it transformed into a service, it started as a personal blog in the 2018 snapshot. This blog featured articles on hacking, tools, and related topics.
As we carefully examine all of this, we managed to collect the following information:
+-------------------------------------------+----------------------------------+
| Site A | Site B |
+-------------------------------------------+----------------------------------+
| Bitcoin Address | Bitcoin Address |
| A name linked to the administration | Email Address |
| alias | |
| Country of residence | |
| Email Address | |
+-------------------------------------------+----------------------------------+
Now that we’ve managed to collect and analyze some valuable information on the administration of Site A, and we know that the Admin has an interest in hacking services, we still need a key that links the site to it.
I did some research and hunted for the email address to see where they’re connected and what services they’ve been using, which all led me to a set of digital footprints of a guy across the web, from social networks like LinkedIn to a YouTube channel and so on. With a possible identification of the owner of Site A, it’s time to examine the evidence provided by the bitcoin addresses.
Of course, along the way, you piece together the TTPs (Tactics, Techniques, and Procedures) used by these actors to insight into their operational methods.
Blockchain Forensics
From a single Bitcoin address, various insights can be derived, including the total number of transactions, the sources and amounts of incoming funds, the destinations and amounts of outgoing funds, a historical timeline of transactions, and identification of other associated Bitcoin addresses within the same wallet. This is where the websites Wallet Explorer and OTX become relevant and come into play.
Usually, the first approach is to start looking for patterns and correlations to link multiple addresses. We also map the flow of funds and relationships between addresses to uncover suspicious activities or money laundering schemes, and extract and analyze additional data associated with transactions, timestamps to gain further insights.
With these tools, we are able to identify any other bitcoin addresses owned by the same wallet.
When we input the address into the explorer, the displayed data includes transaction records, each with specific information like dates and the amounts sent or received. Notably, one of the transactions received funds from an unfamiliar sender (address beginning with “06f”), allowing us to discern the shared ownership of these addresses and subsequently unveil the complete wallet.
With a transaction history dating back to 2019, we now have a time frame that matches our investigation. Let’s proceed to scrutinize the individual transactions associated with each of these Bitcoin addresses.
These two sites are related since their bitcoin addresses come from the same wallet, confirming that the individuals behind them are the same. Transaction History explores how funds have moved in and out of the address, potentially revealing patterns or connections to other addresses.
Most of the transactions paid into these accounts resemble normal transactions when viewed on the blockchain. However, upon further examination, some transactions involve multiple addresses, possibly indicating the use of a bitcoin mixing service. This is normal, as many actors use a mixing service, or cryptocurrency tumbler, to guarantee anonymity by essentially scrambling the addresses and the payments made.
Likely, the bitcoin address is of an exchange, or it may be a well-used bitcoin tumbling service, explaining the large volume of bitcoin addresses it holds in its wallet, allowing it to essentially scramble transactions.
Lessons Learned ?
- Leverage Blockchain Forensics Bitcoin addresses are a goldmine for uncovering connections.
- Exploit Misconfigurations, Small oversights, like an enabled
mod_status
module, can provide valuable intelligence. - Think Like the Adversary
Final Thoughts
It’s important to note that using open-source information from the dark web and blockchain can only take you so far. We’ve pieced together information from various sources to uncover connections between seemingly unrelated elements, revealing hidden activities. Remember, the key to success is persistence there’s always more to uncover, and every investigation is an opportunity to learn something new.
Until next time, happy hunting!
CTI Research