OSINT – theory and practice
Framework
The framework for Open source intelligence is both sources for the searched data and ways to obtain and analyze it. The whole framework depends on the goal and capacities of the research in which the OSINT method is utilized. This means that two OSINT projects with different goals most likely would have completely different frameworks. This can even happen for researches with the same goals. For example, this year an emergence of OSINT techniques in tracking of the latest developments in Ukraine war can be observed.
While having the same general goal — looking as deep as possible into the fog of war — different researchers have their own subgoals, i.e. tracking weaponry losses like Oryx project or tracking movements of armies like Conflict Intelligence Team. In addition, the researchers use wide variety of methods from analyzing of social media publications, photos and videos, to using plane- and ship-tracking services and even traffic functions of Google Maps to track movement of the armies.
Goal of research
In many cases OSINT research starts with a certain goal and this goal shapes the whole framework: which data needs to be acquired, where it is searched and how it is analyzed. However, there are cases when the framework is defined by the data. This can happen after different leaks of documents, personal information or any other data. Examples for this can be the whole WikiLeaks project, where investigators worked with leaked secret documents, or investigations that followed the leak of Yandex’s food delivery service clients, which among other things allowed to uncover properties owned by Putin’s close circle.
Sources of information
As mentioned above, OSINT can work with any data that is open to the public. Generally the sources of information could be divided in a few categories:
- Internet
- Social media
- Blogs and forums
- Maps and tracking services
- Web analysis services like Google Analytics
- Other online publications
- Media
- Magazines and papers
- TV
- Radio
- Online outlets
- Government data
- Official declarations
- Land registries
- Government contracts
- Other documents
- Speeches of officials
- Academic publications
- Commercial data
- Databases
- Other services that can provide necessary data (i.e. satellite image sources, company information, etc)
All of those sources can be interlinked — as generally nowadays most of the government information, media, academic publications, etc are in the internet.
Tools to collect data
Tools to collect necessary data depend of the type of the data. Generally, the main sources for open source intelligence are social media and, most importantly, Twitter. The reason for this is the news and current events orientation of this website and powerful advanced search capabilities. Using TweetDeck researcher can formulate a search request for what they seek and get real time updates. There are also wide capabilities of using Twitter API to parse its data and structure it. There are also possibilities of using API of other social media but it is much more limited.
OSINT also heavily utilizes search engines so it’s a good idea to learn advanced search tools. In addition, it might be useful to use more than one search engine as some of the information can be withdrawn from the results due to legal reasons or terms of service.
To fully utilize possibilities of traditional media for your research it would be useful to have subscriptions to the biggest agencies or outlets. As these subscriptions can be really expensive, especially when one might need all of them, it’s also good idea to learn how to surpass paywall — in most cases it can be done easily with the incognito mode of the browser or some kind of webarchive service.
For the official government information in a lot of cases it is possible to subscribe to an RSS or email updates about new documents and press-releases. If this is not possible, one might write a script that parses a page that he is interested in and notifies about any updates of its content. Utilizing of some services like government contracts registers might need extensive training to analyze. Other services like land registries in many countries require payment for its information, so it might be not the best starting point for collecting data.
Researcher might also need subscriptions to commercial services that are needed for the analysis. Examples of such services might include Flightradar24, Similarweb, Himera Search and others. In addition, there are services, for example Telegram bots, that search through the know data breaches for a certain entry.
A VPN should be used by the researcher both due to reasons of security and access to information, moreover, a possibility to choose servers in different countries might be useful. Different countries have different information laws and different services can restrict access for foreign users so not all needed data might be possible to acquire from researcher’s location.
Data organization
While an OSINT enthusiast may be adept at data collection, he or she will never develop the necessary data organization skills and tools to become a true professional. There are numerous methods for storing data, including basic text files or notes. However, using text files is impractical, as when there is a large amount of data, it becomes unmanageable. Features desirable for OSINT data management include the ability to export and backup, as well as visualize data.
Examples of software for OSINT data organization and their disadvantages:
- Simple Notes Apps (unmanageable when dealing with a large amount of data)
- Evernote (useful when paid for)
- Notion (notes cannot be accessed offline)
- Joplin (inconvenient organization for large projects)
- Obsidian.md Obsidian.md (a bit tricky to master)
Obsidian.md
Obsidian.md, being perplexive in comparison to simple notes application, contains all the desirable features. It is a cross-platform, free application for organizing notes stored in markup (.md) files. Notes and files are stored on a user's computer, and there is also a premium feature for syncing, which is superfluous given that backups using any online storage service, Syncthing software, or Git. Given that OSINT specialists often work in teams, it is recommended to store the data in a Git repository in order to retain a history of modifications and increase collaboration capability.
Vaults
Obsidian.md contains all data in what are referred to as "Vaults." A vault is a project that houses all of it's associated notes and information.
Plugins
Obsidian.md supports the installation of community plugins that extend the app's initial functionality.
Recommended plugins
- Dataview – Allows us to treat a vault as a database, querying and visualizing information from notes and files.
- BreadCrumbs – Adds link types and notes hierarchy.
- Juggl – Create mindmaps based on your notes and customize their looks with CSS and internal styling features.
Plugin installation
- Open Settings – the button is in the bottom-left corner of the application.
- Choose 'Community Plugins' from the 'Options' clause.
- Switch 'Safe Mode' to OFF and confirm it.
- Click 'Browse Community Plugins'.
- Find the plugin.
- Click 'Install'.
- Go back to 'Community Plugins' submenu.
- In the bottom section turn on the newly installed plugin.
Folding vs Tagging and Linking
Simple folder structure is sufficient, when it comes to organizing data in nonoverlapping groups. It is enough to have just a couple of folders in your photogallery, for example. But in OSINT it is important to have a more sophisticated structure.
Tagging
Tagging adds structure because a piece of data can have several tags, as opposed to folders, which can only have one organizing unit per file.
Tag structure example:
- #people #processes #technology (part targeted)
- #primary #supportive #irrelevant (importance)
- #finished #unfinished (state of note/file)
- #web #registry #socialengineering (means of getting the information)
Linking
Linking enables the creation of relationships between notes and files. This manner, one note can include connections to other notes and files, making it easier to handle. For example, if John purchased the domain name legit.com, John's note can be linked to legit.com's note, which contains information about the domain.
Link types
Using link types opens up even more possibilities. Link types are included in Breadcrumbs Plugin for Obsidian.md. In the aforementioned situation of John and legit.com, John is the domain's owner, thus, the domain is John's asset. These are called types of relations. If it is later revealed that John purchased another domain name - fake.com – the new domain can be connected back to John. This structure will be displayed in the notes by creating two relations of John's ownership:
- John – owner of legit.com, fake.com
- legit.com – asset of Johh, relative of fake.com
- fake.com – asset of John, relative of legit.com
Dataview plugin
Dataview is, first and foremost, a data index, so it supports relatively rich methods of adding metadata to your knowledge base. Dataview tracks information at the markdown page and markdown task levels, with each page/task able to contain an arbitrary number of complex (numbers, objects, lists) fields. Each field is a named value of a specific type (like "number" or "text").
Example of notes with arbitrary metadata and a tag:
jason_statham.md
--- name: Jason Statham salary: 7500 department: Cyber Forensics notes: [ "Potential phishing target", "Mother has stage T4 cancer" ] --- #employee
bruce_lee.md
--- name: Bruce Lee salary: 8000 department: Developer Operations notes: [] ---
Querying dataview data
Options for querying data:
- Dataview query language
- Dataview Javascript API
Both can be used to, as an example, render a table from jason_statham.md and bruce_lee.md with four columns:
- File – contains a link to the file
- Name – metadata 'name'
- Salary – metadata 'salary'
- Department – metadata 'department'
It can also be sorted by 'salary'.
Dataview query language
The dataview query language is a straightforward, organized custom query language that enables you to quickly create views from data. It enables the following:
- Retrieve pages related with tags, folders, and links, among other things.
- Simple actions on fields, such as comparison, existence checks, and so on, can be used to filter notes/data.
- Sorting results according to their fields.
The query language is capable of generating the view kinds, which are detailed below:
- TABLE: The standard view type; one row for each data point, with multiple columns of field data.
- LIST: A list of pages that correspond to the query. Each page can have a single linked value.
- TASK: A collection of tasks whose pages correspond to the specified query.
To query data with Dataview Query Language the 'dataview' language specification for a codeblock is used.
Example result of a data query
File:Dont know how to embed images yet The queries leading to this result are listed below.
The general format of queries:
```dataview TABLE|LIST|TASK <field> [AS "Column Name"], <field>, ..., <field> FROM <source> (like #tag or "folder") WHERE <expression> (like 'field = value') SORT <expression> [ASC/DESC] (like 'field ASC') ```
Example with jason_statham.md and bruce_lee.md
```dataview TABLE name as "Name", salary as "Salary", department as "Department" FROM #employee SORT salary ASC ```
Dataview Javascript API
The Dataview JavaScript API allows arbitrary JavaScript to be executed with access to the dataview indices and query engine, which is useful for complex views or interoperability with other plugins. To query data with Dataview Javascript API the 'dataviewjs' language specification for a codeblock is used. The API is accessible via the implicitly provided dv (or dataview) variable, which allows you to query for data, render HTML, and configure the view.
Example with jason_statham.md and bruce_lee.md
```dataviewjs let employees = dv.pages("#employee") .sort(emp => emp.salary, "asc") .map(emp => [emp.file.link, emp.name, emp.salary, emp.department]) dv.table(["File", "Name", "Salary", "Department"], employees) ```
Conclusion
There is no defined standard for OSINT data organization, because the data may come in different forms, including, but not limited to, web-pages, paper documents, online calendars, video and audio recordings. Due to this, it is nearly impossible to create a convenient tool for all use cases. If the operation is big enough, it might be feasible to create a devoted web application that stores all necessary data in a database. However, since OSINT itself it usually a highly confidential activity, publishing the application in Clear Web is a privacy and a security risk.