OSINT – theory and practice: Difference between revisions
No edit summary |
No edit summary |
||
Line 71: | Line 71: | ||
== Classification of social media platforms == | == Classification of social media platforms == | ||
[[File: | [[File:social.png]] | ||
The following are the main social media types classified according to function: | The following are the main social media types classified according to function: | ||
# ''' Social networking ''': This allows people to connect with other people and businesses (brands) online to share information and ideas. Examples include Facebook and LinkedIn. | # ''' Social networking ''': This allows people to connect with other people and businesses (brands) online to share information and ideas. Examples include Facebook and LinkedIn. |
Revision as of 17:04, 24 April 2022
Framework
The framework for Open source intelligence is both sources for the searched data and ways to obtain and analyze it. The whole framework depends on the goal and capacities of the research in which the OSINT method is utilized. This means that two OSINT projects with different goals most likely would have completely different frameworks. This can even happen for researches with the same goals. For example, this year an emergence of OSINT techniques in tracking of the latest developments in Ukraine war can be observed.
While having the same general goal — looking as deep as possible into the fog of war — different researchers have their own subgoals, i.e. tracking weaponry losses like Oryx project or tracking movements of armies like Conflict Intelligence Team. In addition, the researchers use wide variety of methods from analyzing of social media publications, photos and videos, to using plane- and ship-tracking services and even traffic functions of Google Maps to track movement of the armies.
Goal of research
In many cases OSINT research starts with a certain goal and this goal shapes the whole framework: which data needs to be acquired, where it is searched and how it is analyzed. However, there are cases when the framework is defined by the data. This can happen after different leaks of documents, personal information or any other data. Examples for this can be the whole WikiLeaks project, where investigators worked with leaked secret documents, or investigations that followed the leak of Yandex’s food delivery service clients, which among other things allowed to uncover properties owned by Putin’s close circle.
Sources of information
As mentioned above, OSINT can work with any data that is open to the public. Generally the sources of information could be divided in a few categories:
- Internet
- Social media
- Blogs and forums
- Maps and tracking services
- Web analysis services like Google Analytics
- Other online publications
- Media
- Magazines and papers
- TV
- Radio
- Online outlets
- Government data
- Official declarations
- Land registries
- Government contracts
- Other documents
- Speeches of officials
- Academic publications
- Commercial data
- Databases
- Other services that can provide necessary data (i.e. satellite image sources, company information, etc)
All of those sources can be interlinked — as generally nowadays most of the government information, media, academic publications, etc are in the internet.
Tools to collect data
Social media
Tools to collect necessary data depend of the type of the data. Generally, the main sources for open source intelligence are social media and, most importantly, Twitter. The reason for this is the news and current events orientation of this website and powerful advanced search capabilities. Using TweetDeck researcher can formulate a search request for what they seek and get real time updates. There are also wide capabilities of using Twitter API to parse its data and structure it. There are also possibilities of using API of other social media but it is much more limited.
Search engines
OSINT also heavily utilizes search engines so it’s a good idea to learn advanced search tools. In addition, it might be useful to use more than one search engine as some of the information can be withdrawn from the results due to legal reasons or terms of service.
Traditional media
To fully utilize possibilities of traditional media for your research it would be useful to have subscriptions to the biggest agencies or outlets. As these subscriptions can be really expensive, especially when one might need all of them, it’s also good idea to learn how to surpass paywall — in most cases it can be done easily with the incognito mode of the browser or some kind of webarchive service. This should eliminate most of the costs and leave possibilities to subscribe to media with hard paywall like Der Spiegel
Government information
For the official government information in a lot of cases it is possible to subscribe to an RSS or email updates about new documents and press-releases. If this is not possible, one might write a script that parses a page that he is interested in and notifies about any updates of its content. Utilizing of some services like government contracts registers might need extensive training to analyze. Other services like land registries in many countries require payment for its information, so it might be not the best starting point for collecting data.
Commercial subscriptions
Researcher might also need subscriptions to commercial services that are needed for the analysis. Examples of such services might include Flightradar24, Similarweb, Himera Search and others. In addition, there are services, for example Telegram bots, that search through the know data breaches for a certain entry.
VPN
A virtual private network (VPN) should be used by the researcher both due to reasons of security and access to information, moreover, a possibility to choose servers in different countries might be useful. Different countries have different information laws and different services can restrict access for foreign users so not all needed data might be possible to acquire from the researcher’s location.
Tools to analyze data
The analysis of the acquired data in most cases would be the most challenging part of the OSINT. In case when the information is mostly text it is easier as text is much easier to analyze using parsing, programming, or just word search.
When dealing with photos or videos it would be most likely needed to analyze them personally — or double-check after the used algorithm of analysis if there is one for the task.
A useful tool of analysis is visualization, especially when it comes to location-based research and big databases of structured information. The exact methodology of analysis, as well as the data collection, should be determined by the researcher at the start of the work.
Social media intelligence (SMI or SOCMINT)
SOCMINT (Social Media Intelligence) is a collection of search methods and technologies. These forensic techniques are designed to keep records of social media platforms and users. Many people use social media to connect their smartphone (cellphone number) and computer system. Furthermore, social media platforms such as Facebook, Instagram, YouTube, Twitter, and Pinterest, as well as IM chat systems such as WhatsApp, Facebook Messenger, and WeChat, can map social networks (friends and contacts). In fact, their artificial intelligence algorithms are proficient at data collection and profiling. Intelligence gathering, open-source intelligence (OSINT), and other surveillance activities are all linked to social media intelligence. SOCMINT can be carried out overtly or covertly. [1]
Social media content type
Data available on social media sites can be classified into two categories:
- The original content posted by the user – such as Facebook text content or an uploaded image
- The metadata associated with original content – multimedia files metadata, the date/time and geo-location info associated with the posted content, social media ID and bookmarking (Pinterest)
People use social networking platforms for a variety of reasons. The following are some of the most common interactions seen on social networking sites:
- Post/comment: People utilize social media platforms to upload or write paragraphs of text that other users may see. Such posts can also include the user's location (this function is known as a "Check-in" on Facebook).
- Reply : A text message (or a picture, video, or URL) that responds to another user's post, status update, or remark.
- Multimedia content (images and videos): Multimedia is quite popular; a user may include a movie or a photograph in their message. Many social media sites enable users to create albums by uploading photographs or videos. Many social media services, like Facebook, Twitter, and YouTube, offer live streaming. This feature allows users to broadcast live videos and save the recordings for subsequent viewing on their accounts.
- Social interactions : The foundation of social media sites is that individuals connect online by sending and responding to requests from other users.
- Metadata : The total of a user's interactions with a social media network. Examples include the date and time a video/image was submitted, the date and time a friend request was accepted, the geolocation data of the uploaded multimedia file or post (if enabled), and the type of device used to upload the contents mobile or a standard computer).
SOCMINT is interested in collecting all these types of content, but its capacity to do so is limited by the amount of privacy control established by each user when making posts/updates online. For example, if someone restricts a post's availability to friend circles or sets it to "Only me," it is impossible to see his updates on Facebook. [2]
Classification of social media platforms
File:Social.png The following are the main social media types classified according to function:
- Social networking : This allows people to connect with other people and businesses (brands) online to share information and ideas. Examples include Facebook and LinkedIn.
- Photo sharing : Such websites are dedicated to sharing photos between users online. Examples include Instagram & Flicker.
- Video sharing : Such websites are dedicated to sharing videos, including live video broadcasts. The most popular one is YouTube. Please note that Facebook and Twitter also offer live video broadcast services.
- Blogs : This is a type of informational website containing a set of posts belonging to one topic or subject organized in descending order according to the publish date. The most popular blogging platforms are WordPress and Blogger, which are powered by Google.
- Microblog : allows users to publish a short text paragraph (which can be associated with an image or video) or a link (URL) to be shared with other audience online. Twitter is the most popular example.
- Forums (message board) : This is one of the oldest types of social media. Users exchange ideas and discussions in a form of posted messages and replies. Reddit is an example.
- Social gaming : Refers to playing games online with other players in different locations. It has gained more popularity recently. KAMAGAMES and zynga are examples of this type.
- Social bookmarking : These websites offer a similar function to your web browser’s typical bookmark. However, they allow you to do this online and share your Internet bookmarks among your friends in addition to adding annotations and tags to your saved bookmarks. Examples include: Atavi and Pinterest
- Product/service review : These websites allow their users to review—give feedback—about any product or service they have used. Yelp and Angie’s List are examples of this type.
Data organization
While an OSINT enthusiast may be adept at data collection, he or she will never develop the necessary data organization skills and tools to become a true professional. There are numerous methods for storing data, including basic text files or notes. However, using text files is impractical, as when there is a large amount of data, it becomes unmanageable. Features desirable for OSINT data management include the ability to export and backup, as well as visualize data.
Examples of software for OSINT data organization and their disadvantages:
- Simple Notes Apps (unmanageable when dealing with a large amount of data)
- Evernote (useful when paid for)
- Notion (notes cannot be accessed offline)
- Joplin (inconvenient organization for large projects)
- Obsidian.md Obsidian.md (a bit tricky to master)
Obsidian.md
Obsidian.md, being perplexive in comparison to simple notes application, contains all the desirable features. It is a cross-platform, free application for organizing notes stored in markup (.md) files. Notes and files are stored on a user's computer, and there is also a premium feature for syncing, which is superfluous given that backups using any online storage service, Syncthing software, or Git. Given that OSINT specialists often work in teams, it is recommended to store the data in a Git repository in order to retain a history of modifications and increase collaboration capability.
Vaults
Obsidian.md contains all data in what are referred to as "Vaults." A vault is a project that houses all of it's associated notes and information.
Plugins
Obsidian.md supports the installation of community plugins that extend the app's initial functionality.
Recommended plugins
- Dataview – Allows us to treat a vault as a database, querying and visualizing information from notes and files.
- BreadCrumbs – Adds link types and notes hierarchy.
- Juggl – Create mindmaps based on your notes and customize their looks with CSS and internal styling features.
Plugin installation
- Open Settings – the button is in the bottom-left corner of the application.
- Choose 'Community Plugins' from the 'Options' clause.
- Switch 'Safe Mode' to OFF and confirm it.
- Click 'Browse Community Plugins'.
- Find the plugin.
- Click 'Install'.
- Go back to 'Community Plugins' submenu.
- In the bottom section turn on the newly installed plugin.
Folding vs Tagging and Linking
Simple folder structure is sufficient, when it comes to organizing data in nonoverlapping groups. It is enough to have just a couple of folders in your photogallery, for example. But in OSINT it is important to have a more sophisticated structure.
Tagging
Tagging adds structure because a piece of data can have several tags, as opposed to folders, which can only have one organizing unit per file.
Tag structure example:
- #people #processes #technology (part targeted)
- #primary #supportive #irrelevant (importance)
- #finished #unfinished (state of note/file)
- #web #registry #socialengineering (means of getting the information)
Linking
Linking enables the creation of relationships between notes and files. This manner, one note can include connections to other notes and files, making it easier to handle. For example, if John purchased the domain name legit.com, John's note can be linked to legit.com's note, which contains information about the domain.
Link types
Using link types opens up even more possibilities. Link types are included in Breadcrumbs Plugin for Obsidian.md. In the aforementioned situation of John and legit.com, John is the domain's owner, thus, the domain is John's asset. These are called types of relations. If it is later revealed that John purchased another domain name - fake.com – the new domain can be connected back to John. This structure will be displayed in the notes by creating two relations of John's ownership:
- John – owner of legit.com, fake.com
- legit.com – asset of Johh, relative of fake.com
- fake.com – asset of John, relative of legit.com
Dataview plugin
Dataview is, first and foremost, a data index, so it supports relatively rich methods of adding metadata to your knowledge base. Dataview tracks information at the markdown page and markdown task levels, with each page/task able to contain an arbitrary number of complex (numbers, objects, lists) fields. Each field is a named value of a specific type (like "number" or "text").
Example of notes with arbitrary metadata and a tag:
jason_statham.md
--- name: Jason Statham salary: 7500 department: Cyber Forensics notes: [ "Potential phishing target", "Mother has stage T4 cancer" ] --- #employee
bruce_lee.md
--- name: Bruce Lee salary: 8000 department: Developer Operations notes: [] ---
Querying dataview data
Options for querying data:
- Dataview query language
- Dataview Javascript API
Both can be used to, as an example, render a table from jason_statham.md and bruce_lee.md with four columns:
- File – contains a link to the file
- Name – metadata 'name'
- Salary – metadata 'salary'
- Department – metadata 'department'
It can also be sorted by 'salary'.
Dataview query language
The dataview query language is a straightforward, organized custom query language that enables you to quickly create views from data. It enables the following:
- Retrieve pages related with tags, folders, and links, among other things.
- Simple actions on fields, such as comparison, existence checks, and so on, can be used to filter notes/data.
- Sorting results according to their fields.
The query language is capable of generating the view kinds, which are detailed below:
- TABLE: The standard view type; one row for each data point, with multiple columns of field data.
- LIST: A list of pages that correspond to the query. Each page can have a single linked value.
- TASK: A collection of tasks whose pages correspond to the specified query.
To query data with Dataview Query Language the 'dataview' language specification for a codeblock is used.
Example result of a data query
File:Dont know how to embed images yet The queries leading to this result are listed below.
The general format of queries:
```dataview TABLE|LIST|TASK <field> [AS "Column Name"], <field>, ..., <field> FROM <source> (like #tag or "folder") WHERE <expression> (like 'field = value') SORT <expression> [ASC/DESC] (like 'field ASC') ```
Example with jason_statham.md and bruce_lee.md
```dataview TABLE name as "Name", salary as "Salary", department as "Department" FROM #employee SORT salary ASC ```
Dataview Javascript API
The Dataview JavaScript API allows arbitrary JavaScript to be executed with access to the dataview indices and query engine, which is useful for complex views or interoperability with other plugins. To query data with Dataview Javascript API the 'dataviewjs' language specification for a codeblock is used. The API is accessible via the implicitly provided dv (or dataview) variable, which allows you to query for data, render HTML, and configure the view.
Example with jason_statham.md and bruce_lee.md
```dataviewjs let employees = dv.pages("#employee") .sort(emp => emp.salary, "asc") .map(emp => [emp.file.link, emp.name, emp.salary, emp.department]) dv.table(["File", "Name", "Salary", "Department"], employees) ```
Conclusion
There is no defined standard for OSINT data organization, because the data may come in different forms, including, but not limited to, web-pages, paper documents, online calendars, video and audio recordings. Due to this, it is nearly impossible to create a convenient tool for all use cases. If the operation is big enough, it might be feasible to create a devoted web application that stores all necessary data in a database. However, since OSINT itself it usually a highly confidential activity, publishing the application in Clear Web is a privacy and a security risk.
References
- Pastor-Galindo, Javier, et al. "The not yet exploited goldmine of OSINT: Opportunities, open challenges and future trends." IEEE Access 8 (2020): 10282-10304.
- Richelson, Jeffrey T. The US intelligence community. Routledge, 2018.
- Williams, Heather J., and Ilana Blum. Defining second generation open source intelligence (OSINT) for the defense enterprise. Rand Corporation, 2018.
APA