OSINT – theory and practice: Difference between revisions

From ICO wiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 53: Line 53:
A useful tool of analysis is visualization, especially when it comes to location-based research and big databases of structured information.
A useful tool of analysis is visualization, especially when it comes to location-based research and big databases of structured information.
The exact methodology of analysis, as well as the data collection, should be determined by the researcher at the start of the work.
The exact methodology of analysis, as well as the data collection, should be determined by the researcher at the start of the work.


= Social media intelligence (SMI or SOCMINT) =
= Social media intelligence (SMI or SOCMINT) =

Revision as of 18:56, 24 April 2022

Framework

Example of OSINT analysis

The framework for Open source intelligence is both sources for the searched data and ways to obtain and analyze it. The whole framework depends on the goal and capacities of the research in which the OSINT method is utilized. This means that two OSINT projects with different goals most likely would have completely different frameworks. This can even happen for researches with the same goals. For example, this year an emergence of OSINT techniques in tracking of the latest developments in Ukraine war can be observed.

While having the same general goal — looking as deep as possible into the fog of war — different researchers have their own subgoals, i.e. tracking weaponry losses like Oryx project or tracking movements of armies like Conflict Intelligence Team. In addition, the researchers use wide variety of methods from analyzing of social media publications, photos and videos, to using plane- and ship-tracking services and even traffic functions of Google Maps to track movement of the armies.

Goal of research

In many cases OSINT research starts with a certain goal and this goal shapes the whole framework: which data needs to be acquired, where it is searched and how it is analyzed. However, there are cases when the framework is defined by the data. This can happen after different leaks of documents, personal information or any other data. Examples for this can be the whole WikiLeaks project, where investigators worked with leaked secret documents, or investigations that followed the leak of Yandex’s food delivery service clients, which among other things allowed to uncover properties owned by Putin’s close circle.

Sources of information

As mentioned above, OSINT can work with any data that is open to the public. Generally the sources of information could be divided in a few categories:

  • Internet
    • Social media
    • Blogs and forums
    • Maps and tracking services
    • Web analysis services like Google Analytics
    • Other online publications
  • Media
    • Magazines and papers
    • TV
    • Radio
    • Online outlets
  • Government data
    • Official declarations
    • Land registries
    • Government contracts
    • Other documents
    • Speeches of officials
  • Academic publications
  • Commercial data
    • Databases
    • Other services that can provide necessary data (i.e. satellite image sources, company information, etc)

All of those sources can be interlinked — as generally nowadays most of the government information, media, academic publications, etc are in the internet.

Tools to collect data

Example of TweetDeck request. Source: bellingcat.com

Social media

Tools to collect necessary data depend of the type of the data. Generally, the main sources for open source intelligence are social media and, most importantly, Twitter. The reason for this is the news and current events orientation of this website and powerful advanced search capabilities. Using TweetDeck researcher can formulate a search request for what they seek and get real time updates. There are also wide capabilities of using Twitter API to parse its data and structure it. There are also possibilities of using API of other social media but it is much more limited.

Search engines

OSINT also heavily utilizes search engines so it’s a good idea to learn advanced search tools. In addition, it might be useful to use more than one search engine as some of the information can be withdrawn from the results due to legal reasons or terms of service.

Traditional media

To fully utilize possibilities of traditional media for your research it would be useful to have subscriptions to the biggest agencies or outlets. As these subscriptions can be really expensive, especially when one might need all of them, it’s also good idea to learn how to surpass paywall — in most cases it can be done easily with the incognito mode of the browser or some kind of webarchive service. This should eliminate most of the costs and leave possibilities to subscribe to media with hard paywall like Der Spiegel

Government information

For the official government information in a lot of cases it is possible to subscribe to an RSS or email updates about new documents and press-releases. If this is not possible, one might write a script that parses a page that he is interested in and notifies about any updates of its content. Utilizing of some services like government contracts registers might need extensive training to analyze. Other services like land registries in many countries require payment for its information, so it might be not the best starting point for collecting data.

Commercial subscriptions

Researcher might also need subscriptions to commercial services that are needed for the analysis. Examples of such services might include Flightradar24, Similarweb, Himera Search and others. In addition, there are services, for example Telegram bots, that search through the know data breaches for a certain entry.

VPN

A virtual private network (VPN) should be used by the researcher both due to reasons of security and access to information, moreover, a possibility to choose servers in different countries might be useful. Different countries have different information laws and different services can restrict access for foreign users so not all needed data might be possible to acquire from the researcher’s location.

Tools to analyze data

The analysis of the acquired data in most cases would be the most challenging part of the OSINT. In case when the information is mostly text it is easier as text is much easier to analyze using parsing, programming, or just word search.

When dealing with photos or videos it would be most likely needed to analyze them personally — or double-check after the used algorithm of analysis if there is one for the task.

A useful tool of analysis is visualization, especially when it comes to location-based research and big databases of structured information. The exact methodology of analysis, as well as the data collection, should be determined by the researcher at the start of the work.


Social media intelligence (SMI or SOCMINT)

SOCMINT (Social Media Intelligence) is a collection of search methods and technologies. These forensic techniques are designed to keep records of social media platforms and users. Many people use social media to connect their smartphone (cellphone number) and computer system. Furthermore, social media platforms such as Facebook, Instagram, YouTube, Twitter, and Pinterest, as well as IM chat systems such as WhatsApp, Facebook Messenger, and WeChat, can map social networks (friends and contacts). In fact, their artificial intelligence algorithms are proficient at data collection and profiling. Intelligence gathering, open-source intelligence (OSINT), and other surveillance activities are all linked to social media intelligence. SOCMINT can be carried out overtly or covertly. [1]

Social media content type

Data available on social media sites can be classified into two categories:

  1. The original content posted by the user – such as Facebook text content or an uploaded image
  2. The metadata associated with original content – multimedia files metadata, the date/time and geo-location info associated with the posted content, social media ID and bookmarking (Pinterest)

People use social networking platforms for a variety of reasons. The following are some of the most common interactions seen on social networking sites:

  • Post/comment: People utilize social media platforms to upload or write paragraphs of text that other users may see. Such posts can also include the user's location (this function is known as a "Check-in" on Facebook).
  • Reply : A text message (or a picture, video, or URL) that responds to another user's post, status update, or remark.
  • Multimedia content (images and videos): Multimedia is quite popular; a user may include a movie or a photograph in their message. Many social media sites enable users to create albums by uploading photographs or videos. Many social media services, like Facebook, Twitter, and YouTube, offer live streaming. This feature allows users to broadcast live videos and save the recordings for subsequent viewing on their accounts.
  • Social interactions : The foundation of social media sites is that individuals connect online by sending and responding to requests from other users.
  • Metadata : The total of a user's interactions with a social media network. Examples include the date and time a video/image was submitted, the date and time a friend request was accepted, the geolocation data of the uploaded multimedia file or post (if enabled), and the type of device used to upload the contents mobile or a standard computer).

SOCMINT is interested in collecting all these types of content, but its capacity to do so is limited by the amount of privacy control established by each user when making posts/updates online. For example, if someone restricts a post's availability to friend circles or sets it to "Only me," it is impossible to see his updates on Facebook. [2]

Classification of social media platforms

Classification of different social media platforms

The following are the main social media types classified according to function:

  1. Social networking : This allows people to connect with other people and businesses (brands) online to share information and ideas. Examples include Facebook and LinkedIn.
  2. Photo sharing : Such websites are dedicated to sharing photos between users online. Examples include Instagram & Flicker.
  3. Video sharing : Such websites are dedicated to sharing videos, including live video broadcasts. The most popular one is YouTube. Please note that Facebook and Twitter also offer live video broadcast services.
  4. Blogs : This is a type of informational website containing a set of posts belonging to one topic or subject organized in descending order according to the publish date. The most popular blogging platforms are WordPress and Blogger, which are powered by Google.
  5. Microblog : allows users to publish a short text paragraph (which can be associated with an image or video) or a link (URL) to be shared with other audience online. Twitter is the most popular example.
  6. Forums (message board) : This is one of the oldest types of social media. Users exchange ideas and discussions in a form of posted messages and replies. Reddit is an example.
  7. Social gaming : Refers to playing games online with other players in different locations. It has gained more popularity recently. KAMAGAMES and zynga are examples of this type.
  8. Social bookmarking : These websites offer a similar function to your web browser’s typical bookmark. However, they allow you to do this online and share your Internet bookmarks among your friends in addition to adding annotations and tags to your saved bookmarks. Examples include: Atavi and Pinterest
  9. Product/service review : These websites allow their users to review—give feedback—about any product or service they have used. Yelp and Angie’s List are examples of this type.


Search tools for social media

Facebook search tools and services

There are many online services that simplify the process of acquiring/analyzing information from Facebook accounts. The following are the most useful ones:

  • Lookup ID [1]: This site helps you to find Facebook personal IDs. This ID is necessary when using any of the previous online services –mentioned previously- used to compliment Facebook standard keyword search.
  • Facebook Page Barometer [2]: This site gives statistics and insight about specific Facebook profiles or pages.
  • Information for Law Enforcement Authorities [3]: Offers information and legal guidelines for law enforcement/authorities when seeking information from Facebook and Instagram.
  • A directory of free tools and online services for searching within Facebook can be found at: [4]

Twitter advanced search operators

Twitter search operators

Like Google, Twitter allows specialized operators to find related tweets more precisely. Twitter search operators are already available on the Twitter developer site, go to [5] to view them. Twitter search operators can be incorporated with other criteria to create more advanced search queries to find related tweets more precisely, the following are some advanced Twitter search queries to start the search with.

  • The negation operator (-) is used to exclude specific keywords or phrases from search results.
    virus –computer
  • To search for hashtags, use the (#) operator followed by the search keyword. For example:
     #OSINT 
  • To search for tweets sent up to a specific date, use the (until) operator.
    OSINT until:2019-11-30 (this will return all tweets containing OSINT and sent until November 30, 2019) 
  • To search for tweets sent since a specific date, use the (since) operator followed by the date.
    OSINT since:2019-11-30 (this will return all tweets containing OSINT and sent since November 11, 2019)
  • Use the (images) keyword to return tweets that contain an image within it.
    OSINT Filter:images(this will return all tweets that contain the keyword OSINT and have an image embedded within them) 
  • To return tweets with video embedded with them, use the (videos) keyword (similar to the images filter).
    OSINT Filter:videos 
  • To search for videos uploaded using the Twitter Periscope service, use the (Periscope) filter.
    OSINT filter:periscope (this will search for all tweets containing the OSINT keyword with a Periscope video URL)
  • To return tweets with either image or video, use the (media) operator.
    OSINT Filter:media
  • To return tweets that contain a link (URL) within them, use the (links) keyword.
    OSINT Filter:links
  • To return tweets that contain a link (URL) and hold a specific word within that URL, use the URL keyword.
    OSINT url:amazon this will return all tweets that contain OSINT and a URL with the word “amazon” anywhere within it 
  • To return tweets from verified users only (verified accounts have a blue check mark near their names), use the (Verified) operator.
  • Use the (min_retweets) operator followed by a number.
    OSINT min_retweets:50 (this will return all tweets containing the OSINT search keyword that have been retweeted at least 50 times) 
  • Use (min_faves) followed by a number to return all tweets with NUMBER or more likes.
    OSINT min_faves:11 (this will return all tweets that have at least 11 or more likes and that contain the OSINT search keyword) 
  • To limit Twitter returned results to a specific language, use the (lang) operator.
    OSINT lang:en (this will return all tweets containing OSINT in the English language only) 
Twitter analysis service: Spoonbill
  • To search for tweets with a negative attitude use the following symbol 🙁
    OSINT 🙁 will return all tweets containing the keyword OSINT written in a negative attitude.

Twitter analysis services

The following are online services to help you find information on Twitter:

  1. All My Tweets [6]: View all public tweets posted by any Twitter account on one page.
  2. Trendsmap [7]: This shows you the most popular trends, hashtags, and keywords on Twitter from anywhere around the world.
  3. First Tweet [8]: Find the first tweet of any search keyword or link.
  4. Social Bearing [9]: Analyze Twitter followers of any particular account (a maximum of 10,000 followers can be loaded).
  5. Spoonbill [10]: Monitor profile changes from the people you follow on Twitter [3]


Data organization

While an OSINT enthusiast may be adept at data collection, he or she will never develop the necessary data organization skills and tools to become a true professional. There are numerous methods for storing data, including basic text files or notes. However, using text files is impractical, as when there is a large amount of data, it becomes unmanageable. Features desirable for OSINT data management include the ability to export and backup, as well as visualize data.

Examples of software for OSINT data organization and their disadvantages:

  • Simple Notes Apps (unmanageable when dealing with a large amount of data)
  • Evernote (useful when paid for)
  • Notion (notes cannot be accessed offline)
  • Joplin (inconvenient organization for large projects)
  • Obsidian.md Obsidian.md (a bit tricky to master)

Obsidian.md

Obsidian.md, being perplexive in comparison to simple notes application, contains all the desirable features. It is a cross-platform, free application for organizing notes stored in markup (.md) files. Notes and files are stored on a user's computer, and there is also a premium feature for syncing, which is superfluous given that backups using any online storage service, Syncthing software, or Git. Given that OSINT specialists often work in teams, it is recommended to store the data in a Git repository in order to retain a history of modifications and increase collaboration capability.

Vaults

Obsidian.md contains all data in what are referred to as "Vaults." A vault is a project that houses all of it's associated notes and information.

Plugins

Obsidian.md supports the installation of community plugins that extend the app's initial functionality.

Recommended plugins

  1. Dataview – Allows us to treat a vault as a database, querying and visualizing information from notes and files.
  2. BreadCrumbs – Adds link types and notes hierarchy.
  3. Juggl – Create mindmaps based on your notes and customize their looks with CSS and internal styling features.

Plugin installation

  1. Open Settings – the button is in the bottom-left corner of the application.
  2. Choose 'Community Plugins' from the 'Options' clause.
  3. Switch 'Safe Mode' to OFF and confirm it.
  4. Click 'Browse Community Plugins'.
  5. Find the plugin.
  6. Click 'Install'.
  7. Go back to 'Community Plugins' submenu.
  8. In the bottom section turn on the newly installed plugin.

Folding vs Tagging and Linking

Simple folder structure is sufficient, when it comes to organizing data in nonoverlapping groups. It is enough to have just a couple of folders in your photogallery, for example. But in OSINT it is important to have a more sophisticated structure.

Tagging

Tagging adds structure because a piece of data can have several tags, as opposed to folders, which can only have one organizing unit per file.

Tag structure example:

  1. #people #processes #technology (part targeted)
  2. #primary #supportive #irrelevant (importance)
  3. #finished #unfinished (state of note/file)
  4. #web #registry #socialengineering (means of getting the information)

Linking

Linking enables the creation of relationships between notes and files. This manner, one note can include connections to other notes and files, making it easier to handle. For example, if John purchased the domain name legit.com, John's note can be linked to legit.com's note, which contains information about the domain.

Link types

Using link types opens up even more possibilities. Link types are included in Breadcrumbs Plugin for Obsidian.md. In the aforementioned situation of John and legit.com, John is the domain's owner, thus, the domain is John's asset. These are called types of relations. If it is later revealed that John purchased another domain name - fake.com – the new domain can be connected back to John. This structure will be displayed in the notes by creating two relations of John's ownership:

  1. John – owner of legit.com, fake.com
  2. legit.com – asset of Johh, relative of fake.com
  3. fake.com – asset of John, relative of legit.com

Dataview plugin

Dataview is, first and foremost, a data index, so it supports relatively rich methods of adding metadata to your knowledge base. Dataview tracks information at the markdown page and markdown task levels, with each page/task able to contain an arbitrary number of complex (numbers, objects, lists) fields. Each field is a named value of a specific type (like "number" or "text").

Example of notes with arbitrary metadata and a tag:

jason_statham.md

---
name: Jason Statham
salary: 7500
department: Cyber Forensics
notes: [
  "Potential phishing target",
  "Mother has stage T4 cancer"
]
---
#employee

bruce_lee.md

---
name: Bruce Lee
salary: 8000
department: Developer Operations
notes: []
---

Querying dataview data

Options for querying data:

  1. Dataview query language
  2. Dataview Javascript API

Both can be used to, as an example, render a table from jason_statham.md and bruce_lee.md with four columns:

  1. File – contains a link to the file
  2. Name – metadata 'name'
  3. Salary – metadata 'salary'
  4. Department – metadata 'department'

It can also be sorted by 'salary'.

Dataview query language

The dataview query language is a straightforward, organized custom query language that enables you to quickly create views from data. It enables the following:

  • Retrieve pages related with tags, folders, and links, among other things.
  • Simple actions on fields, such as comparison, existence checks, and so on, can be used to filter notes/data.
  • Sorting results according to their fields.

The query language is capable of generating the view kinds, which are detailed below:

  • TABLE: The standard view type; one row for each data point, with multiple columns of field data.
  • LIST: A list of pages that correspond to the query. Each page can have a single linked value.
  • TASK: A collection of tasks whose pages correspond to the specified query.

To query data with Dataview Query Language the 'dataview' language specification for a codeblock is used.

Example result of a data query

File:Dont know how to embed images yet The queries leading to this result are listed below.

The general format of queries:
```dataview
TABLE|LIST|TASK <field> [AS "Column Name"], <field>, ..., <field> 
FROM <source> (like #tag or "folder")
WHERE <expression> (like 'field = value')
SORT <expression> [ASC/DESC] (like 'field ASC')
```

Example with jason_statham.md and bruce_lee.md

```dataview
TABLE name as "Name", salary as "Salary", department as "Department"
FROM #employee 
SORT salary ASC
```

Dataview Javascript API

The Dataview JavaScript API allows arbitrary JavaScript to be executed with access to the dataview indices and query engine, which is useful for complex views or interoperability with other plugins. To query data with Dataview Javascript API the 'dataviewjs' language specification for a codeblock is used. The API is accessible via the implicitly provided dv (or dataview) variable, which allows you to query for data, render HTML, and configure the view.

Example with jason_statham.md and bruce_lee.md

```dataviewjs
let employees = dv.pages("#employee")
	.sort(emp => emp.salary, "asc")
	.map(emp => [emp.file.link, emp.name, emp.salary, emp.department])
dv.table(["File", "Name", "Salary", "Department"], employees)
```

Conclusion

There is no defined standard for OSINT data organization, because the data may come in different forms, including, but not limited to, web-pages, paper documents, online calendars, video and audio recordings. Due to this, it is nearly impossible to create a convenient tool for all use cases. If the operation is big enough, it might be feasible to create a devoted web application that stores all necessary data in a database. However, since OSINT itself it usually a highly confidential activity, publishing the application in Clear Web is a privacy and a security risk.

References

  • Pastor-Galindo, Javier, et al. "The not yet exploited goldmine of OSINT: Opportunities, open challenges and future trends." IEEE Access 8 (2020): 10282-10304.
  • Richelson, Jeffrey T. The US intelligence community. Routledge, 2018.
  • Williams, Heather J., and Ilana Blum. Defining second generation open source intelligence (OSINT) for the defense enterprise. Rand Corporation, 2018.

APA

  • [11] First Steps to Getting Started in Open Source Research / Bellingcat
  • [12] The Most Comprehensive TweetDeck Research Guide In Existence / Bellingcat
  • [13] How to Use SOCMINT for Better Cause?
  1. [14]"SOCMINT – Social Media Intelligence Gathering "
  2. [15] "A Guide To Social Media Intelligence Gathering (SOCMINT)"
  3. [16] "How to Use SOCMINT for Better Cause?"