OSINT – theory and practice: Difference between revisions
(25 intermediate revisions by 4 users not shown) | |||
Line 32: | Line 32: | ||
==Tools to collect data== | ==Tools to collect data== | ||
[[File:Tweetdeck.png|thumb|Example of TweetDeck request. Source: bellingcat.com]] | [[File:Tweetdeck.png|thumb|Example of TweetDeck request. Source: bellingcat.com]] | ||
===Social media=== | |||
Tools to collect necessary data depend of the type of the data. Generally, the main sources for open source intelligence are social media and, most importantly, Twitter. The reason for this is the news and current events orientation of this website and powerful advanced search capabilities. Using TweetDeck researcher can formulate a search request for what they seek and get real time updates. There are also wide capabilities of using Twitter API to parse its data and structure it. There are also possibilities of using API of other social media but it is much more limited. | Tools to collect necessary data depend of the type of the data. Generally, the main sources for open source intelligence are social media and, most importantly, Twitter. The reason for this is the news and current events orientation of this website and powerful advanced search capabilities. Using TweetDeck researcher can formulate a search request for what they seek and get real time updates. There are also wide capabilities of using Twitter API to parse its data and structure it. There are also possibilities of using API of other social media but it is much more limited. | ||
===Search engines=== | |||
OSINT also heavily utilizes search engines so it’s a good idea to learn advanced search tools. In addition, it might be useful to use more than one search engine as some of the information can be withdrawn from the results due to legal reasons or terms of service. | |||
===Traditional media=== | |||
To fully utilize possibilities of traditional media for your research it would be useful to have subscriptions to the biggest agencies or outlets. As these subscriptions can be really expensive, especially when one might need all of them, it’s also good idea to learn how to surpass paywall — in most cases it can be done easily with the incognito mode of the browser or some kind of webarchive service. This should eliminate most of the costs and leave possibilities to subscribe to media with hard paywall like Der Spiegel | |||
===Government information=== | |||
For the official government information in a lot of cases it is possible to subscribe to an RSS or email updates about new documents and press-releases. If this is not possible, one might write a script that parses a page that he is interested in and notifies about any updates of its content. Utilizing of some services like government contracts registers might need extensive training to analyze. Other services like land registries in many countries require payment for its information, so it might be not the best starting point for collecting data. | |||
===Commercial subscriptions=== | |||
Researcher might also need subscriptions to commercial services that are needed for the analysis. Examples of such services might include Flightradar24, Similarweb, Himera Search and others. In addition, there are services, for example Telegram bots, that search through the know data breaches for a certain entry. | |||
===VPN=== | |||
A virtual private network (VPN) should be used by the researcher both due to reasons of security and access to information, moreover, a possibility to choose servers in different countries might be useful. Different countries have different information laws and different services can restrict access for foreign users so not all needed data might be possible to acquire from the researcher’s location. | |||
==Tools to analyze data== | |||
The analysis of the acquired data in most cases would be the most challenging part of the OSINT. In case when the information is mostly text it is easier as text is much easier to analyze using parsing, programming, or just word search. | |||
When dealing with photos or videos it would be most likely needed to analyze them personally — or double-check after the used algorithm of analysis if there is one for the task. | |||
A useful tool of analysis is visualization, especially when it comes to location-based research and big databases of structured information. | |||
The exact methodology of analysis, as well as the data collection, should be determined by the researcher at the start of the work. | |||
= Social media intelligence (SMI or SOCMINT) = | |||
SOCMINT (Social Media Intelligence) is a collection of search methods and technologies. These forensic techniques are designed to keep records of social media platforms and users. Many people use social media to connect their smartphone (cellphone number) and computer system. Furthermore, social media platforms such as Facebook, Instagram, YouTube, Twitter, and Pinterest, as well as IM chat systems such as WhatsApp, Facebook Messenger, and WeChat, can map social networks (friends and contacts). In fact, their artificial intelligence algorithms are proficient at data collection and profiling. Intelligence gathering, open-source intelligence (OSINT), and other surveillance activities are all linked to social media intelligence. SOCMINT can be carried out overtly or covertly. <ref>[https://www.arintell.com/cyber-security/socmint-social-media-intelligence-gathering/]"SOCMINT – Social Media Intelligence Gathering "</ref> | |||
== Social media content type == | |||
Data available on social media sites can be classified into two categories: | |||
# The original content posted by the user – such as Facebook text content or an uploaded image | |||
# The metadata associated with original content – multimedia files metadata, the date/time and geo-location info associated with the posted content, social media ID and bookmarking (Pinterest) | |||
People use social networking platforms for a variety of reasons. The following are some of the most common interactions seen on social networking sites: | |||
* ''' Post/comment''': People utilize social media platforms to upload or write paragraphs of text that other users may see. Such posts can also include the user's location (this function is known as a "Check-in" on Facebook). | |||
* ''' Reply ''': A text message (or a picture, video, or URL) that responds to another user's post, status update, or remark. | |||
* ''' Multimedia content ''' (images and videos): Multimedia is quite popular; a user may include a movie or a photograph in their message. Many social media sites enable users to create albums by uploading photographs or videos. Many social media services, like Facebook, Twitter, and YouTube, offer live streaming. This feature allows users to broadcast live videos and save the recordings for subsequent viewing on their accounts. | |||
* ''' Social interactions ''': The foundation of social media sites is that individuals connect online by sending and responding to requests from other users. | |||
* ''' Metadata ''': The total of a user's interactions with a social media network. Examples include the date and time a video/image was submitted, the date and time a friend request was accepted, the geolocation data of the uploaded multimedia file or post (if enabled), and the type of device used to upload the contents mobile or a standard computer). | |||
SOCMINT is interested in collecting all these types of content, but its capacity to do so is limited by the amount of privacy control established by each user when making posts/updates online. For example, if someone restricts a post's availability to friend circles or sets it to "Only me," it is impossible to see his updates on Facebook. <ref>[https://www.secjuice.com/social-media-intelligence-socmint/] "A Guide To Social Media Intelligence Gathering (SOCMINT)"</ref> | |||
== Classification of social media platforms == | |||
[[File: Social-media.png|thumb|Classification of different social media platforms]] | |||
The following are the main social media types classified according to function: | |||
# ''' Social networking ''': This allows people to connect with other people and businesses (brands) online to share information and ideas. Examples include Facebook and LinkedIn. | |||
# ''' Photo sharing ''': Such websites are dedicated to sharing photos between users online. Examples include Instagram & Flicker. | |||
# ''' Video sharing ''': Such websites are dedicated to sharing videos, including live video broadcasts. The most popular one is YouTube. Please note that Facebook and Twitter also offer live video broadcast services. | |||
# ''' Blogs ''': This is a type of informational website containing a set of posts belonging to one topic or subject organized in descending order according to the publish date. The most popular blogging platforms are WordPress and Blogger, which are powered by Google. | |||
# ''' Microblog ''': allows users to publish a short text paragraph (which can be associated with an image or video) or a link (URL) to be shared with other audience online. Twitter is the most popular example. | |||
# ''' Forums (message board) ''': This is one of the oldest types of social media. Users exchange ideas and discussions in a form of posted messages and replies. Reddit is an example. | |||
# ''' Social gaming ''': Refers to playing games online with other players in different locations. It has gained more popularity recently. KAMAGAMES and zynga are examples of this type. | |||
# ''' Social bookmarking ''': These websites offer a similar function to your web browser’s typical bookmark. However, they allow you to do this online and share your Internet bookmarks among your friends in addition to adding annotations and tags to your saved bookmarks. Examples include: Atavi and Pinterest | |||
# ''' Product/service review ''': These websites allow their users to review—give feedback—about any product or service they have used. Yelp and Angie’s List are examples of this type. | |||
== | ==Search tools for social media == | ||
The | === Facebook search tools and services === | ||
There are many online services that simplify the process of acquiring/analyzing information from Facebook accounts. The following are the most useful ones: | |||
* ''' Lookup ID ''' https://lookup-id.com: This site helps you to find Facebook personal IDs. This ID is necessary when using any of the previous online services mentioned previously used to complement Facebook's standard keyword search. | |||
* ''' Facebook Page Barometer ''' http://barometer.agorapulse.com: This site gives statistics and insight about specific Facebook profiles or pages. | |||
* ''' Information for Law Enforcement Authorities''' https://www.facebook.com/safety/groups/law/guidelines: Offers information and legal guidelines for law enforcement/authorities when seeking information from Facebook and Instagram. | |||
* A directory of free tools and online services for searching within Facebook can be found at: https://osint.link/osint-part2/#facebook | |||
=== Twitter advanced search operators === | |||
[[File:Search-operators.png|thumb| Twitter search operators]] | |||
Like Google, Twitter allows specialized operators to find related tweets more precisely. Twitter search operators are already available on the Twitter developer site, go to https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators to view them. | |||
Twitter search operators can be incorporated with other criteria to create more advanced search queries to find related tweets more precisely, the following are some advanced ''' Twitter search queries ''' to start the search with. | |||
* The negation operator (-) is used to exclude specific keywords or phrases from search results. <pre>virus –computer</pre> | |||
* To search for hashtags, use the (#) operator followed by the search keyword. For example: <pre> #OSINT </pre> | |||
* To search for tweets sent up to a specific date, use the (until) operator. <pre>OSINT until:2019-11-30 (this will return all tweets containing OSINT and sent until November 30, 2019) </pre> | |||
* To search for tweets sent since a specific date, use the (since) operator followed by the date. <pre>OSINT since:2019-11-30 (this will return all tweets containing OSINT and sent since November 11, 2019)</pre> | |||
* Use the (images) keyword to return tweets that contain an image within it. <pre>OSINT Filter:images(this will return all tweets that contain the keyword OSINT and have an image embedded within them) </pre> | |||
* To return tweets with video embedded with them, use the (videos) keyword (similar to the images filter). <pre>OSINT Filter:videos </pre> | |||
* To search for videos uploaded using the Twitter Periscope service, use the (Periscope) filter. <pre>OSINT filter:periscope (this will search for all tweets containing the OSINT keyword with a Periscope video URL)</pre> | |||
* To return tweets with either image or video, use the (media) operator. <pre>OSINT Filter:media</pre> | |||
* To return tweets that contain a link (URL) within them, use the (links) keyword. <pre>OSINT Filter:links</pre> | |||
* To return tweets that contain a link (URL) and hold a specific word within that URL, use the URL keyword. <pre>OSINT url:amazon this will return all tweets that contain OSINT and a URL with the word “amazon” anywhere within it </pre> | |||
* To return tweets from verified users only (verified accounts have a blue check mark near their names), use the (Verified) operator. | |||
* Use the (min_retweets) operator followed by a number. <pre>OSINT min_retweets:50 (this will return all tweets containing the OSINT search keyword that have been retweeted at least 50 times) </pre> | |||
* Use (min_faves) followed by a number to return all tweets with NUMBER or more likes. <pre>OSINT min_faves:11 (this will return all tweets that have at least 11 or more likes and that contain the OSINT search keyword) </pre> | |||
* To limit Twitter returned results to a specific language, use the (lang) operator.<pre>OSINT lang:en (this will return all tweets containing OSINT in the English language only) </pre> | |||
[[File:Spoonbill.png|thumbnail|Twitter analysis service: Spoonbill]] | |||
* To search for tweets with a negative attitude use the following symbol 🙁 <pre>OSINT 🙁 will return all tweets containing the keyword OSINT written in a negative attitude.</pre> | |||
===Twitter analysis services=== | |||
The following are online services to help you find information on Twitter: | |||
# ''' All My Tweets ''' https://www.allmytweets.net : View all public tweets posted by any Twitter account on one page. | |||
# ''' Trendsmap''' https://www.trendsmap.com : This shows you the most popular trends, hashtags, and keywords on Twitter from anywhere around the world. | |||
# ''' First Tweet''' http://ctrlq.org/first : Find the first tweet of any search keyword or link. | |||
# ''' Social Bearing''' https://socialbearing.com/search/followers : Analyze Twitter followers of any particular account (a maximum of 10,000 followers can be loaded). | |||
# ''' Spoonbill''' https://spoonbill.io: Monitor profile changes from the people you follow on Twitter <ref>[https://socradar.io/how-to-use-socmint-for-better-cause/] "How to Use SOCMINT for Better Cause?"</ref> | |||
= Data organization = | = Data organization = | ||
Line 57: | Line 128: | ||
However, using text files is impractical, as when there is a large amount of data, it becomes unmanageable. | However, using text files is impractical, as when there is a large amount of data, it becomes unmanageable. | ||
Features desirable for OSINT data management include the ability to export and backup, as well as visualize data. | Features desirable for OSINT data management include the ability to export and backup, as well as visualize data. | ||
== Examples of software for OSINT data organization and their disadvantages | == Examples of software for OSINT data organization and their disadvantages == | ||
* Simple Notes Apps (unmanageable when dealing with a large amount of data) | * Simple Notes Apps (unmanageable when dealing with a large amount of data) | ||
* Evernote (useful when paid for) | * Evernote (useful when paid for) | ||
* Notion (notes cannot be accessed offline) | * Notion (notes cannot be accessed offline) | ||
* Joplin (inconvenient organization for large projects) | * Joplin (inconvenient organization for large projects) | ||
* Obsidian.md Obsidian.md (a bit tricky to master) | * Obsidian.md Obsidian.md (a bit tricky to master) | ||
== Obsidian.md == | == Obsidian.md == | ||
Obsidian.md, being perplexive in comparison to simple notes application, contains all the desirable features. It is a cross-platform, free application for organizing notes stored in markup (.md) files. | Obsidian.md, being perplexive in comparison to simple notes application, contains all the desirable features. It is a cross-platform, free application for organizing notes stored in markup (.md) files. | ||
Line 114: | Line 186: | ||
Each field is a named value of a specific type (like "number" or "text"). | Each field is a named value of a specific type (like "number" or "text"). | ||
=== Example of notes with arbitrary metadata and a tag | === Example of notes with arbitrary metadata and a tag === | ||
<i>jason_statham.md</i> | <i>jason_statham.md</i> | ||
Line 167: | Line 239: | ||
===== Example result of a data query ===== | ===== Example result of a data query ===== | ||
[[File: | [[File:Dataview_query.png]] | ||
The queries leading to this result are listed below. | The queries leading to this result are listed below. | ||
===== The general format of queries: ===== | ===== The general format of queries: ===== | ||
Line 200: | Line 273: | ||
</pre> | </pre> | ||
== | == Note == | ||
There is no defined standard for OSINT data organization, because the data may come in different forms, including, but not limited to, web-pages, paper documents, online calendars, video and audio recordings. Due to this, it is nearly impossible to create a convenient tool for all use cases. If the operation is big enough, it might be feasible to create a devoted web application that stores all necessary data in a database. However, since OSINT itself it usually a highly confidential activity, publishing the application in Clear Web is a privacy and a security risk. | There is no defined standard for OSINT data organization, because the data may come in different forms, including, but not limited to, web-pages, paper documents, online calendars, video and audio recordings. Due to this, it is nearly impossible to create a convenient tool for all use cases. If the operation is big enough, it might be feasible to create a devoted web application that stores all necessary data in a database. However, since OSINT itself it usually a highly confidential activity, publishing the application in Clear Web is a privacy and a security risk. | ||
= OSINT Techniques = | |||
This section is split into several parts and each explains a common type of target investigation. Each part provides every valuable resource and technique which is found beneficial toward the research. This section should serve as a reference when encountering a specific need within an investigation. | |||
== Email Addresses == | |||
[[File:GoogleSearch.jpg|thumb|Google Search by Domain and URL]] | |||
Searching by a person's real name can be frustrating. lf the target has a common name, it is easy to get lost in the results. Even a fairly unique name produces a big number of people's addresses, profiles, and telephone numbers. If your target is named John Smith, you have a problem. This is why it is always better to search by email address when available. If you have your target's email address, you will achieve much better results at a faster pace. There may be thousands of John Wilsons, but there would be only one john.wilson.77089@yahoo.com. | |||
Searching this address within quotation marks on the major search engines is the first preference. This should identify web pages which include the exact details within either the content or the source code. This is the "easy stuff" which may present false positives, but should provide immediate evidence to the exposure of the target account. | |||
Then typically search the username portion of the email address by itself in case it is in use within other providers, such as Gmail, Hotmail, Twitter, Linkedin, etc. Let's conduct an example assuming that "john.wilson.77089@yahoo.com" is your target. You could place this email within quotes and execute through each service manually, or use the following direct search URLs to expedite the process. | |||
Google Email: https://google.com/search?q="john.wilson.77089@yahoo.com" | |||
Google Username: https://google.com/search?q="john.wilson.77089" | |||
Bing Email: https://bing.com/search?q="john.wilson.77089@yahoo.com" | |||
Bing Username: https://bing.com/search?q="john.wilson.77089" | |||
== Usernames == | |||
[[File:Username.jpg|thumb|Name Checker search by username]] | |||
Once you have identified a username for an online service, this information may lead to much more data. Active internet users often use the same username across many sites. For example, the user "amanda62002" on Instagram may be the same "amanda62002" on Twitter and an unknown number of other sites. When you identify an email address, you may now have the username of the target. If a subject uses mpulido007@gmail.com as an email address, there is a good chance that person may use mpulido007 as a screen name on a number of sites. If the target has been an internet user for several years, this Gmail account was probably not the first email address used by | |||
the target. Searches for potential addresses of mpulido007@yahoo.com, mpulido007@hotmail.com, and mpulido007@aol.com may discover new information. Manual searching of this new username information is a good start. Keeping up with the hundreds of social websites available is impossible. Visiting the following services will allow you to search usernames across several websites, and will report links to profiles that you may have missed. | |||
To perform a search, you need to enter the username in a single search field on the selected engine, which will immediately check for the presence of the provided username in the most popular social networks. A search for the username "intelTechnics" through one of these engines provides information about the availability of that username on the top 25 networks. | |||
== Telephone Numbers == | |||
[[File:PhoneNum.jpg|thumb|Search by phone number in That’s Them Engine]] | |||
There are hundreds of websites that claim the ability to search for information on telephone | |||
numbers. These vary from amazingly accurate results to sites that only include advertisements. If you have a target telephone number, there are three phases of your search. First, you need to identify the type of number and provider. The type could be landline, cellular or internet, and the provider could be the company supplying the service. The second is that you need to identify any subscriber information such as the name and address associated with the account. Finally, you need to locate any online web content with a connection to the target number. This can all lead to more intelligence and additional searches. The majority of cellular numbers can now be identified if they are registered in someone's name. If you have an address, you will want to identify the people associated with the address and any telephone numbers the subjects use. | |||
=== Telephone Number Search Websites === | |||
Here you can see a handful of people searching websites which allow the query of a telephone number. These sites all possess unique data sets and each should be searched. Most of the results originate from sources such as property tax data, marketing leaks, phonebooks and various data breaches. Many of these links avoid unnecessary loading screens and advertisements. This will help with the automated tool at the end. Overall, we cannot control the results and telephone search is mostly "what you see is what you get". Replace the demo number (618-462-0000) with your target number. | |||
Fast People Search https://www.fastpeoplesearch.com/618-462-0000 | |||
Sync.me https://sync.me/search/?number=6184620000 | |||
That's Them https://thatsthem.com/phone/618-462-0000 | |||
== Images == | |||
Thanks to cameras on every data cellular phone, digital photograph uploads are extremely | |||
common among social network users. Digital images, logos, and icons can be of great value in OSINT investigations. Major search engines like Google, Yahoo, and Bing provide basic image search engine functionality. However, there are other more specialized image search engines that can be used to get more precise results. This part will identify various photo sharing websites as well as specific search techniques. | |||
=== Basic Image Search === | |||
[[File:BasicImg.jpg|thumb|Google basic image search]] | |||
The following sites offer image search services: | |||
• Google Image Search: (https://images.google.com) | |||
• Bing image search: (www.bing.com/images) | |||
• Yahoo Images: (http://images.yahoo.com) | |||
• Baidu: (http://image.baidu.com) | |||
• Imgur: (https://imgur.com) | |||
• SmugMug: (https://www.smugmug.com) | |||
Google offers Advanced Image Search, where it is possible to set many criteria of search query such as image color, image type (photo, face, clip art, line drawing, animated), region or country, site or domain name, image format type, and usage rights. Google Advanced Image Search can be found at (https://images.google.com/advanced_image_search). | |||
=== Reverse Image Search === | |||
[[File:RevImg.jpg|thumb|Google reverse image search using an image URL]] | |||
A reverse image search uses a sample image instead of a search query. It works by uploading an image or inserting its URL into a reverse image search engine, which will in turn search its index to find where else this image appears online and display all the other locations. In this way, you can know the original source of photographs, memes, and profile pictures. The following are the most popular reverse image search engine sites: | |||
Google reverse search (https://www.google.com/imghp): Google has a dedicated search engine for image reverse searches; you can either paste the image URL in the search box or upload it to Google. | |||
Advancements in computer processing power and image analysis software have made reverse image searching possible on several sites. While a standard search online involves entering text into a search engine for related results, a reverse image search provides an image to a search engine for analysis. The results will vary depending on the site used. Some will identify identical images that appear on other websites. This can be used to identify other websites on which the target used the same image. lf you have a photo of a target on a social network, a reverse analysis of that photo may provide other websites on which the target used the same image. These may be results that were not identified through a standard search engine. Occasionally, a target may create a website as an alias, but use an actual photo of himself. Unless you knew the alias name, you would never find the site. Searching for the site by the image may be the only way to locate the profile of the alias. Some reverse image sites go further and try to identify other photos of the target that are similar enough to be matched. Some will even try to determine the sex and age of the subject in the photo based on the analysis of the image. This type of analysis was once limited to expensive private solutions. Now, these services are free to the public. | |||
=== Image Metadata === | |||
[[File:ImgMetadata.jpg|thumb|Getting metadata using URL in Jeffrey's Image Metadata Viewer]] | |||
Every digital photograph captured with a digital camera possesses metadata known as Exif data.This is a layer of code that provides information about the photo and camera. All digital cameras write this data to each image, but the amount and type of data can vary. This data, which is embedded into each photo "behind the scenes", is not visible by viewing the captured image. You need an Exif reader, which can be found on websites and within applications. Keep in mind that some websites remove or "scrub" this data before being stored on their servers. Facebook, for example, removes the data while Flickr does not. Locating a digital photo online will not always present this data. If you locate an image that appears full size and uncompressed, you will likely still have the data intact. If the image has been compressed to a smaller file size, this data is often lost. Any images removed directly from a digital camera card will always have the data. This is one of the reasons you will always want to identify the largest version of an image when searching online. The quickest way to see the information is through an online viewer. | |||
Jeffrey's Image Metadata Viewer: http://exif.regex.info/exif.cgi | |||
= References = | = References = | ||
*Pastor-Galindo, Javier, et al. "The not yet exploited goldmine of OSINT: Opportunities, open challenges and future trends." IEEE Access 8 (2020): 10282-10304. | *Pastor-Galindo, Javier, et al. "The not yet exploited goldmine of OSINT: Opportunities, open challenges and future trends." IEEE Access 8 (2020): 10282-10304. | ||
Line 207: | Line 380: | ||
*Williams, Heather J., and Ilana Blum. Defining second generation open source intelligence (OSINT) for the defense enterprise. Rand Corporation, 2018. | *Williams, Heather J., and Ilana Blum. Defining second generation open source intelligence (OSINT) for the defense enterprise. Rand Corporation, 2018. | ||
APA | APA | ||
*Open Source Intelligence Methods and Tools: A Practical Guide to Online Intelligence 1st ed.Edition, Nihad A. Hassan, Rami Hijazi | |||
*Open Source Intelligence Techniques: Resources for Searching and Analyzing Online Information 9th Edition, Michael Bazzel | |||
*[https://www.bellingcat.com/resources/2021/11/09/first-steps-to-getting-started-in-open-source-research/] First Steps to Getting Started in Open Source Research / Bellingcat | *[https://www.bellingcat.com/resources/2021/11/09/first-steps-to-getting-started-in-open-source-research/] First Steps to Getting Started in Open Source Research / Bellingcat | ||
*[https://www.bellingcat.com/resources/how-tos/2019/06/21/the-most-comprehensive-tweetdeck-research-guide-in-existence-probably/] The Most Comprehensive TweetDeck Research Guide In Existence / Bellingcat | *[https://www.bellingcat.com/resources/how-tos/2019/06/21/the-most-comprehensive-tweetdeck-research-guide-in-existence-probably/] The Most Comprehensive TweetDeck Research Guide In Existence / Bellingcat |
Latest revision as of 09:52, 27 April 2022
Framework
The framework for Open source intelligence is both sources for the searched data and ways to obtain and analyze it. The whole framework depends on the goal and capacities of the research in which the OSINT method is utilized. This means that two OSINT projects with different goals most likely would have completely different frameworks. This can even happen for researches with the same goals. For example, this year an emergence of OSINT techniques in tracking of the latest developments in Ukraine war can be observed.
While having the same general goal — looking as deep as possible into the fog of war — different researchers have their own subgoals, i.e. tracking weaponry losses like Oryx project or tracking movements of armies like Conflict Intelligence Team. In addition, the researchers use wide variety of methods from analyzing of social media publications, photos and videos, to using plane- and ship-tracking services and even traffic functions of Google Maps to track movement of the armies.
Goal of research
In many cases OSINT research starts with a certain goal and this goal shapes the whole framework: which data needs to be acquired, where it is searched and how it is analyzed. However, there are cases when the framework is defined by the data. This can happen after different leaks of documents, personal information or any other data. Examples for this can be the whole WikiLeaks project, where investigators worked with leaked secret documents, or investigations that followed the leak of Yandex’s food delivery service clients, which among other things allowed to uncover properties owned by Putin’s close circle.
Sources of information
As mentioned above, OSINT can work with any data that is open to the public. Generally the sources of information could be divided in a few categories:
- Internet
- Social media
- Blogs and forums
- Maps and tracking services
- Web analysis services like Google Analytics
- Other online publications
- Media
- Magazines and papers
- TV
- Radio
- Online outlets
- Government data
- Official declarations
- Land registries
- Government contracts
- Other documents
- Speeches of officials
- Academic publications
- Commercial data
- Databases
- Other services that can provide necessary data (i.e. satellite image sources, company information, etc)
All of those sources can be interlinked — as generally nowadays most of the government information, media, academic publications, etc are in the internet.
Tools to collect data
Social media
Tools to collect necessary data depend of the type of the data. Generally, the main sources for open source intelligence are social media and, most importantly, Twitter. The reason for this is the news and current events orientation of this website and powerful advanced search capabilities. Using TweetDeck researcher can formulate a search request for what they seek and get real time updates. There are also wide capabilities of using Twitter API to parse its data and structure it. There are also possibilities of using API of other social media but it is much more limited.
Search engines
OSINT also heavily utilizes search engines so it’s a good idea to learn advanced search tools. In addition, it might be useful to use more than one search engine as some of the information can be withdrawn from the results due to legal reasons or terms of service.
Traditional media
To fully utilize possibilities of traditional media for your research it would be useful to have subscriptions to the biggest agencies or outlets. As these subscriptions can be really expensive, especially when one might need all of them, it’s also good idea to learn how to surpass paywall — in most cases it can be done easily with the incognito mode of the browser or some kind of webarchive service. This should eliminate most of the costs and leave possibilities to subscribe to media with hard paywall like Der Spiegel
Government information
For the official government information in a lot of cases it is possible to subscribe to an RSS or email updates about new documents and press-releases. If this is not possible, one might write a script that parses a page that he is interested in and notifies about any updates of its content. Utilizing of some services like government contracts registers might need extensive training to analyze. Other services like land registries in many countries require payment for its information, so it might be not the best starting point for collecting data.
Commercial subscriptions
Researcher might also need subscriptions to commercial services that are needed for the analysis. Examples of such services might include Flightradar24, Similarweb, Himera Search and others. In addition, there are services, for example Telegram bots, that search through the know data breaches for a certain entry.
VPN
A virtual private network (VPN) should be used by the researcher both due to reasons of security and access to information, moreover, a possibility to choose servers in different countries might be useful. Different countries have different information laws and different services can restrict access for foreign users so not all needed data might be possible to acquire from the researcher’s location.
Tools to analyze data
The analysis of the acquired data in most cases would be the most challenging part of the OSINT. In case when the information is mostly text it is easier as text is much easier to analyze using parsing, programming, or just word search.
When dealing with photos or videos it would be most likely needed to analyze them personally — or double-check after the used algorithm of analysis if there is one for the task.
A useful tool of analysis is visualization, especially when it comes to location-based research and big databases of structured information. The exact methodology of analysis, as well as the data collection, should be determined by the researcher at the start of the work.
Social media intelligence (SMI or SOCMINT)
SOCMINT (Social Media Intelligence) is a collection of search methods and technologies. These forensic techniques are designed to keep records of social media platforms and users. Many people use social media to connect their smartphone (cellphone number) and computer system. Furthermore, social media platforms such as Facebook, Instagram, YouTube, Twitter, and Pinterest, as well as IM chat systems such as WhatsApp, Facebook Messenger, and WeChat, can map social networks (friends and contacts). In fact, their artificial intelligence algorithms are proficient at data collection and profiling. Intelligence gathering, open-source intelligence (OSINT), and other surveillance activities are all linked to social media intelligence. SOCMINT can be carried out overtly or covertly. [1]
Social media content type
Data available on social media sites can be classified into two categories:
- The original content posted by the user – such as Facebook text content or an uploaded image
- The metadata associated with original content – multimedia files metadata, the date/time and geo-location info associated with the posted content, social media ID and bookmarking (Pinterest)
People use social networking platforms for a variety of reasons. The following are some of the most common interactions seen on social networking sites:
- Post/comment: People utilize social media platforms to upload or write paragraphs of text that other users may see. Such posts can also include the user's location (this function is known as a "Check-in" on Facebook).
- Reply : A text message (or a picture, video, or URL) that responds to another user's post, status update, or remark.
- Multimedia content (images and videos): Multimedia is quite popular; a user may include a movie or a photograph in their message. Many social media sites enable users to create albums by uploading photographs or videos. Many social media services, like Facebook, Twitter, and YouTube, offer live streaming. This feature allows users to broadcast live videos and save the recordings for subsequent viewing on their accounts.
- Social interactions : The foundation of social media sites is that individuals connect online by sending and responding to requests from other users.
- Metadata : The total of a user's interactions with a social media network. Examples include the date and time a video/image was submitted, the date and time a friend request was accepted, the geolocation data of the uploaded multimedia file or post (if enabled), and the type of device used to upload the contents mobile or a standard computer).
SOCMINT is interested in collecting all these types of content, but its capacity to do so is limited by the amount of privacy control established by each user when making posts/updates online. For example, if someone restricts a post's availability to friend circles or sets it to "Only me," it is impossible to see his updates on Facebook. [2]
Classification of social media platforms
The following are the main social media types classified according to function:
- Social networking : This allows people to connect with other people and businesses (brands) online to share information and ideas. Examples include Facebook and LinkedIn.
- Photo sharing : Such websites are dedicated to sharing photos between users online. Examples include Instagram & Flicker.
- Video sharing : Such websites are dedicated to sharing videos, including live video broadcasts. The most popular one is YouTube. Please note that Facebook and Twitter also offer live video broadcast services.
- Blogs : This is a type of informational website containing a set of posts belonging to one topic or subject organized in descending order according to the publish date. The most popular blogging platforms are WordPress and Blogger, which are powered by Google.
- Microblog : allows users to publish a short text paragraph (which can be associated with an image or video) or a link (URL) to be shared with other audience online. Twitter is the most popular example.
- Forums (message board) : This is one of the oldest types of social media. Users exchange ideas and discussions in a form of posted messages and replies. Reddit is an example.
- Social gaming : Refers to playing games online with other players in different locations. It has gained more popularity recently. KAMAGAMES and zynga are examples of this type.
- Social bookmarking : These websites offer a similar function to your web browser’s typical bookmark. However, they allow you to do this online and share your Internet bookmarks among your friends in addition to adding annotations and tags to your saved bookmarks. Examples include: Atavi and Pinterest
- Product/service review : These websites allow their users to review—give feedback—about any product or service they have used. Yelp and Angie’s List are examples of this type.
Search tools for social media
Facebook search tools and services
There are many online services that simplify the process of acquiring/analyzing information from Facebook accounts. The following are the most useful ones:
- Lookup ID https://lookup-id.com: This site helps you to find Facebook personal IDs. This ID is necessary when using any of the previous online services mentioned previously used to complement Facebook's standard keyword search.
- Facebook Page Barometer http://barometer.agorapulse.com: This site gives statistics and insight about specific Facebook profiles or pages.
- Information for Law Enforcement Authorities https://www.facebook.com/safety/groups/law/guidelines: Offers information and legal guidelines for law enforcement/authorities when seeking information from Facebook and Instagram.
- A directory of free tools and online services for searching within Facebook can be found at: https://osint.link/osint-part2/#facebook
Twitter advanced search operators
Like Google, Twitter allows specialized operators to find related tweets more precisely. Twitter search operators are already available on the Twitter developer site, go to https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators to view them. Twitter search operators can be incorporated with other criteria to create more advanced search queries to find related tweets more precisely, the following are some advanced Twitter search queries to start the search with.
- The negation operator (-) is used to exclude specific keywords or phrases from search results.
virus –computer
- To search for hashtags, use the (#) operator followed by the search keyword. For example:
#OSINT
- To search for tweets sent up to a specific date, use the (until) operator.
OSINT until:2019-11-30 (this will return all tweets containing OSINT and sent until November 30, 2019)
- To search for tweets sent since a specific date, use the (since) operator followed by the date.
OSINT since:2019-11-30 (this will return all tweets containing OSINT and sent since November 11, 2019)
- Use the (images) keyword to return tweets that contain an image within it.
OSINT Filter:images(this will return all tweets that contain the keyword OSINT and have an image embedded within them)
- To return tweets with video embedded with them, use the (videos) keyword (similar to the images filter).
OSINT Filter:videos
- To search for videos uploaded using the Twitter Periscope service, use the (Periscope) filter.
OSINT filter:periscope (this will search for all tweets containing the OSINT keyword with a Periscope video URL)
- To return tweets with either image or video, use the (media) operator.
OSINT Filter:media
- To return tweets that contain a link (URL) within them, use the (links) keyword.
OSINT Filter:links
- To return tweets that contain a link (URL) and hold a specific word within that URL, use the URL keyword.
OSINT url:amazon this will return all tweets that contain OSINT and a URL with the word “amazon” anywhere within it
- To return tweets from verified users only (verified accounts have a blue check mark near their names), use the (Verified) operator.
- Use the (min_retweets) operator followed by a number.
OSINT min_retweets:50 (this will return all tweets containing the OSINT search keyword that have been retweeted at least 50 times)
- Use (min_faves) followed by a number to return all tweets with NUMBER or more likes.
OSINT min_faves:11 (this will return all tweets that have at least 11 or more likes and that contain the OSINT search keyword)
- To limit Twitter returned results to a specific language, use the (lang) operator.
OSINT lang:en (this will return all tweets containing OSINT in the English language only)
- To search for tweets with a negative attitude use the following symbol 🙁
OSINT 🙁 will return all tweets containing the keyword OSINT written in a negative attitude.
Twitter analysis services
The following are online services to help you find information on Twitter:
- All My Tweets https://www.allmytweets.net : View all public tweets posted by any Twitter account on one page.
- Trendsmap https://www.trendsmap.com : This shows you the most popular trends, hashtags, and keywords on Twitter from anywhere around the world.
- First Tweet http://ctrlq.org/first : Find the first tweet of any search keyword or link.
- Social Bearing https://socialbearing.com/search/followers : Analyze Twitter followers of any particular account (a maximum of 10,000 followers can be loaded).
- Spoonbill https://spoonbill.io: Monitor profile changes from the people you follow on Twitter [3]
Data organization
While an OSINT enthusiast may be adept at data collection, he or she will never develop the necessary data organization skills and tools to become a true professional. There are numerous methods for storing data, including basic text files or notes. However, using text files is impractical, as when there is a large amount of data, it becomes unmanageable. Features desirable for OSINT data management include the ability to export and backup, as well as visualize data.
Examples of software for OSINT data organization and their disadvantages
- Simple Notes Apps (unmanageable when dealing with a large amount of data)
- Evernote (useful when paid for)
- Notion (notes cannot be accessed offline)
- Joplin (inconvenient organization for large projects)
- Obsidian.md Obsidian.md (a bit tricky to master)
Obsidian.md
Obsidian.md, being perplexive in comparison to simple notes application, contains all the desirable features. It is a cross-platform, free application for organizing notes stored in markup (.md) files. Notes and files are stored on a user's computer, and there is also a premium feature for syncing, which is superfluous given that backups using any online storage service, Syncthing software, or Git. Given that OSINT specialists often work in teams, it is recommended to store the data in a Git repository in order to retain a history of modifications and increase collaboration capability.
Vaults
Obsidian.md contains all data in what are referred to as "Vaults." A vault is a project that houses all of it's associated notes and information.
Plugins
Obsidian.md supports the installation of community plugins that extend the app's initial functionality.
Recommended plugins
- Dataview – Allows us to treat a vault as a database, querying and visualizing information from notes and files.
- BreadCrumbs – Adds link types and notes hierarchy.
- Juggl – Create mindmaps based on your notes and customize their looks with CSS and internal styling features.
Plugin installation
- Open Settings – the button is in the bottom-left corner of the application.
- Choose 'Community Plugins' from the 'Options' clause.
- Switch 'Safe Mode' to OFF and confirm it.
- Click 'Browse Community Plugins'.
- Find the plugin.
- Click 'Install'.
- Go back to 'Community Plugins' submenu.
- In the bottom section turn on the newly installed plugin.
Folding vs Tagging and Linking
Simple folder structure is sufficient, when it comes to organizing data in nonoverlapping groups. It is enough to have just a couple of folders in your photogallery, for example. But in OSINT it is important to have a more sophisticated structure.
Tagging
Tagging adds structure because a piece of data can have several tags, as opposed to folders, which can only have one organizing unit per file.
Tag structure example:
- #people #processes #technology (part targeted)
- #primary #supportive #irrelevant (importance)
- #finished #unfinished (state of note/file)
- #web #registry #socialengineering (means of getting the information)
Linking
Linking enables the creation of relationships between notes and files. This manner, one note can include connections to other notes and files, making it easier to handle. For example, if John purchased the domain name legit.com, John's note can be linked to legit.com's note, which contains information about the domain.
Link types
Using link types opens up even more possibilities. Link types are included in Breadcrumbs Plugin for Obsidian.md. In the aforementioned situation of John and legit.com, John is the domain's owner, thus, the domain is John's asset. These are called types of relations. If it is later revealed that John purchased another domain name - fake.com – the new domain can be connected back to John. This structure will be displayed in the notes by creating two relations of John's ownership:
- John – owner of legit.com, fake.com
- legit.com – asset of Johh, relative of fake.com
- fake.com – asset of John, relative of legit.com
Dataview plugin
Dataview is, first and foremost, a data index, so it supports relatively rich methods of adding metadata to your knowledge base. Dataview tracks information at the markdown page and markdown task levels, with each page/task able to contain an arbitrary number of complex (numbers, objects, lists) fields. Each field is a named value of a specific type (like "number" or "text").
Example of notes with arbitrary metadata and a tag
jason_statham.md
--- name: Jason Statham salary: 7500 department: Cyber Forensics notes: [ "Potential phishing target", "Mother has stage T4 cancer" ] --- #employee
bruce_lee.md
--- name: Bruce Lee salary: 8000 department: Developer Operations notes: [] ---
Querying dataview data
Options for querying data:
- Dataview query language
- Dataview Javascript API
Both can be used to, as an example, render a table from jason_statham.md and bruce_lee.md with four columns:
- File – contains a link to the file
- Name – metadata 'name'
- Salary – metadata 'salary'
- Department – metadata 'department'
It can also be sorted by 'salary'.
Dataview query language
The dataview query language is a straightforward, organized custom query language that enables you to quickly create views from data. It enables the following:
- Retrieve pages related with tags, folders, and links, among other things.
- Simple actions on fields, such as comparison, existence checks, and so on, can be used to filter notes/data.
- Sorting results according to their fields.
The query language is capable of generating the view kinds, which are detailed below:
- TABLE: The standard view type; one row for each data point, with multiple columns of field data.
- LIST: A list of pages that correspond to the query. Each page can have a single linked value.
- TASK: A collection of tasks whose pages correspond to the specified query.
To query data with Dataview Query Language the 'dataview' language specification for a codeblock is used.
Example result of a data query
The queries leading to this result are listed below.
The general format of queries:
```dataview TABLE|LIST|TASK <field> [AS "Column Name"], <field>, ..., <field> FROM <source> (like #tag or "folder") WHERE <expression> (like 'field = value') SORT <expression> [ASC/DESC] (like 'field ASC') ```
Example with jason_statham.md and bruce_lee.md
```dataview TABLE name as "Name", salary as "Salary", department as "Department" FROM #employee SORT salary ASC ```
Dataview Javascript API
The Dataview JavaScript API allows arbitrary JavaScript to be executed with access to the dataview indices and query engine, which is useful for complex views or interoperability with other plugins. To query data with Dataview Javascript API the 'dataviewjs' language specification for a codeblock is used. The API is accessible via the implicitly provided dv (or dataview) variable, which allows you to query for data, render HTML, and configure the view.
Example with jason_statham.md and bruce_lee.md
```dataviewjs let employees = dv.pages("#employee") .sort(emp => emp.salary, "asc") .map(emp => [emp.file.link, emp.name, emp.salary, emp.department]) dv.table(["File", "Name", "Salary", "Department"], employees) ```
Note
There is no defined standard for OSINT data organization, because the data may come in different forms, including, but not limited to, web-pages, paper documents, online calendars, video and audio recordings. Due to this, it is nearly impossible to create a convenient tool for all use cases. If the operation is big enough, it might be feasible to create a devoted web application that stores all necessary data in a database. However, since OSINT itself it usually a highly confidential activity, publishing the application in Clear Web is a privacy and a security risk.
OSINT Techniques
This section is split into several parts and each explains a common type of target investigation. Each part provides every valuable resource and technique which is found beneficial toward the research. This section should serve as a reference when encountering a specific need within an investigation.
Email Addresses
Searching by a person's real name can be frustrating. lf the target has a common name, it is easy to get lost in the results. Even a fairly unique name produces a big number of people's addresses, profiles, and telephone numbers. If your target is named John Smith, you have a problem. This is why it is always better to search by email address when available. If you have your target's email address, you will achieve much better results at a faster pace. There may be thousands of John Wilsons, but there would be only one john.wilson.77089@yahoo.com.
Searching this address within quotation marks on the major search engines is the first preference. This should identify web pages which include the exact details within either the content or the source code. This is the "easy stuff" which may present false positives, but should provide immediate evidence to the exposure of the target account.
Then typically search the username portion of the email address by itself in case it is in use within other providers, such as Gmail, Hotmail, Twitter, Linkedin, etc. Let's conduct an example assuming that "john.wilson.77089@yahoo.com" is your target. You could place this email within quotes and execute through each service manually, or use the following direct search URLs to expedite the process.
Google Email: https://google.com/search?q="john.wilson.77089@yahoo.com"
Google Username: https://google.com/search?q="john.wilson.77089"
Bing Email: https://bing.com/search?q="john.wilson.77089@yahoo.com"
Bing Username: https://bing.com/search?q="john.wilson.77089"
Usernames
Once you have identified a username for an online service, this information may lead to much more data. Active internet users often use the same username across many sites. For example, the user "amanda62002" on Instagram may be the same "amanda62002" on Twitter and an unknown number of other sites. When you identify an email address, you may now have the username of the target. If a subject uses mpulido007@gmail.com as an email address, there is a good chance that person may use mpulido007 as a screen name on a number of sites. If the target has been an internet user for several years, this Gmail account was probably not the first email address used by the target. Searches for potential addresses of mpulido007@yahoo.com, mpulido007@hotmail.com, and mpulido007@aol.com may discover new information. Manual searching of this new username information is a good start. Keeping up with the hundreds of social websites available is impossible. Visiting the following services will allow you to search usernames across several websites, and will report links to profiles that you may have missed.
To perform a search, you need to enter the username in a single search field on the selected engine, which will immediately check for the presence of the provided username in the most popular social networks. A search for the username "intelTechnics" through one of these engines provides information about the availability of that username on the top 25 networks.
Telephone Numbers
There are hundreds of websites that claim the ability to search for information on telephone numbers. These vary from amazingly accurate results to sites that only include advertisements. If you have a target telephone number, there are three phases of your search. First, you need to identify the type of number and provider. The type could be landline, cellular or internet, and the provider could be the company supplying the service. The second is that you need to identify any subscriber information such as the name and address associated with the account. Finally, you need to locate any online web content with a connection to the target number. This can all lead to more intelligence and additional searches. The majority of cellular numbers can now be identified if they are registered in someone's name. If you have an address, you will want to identify the people associated with the address and any telephone numbers the subjects use.
Telephone Number Search Websites
Here you can see a handful of people searching websites which allow the query of a telephone number. These sites all possess unique data sets and each should be searched. Most of the results originate from sources such as property tax data, marketing leaks, phonebooks and various data breaches. Many of these links avoid unnecessary loading screens and advertisements. This will help with the automated tool at the end. Overall, we cannot control the results and telephone search is mostly "what you see is what you get". Replace the demo number (618-462-0000) with your target number.
Fast People Search https://www.fastpeoplesearch.com/618-462-0000
Sync.me https://sync.me/search/?number=6184620000
That's Them https://thatsthem.com/phone/618-462-0000
Images
Thanks to cameras on every data cellular phone, digital photograph uploads are extremely common among social network users. Digital images, logos, and icons can be of great value in OSINT investigations. Major search engines like Google, Yahoo, and Bing provide basic image search engine functionality. However, there are other more specialized image search engines that can be used to get more precise results. This part will identify various photo sharing websites as well as specific search techniques.
Basic Image Search
The following sites offer image search services:
• Google Image Search: (https://images.google.com)
• Bing image search: (www.bing.com/images)
• Yahoo Images: (http://images.yahoo.com)
• Baidu: (http://image.baidu.com)
• Imgur: (https://imgur.com)
• SmugMug: (https://www.smugmug.com)
Google offers Advanced Image Search, where it is possible to set many criteria of search query such as image color, image type (photo, face, clip art, line drawing, animated), region or country, site or domain name, image format type, and usage rights. Google Advanced Image Search can be found at (https://images.google.com/advanced_image_search).
Reverse Image Search
A reverse image search uses a sample image instead of a search query. It works by uploading an image or inserting its URL into a reverse image search engine, which will in turn search its index to find where else this image appears online and display all the other locations. In this way, you can know the original source of photographs, memes, and profile pictures. The following are the most popular reverse image search engine sites: Google reverse search (https://www.google.com/imghp): Google has a dedicated search engine for image reverse searches; you can either paste the image URL in the search box or upload it to Google.
Advancements in computer processing power and image analysis software have made reverse image searching possible on several sites. While a standard search online involves entering text into a search engine for related results, a reverse image search provides an image to a search engine for analysis. The results will vary depending on the site used. Some will identify identical images that appear on other websites. This can be used to identify other websites on which the target used the same image. lf you have a photo of a target on a social network, a reverse analysis of that photo may provide other websites on which the target used the same image. These may be results that were not identified through a standard search engine. Occasionally, a target may create a website as an alias, but use an actual photo of himself. Unless you knew the alias name, you would never find the site. Searching for the site by the image may be the only way to locate the profile of the alias. Some reverse image sites go further and try to identify other photos of the target that are similar enough to be matched. Some will even try to determine the sex and age of the subject in the photo based on the analysis of the image. This type of analysis was once limited to expensive private solutions. Now, these services are free to the public.
Image Metadata
Every digital photograph captured with a digital camera possesses metadata known as Exif data.This is a layer of code that provides information about the photo and camera. All digital cameras write this data to each image, but the amount and type of data can vary. This data, which is embedded into each photo "behind the scenes", is not visible by viewing the captured image. You need an Exif reader, which can be found on websites and within applications. Keep in mind that some websites remove or "scrub" this data before being stored on their servers. Facebook, for example, removes the data while Flickr does not. Locating a digital photo online will not always present this data. If you locate an image that appears full size and uncompressed, you will likely still have the data intact. If the image has been compressed to a smaller file size, this data is often lost. Any images removed directly from a digital camera card will always have the data. This is one of the reasons you will always want to identify the largest version of an image when searching online. The quickest way to see the information is through an online viewer.
Jeffrey's Image Metadata Viewer: http://exif.regex.info/exif.cgi
References
- Pastor-Galindo, Javier, et al. "The not yet exploited goldmine of OSINT: Opportunities, open challenges and future trends." IEEE Access 8 (2020): 10282-10304.
- Richelson, Jeffrey T. The US intelligence community. Routledge, 2018.
- Williams, Heather J., and Ilana Blum. Defining second generation open source intelligence (OSINT) for the defense enterprise. Rand Corporation, 2018.
APA
- Open Source Intelligence Methods and Tools: A Practical Guide to Online Intelligence 1st ed.Edition, Nihad A. Hassan, Rami Hijazi
- Open Source Intelligence Techniques: Resources for Searching and Analyzing Online Information 9th Edition, Michael Bazzel
- [1] First Steps to Getting Started in Open Source Research / Bellingcat
- [2] The Most Comprehensive TweetDeck Research Guide In Existence / Bellingcat