Post

CWES Cheatsheet — Passive Recon

CWES Cheatsheet — Passive Recon

passive recon is all about gathering info without touching the target directly. no packets sent to them, no noise, just public sources and smart googling. the goal is to build a picture of the target before you ever send a single request.


Passive Reconnaissance

passive reconnaissance involves gathering information about the target without directly interacting with it. this relies on analysing publicly available information and resources.

TechniqueDescriptionExampleToolsRisk of Detection
Search Engine Queriesusing search engines to uncover info about the target — websites, social media profiles, news articlessearching Google for “[Target Name] employeesGoogle, DuckDuckGo, Bing, ShodanVery Low
WHOIS Lookupsquerying WHOIS databases for domain registration detailsperforming a WHOIS lookup to find registrant name, contact info, name serverswhois, online WHOIS servicesVery Low
DNSanalysing DNS records to identify subdomains, mail servers, and other infrastructureusing dig to enumerate subdomainsdig, nslookup, host, dnsenum, fierce, dnsreconVery Low
Web Archive Analysisexamining historical snapshots to identify changes, vulnerabilities, or hidden infousing the Wayback Machine to view past versions of a target websiteWayback MachineVery Low
Social Media Analysisgathering info from LinkedIn, Twitter, Facebooksearching LinkedIn for employees to learn about roles and potential SE targetsLinkedIn, Twitter, Facebook, OSINT toolsVery Low
Code Repositoriesanalysing public repos for exposed credentials or vulnerabilitiessearching GitHub for code related to the target that might contain secretsGitHub, GitLabVery Low

WHOIS

CommandDescription
export TARGET="domain.tld"assign target to an environment variable
whois $TARGETWHOIS lookup for the target
1
2
3
4
5
6
7
8
9
10
whois inlanefreight.com

[...]
Domain Name: inlanefreight.com
Registry Domain ID: 2420436757_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.registrar.amazon
Registrar URL: https://registrar.amazon.com
Updated Date: 2023-07-03T01:11:15Z
Creation Date: 2019-08-05T22:43:09Z
[...]

each WHOIS record typically contains:

  • Domain Name — the domain name itself (e.g., example.com)
  • Registrar — the company where the domain was registered (e.g., GoDaddy, Namecheap)
  • Registrant Contact — the person or organization that registered the domain
  • Administrative Contact — the person responsible for managing the domain
  • Technical Contact — the person handling technical issues related to the domain
  • Creation and Expiration Dates — when it was registered and when it expires
  • Name Servers — servers that translate the domain name into an IP address

for historical WHOIS data (ownership changes over time), use WhoisFreaks.


Passive Subdomain Enumeration

this relies on external sources to discover subdomains without directly querying the target’s DNS servers.

Certificate Transparency (CT) logs — public repositories of SSL/TLS certificates. these certificates often include a list of associated subdomains in their Subject Alternative Name (SAN) field.

search engines — using operators like site: to filter results and find subdomains.

online databases — various tools aggregate DNS data from multiple sources.

Resource/CommandDescription
VirusTotalhttps://www.virustotal.com/gui/home/url
Censyshttps://censys.io/
Crt.shhttps://crt.sh/
curl -s https://sonar.omnisint.io/subdomains/{domain} \| jq -r '.[]' \| sort -uall subdomains for a given domain
curl -s https://sonar.omnisint.io/tlds/{domain} \| jq -r '.[]' \| sort -uall TLDs found for a given domain
curl -s https://sonar.omnisint.io/all/{domain} \| jq -r '.[]' \| sort -uall results across all TLDs for a given domain
curl -s https://sonar.omnisint.io/reverse/{ip} \| jq -r '.[]' \| sort -ureverse DNS lookup on IP address
curl -s https://sonar.omnisint.io/reverse/{ip}/{mask} \| jq -r '.[]' \| sort -ureverse DNS lookup of a CIDR range
curl -s "https://crt.sh/?q=${TARGET}&output=json" \| jq -r '.[] \| "\(.name_value)\n\(.common_name)"' \| sort -ucertificate transparency
cat sources.txt \| while read source; do theHarvester -d "${TARGET}" -b $source -f "${source}-${TARGET}"; donesearching for subdomains on the sources provided in sources.txt

sources.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
baidu
bufferoverun
crtsh
hackertarget
otx
projecdiscovery
rapiddns
sublist3r
threatcrowd
trello
urlscan
vhost
virustotal
zoomeye

Searching CT Logs

there are two popular options for searching CT logs:

ToolKey FeaturesUse CasesProsCons
crt.shuser-friendly web interface, simple search by domain, displays certificate details, SAN entriesquick and easy searches, identifying subdomains, checking certificate issuance historyfree, easy to use, no registration requiredlimited filtering and analysis options
Censyspowerful search engine for internet-connected devices, advanced filtering by domain, IP, certificate attributesin-depth analysis of certificates, identifying misconfigurations, finding related certificates and hostsextensive data and filtering options, API accessrequires registration (free tier available)
1
2
3
# crt.sh — fetch JSON output and filter for dev subdomains
curl -s "https://crt.sh/?q=facebook.com&output=json" | jq -r '.[]
 | select(.name_value | contains("dev")) | .name_value' | sort -u

Passive Infrastructure Identification

Resource/CommandDescription
Netcrafthttps://www.netcraft.com/
WayBackMachinehttp://web.archive.org/
WayBackURLshttps://github.com/tomnomnom/waybackurls
waybackurls -dates https://$TARGET > waybackurls.txtcrawling URLs from a domain with the date it was obtained

Fingerprinting Techniques

there are several techniques used for web server and technology fingerprinting:

  • Banner Grabbing — examining banners returned by web servers or services to identify software names, version numbers, and service details
  • Analysing HTTP Headers — reviewing HTTP request and response headers for info disclosure. headers such as Server and X-Powered-By often expose web server software, frameworks, or scripting languages
  • Probing for Specific Responses — sending crafted or malformed requests to trigger distinctive responses or error messages characteristic of specific web servers
  • Analysing Page Content — inspecting the structure of web pages, source code, scripts, comments, and metadata for indicators like framework-specific files or copyright notices
ToolDescriptionFeatures
Wappalyzerbrowser extension and online service for website technology profilingidentifies a wide range of web technologies — CMSs, frameworks, analytics tools, and more
BuiltWithweb technology profiler that provides detailed reports on a website’s tech stackoffers both free and paid plans with varying levels of detail
WhatWebcommand-line tool for website fingerprintinguses a vast database of signatures to identify various web technologies
Nmapversatile network scanner for various recon tasks, including service and OS fingerprintingcan be used with scripts (NSE) for more specialised fingerprinting
Netcraftweb security services including website fingerprinting and security reportingdetailed reports on technology, hosting provider, and security posture
wafw00fcommand-line tool specifically designed for identifying Web Application Firewalls (WAFs)helps determine if a WAF is present and, if so, its type and configuration
1
2
3
4
# nikto — only running the fingerprinting modules
nikto -h inlanefreight.com -Tuning b
# -h specifies the target host
# -Tuning b tells Nikto to only run the Software Identification modules

Check Robots.txt

www.example.com/robots.txt

the robots.txt file lives in the root directory of a website. each set of instructions (“record”) is separated by a blank line:

  1. User-agent — which crawler or bot the rules apply to. a wildcard (*) means all bots
  2. Directives — specific instructions to the identified user-agent
DirectiveDescriptionExample
Disallowpaths the bot should not crawlDisallow: /admin/
Allowexplicitly permits crawling specific paths, even if they fall under a broader Disallow ruleAllow: /public/
Crawl-delaydelay (in seconds) between successive requests from the botCrawl-delay: 10
SitemapURL to an XML sitemap for more efficient crawlingSitemap: https://www.example.com/sitemap.xml

Google Dorking

refer to the Google Hacking Database

useful combos to try:

  • finding login pages: site:example.com inurl:login or site:example.com (inurl:login OR inurl:admin)
  • identifying exposed files: site:example.com filetype:pdf or site:example.com (filetype:xls OR filetype:docx)
  • uncovering config files: site:example.com inurl:config.php or site:example.com (ext:conf OR ext:cnf)
  • locating database backups: site:example.com inurl:backup or site:example.com filetype:sql
OperatorDescriptionExample
site:limits results to a specific website or domainsite:example.com
inurl:finds pages with a specific term in the URLinurl:login
filetype:searches for files of a particular typefiletype:pdf
intitle:finds pages with a specific term in the titleintitle:"confidential report"
intext: / inbody:searches for a term within the body text of pagesintext:"password reset"
cache:displays the cached version of a webpagecache:example.com
link:finds pages that link to a specific webpagelink:example.com
related:finds websites related to a specific webpagerelated:example.com
info:provides a summary of information about a webpageinfo:example.com
define:provides definitions of a word or phrasedefine:phishing
numrange:searches for numbers within a specific rangesite:example.com numrange:1000-2000
allintext:finds pages containing all specified words in the body textallintext:admin password reset
allinurl:finds pages containing all specified words in the URLallinurl:admin panel
allintitle:finds pages containing all specified words in the titleallintitle:confidential report 2023
ANDnarrows results by requiring all terms to be presentsite:example.com AND (inurl:admin OR inurl:login)
ORbroadens results by including pages with any of the terms"linux" OR "ubuntu" OR "debian"
NOTexcludes results containing the specified termsite:bank.com NOT inurl:login
* (wildcard)represents any character or wordsite:socialnetwork.com filetype:pdf user* manual
.. (range search)finds results within a specified numerical rangesite:ecommerce.com "price" 100..500
" " (quotation marks)searches for exact phrases"information security policy"
- (minus sign)excludes terms from the search resultssite:news.com -inurl:sports

Wayback Machine

https://web.archive.org/

1
2
3
# waybackurls tool — find all archived URLs
# can reveal old endpoints, params, subdomains
waybackurls target.com

← Back to CWES Cheatsheet Index

This post is licensed under CC BY 4.0 by the author.