Web content enumeration tools in 2021

Table of contents
  1. 1. Web content enumeration tools
    1. 1.1. Summary
    2. 1.2. Dirb
      1. 1.2.1. Pros
      2. 1.2.2. Cons
    3. 1.3. DirBuster
      1. 1.3.1. Pros
      2. 1.3.2. Cons
    4. 1.4. Dirsearch
      1. 1.4.1. Pros
      2. 1.4.2. Cons
    5. 1.5. FFUF
      1. 1.5.1. Pros
      2. 1.5.2. Cons
    6. 1.6. Gobuster
      1. 1.6.1. Pros
      2. 1.6.2. Cons
    7. 1.7. Wfuzz
      1. 1.7.1. Pros
      2. 1.7.2. Cons
    8. 1.8. Bonus - BFAC
      1. 1.8.1. Pros
      2. 1.8.2. Cons
  2. 2. Use-cases
    1. 2.1. Simple discovery on PHP applications
    2. 2.2. Webserver with a custom page for error 40X
    3. 2.3. Fuzzing on Rest API
    4. 2.4. POST IDOR with incremental ID
  3. 3. Comparative table
  4. 4. About the author

Web content enumeration tools#

Summary#

Perimeter discovery is an important step during a web pentest and can, in some cases, lead to a website compromise. In order to carry out this recognition, several tools are available, including web content enumeration tools:

Name Version* First release Last Release Language
Dirb 2.22 2005/04/27 2014/11/19 C
DirBuster 1.0-RC1 2007 2013/05/01 Java
Dirsearch 0.4.1 2014/07/07 2020/12/08 Python3
FFUF 1.2.1 2018/11/08 2021/01/24 Go
Gobuster 3.1.0 2015/07/21 2020/10/19 Go
Wfuzz 3.1.0 2014/10/23 2020/11/06 Python3
BFAC (Bonus) 1.0 2017/11/08 2017/11/08 Python3

* this post has been written in Feb. 2021

Other tools such as Rustbuster, FinalRecon or Monsoon exists and won't be fully described since they're less known and used. They'll be part of the synthesis.

Dirb#

Dirb is a web content scanner written in C and provided by The Dark Raver since 2005.

DIRB is a Web Content Scanner. It looks for existing (and/or hidden) Web Objects. It basically works by launching a dictionary based attack against a web server and analyzing the response.

The last release of this tool was 5 years ago, in 2014, with the version 2.22. The package is provided by most of pentesting Linux releases such as Black Arch and Kali Linux.

The tool is provided with many wordlists, including big.txt, and common.txt (its default wordlist). Dirb is also provided with two utilities: html2dic which is an equivalent of cewl and gendict which is an equivalent of crunch, both are used for wordlist generation.

Despite dirb is one of the oldest web discovery tools, it proposes most of the advanced options such as custom headers, custom extensions, authenticated proxy and even interactive recursion. Unfortunately, the tool is one of the rarest that doesn't provide multithreaded capabilities.

Pros#

  • Specify multiple wordlists (comma separated)
  • Recursive mode (by default, or using -R option for interactive mode)

Cons#

  • No multithreaded option
  • Only GET method
  • No fancy filters
  • Only one output options

DirBuster#

DirBuster is a web content scanner written in Java provided the OWASP Foundation since 2007. The project is no longer maintained by OWASP and the provided features are now part of the OWASP ZAP Proxy. The last release of the tools was version 1.0-RC1 in 2008. DirBuster has the particularity to provide a GUI :

DirBuster

Even if the project is not proposed by OWASP anymore, source of the tool can be found on SourceForge. The tool is also provided by most of pentesting Linux distributions.

The tool is packaged with 8 wordlists including directory-list-1.0.txt and apache-user-enum-2.0.txt.

Pros#

  • WebSite scrapping (extract folders from src and href attributes)
  • Support digest access authentication
  • Specify Fuzzing point in URL
  • Reports in XML, CSV or TXT

Cons#

  • Only GET/HEAD method
  • Java GUI

Dirsearch#

Dirsearch is a command-line tool designed to brute force directories and files in web servers. The tool is written in Python3 since 2015 but was designed in 2014 with Python2. Dirsearch is still maintained and the last release was in December 2020.

As a feature-rich tool, dirsearch gives users the opportunity to perform a complex web content discovering, with many vectors for the wordlist, high accuracy, impressive performance, advanced connection/request settings, modern brute-force techniques and nice output.

As you can see, dirsearch provides many options to perform wordlist transformation such as extension exclusion, suffix, extension removal. Dirsearch even provide 429 - Too Many Requests error handling, raw requests handling, and regex checks. Dirsearch is provided with a default wordlist named dicc.txt which contain %EXT% tags which will be replaced with user-defined extensions.

Finally, dirsearch provide multiple report formats including text, JSON, XML, Markdown and CSV.

Pros#

  • Multiple URLs and CIDR support
  • Multiple extensions check
  • Support multiple wordlists with wordlist manipulation
  • Support raw requests with --raw option, and any HTTP method with -m.
  • Colorful output with many export formats and regex filters

Cons#

  • Lots of options, custom scan may be long to configure
  • No quick way to fuzz a specific part of an URL

FFUF#

FFUF (Fuzz Faster U Fool) is a web fuzzer written in Go. The tool is quite recent (first release in 2018) and is actively updated. Unlike the previous tools, FFUF aims to be an HTTP fuzzing tool which can be used not only for content discovery but also for parameters fuzzing. Thanks to its design, FFUF also has the ability to fuzz headers such as VHOST.

Such as Dirsearch, FFUF provide filter and "matcher" options (including regex) to sort results, and a lot of output formats (including JSON and XML). FFUF is the only one to provide multi-wordlist operation mode, such as attack type in BurpSuite intruder. This mode can be used for bruteforce attack or complex fuzzing discovery.

Finally, we can note that the option -D allow us to reuse specific Dirsearch wordlists sur as dicc.txt.

Pros#

  • "Replay-proxy" option which can be associated with other tools such as BurpSuite
  • Multi-wordlist operation modes
  • Colorized output
  • Custom / Auto filtering calibration

Cons#

  • lots of options, custom scan may be long to configure

Gobuster#

As indicated by his name, Gobuster is a tool written in Go. The first release of gobuster was in 2015 and the last one in October 2020. Gobuster is a powerful tool with multiple purpose :

Gobuster is a tool used to brute-force: URIs (directories and files) in websites. DNS subdomains (with wildcard support). Virtual Host names on target web servers. Open Amazon S3 buckets

As mentioned in the project description, Gobuster has been originally created to avoid Dirbuster Java GUI and that do support content discovery with multiple extensions at once.

As said in the tools description, Gobuster aim to be a simple tool without any fancy options. Note that Gobuster is provided without any wordlist.

Pros#

  • Multiple extensions
  • -d option to discover backup files
  • DNS, VHOST and S3 options

Cons#

  • No recursion
  • Single Wordlist
  • No regex match
  • Only one output format (TXT)

Wfuzz#

Wfuzz is a web fuzzer written in Python3 and provided by Xavi Mendez since 2014.

Wfuzz has been created to facilitate the task in web applications assessments and it is based on a simple concept: it replaces any reference to the FUZZ keyword by the value of a given payload.

The tool is still maintained with a recent release in November 2020. The package is provided by most of pentesting Linux releases.

The tool is provided with a lot of wordlists: General (big.txt, common.txt, medium.txt...), Webservices (ws-dirs.txt and ws-files.txt), Injections (SQL.txt, XSS.txt, Traversal.txt...), Stress (alphanum_case.txt, char.txt...), Vulns (cgis.txt, coldfusion.txt, iis.txt...) and others.

Such as Fuff, Wfuzz replace the FUZZ keyword by a payload from a given wordlist. Wfuzz provides multiple filters including regex filters (--ss/hs) and supports multiple outputs (JSON, CSV, ...). Also, Wfuzz is one of the rarest tools to support both basic auth, NTLM auth and digest auth.

Pros#

  • Encoders (urlencode, base64, uri_double_hex...) and scripts
  • Encoding chaining
  • Basic/NTLM/Digest authentication
  • Colorized output

Cons#

  • Single wordlist

Bonus - BFAC#

BFAC (Backup File Artifacts Checker) is not a tool design to search for new folders, files or routes, but a tool designed to search for backup files.

BFAC (Backup File Artifacts Checker) is an automated tool that checks for backup artifacts that may disclose the web application's source code. The artifacts can also lead to leakage of sensitive information, such as passwords, directory structure, etc. The goal of BFAC is to be an all-in-one tool for backup-file artifacts black box testing.

Given a list of files URI, BFAC will attempt to recover associated backup files with a hardcoded list of tests. For example, for the file /index.php, BFAC will not only attempt to recover /index.php.swp and /index.php.tmp, but also includes tests such as /Copy_(2)_of_index.php, /index.bak1 or /index.csproj.

As you can imagine, BFAC should be used as a complement of previous tools. It supports most of the expected features such as proxy support, custom headers and different outputs.

Pros#

  • Complementary Tool
  • Efficient with fewer requests than a common web discovery tool

Cons#

  • Even if the tool is still maintained, the repository only provides one release

Use-cases#

Simple discovery on PHP applications#

The main use of these tools is file discovery on a common web server, such as a PHP website running on an apache2. Searching for files on this kind of web server often leads to HTTP errors such as 404 - File not found, 403 - Forbidden or HTTP success such as 200 - OK. Other HTTP status codes may be encountered, like 302 - Found, 429 - Too Many Requests, 500 - Internal Server Error...

Depending on the server configuration, an auditor may or may not include specific HTTP status code during file discovery. The default configuration on most of the tools is to hide 404 - File not found from results. Displayed status codes may vary between tools but 200 - OK is the most common displayed result.

i.e., by default, Dirsearch will print not only 200 status code but also 301, 302, etc.

1
2
3
dirsearch -u http://localhost/
dirsearch -u http://localhost/ -e php
dirsearch -u http://localhost/ -e php,php5,sql -w /usr/share/wordlists/raft-large-words.txt -f

Note : By default dirsearch only replaces the %EXT% keyword with extensions. Using -f flag will force dirsearch to add extensions for a given wordlist. This option is useless if your wordlist already contains file extensions.

The same task can be accomplished by the other tools :

1
2
3
4
dirb http://localhost/ /usr/share/wordlists/raft-large-words.txt -X php,php5,sql
gobuster dir -u http://localhost/ -w /usr/share/wordlists/raft-large-words.txt -x php,php5,sql
ffuf -u http://localhost/FUZZ -w /usr/share/wordlists/raft-large-words.txt -e php,php5,sql
wfuzz --hc 404 -w /usr/share/wordlists/raft-large-words.txt -w exts.txt http://localhost/FUZZFUZ2Z

Webserver with a custom page for error 40X#

Sometimes, server won't reply as expected for your tools and will reply a 403 error instead of a 404 error, or worst a 200 status code with a custom error page.

In this case, the auditor must configure his tool to match with the server answer. For the 403 case, the first solution is to exclude 403 results from his tool :

1
2
3
4
5
dirb http://localhost/ /usr/share/wordlists/raft-large-words.txt -X php,php5,sql -N 403
dirsearch -u http://localhost/ -e php,php5,sql -w /usr/share/wordlists/raft-large-words.txt -f -x 403
gobuster dir -u http://localhost/ -w /usr/share/wordlists/raft-large-words.txt -x php,php5,sql -b 403,404
ffuf -u http://localhost/FUZZ -w /usr/share/wordlists/raft-large-words.txt -e php,php5,sql -fc 403
wfuzz --hc 404,403 -w /usr/share/wordlists/raft-large-words.txt -w exts.txt http://localhost/FUZZFUZ2Z

With this solution the auditor may miss interesting 403 errors. The second option is to filter more precisely the content you're not looking for.

If the 403 error is a custom page or if you got a 200 status code with an error message, you may filter web pages by their content and not with their status code. Tools provide multiple way to perform that: you can either filter by page size (assuming the error page is always the same size), or you can filter per words or regex present in the web page.

Error Page from https://wordpress.com/

i.e., if a website returns a 200 HTTP status code with an HTML page containing the sentence Page not found, you may filter with the following :

1
2
3
dirsearch -u http://localhost/ --exclude-texts="Page not found" -e php,php5,sql -w /usr/share/wordlists/raft-large-words.txt -f
ffuf -u http://localhost/FUZZ -fr "Page not found" -w /usr/share/wordlists/raft-large-words.txt -e php,php5,sql
wfuzz --hs "Page not found" --hc 404 -w /usr/share/wordlists/raft-large-words.txt -w exts.txt http://localhost/FUZZFUZ2Z

Not that this method is not available for every tool.

Fuzzing on Rest API#

With the evolution of Web development standards, auditors encounter more and more varied web routing techniques. Therefore, it's not rare that resources are accessible through dynamic routes. That's the case of RESTfull WEB API where certain resources must be fuzzed at the middle of an URI.

OVH API - https://api.ovh.com

Let's take the example of a REST API where the route /vps/{serviceName}/ips is available with GET requests (and where the route /vps/{serviceName} doesn't exist). To enumerate this parameter, you've got 3 possibilities :

  • Reuse the previous examples and set /ips as an extension 🧐 ;
  • Use suffix option on tools if available ;
  • Use a dedicated fuzzing tool such as ffuf or wfuzz to perform precise parameter fuzzing (recommended).
1
2
3
[deprecated] dirsearch -u http://localhost/vps/ --suffixes /ips -w /usr/share/wordlists/raft-large-words.txt
ffuf -u http://localhost/vps/FUZZ/ips -w /usr/share/wordlists/raft-large-words.txt
wfuzz --hc 404 -w /usr/share/wordlists/raft-large-words.txt http://localhost/vps/FUZZ/ips

POST IDOR with incremental ID#

Sometimes resources location is based on a more complex parameter such as Accept-Language header, HTTP POST parameter or even IP address.

During a pentest, SEC-IT auditors encounter a vulnerability allowing users to download PDF on page /files/pdf with POST parameter {"objectId": "X"} where X is an integer. The vulnerability itself was an IDOR (Insecure Direct Object Reference) : a user could download any PDF without privilege restriction. The problem is that even if the vulnerable parameter was a pseudo-incremental ID, there was a random step between each ID which makes the exfiltration harder without any tool.

To perform this PDF exfiltrations, web fuzzer like ffuf and wfuzz can be used to fuzz the objectId POST parameter :

1
2
ffuf -u http://localhost/files/pdf -X POST -d '{"objectId" : "FUZZ"}' -w /usr/share/wordlists/ints.txt
wfuzz -z file,/usr/share/wordlists/ints.txt -d '{"objectId" : "FUZZ"}' http://localhost/files/pdf

Comparative table#

Without further ado, here is a comparative table of the different tools discussed in this post :

(open as image here)

DirbDirbusterDirsearchFFUFGoBusterWfuzzRustbusterFinalReconMonsoonBFAC
LanguageCJavaPython3GoGoPython3RustPython3GoPython3
First release27/04/2005200707/07/201408/11/201821/07/201523/10/201420/05/201905/05/201912/11/201708/11/2017
Last release19/11/201401/05/201308/12/202024/01/202119/10/202006/11/202024/05/201923/11/202028/10/202008/11/2017
Current version2.221.0-RC10.4.11.2.13.1.03.1.01.1.0no versionning0.6.01.0
LicenseGPLv2LGPL-2GPLv2MITApache License 2.0GPLv2GPLv3MITMITGPLv3
MaintainedNoNoYesYesYesYesYesYesYesYes
GUI/CLICLIGUI (Java)CLI (colorized by default)CLI (colorize option)CLICLI (colorize option)CLICLI (colorized by default)CLI (colorized by default)CLI (colorized by default)
Profile options fileNoNo but ability to modify default threads, WL and extentionsYes (default.conf)Yes (-config)NoYes (--recipe)NoNoYes (-f)No
OutputNo (-o, text only)Yes (XML, CSV, TXT)Yes (JSON, XML, MD, CSV, TXT)Yes (JSON, EJSON, HTML, MD, CSV, ECSV)No (-o, text only)Yes (-o, JSON, CSV, HTML, Raw)No (-o, text only)Yes (-o, XML, CSV, TXT)No (--logfile, text only)Yes (JSON, CSV, TXT)
MultithreadNoUp to 500Yes (-t)Yes (-t)Yes (-t)Yes (-t)Yes (-t)Yes (-t)Yes (-t)Yes (--threads)
DelayYes (-z)Yes (Rate limit)Yes (-s)Yes (-p), accept rangeYes (--delay)Yes (-s)NoNoYes (--requests-per-second)Yes (Rate limit)
Custom TimeoutNoYesYes (--timeout)Yes (-timeout)Yes (--timeout)Yes (--req-delay)NoYes (-T)NoYes (--timeout)
ProxyYes (-p/-P, socks5)Yes (not specified, authenticated)Yes (--proxy, http/socks5)Yes (-x, http, see issue 50)Yes(--proxy, http(s) )Yes (-p) Socks4 / Socks5 / HTTP (unauthent)NoNoYes (SOCKS5/HTTP(s) authenticated)Yes (--proxy, http(s)/socks5 authenticated)
AuthBasicBasic / Digest / NTLMBasic with HeadersBasic with HeadersBasic (-U/-P)Basic / Digest / NTLMBasic with HeadersNoBasic (-u)Basic with Headers
Default WLcommon.txt (4614)Nodicc.txt (9000)NoNoNoNodirb_common.txt (4614)NoN/A
WL providedYes (more than 30)Yes (8)Yes (5)NoNoYes (more than 30)NoYes (3)NoN/A
RecursionBy default, switch availableYesYes (-r)Yes (-recursion)NoYes (-R)NoNoNoN/A
Recursion depthNo but interactive mode availableNoYes (-R) + interactiveYes (-recursion-depth)N/AYes (-R)N/AN/AN/AN/A
Multiple URLsNoNoYes (-l) / CIDRYes (using wordlist of hosts)NoYes (using wordlist of hosts)NoNoNoYes (-L)
Multiple WLYes (commas separated)NoYes, commas seperatedYes (repeat -w)NoYes (repeat -w)Yes (for multiple Fuzzing point)NoNoN/A
WL ManipulationNoNoYes (lots of transformations)NoNoYes (using encoders and script)NoNoNoN/A
EncodersNoNoNoNoNoYesNoNoNoN/A
Single ExtensionYes (-X/-x)YesYes (-e)Yes (-e)Yes (-x)YesYes (-e)Yes (-e)YesN/A
Multiple ExtensionsYes (-X/-x)Yes (commas separated)Yes (-e, commas separated)Yes (-e, commas separated)Yes (-x, commas separated)Yes (with given wordlist)Yes (-e, commas separated)Yes (-e, commas separated)NoN/A
Custom User-AgentYes (-a/-H)YesYes (--user-agent) + randomYes (with header -H)Yes (-a) + randomYes (with header -H)Yes (-a)NoYes (with header -H)Yes (-ua)
Custom CookieYes (-c/-H)Yes (through headers)Yes (--cookie)Yes (with header -H)Yes (-c)Yes (-b)Yes (with header -H)NoYes (with header -H)Yes (--cookie)
Custom HeaderYes (-H)YesYes (-H) + Headers fileYes (-H)Yes (-H)Yes (-H)Yes (-H)NoYes (-H)Yes (--headers)
Custom MethodNoNoYes (-m)Yes (-x)Yes (-m)Yes (-X)Yes (-X)NoYes (-X)No
URL fuzzing (at any point)NoYesNot by design but can be bypassed using --suffixesYesNoYesYes (fuzz mode)NoYes (fuzz mode)N/A
Post data fuzzingNoNoNoYes (-d)NoYes (-d)Yes (fuzz mode)NoYes (fuzz mode)N/A
Header fuzzingNoNoNoYes (-H)NoYesYes (fuzz mode)NoYes (fuzz mode)N/A
Method fuzzingNoNoNoYes (-X FUZZ)NoYes (-X FUZZ)Yes (fuzz mode)NoYes (fuzz mode)N/A
Raw file ingestNoNoYes (--raw)Yes (-request)NoNoNoNoYes (--template-file)No
Follow redirect (302)Yes + switch (-N)Yes + switchYes (-F)Yes (-r)Yes (-r)Yes (-L)NoNoYes (--follow-redirect)No
Custom filtersNoNoYes (--excludes-*, based on text, size, regex)Yes (-m*, -f*, based on code, size, regex)Limited (status code, -s/-b)Yes (based on code, words, regex)Yes (based on code, string)NoYes (size,code,regex)Yes (code, size or both)
Backup files optionNoNoNoNoYes (-d)NoNoNoNoYes
Replay proxyNoNoYes (--replay-proxy)Yes (-replay-proxy)NoNoNoNoNoNo
Ignore certificate errorsBy default ?By default ?By defaultBy default, (switch with -k)Yes (-k)By defaultYes (-k)Yes (-s)Yes (-k)By default
Specify IP to connect toNoNoYes (--ip)NoNoYes (--ip)NoNoNoNo
Vhost enumerationNoNoNoYesYesYesYesNoYesN/A
Subdomain enumerationNoNoNoYesYesYesYesYesYesN/A
S3 enumerationNoNoNoNoYesNoNoNoNoN/A

About the author#

This piece was written by Alex GARRIDO a.k.a. zeecka. Alex is a pentester at SEC-IT.

Website: zeecka.fr