PaGoDo: a powerful passive Google Dork

Category: Tags: ,

Tool introduction
The main purpose of this project is to develop a passive Google Dork script to collect web pages and applications with potential security vulnerabilities in the Internet. This project consists of two parts, the first is the ghdb_scraper.py script, which can retrieve Google Dork. The second is pagodo.py, which can directly use the information collected by ghdb_scraper.py.

What is Google Dork?
The Google Hacking Database (GHDB) is currently maintained by Offensive Securiy. For details, please click [here]. It contains the Google Search Collection, called Dorks. The majority of researchers can use it to search for applications that have security issues. This information is collected by Google’s search bot.

Tool installation
All the scripts of the tool are developed based on Python 3.6+. The majority of researchers can use the following commands to clone the project source code to the local and complete the tool installation:

git clone https://github.com/opsdisk/pagodo.git

cd pagodo

virtualenv -p python3 .venv  # If using a virtual environment.

source .venv/bin/activate  # If using a virtual environment.

pip install -r requirements.txt

What if it is blocked by Google?
If you receive an HTTP 503 error during the use of the tool, it means that Google has detected you as a bot and will block your IP address for a period of time. The solution is to use proxychains. The installation command of proxychains4 is as follows:

apt install proxychains4 -y

By editing the /etc/proxychains4.conf configuration file, you can connect different proxy servers in series and perform a circular search. In the following example, we will use different local listening ports (9050 and 9051) and set up two different dynamic socks proxies.

vim /etc/proxychains4.conf

round_robin

chain_len = 1

proxy_dns

remote_dns_subnet 224

tcp_read_time_out 15000

tcp_connect_time_out 8000

[ProxyList]

socks4 127.0.0.1 9050

socks4 127.0.0.1 9051

After introducing proxychains4 in the Python script, the script will query information through different IP addresses. Here you can also use the -e parameter to set the query interval:

proxychains4 python3 pagodo.py -g ALL_dorks.txt -s -e 17.0 -l 700 -j 1.1

ghdb_scraper.py
First of all, ghdb_scraper.py needs a list of all the current Google Dorks. The timestamp files of Google Dorks and other categories of Dorks are already in the code base. Fortunately, the entire database can be pulled locally using the GET request in ghdb_scraper.py, and we can also export all Dork to a file.

Get all Dork:

python3 ghdb_scraper.py -j -s

Get all Dork and write them into a separate category:

python3 ghdb_scraper.py -i

All Dork categories are as follows:

categories = {

    1: "Footholds",

    2: "File Containing Usernames",

    3: "Sensitives Directories",

    4: "Web Server Detection",

    5: "Vulnerable Files",

    6: "Vulnerable Servers",

    7: "Error Messages",

    8: "File Containing Juicy Info",

    9: "File Containing Passwords",

    10: "Sensitive Online Shopping Info",

    11: "Network or Vulnerability Data",

    12: "Pages Containing Login Portals",

    13: "Various Online devices",

    14: "Advisories and Vulnerabilities",

}

pagodo.py
If the file containing Google Dork already exists, we can use the -g parameter to pass the file to pagodo.py and start collecting vulnerable applications. The pagodo.py script uses the google Python library to search Google sites, for example:

intitle:"ListMail Login" admin -demo

The -d parameter can be used to specify a target domain name:

site:example.com

If too many requests are sent to Google in a short period of time, Google will judge us as a robot and block our IP address for a period of time. In order to make search queries look more user-friendly, we have also made some improvements to allow user agent randomization in Google search queries. This feature is available in the v1.9.3 version of the tool, allowing us to randomly select a different user agent for each search, which will simulate different browsers used in a large enterprise environment.

The second improvement is the randomization interval time for each search query. We can use the -e option to set the minimum time interval and use a random factor as the increase amount of the interval time:

# Create an array of jitter values to add to delay, favoring longer search times.

self.jitter = numpy.random.uniform(low=self.delay, high=jitter * self.delay, size=(50,))

In the following script, the code will select a random time from the jitter array and add it to the delay:

pause_time = self.delay + random.choice(self.jitter)

In this way, Google will not block our IP address.

Example

python3 pagodo.py -d example.com -g dorks.txt -l 50 -s -e 35.0 -j 1.1

Reviews

There are no reviews yet.

Be the first to review “PaGoDo: a powerful passive Google Dork”

Your email address will not be published. Required fields are marked *