Burp Suite spider crawls the entire website that has been targeted on the scope. The spider crawls the entire website thereby mapping the entire structure of the website.
- Go to spider ->control and make sure Spider scope is set to use suite scope.
- In the “options” tab we can see the different options. let us take a look at the options presented.
In crawler Setting;
- We can see robot.txt (it specifies which website to crawl and which website not to crawl). By default the burp suite crawler ignores the robots.txt.
The above picture describes how the robots.txt looks like. Anything under “disallow” will not be crawled by the normal search engine bots but it will be crawled by Burp Suite.
- Detect custom “not found” responses – Burp Suite will crawl the webpage if it displays any other error other than the custom error. For example, the webpage will be crawled if the company make their own custom message instead of 404 error.
- Ignore links – If there is any links other than text (like image src), it will be ignored.
- Request root directories-if there is the root directories for the page it will try to crawl that too. For example, if we are crawl “img” and it has “root/img”, Burp Suite will also crawl root if access is granted.
- Max length –specifies how many link it can travel from the starting webpage. Its shown the length as 5 so it can traverse up to 5 webpages.
Form submission setting:
If the web page has the some forms, Burp Suite needs to figure out what happens once it gets to the form. If we select;
- Don’t submit – nothing will be selected.
- Prompt for guidance – prompt box will be opened in the browser to provide guidance on filling out the form.
- Auto – form will be filled automatically. We can add add, edit and remove the default details if we want.
Suppose the website has a login page or we need to be authenticated to crawl the page, we can use the “Application login” section.
Th application login section works the same way as the form submission section.
Spider engine help to setup the spider speeds. “Pause before retry” is used to set the interval between the each request so the server doesn’t get crashed.
Throttle help to set random interval so the request look like it was made by human and not by bots.
Request header used to set the header option.
Go to control tab and click “spider is paused” to start crawling.
The spider will start to crawl.