Simple GET/POST Crawler (Python)

2011-06-30

Some weeks ago I wrote a little tool to support me when analysing webpages. The python tool recursive crawls all links from a page, collects the GET-Parameter and filters out the FORM-Data. Simple! The actual crawling- and parsing-method is located in the LinkCrawler.py which makes use of the BeautifulSoup library. You can easily include it in your own scripts:

links =  LinkCrawler.LinkCrawler(url, layer=1, quiet=False)

I also added a little script to use it as a command line tool:

python rAnlyzr.py -u <url> [-l layer] [-q]

-u <url>        Specifies the URL. Format: http://example.com/
-l <layer>      Layers to crawl. Default = 1
-q              Quiet. Do not print crawled URL's
-h              Print this help

Example: python rAnlyzr.py -u http://robinverton.de/

Example Output:

$ python rAnlyzr.py -u robinverton.de -q
#########################################
# rAnzlyr v0.1 - Simple GET/POST Filter #
# Jun. 2011 - Robin Verton                    #
# http://robinverton.de                       #
#########################################

- /blog/
- /blog/hello-blog
- /hireme
- /blog/advanced-insert-into-injection-by-taking-advantage-of-the-primary-key
- /blog/recent.atom
- /blog/category/it-security
+ P /imprint
+   csrf
+   name
+   email
+   submit
- /whatido
- /
- /blog/category/other
- /blog/category/penetrationtesting
+ P /blog/addcomment/2
+   csrf
+   author_name
+   author_email
+   author_web
+   submit
+ P /blog/addcomment/3
+   csrf
+   author_name
+   author_email
+   author_web
+   submit
- /blog/mybloggie-2-1-6-sql-injection-persistent-xss
- /blog/category/advisories-publications
+ P /blog/addcomment/1
+   csrf
+   author_name
+   author_email
+   author_web
+   submit
- /imprint
- /blog/category/python-tools

Note: P indicates a FORM

Download here