Some weeks ago I wrote a little tool to support me when analysing webpages. The python tool recursive crawls all links from a page, collects the GET-Parameter and filters out the FORM-Data. Simple! The actual crawling- and parsing-method is located in the LinkCrawler.py which makes use of the BeautifulSoup library. You can easily include it in your own scripts:
links = LinkCrawler.LinkCrawler(url, layer=1, quiet=False)
I also added a little script to use it as a command line tool:
python rAnlyzr.py -u <url> [-l layer] [-q]
-u <url> Specifies the URL. Format: http://example.com/
-l <layer> Layers to crawl. Default = 1
-q Quiet. Do not print crawled URL's
-h Print this help
Example: python rAnlyzr.py -u http://robinverton.de/
Example Output:
$ python rAnlyzr.py -u robinverton.de -q
#########################################
# rAnzlyr v0.1 - Simple GET/POST Filter #
# Jun. 2011 - Robin Verton #
# http://robinverton.de #
#########################################
- /blog/
- /blog/hello-blog
- /hireme
- /blog/advanced-insert-into-injection-by-taking-advantage-of-the-primary-key
- /blog/recent.atom
- /blog/category/it-security
+ P /imprint
+ csrf
+ name
+ email
+ submit
- /whatido
- /
- /blog/category/other
- /blog/category/penetrationtesting
+ P /blog/addcomment/2
+ csrf
+ author_name
+ author_email
+ author_web
+ submit
+ P /blog/addcomment/3
+ csrf
+ author_name
+ author_email
+ author_web
+ submit
- /blog/mybloggie-2-1-6-sql-injection-persistent-xss
- /blog/category/advisories-publications
+ P /blog/addcomment/1
+ csrf
+ author_name
+ author_email
+ author_web
+ submit
- /imprint
- /blog/category/python-tools
Note: P indicates a FORM
Download here