github twitter email rss
Simple GET/POST Crawler (Python)
Jun 30, 2011
2 minutes read

Some weeks ago I wrote a little tool to support me when analysing webpages. The python tool recursive crawls all links from a page, collects the GET-Parameter and filters out the FORM-Data. Simple! The actual crawling- and parsing-method is located in the LinkCrawler.py which makes use of the BeautifulSoup library. You can easily include it in your own scripts:

links =	 LinkCrawler.LinkCrawler(url, layer=1, quiet=False)

I also added a little script to use it as a command line tool:

python rAnlyzr.py -u <url> [-l layer] [-q]

-u <url>		Specifies the URL. Format: http://example.com/
-l <layer>		Layers to crawl. Default = 1
-q				Quiet. Do not print crawled URL's
-h				Print this help

Example: python rAnlyzr.py -u http://robinverton.de/

Example Output:

$ python rAnlyzr.py -u robinverton.de -q
#########################################
# rAnzlyr v0.1 - Simple GET/POST Filter	#
# Jun. 2011 - Robin Verton    				  #
# http://robinverton.de				      	  #
#########################################

- /blog/
- /blog/hello-blog
- /hireme
- /blog/advanced-insert-into-injection-by-taking-advantage-of-the-primary-key
- /blog/recent.atom
- /blog/category/it-security
+ P /imprint
+	csrf
+	name
+	email
+	submit
- /whatido
- /
- /blog/category/other
- /blog/category/penetrationtesting
+ P /blog/addcomment/2
+	csrf
+	author_name
+	author_email
+	author_web
+	submit
+ P /blog/addcomment/3
+	csrf
+	author_name
+	author_email
+	author_web
+	submit
- /blog/mybloggie-2-1-6-sql-injection-persistent-xss
- /blog/category/advisories-publications
+ P /blog/addcomment/1
+	csrf
+	author_name
+	author_email
+	author_web
+	submit
- /imprint
- /blog/category/python-tools

Note: P indicates a FORM

Download here


Back to posts