Sazz approached me this evening with an interesting little challenge. She volunteers with a group called WIRE and has been charged with gathering a list of all GPs in rural Victoria in order to distribute information. Fortunately this information is available from the government's Human Services Directory search page, but the prospect of manually entering each suburb/town in Victoria and copying across the results was, clearly, less than thrilling.
To the rescue comes a really neat piece of Python technology named Mechanize.
The result was a 96 line script (including docstrings and proper comments!) that was able to query the website, parse the output, and spit out a list of 1290 GP clinics in the state, along with addresses and contact details. I've uploaded it if anyone is interested in seeing the Mechanize API in action: gp_lister.py.
Kudos to the Mechanize developers (developer?) for providing exellent docstrings on all the Python modules, making working with the API pretty much painless. My only complaint would be that the API docs should also be online to save having to read them from within the Python interpreter.
I understand that this library was inspired by a Perl library of similar name, so I'm not going to proclaim Python the ultimate platform for web-scraping technology. As much as I'd like to. But if you dont feel like wrapping your head around the $< /!@/< of Perl to get something like this done, I can heartily recommend Python and Mechanize.
Update [01/05/05, 0200]: Fixed some very obviously broken doubling-up of content. Remember kids, dont blog in your sleep.