I don’t know when this happened, but it seems you can’t scrape (parse) google.com as easily as before. At least with Python, a simple
s=urllib.urlopen('http://www.google.com/search?q=define%3Aesoteric') r=s.read() print r
will give you this notice instead:
/snip/ Your client does not have permission to get URL /search?q=define%3Aesoteric from this server /snip/
And here’s me not understanding why my regexes are not matching. :) Apparently, you should at least show some kind of User Agent. In Python you can do this easily by subclassing URLopener and setting it’s version property to something, like:
class MyOpener(urllib.URLopener): version = "InternetExploiter/666" urllib._urlopener = MyOpener()
And I’m really starting to love TextMate. Just type your little Python script and hit Cmd-R. Brilliant.