Enjoy the newly minted printable schedule available for UTOSC 2009

Web scraping for fun and profit with ruby
Start: Oct 10, 1:00 p.m.
End: Oct 10, 2:00 p.m.
Location: 123
(map)
Presentation Download(s)
If you're looking for data on or for your business you can likely find sources online that will blow your mind. How can you effectively tap into the wellspring of data flowing through the tubes into your business or home?
In this presentation we'll take a quick look at the structured and semi-structured nature of the web, and then how to access it's information with not much effort using the Ruby scripting language and some of it's libraries including: Nokogiri/Hpricot, Mechanize, scrubyt, scrapi, webrat-scraper and the Watir family.
We'll discuss best practices from a business as well as technical perspective including:
* legal and ethical issues
* cost of data aquisition
* asynchronous distributed scraping
* browser impersonation
* getting info from DHTML & Ajax
* IP address obfuscation through onion routing
* throttling
* data extraction with CSS/XPath/regex
We'll have time for Q&A at the end.