UTOS logo
Utah Open Source

UTOSC 2009 Printable Schedule

Enjoy the newly minted printable schedule available for UTOSC 2009

Register Now! Update History
Hosted by:

Miller Free Enterprise Center (MFEC)
Sandy, Utah

Sponsors

Diamond

Sapphire

Emerald

General

Media

Publishers


Web scraping for fun and profit with ruby
Start: Oct 10, 1:00 p.m.
End: Oct 10, 2:00 p.m.
Location: 123 (map)
Presentation Download(s)

If you're looking for data on or for your business you can likely find sources online that will blow your mind. How can you effectively tap into the wellspring of data flowing through the tubes into your business or home?

In this presentation we'll take a quick look at the structured and semi-structured nature of the web, and then how to access it's information with not much effort using the Ruby scripting language and some of it's libraries including: Nokogiri/Hpricot, Mechanize, scrubyt, scrapi, webrat-scraper and the Watir family.

We'll discuss best practices from a business as well as technical perspective including:
* legal and ethical issues
* cost of data aquisition
* asynchronous distributed scraping
* browser impersonation
* getting info from DHTML & Ajax
* IP address obfuscation through onion routing
* throttling
* data extraction with CSS/XPath/regex

We'll have time for Q&A at the end.

About the presenter

JT Zemp (Thrive Information Solutions)