SafariSucker

Overview

SafariSucker facilitates the automatic download of books from O'reilly's Safari service. SafariSucker is written in Java, and runs on any platform where a JVM is avaliable.

O'reilly's Safari service is very useful, but there needs to be an easy way to download books for offline reading and printing. O'reilly's terms of service allow for download and printing, so we belive it will be the responsibility of our users to use SafariSucker responsibly and not infringe on the copytights of the authors and publishers who have content on Safari. We plan for there to be a large startup message that is displayed to the user expressing this view everytime SafariSucker starts.

SafariSucker Features

Simple interface:
- Give SafariSucker a URL to the first page of a book and SafariSucker will do the rest
Respectful of O'reilly
- SafariSucker should have a throttle which will keep it from overloading Oreilly's servers. It may be a good idea to set the throttle to a gentle setting by default and warn the user if he tries to override it to go faster.
Puts respect of copyright in the user's hands
- SafariSucker can easily be used to facilitate copyright infringement, but so can an iPod, a cd burner or your eyes and hands. It will be up to the users to repect O'reilly and it's partners.

Implementation Notes

Safarisucker will be written in Java and wll use at least the Apache Xalan and Jakarta HttpClient libraries. We plan to write a native gui using the Cocoa frameworks on Mac OS X, and hopefully someone will contribute a Swing GUI (or maybe SWT) which can be used on other platforms. If there is no GUI avaliable for a particular platform there will be a command line mode.

The implementation of SafariSucker should be fairly straightforward, we already have some code written that can extract the link for the printable version of a page in Safari, and find the link to the next section. What needs to be added is code to crawl Safari and maintain the user's session. Code also needs to be written to pull down diagrams and change image URLs to work offline. At that point we will probably have the command line version of SafariSucker pretty much done. Beyond that a GUI will be nice and we plan to write one for Mac OS X. A swing gui would be nice, but hopefully someone will contribute it so we dont have to deal with swing ;-)

Protocols used

Http
HTML
Probably going to be some XML involved :)
May output PDF if we get really ambitious.

Current Status