ScraperScope is an experimental web scraper intended for use in systems that automatically analyze or process web pages. It seeks to locate and present in plain text the heart of the web page without any of the surrounding material and boilerplate. The module contains no knowledge of any particular web site but attempts to figure out what the page itself is talking about.
The alpha version is meant to test the basic logic and to obtain feedback -- for which there is a form at the end of each page of results -- from those of you who try out the system.
In this version, no effort has been made to optimize the module for speed. If people are interested, we would be happy to talk about it.
The next version will contain functionality to fetch follow-up pages. There will also be some improvements in presenting the correct paragraphing in certain web sites where the html markup takes some shortcuts.
Try the scraper out.
We hope to read your feedback.
|