Wednesday, 21 December 2011

Web scraping

a computer software technique of extracting information from websites.

HtmlUnit is a headless web browser written in Java.
Selenium is a portable software testing framework for web applications.
JWebUnit is a Java-based testing framework for web applications.

JWebUnit and HtmlUnit both stub the browser, whereas Selenium runs inside the browser.

CAPTCHA

"Completely Automated Public Turing test to tell Computers and Humans Apart"
In other words, a computer administers a test to determine if the subject is or is not human.