Automated web screen shots with Perl
Posted in imagemagick, ie, perl Fri, 21 Jul 2006 15:02:00 GMT
I've been looking for a program that will take full screen shots of web pages even when the web page is larger than the window size on my physical screen, requiring scrolling. This morning I found such a program in Petr Šmejkal's Win32::CaptureIE when it was mentioned by Displeaser on DevShed Forums in the "Screenshot of webpage" thread. It uses ImageMagick for image manipulation.
From reading the Win32::CaptureIE POD, the CapturePage function does exactly what I want:
CapturePage ( ) Captures whole page currently loaded in the Internet Explorer window. Only the page content will be captured - no window, no scrollbars. If the page is smaller than the window only the occupied part of the window will be captured. If the page is longer (scrollbars are active) the function will capture the whole page step by step by scrolling the window content (in all directions) and will return a complete image of the page.
After installing ImageMagick, Image::Magick and Win32::CaptureIE on my Windows / ActiveState Perl system, I generated this screen shot with the following short program using no additional processing:
#!perl use strict; use warnings; use Win32::CaptureIE; StartIE( width => 900 ); Navigate( 'http://www.dev411.com/blog/' ); my $img = CapturePage(); $img->Write( 'capture.png' ); QuitIE;
Perl and CPAN continue to amaze me with their treasure trove of functionality. Are there similar tools for using Firefox, Linux, other image libraries or languages?
UPDATE: ishnid has found two programs with CLIs (posted to the same thread):
- khtml2png on SourceForge. This is a command-line program that looks like it can be run without a browser. It uses libkhtml (used by Konqueror) and ImageMagick's convertn.
- Pearl Crescent Page Saver, a commercial app but available in a free version. This is a Firefox extension and requires the browser.
UPDATE 2:: I recently tried Win32::CaptureIE with ImageMagick 6.3.0 and it doesn't work. Apparently there used to be a link to "PerlMagick" in older versions of ImageMagick that may not exist anymore. Unfortunately Win32::CaptureIE relies on PerlMagick.
UPDATE 3:: I just tried the free version of Pearl Crescent with Firefox 1.5.0.7 which it says it should support but I get a "Download error" with pageserverbasic-1.3.xpi.
I’ve looked around for this before without much success.
I’m amazed that someone hasn’t found a way to hook into the Gecko engine, and instead of rendering to Browser, rendered to a PNG/JPG file.
I wouldn’t know where to start looking, but it seems that all the hard-work has been dome already, and some clever b**tard just needs to redirect the output.
I’m not keen on solutions that require you to be running in/under X/Windows to render the page.
One day …
Great idea. I’d be excited about a solution that could run without a windowing system as well. Definately something to keep an eye out for.
Version 1.4 of Pearl Crescent has been released and works well with firefox 2.0.0.2
Is there a utility (or combination utilities) using which I can automate to read series of web pages & copying their texts? The utility ‘Mechanize’ helps? Thank you, BJ.