Picture Extraction

Questions and answers about anything related to Helium Scraper
Post Reply
kevin994
Posts: 3
Joined: Wed Nov 23, 2011 3:15 pm

Picture Extraction

Post by kevin994 » Tue Nov 29, 2011 9:17 am

Hi,
I'm loving your software.
I want to extract pictures from this site:

http://www.endless.com/s/ref=dp_msr/?ke ... geCode%3Dd

Helium Scraper does not open the simulated navigation to click on the image which has a hover over image grid. I need to click through so that I can scrape the images.

I'm not conversant with Javascript so not sure what to do?

Kind regards
Kevin.

webmaster
Site Admin
Posts: 501
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Picture Extraction

Post by webmaster » Thu Dec 01, 2011 7:19 pm

Hi Kevin,

Here is a sample project that extracts those pictures.

The jQuery Fire Event project contains an actions tree that will fire using jQuery (since this website uses jQuery) whatever event you choose on any given kind. In the sample project (Shoes.hsp), which uses jQuery Fire Event, I added a kind that selects the icons in each of which it fires the mousemove event. I also added a kind called Big IMG that selects the big image that appears on the right when you hover your mouse over the middle image, which is the one we want to download.

This kind was pretty tricky to create. First I had to mouse over to the middle image so the picture on the right appears, then move my mouse out of the browser from the bottom and turn on selection mode, then go back without touching any scroll bar, and select it to then create it. But then there was another problem. For some reason there was another image that was exactly the same as this image, except that its src attribute, which contains the URL of the image was different, and that it was hidden. So when the image on the right wasn't visible these two images were pretty much the same except for the src attribute, which caused the Big IMG kind to select both images.

So I had to write a JS Gatherer called JS_ParentDisplay that returns the display attribute (which basically tells you whether is visible or not) of the DIV element that contains the image. Then I manually added the JS_ParentDisplay = block (which basically means it is visible) property to my Big IMG kind so that it only selects the visible image, and then use the jQuery Fire Event again but now to simulate a mouseover event on the middle image so that the right big image becomes visible and then extract it.

I also added two Select Kind actions that select the images that will be used by both Fire Event actions just so that you get an error message if they're not found in the page. If so, you can try to fix these kinds yourself by selecting the corresponding elements in the page and adding them to their corresponding kind, or you can let me know in which URL it fails and I'll do the fix.

Also, the reason why there are 2 Fire Event trees is because you cannot have nested actions trees (you'd get an error if there is one Fire Event inside the same Fire Event). Note that you can double click each of these actions in the Sample actions tree to see how they are configured.
Attachments
jQuery Fire Event.hsp
(293.67 KiB) Downloaded 186 times
Shoes.hsp
(332 KiB) Downloaded 186 times
Juan Soldi
The Helium Scraper Team

Post Reply