Scrape text & associated large images behind thumbnail

Questions and answers about anything related to Helium Scraper
Post Reply
ecollectica
Posts: 3
Joined: Fri Apr 01, 2011 4:28 am

Scrape text & associated large images behind thumbnail

Post by ecollectica » Fri Apr 01, 2011 4:40 am

Your solution looks great but I need help. Would like to scrape a page like http://tinyurl.com/3kpc5j8 . Have looked at tutorials but still seems beyond my understanding (sorry!). How can I extract (a) Name of item (i.e. BRITISH GRANDFATHER CLOCK; Mahogany case with painted dial and 8 day movement with sweep second hand and calendar, 19th c.; 84") (b) Additional Description (i.e. Auction Date: 04/08/11
Starting Bid: $500.00 USD
Estimate: $1000.00 - $1500.00 USD
Description: BRITISH GRANDFATHER CLOCK; Mahogany case with painted dial and 8 day movement with sweep second hand and calendar, 19th c.; 84")
and (c) images associated with each item. I would like to download the larger image (or sometimes many images) associated with each item (available when clicking on thumbnail image.

Can you help? Thanks!

webmaster
Site Admin
Posts: 494
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Scrape text & associated large images behind thumbnail

Post by webmaster » Fri Apr 01, 2011 9:25 pm

Hi,

The attached project does just what you need, but it required a little JavaScript. Just go to the Actions panel and press play on the "Navigate Through Each" actions tree. It must be started in a page that contains a list of links such as the one that opens when you just open the project. Column Url in Table1 and column Other Images in Table3 (excuse the non descriptive table names , but I'm sure you can tell what they are by looking at their content :) ) are the URL of the page where the big images are, so you can use them as a common Id, such as on this query (you can run it from the database panel by clicking on the SQL button):

Code: Select all

select [Name], [Big Image] , [Description]  from [Table3] , [Table1] where [Table3.Other Images] = [Table1.Url]
The JavaScript code is in the "Extract Images" tree. It can probably give you a better idea of how to use JavaScript in Helium Scraper. I found out that there is an array that contains the list of big images URL's called imgurls. There is also a function called swap that is called when you click on any thumbnail and this is what all does:

Code: Select all

document['fullsize'].src = imgurls[index];
Where index is a parameter passed to that function, and is different for each thumbnail. So what my code does is do this manually for each index (as long as there is a URL in that index and as long as the URL doesn't end with the string "null" (I also found out that some of the URLs ended with the string "null" and they returned a 404 error when trying to download them)).

Let me know if you have any other question.
Attachments
Clocks.hsp
(1.06 MiB) Downloaded 219 times
Juan Soldi
The Helium Scraper Team

ecollectica
Posts: 3
Joined: Fri Apr 01, 2011 4:28 am

Re: Scrape text & associated large images behind thumbnail

Post by ecollectica » Fri Apr 01, 2011 11:35 pm

Juan,

Thanks very much for your help. My partner has good javascript knowledge so he'll know what to do. I'll let you know how it turns out.

Again, looks like a great product at a very reasonable price compared to anything else out there. Good work!

Jeff

Post Reply