URGENT: In a Fix! Not able to extract certain data

Questions and answers about anything related to Helium Scraper
Post Reply
saahilgoel
Posts: 8
Joined: Wed May 09, 2012 9:04 am

URGENT: In a Fix! Not able to extract certain data

Post by saahilgoel » Sun Jun 03, 2012 1:19 pm

Hi,

Please see attached project. I am trying to scrape data on a certain page in tabs. Even though the tabs navigate fine (I viewed the page during execution), data isn't being extracted at all from these tabs.

Please let me know what I am doing wrong. The URL to scrape is http://www.theitdepot.com

Thanks,
Saahil
Attachments
itdepot.zip
(35.84 KiB) Downloaded 565 times

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: URGENT: In a Fix! Not able to extract certain data

Post by webmaster » Tue Jun 05, 2012 2:51 am

Hi Saahil,

All you need to do is make sure your kinds are selecting the appropriate elements. To do this, add requirements to all your Extract actions according to what you know each of them should extract. If, for instance, you know every product has a product name, add a requirement of exactly one item in your Extract action that extracts the product name. This will show you a message box if the product name is not found and let you pause the extraction to fix the problem.

Most of the cases, the problem will be that Helium Scraper is not finding (continuing with the product name example) the product name in this particular page because it has one or more properties that are different from the titles you've used as a sample when you created your "ProductName" kind. All you need to do is pause the extraction, select the product name and add it to your "ProductName" kind by clicking on the "Add selection to this kind" button. Take a look at this video to see a real life example of how this is done.

Finally, since your extraction tree goes three levels deep, I recommend breaking it into sections by extracting the URLs of the categories and then using a Navigate URLs actions to navigate through the extracted URLs. Similarly, extract the product links URLs to a table and then use a Navigate URLs action to navigate to each product page and extract their details. This way, if there is any problem with any of these steps, you'll only need to deal with one of them at a time.
Juan Soldi
The Helium Scraper Team

saahilgoel
Posts: 8
Joined: Wed May 09, 2012 9:04 am

Re: URGENT: In a Fix! Not able to extract certain data

Post by saahilgoel » Tue Jun 26, 2012 10:49 am

Hi Jodi,

I don't think its a problem of the "kind" attribute. I have tried on several pages and selecting the features, specifications and manufacturer info "kind" does select the right data on the page. Also, the javascript links that toggle the various tabs also work fine.

During the execution, I can even see the tabs getting changed. However, the data isn't being scraped. Please see screenshots.

Thanks,
Saahil
Attachments
itd.zip
(981.48 KiB) Downloaded 579 times

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: URGENT: In a Fix! Not able to extract certain data

Post by webmaster » Wed Jun 27, 2012 6:27 am

Try following the steps described in the "My project is not extracting any / enough data." section on this post. Should let you identify the cause of the problem. Make sure you add requirements to every action takes requirements (Extract, Navigate, Navigate Each...).

Let me know what happens.
Juan Soldi
The Helium Scraper Team

Post Reply