OnlineStore Test Project

Questions and answers about anything related to Helium Scraper
Post Reply
Maxkuba
Posts: 5
Joined: Wed May 11, 2011 4:31 pm

OnlineStore Test Project

Post by Maxkuba » Wed May 11, 2011 4:42 pm

Dear Team,
as a foreword, Im still in test period, but Ill defenatly purchase your great software.
I tried a few, incl. visual web ripper, which is not half as easy and good as your app...^^
One suggestion, it should be possible(in a very easy way) to create a relational database like in vww..

Now to my problem
Im tryng to extract data from a online shop for indian products (by the way very cheap and good shop).
http://www.get-grocery.com/
So far you software works fine, I was able to extract alot of data, but not all.
If I click on the select Items, everything is selected and works fine, but will scraping it doesnt take all the data.
It is not proceeding every link correctly. On around 70% of the next page steps it works fine but not on all.
What Im doing wrong?
Please give me some support.
Regards
GetGrocery_com_prjct.rar
(135.51 KiB) Downloaded 738 times

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: OnlineStore Test Project

Post by webmaster » Wed May 11, 2011 6:08 pm

Hi,

I've made a few changes to your project. First, your Navigate Next inside the Navigate Each won't do anything other than navigate to the next page and then go to the next category. So only the first page is being extracted. There are several ways to achieve what you are trying to, but the easiest one is by importing a premade project I've posted right here. There is more information about that premade in the linked post. You can see how is being used in the attached project.

Then, your "Navigate Each: Categories" uses "Simulate click". I've tested it without simulating click and it worked fine. I wouldn't use it unless is necessary, such as with form buttons or AJAX stuff. Also, your "ProductLink" kind wasn't actually selecting every product. I found a few (such as in the "Confectionary" category) where not all of them were selected.

I made a little SQL query that you can find in the database panel under the Queries tab. All you need to do is change the "Confectionery" text for any other category and you will see how many items from that category have been extracted. You can use this to easily make sure all products are being extracted. But beware of tricky items! A few of them will have a different category that the category under which they are being extracted. For instance, the count for the "Confectionery" category is 10 even though the page says there are 11. This is because one of them's category is actually "Sweets".

Also, and this is just me being picky or perhaps trying to show off my skills :D , I found that the "New Items" category is full of items that belong to a different category, so I created a kind called "Categories No New Items" that selects every category except for the "New Items" one. To get a clue of how I made this kind, take a look at the "JS_IsNewItems" javascript gatherer in Project -> JavaScript Gatherers. Also, note the "JS_IsNewItems" property of the kind.

One last thing: when you bump into this kind of problems, is usually helpful to break the project apart. You could have, for instance, extract all of the categories' URLs and then use a "Navigate URLs" to visit each of them. This "Navigate URLs" actions comes very handy when breaking projects apart this way.

So if you download the project and press play to the "GetGrocery2" actions tree, everything should work fine. You can change the "Navigate Each: Categories No New Items" back to "Categories" if you wish.

I think I'm not forgetting anything, but let me know if I am or you need any further help.
Attachments
GetGrocery2.hsp
(2.27 MiB) Downloaded 762 times
Juan Soldi
The Helium Scraper Team

Maxkuba
Posts: 5
Joined: Wed May 11, 2011 4:31 pm

Re: OnlineStore Test Project

Post by Maxkuba » Wed May 11, 2011 7:43 pm

Wow Awsome!!
Thank you so much :-)

Maxkuba
Posts: 5
Joined: Wed May 11, 2011 4:31 pm

Re: OnlineStore Test Project

Post by Maxkuba » Thu May 12, 2011 6:39 pm

Hi,
can you give me a hint how I can extract into the table the ProductLink without messing my table up?
I selected in the Extract node ProductLink and the "Link" option, (also "Url") but sadly it doesn't really extract the link to the appropriate details from the Product..
Do you have any idea?
Regards,

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: OnlineStore Test Project

Post by webmaster » Fri May 13, 2011 7:57 am

If you extract the "Url" property of any element inside a page, you will get the URL of the page in which the element is. You need to make sure your kind is selecting any element inside that page. You can use a requirement of "At Least" 1 item in your "Extract" action for this purpose.

If you use the "Link" property, it will extract the "href" attribute, which usually contains the destination page, unless the page is loaded with AJAX or javascript.

If you are still having this problem I'll be glad to take a look at your project and help you out.
Juan Soldi
The Helium Scraper Team

Post Reply