Extraction not working

Eco · Post by **Eco** » Mon Oct 06, 2014 8:52 am

Hi Guys,

This is my first project so there is a good chance im doing somthing wrong.
I am following your video's (they are very fast, you should put some voice over to explain what you are doing, and make them a bit slower), and I am trying to extract listings from this site:
naturaltherapypages.com.au /natural_medicine/sa/Naturopath?limitno=10&pageno=10 (I took the http out on purpose and put a space in there so that it will not link).

Anyway, I setup the kinds, and it seems to pick them up properly.
I setup a process tree, but the data in not extracted properly.
The top level info is extracted, but when in the listing, the description is not taken out, but the full address is taken out.

I find it weird, because if I manually go into a listing and click the description kind, the text lights up.
Also I did a run and got the out of resources issue, and the video was 2 fast to follow.

Any help will be appreciated.
Thanks,
Eco

Eco · Post by **Eco** » Tue Oct 07, 2014 9:20 am

Hi Guys,

I am uploading here the project I made, can you tell me what Im doing wrong?
The system is running out of resources + not going to the next page.

Waiting to hear from you asap.

Thanks allot,
Eco

Post by **webmaster** » Tue Oct 07, 2014 4:57 pm

Hi Eco,

Here is how you meant to set up your actions:

: main1.png (9.63 KiB) Viewed 20253 times

Note that it uses a Navigate action instead of a Navigate Each one to navigate through the "Next" button, and the Repeat action is set to repeat 10 times. Now, this method will only let you navigate through a fixed number of pages. Here is how you can make it navigate through as many pages as it finds:

: main2.png (11.67 KiB) Viewed 20253 times

Regarding the memory leak, first try upgrading Internet Explorer to the latest version. If this doesn't help, try setting up your actions this way (to start with):

: main3.png (8.1 KiB) Viewed 20253 times

where the Extract to table: 'Table1' action is setup this way:

: extract.png (22.74 KiB) Viewed 20253 times

Note that I've added a line to it.

Then add another actions tree (called main4 here):

: main4.png (9.59 KiB) Viewed 20253 times

What you'd do is first run main3 and then, once you extracted all the links to Table1, run main4 which will navigate through the links you've extracted to Table1. Note that if you do this, you'd be at the first step of doing a multiple process extraction which would solve the memory leak (if upgrading IE doesn't). Here's how you'd implement multi-processes.

Also, you may want to take a look at this video since you're extracting related data to different tables.

Helium Scraper

Extraction not working

Extraction not working

Re: Extraction not working

Re: Extraction not working