I'm attempting to scrape Manta, but can't seem to get past the way the data is given from Manta.
I specify the data I want at Manta and it returns a page of links. Each link contains the data I want and I can scrape it - but none of these pages has a next link. This is not a problem since I'm working off the original page of links. However, when I get to the last link, there is nowhere to go. If I select Kind:Next and specify at least one, Helium stops because it's not there. If I go back to the starting page and select kind Next, I simply collect the 2nd page of data over and over. (At least I think that's whats going on...) I tried to use the PageNumberProperties, but haven't figured out how to incorporate that - I got it imported, but have not found documentation on how to use it yet. If I manually select page 2, page 3 and so on and then run the actions (w/o the got starting page) it works - but that's a bit tedious. Help would be appreciated!
Thanks.
Manta
Manta
- Attachments
-
- DubScrape.hsp
- (462.74 KiB) Downloaded 567 times
Re: Manta
Hi,
See the attached project. First of all, those two Execute JS actions shouldn't need to be there on normal circumstances, but because of a bug in the Navigate Each action (that will be fixed on the next update), it fails to go back to the original page as it should when scraping heavily loaded sites such as Manta.
Notice the Navigate: Next action (instead of your Select Kind action). Also, I'm using the Force Select action because you are using a navigation timeout of 5 seconds, which makes sense for a site like Manta which can take minutes to complete loading a page. This will prevent trying to navigate through the Next button if it's not there yet.
See the attached project. First of all, those two Execute JS actions shouldn't need to be there on normal circumstances, but because of a bug in the Navigate Each action (that will be fixed on the next update), it fails to go back to the original page as it should when scraping heavily loaded sites such as Manta.
Notice the Navigate: Next action (instead of your Select Kind action). Also, I'm using the Force Select action because you are using a navigation timeout of 5 seconds, which makes sense for a site like Manta which can take minutes to complete loading a page. This will prevent trying to navigate through the Next button if it's not there yet.
- Attachments
-
- DubScrape2.hsp
- (366.3 KiB) Downloaded 621 times
Juan Soldi
The Helium Scraper Team
The Helium Scraper Team