Hello,
I manage to extract a table of 167 links, using hierarchical scrap as presented in the video. That works great, the next step is to scrap contents in these links
That's where things gets tricky, these pages have an heavy javascript and dynamic content. But to fully load the page I need to click the refresh button of the internal browser.
I can create kinds, and extarct data from one page loaded without issue.
but when using the "Navigate URLs at" action, I cannot reload each page, as a result Helium scraper load the complete page and is unable to scrap data.
is there any trick to force a reload of the url (with some javascript) before the extraction?
Regards,
Stephane
strange issue, reload required, how to do it?
Re: strange issue, reload required, how to do it?
Hello,
I managed to get the page reloaded with a few javascript command I found on the web.
but that does not solve the issue, the page is not loaded properly.
There may be something in the complex webpage I am loading that the internal browser does not like.
I tried to save the page locally on the pc with chrome - page saved after all the dynamic content is loaded - and helium scarper works perfectly on that local page.
I will look for workarounds, in the meantime If anybody as some idea .
i can send the link to you by email
Regards,
Stephane
I managed to get the page reloaded with a few javascript command I found on the web.
but that does not solve the issue, the page is not loaded properly.
There may be something in the complex webpage I am loading that the internal browser does not like.
I tried to save the page locally on the pc with chrome - page saved after all the dynamic content is loaded - and helium scarper works perfectly on that local page.
I will look for workarounds, in the meantime If anybody as some idea .
i can send the link to you by email
Regards,
Stephane
Re: strange issue, reload required, how to do it?
Hi,
What happens if you take one of these URLs and paste it into the address bar? Does it take you to the page it's supposed to take you to?
What happens if you take one of these URLs and paste it into the address bar? Does it take you to the page it's supposed to take you to?
Juan Soldi
The Helium Scraper Team
The Helium Scraper Team
Re: strange issue, reload required, how to do it?
Hello,
When I copy paste the link in the address bar of IE, or chrome it works well (screen capture attached.
When I copy paste in the browser of Helium Scraper, the dynamic content is not loaded the first time, after a refresh everything is fine.
Regards,
Stephane
When I copy paste the link in the address bar of IE, or chrome it works well (screen capture attached.
When I copy paste in the browser of Helium Scraper, the dynamic content is not loaded the first time, after a refresh everything is fine.
Regards,
Stephane
- Attachments
-
- 2013-04-27 08_27_40_ Resource Selector.png (133.34 KiB) Viewed 12278 times
Re: strange issue, reload required, how to do it?
It works fine when I do it. Which version of HS do you have installed? Also, what happens when you go to some other site (like google.com) and then paste the address in the address bar?
Juan Soldi
The Helium Scraper Team
The Helium Scraper Team
Re: strange issue, reload required, how to do it?
Hi,
Just purchase Helium Scraper, I should have the latest version.
I made the following modification
- upgrade to IE8, => this alone do not change anything
- enable the "use native XMLHttp support" option in the advanced settings. => that seems to solve the issue
The webpage I want to scrap make a heavy use of Ajax, could the "use native XMLHttp support" explain the previous strange behavior?
Regards,
Stephane
Just purchase Helium Scraper, I should have the latest version.
I made the following modification
- upgrade to IE8, => this alone do not change anything
- enable the "use native XMLHttp support" option in the advanced settings. => that seems to solve the issue
The webpage I want to scrap make a heavy use of Ajax, could the "use native XMLHttp support" explain the previous strange behavior?
Regards,
Stephane