Helium Scraper

Posted: **Thu Apr 25, 2013 2:42 pm**

Hello,

I manage to extract a table of 167 links, using hierarchical scrap as presented in the video. That works great, the next step is to scrap contents in these links

That's where things gets tricky, these pages have an heavy javascript and dynamic content. But to fully load the page I need to click the refresh button of the internal browser.
I can create kinds, and extarct data from one page loaded without issue.

but when using the "Navigate URLs at" action, I cannot reload each page, as a result Helium scraper load the complete page and is unable to scrap data.
is there any trick to force a reload of the url (with some javascript) before the extraction?

Regards,
Stephane

Posted: **Fri Apr 26, 2013 12:30 am**

Hello,

I managed to get the page reloaded with a few javascript command I found on the web.
but that does not solve the issue, the page is not loaded properly.
There may be something in the complex webpage I am loading that the internal browser does not like.

I tried to save the page locally on the pc with chrome - page saved after all the dynamic content is loaded - and helium scarper works perfectly on that local page.

I will look for workarounds, in the meantime If anybody as some idea .
i can send the link to you by email

Regards,
Stephane

Posted: **Fri Apr 26, 2013 2:54 pm**

Hi,

What happens if you take one of these URLs and paste it into the address bar? Does it take you to the page it's supposed to take you to?

Posted: **Sat Apr 27, 2013 12:41 am**

Hello,

When I copy paste the link in the address bar of IE, or chrome it works well (screen capture attached.
When I copy paste in the browser of Helium Scraper, the dynamic content is not loaded the first time, after a refresh everything is fine.

Regards,
Stephane

Posted: **Mon Apr 29, 2013 2:12 am**

It works fine when I do it. Which version of HS do you have installed? Also, what happens when you go to some other site (like google.com) and then paste the address in the address bar?

Posted: **Mon Apr 29, 2013 5:49 am**

Hi,

Just purchase Helium Scraper, I should have the latest version.

I made the following modification
- upgrade to IE8, => this alone do not change anything
- enable the "use native XMLHttp support" option in the advanced settings. => that seems to solve the issue

The webpage I want to scrap make a heavy use of Ajax, could the "use native XMLHttp support" explain the previous strange behavior?

Regards,
Stephane

Helium Scraper

strange issue, reload required, how to do it?

strange issue, reload required, how to do it?

Re: strange issue, reload required, how to do it?

Re: strange issue, reload required, how to do it?

Re: strange issue, reload required, how to do it?

Re: strange issue, reload required, how to do it?

Re: strange issue, reload required, how to do it?