Memory Issue

Questions and answers about anything related to Helium Scraper
Post Reply
aigoodaigoo
Posts: 10
Joined: Wed May 18, 2011 6:18 pm

Memory Issue

Post by aigoodaigoo » Thu May 19, 2011 4:22 am

While my scraping seems to work fine, I get a memory error; after about 1000 records or so with multiple columns in one table plus another 3000 records or so in another table - totaling around 4000 records or so, private memory fills up to about 1.5GB, at which point, I'm unable to continue without deleting tables and such (e.g.: Program crash, unable to continue, etc.) I've used Export to Database to keep much of the data and then re-start, removing tables and such, but I'm wondering if there's cache clearing Javascript or some other workarounds. Can you advise?

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Memory Issue

Post by webmaster » Thu May 19, 2011 5:23 am

Try updating your Internet Explorer to latest version. Also, are you by any chance, using the IE option to not show pictures? I once tried that to see if navigation would be faster, but memory just started building up like crazy. Most likely some IE bug.

How many columns do you have in your tables?
Juan Soldi
The Helium Scraper Team

aigoodaigoo
Posts: 10
Joined: Wed May 18, 2011 6:18 pm

Re: Memory Issue

Post by aigoodaigoo » Thu May 19, 2011 5:36 am

I'm actually using Google Chrome. The memory I'm talking about is for Helium Scraper process (Helium Scraper.exe). I have 8GB, so there is plenty of space there, but Helium Process is keep building up private memory as more data are scraped and put into tables. What I don't understand is it seems Helium is cacheing all the pages and table records in memory without clearing them out. Is there some clearing cache or buffer function?

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Memory Issue

Post by webmaster » Thu May 19, 2011 5:42 am

Helium Scraper uses Internet Explorer. The browser you use inside Helium Scraper is actually Internet Explorer. That's why updating IE might help.
Juan Soldi
The Helium Scraper Team

aigoodaigoo
Posts: 10
Joined: Wed May 18, 2011 6:18 pm

Re: Memory Issue

Post by aigoodaigoo » Thu Jun 23, 2011 10:39 am

Follow up question regarding memory mgmt and IE issue. It looks like memory mgmt is worse when I use Navigate URL with nested loops extract. I ended up breaking the queries into multiple parts as a hack as workaround for memory issue. It's not a great solution and somewhat manual, but eventually it gets the job done. More importantly, I have IE8, and I'm very reluctant to upgrade to IE9, but what I just realized is that Helium Scraper built-in IE is not IE8, to my surprise. I was using Helium Scraper to visit another site, and I get a warning sign that my browser (inside HS) is outdated! Can you explain how to "upgrade" the HS browser to at least recognize that I have IE8?

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Memory Issue

Post by webmaster » Thu Jun 23, 2011 7:10 pm

Hi,

To change the IE version used by Helium Scraper you will need to add a value to the registry. To do this, follow these steps, but please, be very careful not to modify anything else when doing it:
  1. Click Start -> Run.
  2. Type regedit and press OK.
  3. Navigate to the "HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Main\FeatureControl\FEATURE_BROWSER_EMULATION" folder from the left panel.
  4. Make sure the "FEATURE_BROWSER_EMULATION" folder is selected, and from the main menu, go to Edit -> New -> DWORD (32-bit) Value.
  5. Rename the new value added to Helium Scraper.exe.
  6. Double click Helium Scraper.exe.
  7. Select the Decimal base.
  8. Set the Value data to 8000 and press OK.
You can see your IE version by going to this page and looking at the number after "MSIE" in the "Browser (User-Agent)" section in your browser details. If you would like to reverse the changes made, just erase the "Helium Scraper.exe" key you just created.

Let me know how this affects your memory problem. Also, is it possible for you to send us your project, or at least the URLs that are giving you problems to run a deeper investigation on this issue?
Juan Soldi
The Helium Scraper Team

aigoodaigoo
Posts: 10
Joined: Wed May 18, 2011 6:18 pm

Re: Memory Issue

Post by aigoodaigoo » Mon Jun 27, 2011 5:35 am

My registry under HKEY_CURRENT_USER did not have the entry, but HKEY_LOCAL_MACHINE did, so I am assuming your suggestion work better here. However, when I updated the registry, it does not seem to work. Do I need to re-start machine or should I create FEATURE_BROWSER_EMULATION folder under HKEY_CURRENT_USER?

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Memory Issue

Post by webmaster » Mon Jun 27, 2011 5:46 am

Hi,

Try restarting the machine after adding the key and let me know how it goes.

Are you by any chance scraping from Google? If so, is Google Instant turned on? This will cause a memory leak in the Internet Explorer component. Turn it off and your problem should be solved.
Juan Soldi
The Helium Scraper Team

aigoodaigoo
Posts: 10
Joined: Wed May 18, 2011 6:18 pm

Re: Memory Issue

Post by aigoodaigoo » Thu Jun 30, 2011 2:34 am

I've restarted the machine w/ registry mods, and still the same issue: the site I'm trying to access is:

nwf.smxthrive.com

I get an immediate pop-up that says to upgrade my browser to IE8, but I already have IE8. Do I need to set it as default browser? I'm also wondering whether the registry mods has to be added under HKEY_CURRENT_USER w/ new folders created.

I've also attached the project file w/ memory issue. I think the problem is w/ navigating URL from tables that's causing memory overflow with nested loops. Essentially, if the list is within 100 entries, it works okay but say 1000 or more seems to create the memory exception error. Please take a look and see if there's a way to clean up the cache or not use the nested loops in the scraping prcedure.

*Note: Tried adding file but says file size limit is 512KB and second time says 256KB? I tried zipping but can't get it down to that size.

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Memory Issue

Post by webmaster » Thu Jun 30, 2011 3:39 am

Hi,

I've changed the forum configuration. You should be able to attach file of up to 2 mb. If this is not enough, I also sent you a PM with my email so you can send it there.

Regarding the registry problem, please try adding the required key under HKEY_CURRENT_USER and then perform the changes I described above. I tried myself changing it under HKEY_LOCAL_MACHINE and then restart and it didn't work either. But it does work when I change it under HKEY_CURRENT_USER.

Let me know how that goes.
Juan Soldi
The Helium Scraper Team

Post Reply