Incredible memory leak when not showing pictures

Let us know if anything goes wrong with our baby :)
Post Reply
vodkasushi
Posts: 11
Joined: Wed Apr 18, 2012 2:57 pm

Incredible memory leak when not showing pictures

Post by vodkasushi » Thu May 03, 2012 1:46 pm

Well, this took me by surprise. In an attempt to save bandwidth, I disabled 'Show Pictures' in Internet Explorer (since Helium Scraper uses IE as the web browser). After a 20 minutes, Helium came back with a "Low on Resources!" error, which was interesting. Repeated tests with browser emulation - 7, 8, 9... all cause RAM usage to shoot up constantly and consistently. I'm running the Scraper now with images on and it's only reached 120MB, whereas with Images off it would have been 230MB right now.

I don't know where which component is at fault (Internet Explorer is very very bad though, so I like to point fingers at it first), but have you considered a 'Low Bandwidth' usage mode? I'm assuming your program was written in a .NET language (only assuming, from the installer), would something like HTMLAgilityPack work for running the actual scraping? I don't really need to view the pages when I've completed testing.

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Incredible memory leak when not showing pictures

Post by webmaster » Fri May 04, 2012 3:41 am

Hi,

That's just how the WebBrowser (Internet Explorer, basically) component works. I believe there is another post somewhere where I recommend not using the not download pictures options for this very reason. This is one of the mysteries of the Trident layout engine.

Regarding HTMLAgilityPack, we have consider it, but to start with you wouldn't have javascript support, which is needed by many features and then you'd end having to write a bunch of code and for that matter you're better off coding the whole project on .NET from scratch.
Juan Soldi
The Helium Scraper Team

Post Reply