Extract data while using "Run in Browser" mode

Questions & Answers about Helium Scraper 3
Post Reply
durlecs
Posts: 11
Joined: Sat Sep 28, 2019 2:11 am

Extract data while using "Run in Browser" mode

Post by durlecs » Sat Sep 28, 2019 4:38 pm

I would like to be able to run the scraper in the browser window so that I can solve a Captcha manually. However, when I use "Run in Browser", it does not extract the data. When I run it in the "Extract" mode, it does extract the data but there is no way for me to solve the captcha that I can find. How can I manually solve the Captcha and still extract the data?

Also, how can I pause the scraper, solve the Captcha, and then resume the scraper. Right now I am using Browser.Wait. However, I end up waiting on the timer even after I am done solving the Captcha. In the previous version of Helium, I was able to do Global.Pause() in a JavaScript but I don't see that function anymore. How can I pause the scraper and then resume when I am ready?

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Extract data while using "Run in Browser" mode

Post by webmaster » Sun Sep 29, 2019 6:09 pm

If this captcha is at the beginning of the extraction, what you can do is set Project -> Settings -> Use Main Browser to True. This will cause extractions to always run on the on-screen browser and start on whatever page and state the browser was on. If you use parallel extraction, the off-screen browsers will also be used.

If this captcha is on the middle of an extraction, the only option right now is to use a solving service (we have a few premades here). In principle, it should be possible to also use JavaScript to manually solve the captcha. The problem with the Pause solution is that it'd only work with the on-screen browser, and using a single browser is typically very slow. Perhaps a better alternative would be to have a solving service for the kind of captcha you're trying to solve. If you send me a URL to the captcha maybe I can look at whether this can be solved automatically.

Although if you're going to be using the on-screen browser only anyway, you could pause on JavaScript by running a script with this code:

Code: Select all

var resolve = null;
window.addEventListener('unload', () => resolve());
alert("Need CAPTCHA");
return new Promise(callback => resolve = callback);
Just put that in a script at Project Explorer -> Scripting -> Scripts and then use it like this:

Code: Select all

Browser.RunScript
   ·  Script.MyScript
   ·  0
When you return a promise from a script, the extraction will pause until the promise is resolved. In this case, the promise will resolve (and the extraction continue) as soon as you navigate away from the current page.

Let me know if you need any more help with this.
Juan Soldi
The Helium Scraper Team

Post Reply