Navigate URL and Extract set on a count

Questions and answers about anything related to Helium Scraper
Post Reply
crookedleaf
Posts: 38
Joined: Tue Dec 11, 2012 6:44 pm

Navigate URL and Extract set on a count

Post by crookedleaf » Wed Dec 12, 2012 10:11 pm

Subject is a bit complicated, so i'll try to explain this the best I can.

I have a scrape that is set to navigate to a list of 500 URL's, give or take, and extract data off each provided URL. What I need to try and do is have it go to only 6 of the URLs, scrape the data off each, then pretty much log out, log back in, then continue the scrape. This is to get past captcha that comes up every 6 searches. I am able to send logout and login commands via a URL as well, so I don't know if that helps. So here is an example of the URLs it is searching and scraping:

https://www.website.com/secure/member/r ... F01%2F2013
https://www.website.com/secure/member/r ... F10%2F2013
https://www.website.com/secure/member/r ... F19%2F2013
https://www.website.com/secure/member/r ... F28%2F2013


https://www.website.com/logout_post
https://www.website.com/secure/login_po ... ord=abc123

So I would need it to go through 6 of the URLs provided, one by one, scraping each, then process the next 2 URLs (or click 'logout', then click 'login', then fill with the username and password) WITHOUT scraping, repeat.

Is this at all possible?

crookedleaf
Posts: 38
Joined: Tue Dec 11, 2012 6:44 pm

Re: Navigate URL and Extract set on a count

Post by crookedleaf » Wed Dec 12, 2012 11:45 pm

i've created a "Execute JS" based off something I read in another post, but it is only processing the Login link, and not the Logout, so I must be doing something wrong :/

Here is what I have:

Code: Select all

var pauseEvery = 6;
var waitSeconds = 5;

if(Tree.UserData == undefined) Tree.UserData = 0;

Tree.UserData++;

if((Tree.UserData % pauseEvery) == 0) 
{
	Global.Wait(waitSeconds * 1000);
	window.location.href = "https://www.website.com/logout_post";
	window.location.href = "https://www.website.com/secure/login_post?j_username=johnsmith&j_password=abc123";
}

crookedleaf
Posts: 38
Joined: Tue Dec 11, 2012 6:44 pm

Re: Navigate URL and Extract set on a count

Post by crookedleaf » Thu Dec 13, 2012 1:30 am

i have temporarily solved the issue by just having the Execute JS process the Logout URL every 6 URLs, and then setup an If/While tree to run the Login URL if the page contains certain text that is unique to the Logout page.

But if it's possible to just log out and log back in in the Execute JS command, I would prefer that.

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Navigate URL and Extract set on a count

Post by webmaster » Mon Dec 17, 2012 3:15 am

Hi,

You were almost there. But you cannot do multiple navigations in a single Execute JavaScript action. Instead of the code you are using use this:

Code: Select all

var relogEvery = 6;

if(Node.Counter) return false;

if(Tree.UserData == undefined) Tree.UserData = 0;
Tree.UserData++;

if((Tree.UserData % relogEvery) == 0) return true;
else return false;
Place this code in an Execute JavaScript action as the last child of your Navigate URLs action and then, select this Execute JavaScript action and add two Go To URL actions as children of the Execute JavaScript action, one that goes to the logout and one that goes to the login pages. It should look like this:
Untitled.png
Untitled.png (8.04 KiB) Viewed 10040 times
What's happening here is that the code above returns true if you're at a multiple of 6 (or whatever the value of relogEvery is) and false otherwise. Returning true from an Execute JavaScript action instructs Helium Scraper to run ts child nodes.
Juan Soldi
The Helium Scraper Team

crookedleaf
Posts: 38
Joined: Tue Dec 11, 2012 6:44 pm

Re: Navigate URL and Extract set on a count

Post by crookedleaf » Mon Dec 17, 2012 10:25 pm

worked perfect! thank you.

Post Reply