Subject is a bit complicated, so i'll try to explain this the best I can.
I have a scrape that is set to navigate to a list of 500 URL's, give or take, and extract data off each provided URL. What I need to try and do is have it go to only 6 of the URLs, scrape the data off each, then pretty much log out, log back in, then continue the scrape. This is to get past captcha that comes up every 6 searches. I am able to send logout and login commands via a URL as well, so I don't know if that helps. So here is an example of the URLs it is searching and scraping:
https://www.website.com/secure/member/r ... F01%2F2013
https://www.website.com/secure/member/r ... F10%2F2013
https://www.website.com/secure/member/r ... F19%2F2013
https://www.website.com/secure/member/r ... F28%2F2013
https://www.website.com/logout_post
https://www.website.com/secure/login_po ... ord=abc123
So I would need it to go through 6 of the URLs provided, one by one, scraping each, then process the next 2 URLs (or click 'logout', then click 'login', then fill with the username and password) WITHOUT scraping, repeat.
Is this at all possible?
Navigate URL and Extract set on a count
-
- Posts: 38
- Joined: Tue Dec 11, 2012 6:44 pm
Re: Navigate URL and Extract set on a count
i've created a "Execute JS" based off something I read in another post, but it is only processing the Login link, and not the Logout, so I must be doing something wrong :/
Here is what I have:
Here is what I have:
Code: Select all
var pauseEvery = 6;
var waitSeconds = 5;
if(Tree.UserData == undefined) Tree.UserData = 0;
Tree.UserData++;
if((Tree.UserData % pauseEvery) == 0)
{
Global.Wait(waitSeconds * 1000);
window.location.href = "https://www.website.com/logout_post";
window.location.href = "https://www.website.com/secure/login_post?j_username=johnsmith&j_password=abc123";
}
-
- Posts: 38
- Joined: Tue Dec 11, 2012 6:44 pm
Re: Navigate URL and Extract set on a count
i have temporarily solved the issue by just having the Execute JS process the Logout URL every 6 URLs, and then setup an If/While tree to run the Login URL if the page contains certain text that is unique to the Logout page.
But if it's possible to just log out and log back in in the Execute JS command, I would prefer that.
But if it's possible to just log out and log back in in the Execute JS command, I would prefer that.
Re: Navigate URL and Extract set on a count
Hi,
You were almost there. But you cannot do multiple navigations in a single Execute JavaScript action. Instead of the code you are using use this:
Place this code in an Execute JavaScript action as the last child of your Navigate URLs action and then, select this Execute JavaScript action and add two Go To URL actions as children of the Execute JavaScript action, one that goes to the logout and one that goes to the login pages. It should look like this:
What's happening here is that the code above returns true if you're at a multiple of 6 (or whatever the value of relogEvery is) and false otherwise. Returning true from an Execute JavaScript action instructs Helium Scraper to run ts child nodes.
You were almost there. But you cannot do multiple navigations in a single Execute JavaScript action. Instead of the code you are using use this:
Code: Select all
var relogEvery = 6;
if(Node.Counter) return false;
if(Tree.UserData == undefined) Tree.UserData = 0;
Tree.UserData++;
if((Tree.UserData % relogEvery) == 0) return true;
else return false;
What's happening here is that the code above returns true if you're at a multiple of 6 (or whatever the value of relogEvery is) and false otherwise. Returning true from an Execute JavaScript action instructs Helium Scraper to run ts child nodes.
Juan Soldi
The Helium Scraper Team
The Helium Scraper Team
-
- Posts: 38
- Joined: Tue Dec 11, 2012 6:44 pm
Re: Navigate URL and Extract set on a count
worked perfect! thank you.