Frame Reset Issues?

Here you can ask anything related to JavaScript as applied to Helium Scraper.
Post Reply
aigoodaigoo
Posts: 10
Joined: Wed May 18, 2011 6:18 pm

Frame Reset Issues?

Post by aigoodaigoo » Thu May 03, 2012 9:29 pm

Please see attached file for screenshot. I would love to give you access to the problem site, but it's sign-on protected.

I'm trying to scrape a third party tool's web-based CRM interface. The tool grabs the result of Twitter search (i.e.: hashtags). I'm trying to scrape the resulting screen of tweets, user info. The display only shows 30 records of each with "Next" navigation button to move to the next page. By clicking on each user handle, a new window opens with detailed user info - Following, Follower, etc. The "Kind" for both #1 handle result and #2 additional user info upon using "Navigate Each" for each of the 30 handle kinds (from #1) works great for the first page. However, when I am using "Navigate: Next Page" to go to the next page to re-scrape the #1 and #2, for some strange reason, after grabbing the first instance of #2, the page resets to the first page.

I am guessing there is some javascript in the background that resets to the first search page. Is there a workaround so that the "Navigate Each" behaves closer to "Simulated Clicks" - I have selected "Simulated Clicks" under Navigate Each, but there may be something else that is resetting this thing. Thank you for your help.

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Frame Reset Issues?

Post by webmaster » Fri May 04, 2012 3:48 am

I didn't get the attachment.

Anyway, try either (1) extracting all the links (by using the Link property of the link) instead of using a Navigate Each, and then using a Navigate URLs action, or (2) navigating through all pages and extracting the URL of each page (by using the URL property of any kind that selects one element in each page, such as the default BODY kind) and then using a Navigate Each inside a Navigate URLs action that navigates through each of the page's URLs. Before using this option make sure each page has a different URL that if you copy and paste into your browser it takes you back to whatever page you copied if from.

I'd try the option 1 first since once you have the URL of every link all you need is a Navigate URLs action.
Juan Soldi
The Helium Scraper Team

aigoodaigoo
Posts: 10
Joined: Wed May 18, 2011 6:18 pm

Re: Frame Reset Issues?

Post by aigoodaigoo » Fri May 04, 2012 11:03 am

It looks like I can't attach any files - .doc/.docx, .pdf - what file type can I attach? Anyways, the "handle" kind does not contain any URL but is more of submit based on jQuery attribute. Even if I could somehow figure out the link property, it would mean moving out of the CRM interface, which means I can't do it...

Also, I tried #2 approach, but the URL of the next page, is the same as the original one. The "Find" or search functionality is embedded either within iframe(?) or some other codes behind HTML that I cannot see.

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Frame Reset Issues?

Post by webmaster » Sat May 05, 2012 4:35 am

Hi,

Can you send the code for one or two of the links and for the Next button? You can just select the link and click on View code in the selection panel and then post those code here.
Juan Soldi
The Helium Scraper Team

aigoodaigoo
Posts: 10
Joined: Wed May 18, 2011 6:18 pm

Re: Frame Reset Issues?

Post by aigoodaigoo » Sun May 06, 2012 10:59 pm

I selected the "handle" link and selected view code:

<STRONG style="CURSOR: pointer" class="twitter username" jQuery1336344828802="69">Profitina</STRONG>

The "Next" button has following view code:

<A class=next_page href="/incoming_messages?messagable_ids%5B%5D=2955&message_type=keyword&page=2" rel=next jQuery1336345055396="302">Next »</A>

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Frame Reset Issues?

Post by webmaster » Tue May 08, 2012 3:38 am

Hi,

Seems to me like you can save the URL to each page since the URL contains the page number. Try copying the second (not the first) page's URL into notepad, then replace "page=2" for "page=3" and then copying and pasting that URL to your browser and see if that takes you to the page 3. If so, then you could use the URL Variations premade to generate URLs for every page you need and then use a Navigate URLs to navigate through all of them without having to worry about going back to the original page in order to go to the next page.

I still don't know if your "handle" link can be used this way because the code you sent is actually not the link itself but probably a children of it. Try selecting one of these links and clicking on the Select parent button until you see that the Outer HTML (in the selection panel) starts with "<A...".
Juan Soldi
The Helium Scraper Team

Post Reply