Looping through web tables

Questions and answers about anything related to Helium Scraper
webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Looping through web tables

Post by webmaster » Mon May 16, 2011 5:10 pm

Yes, there are two ways to do that, one is by using a kind together with a JavaScript Gatherer that checks for some custom property you are looking for (I really recommend this post). This way you could create a kind that selects every possible error as long as you have found a way to identify them (perhaps the red color text?).

Also, you can modify the "Execute JS (Check For Errors)" action to find a particular HTML element. Every "Execute JS" action have full access to the HTML of the page because its code is injected as a function into your local copy of the page.

Actually, here is the XML for a kind that I think is just what you need without any JavaScript. It selects any element in the page if the text is bold and red:

Code: Select all

<?xml version="1.0" encoding="utf-16"?>
<EditableKind xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <Name>Every Error</Name>
  <Items>
    <Item>
      <Property>TextColor</Property>
      <Value xsi:type="xsd:unsignedInt">16711680</Value>
    </Item>
    <Item>
      <Property>TagName</Property>
      <Value xsi:type="xsd:string">STRONG</Value>
    </Item>
  </Items>
</EditableKind>
Just create any random kind, press the "Edit kind" button and replace the displayed code for this code and press save at the right bottom. The kind will be renamed "Every Error". The "TextColor" property will most likely going to appear crossed out because this property is not active by default. If it does, activate it at Project -> Options -> Select Active Properties.
Juan Soldi
The Helium Scraper Team

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Looping through web tables

Post by webmaster » Mon May 16, 2011 7:27 pm

Actually, there is an even easier way:

Activate the "TextColor" property from Project -> Options -> Select Active Properties and then create a kind that selects one error, and then add another error to the same kind. This should then select any other error.
Juan Soldi
The Helium Scraper Team

massradius
Posts: 20
Joined: Thu May 12, 2011 4:05 pm

Re: Looping through web tables

Post by massradius » Tue May 17, 2011 9:20 pm

Working great.

One other question and feel free to move this to it's own thread. There's a repeat X times option in Helium Scrapper. This works if you just want to grab X pages but in a case where the browser tells you how many pages it's brought back I'd rather use that info

I'm attaching a screenshot of an internal website I'm trying to grab data from. I have it putting in the PO, hitting enter, and extracting the results. The only issue is that by default only 10 records are displayed (with a max of 50) per page. At the bottom it will say page 1 of X.

I added a new section called "JOSH_TEST" that grabs the number of pages and iterates through based on a counter. I have three concerns/issues:

1) I'm using the same global variables (TreeData) as I wasn't sure what else I can use for variables to be available across jscript sections. Still a little foggy on the syntax here. As part of that function Current and Data are defined. Later we assign Tree.UserData = new TreeData(). Could I have used Tree.UserData2 here?

2) I'm worried it's going to error out when it's only 1-9 pages as I'm taking two characters in the substr.

3) Last issue in the section you helped me with (Search Each Row) you click the search button and the program waits until the page fully loads. In the bottom section (JOSH TEST) where I'm hitting the "NextPage" button the program is jumping ahead before the page has loaded.

Latest code attached
Attachments
Qtrade_MultiPage.zip
(112.23 KiB) Downloaded 742 times
FireShot capture #4 - 'QTrade_ Search PO History' - trade_converge_com_Trade_secure_purchasing_pohistory_SearchPoHistory_faces.png
FireShot capture #4 - 'QTrade_ Search PO History' - trade_converge_com_Trade_secure_purchasing_pohistory_SearchPoHistory_faces.png (41.83 KiB) Viewed 17398 times

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Looping through web tables

Post by webmaster » Wed May 18, 2011 5:50 pm

Hi,

First of all, there is this project I posted at this post. Now, I'm not sure if this applies to your case. What it does is to navigate through the "next" button in a set of result pages until the "next" button is not found anymore. So it's basically an easy way to navigate through all pages in a set of result pages. Problem is that I see on your screenshot that the last page still has a "next" button. What you can do is create a kind that selects the "next" button in the first and some other pages but NOT in the last page, and then see if the kind selects the "next" button in the last page. If it doesn't, then you can use that project.

Here is a sample code that should help you with the string manipulation thing you are trying to do. Just paste it into a "Execute JavaScript" so it looks nicely colored and so you can see how it works.

Code: Select all

String.prototype.trim = function(){
		return this.replace(/^\s+|\s+$/g, "");
	}

// Find text between two strings. This function is case sensitive ("A" is not the same as "a")
String.prototype.textBetween = function(strLeft, strRight) 
	{
		indexOfStrLeft = this.indexOf(strLeft);
		indexOfStrRight = this.lastIndexOf(strRight);
		if(indexOfStrLeft != -1 && indexOfStrRight != -1)
			return this.substring(indexOfStrLeft + strLeft.length, indexOfStrRight);
		else return null;
	}

// Loaded it with whitespaces on the sides just to make sure the trim() function is working OK
var text = "    Page 2 Of 16       ";

text = text.trim();

// Note the space after "Page" and before "Of". This way I get the number with no spaces.
// Also note I'm using " Of" instead of " of". " of" wouldn't work.
var currentPage = text.textBetween("Page ", " Of");
var lastPage = text.textBetween(" Of", "");

alert("currentPage: " + currentPage);
alert("lastPage: " + lastPage);

// currentPage is actually bigger than lastPage AS STRINGS (alphabetically)
alert("Is currentPage less than lastPage? " + (currentPage < lastPage));		// No!!!

// Turn this strings into numbers
currentPage = parseInt(currentPage);
lastPage = parseInt(lastPage);

alert("Is currentPage less than lastPage now? " + (currentPage < lastPage));	// Yes
Now straight to your questions:

1) No. There is no UserData2, but the following code (that is somewhere in your project):

Code: Select all

function TreeData()
{
	this.Current = 0;
	this.Data = null;
}
defines the object that will be assigned to Tree.UserData, so to add extra information you would to this:

Code: Select all

function TreeData()
{
	this.Current = 0;
	this.Data = null;
	this.ExtraData = null;
	this.MoreExtraData = null;
}
Then, after assigning a new TreeData to UserData, you can access these propertise like this Tree.UserData.ExtraData = "whatever";.

2) Not if you use the code above.

3) That should be fixed if you add a "Wait" right after (or INSIDE as a first child node if it is a "Navigate each" action) whatever action navigates. Start with a 2 seconds one just to make sure this fixes it. Then you can lower it.

Let me know how it goes.
Juan Soldi
The Helium Scraper Team

massradius
Posts: 20
Joined: Thu May 12, 2011 4:05 pm

Re: Looping through web tables

Post by massradius » Wed May 18, 2011 6:01 pm

Works great. Thanks for the explanation. I'm using a "next_counter" and comparing it to the total page count. Once my "next_counter" exceeds the total page count the loop returns false and I move on to the next one PO in the sourceTbl list

So the Tree Data is basically a way to store global variables (or project level at least). I thought you could only store a count and then data but it looks like you can name a bunch of variables and then assign whatever you want (a single data point or a whole table's worth)

thanks again. I will be buying this soon just wait for a PO from my IT dept

massradius
Posts: 20
Joined: Thu May 12, 2011 4:05 pm

Re: Looping through web tables

Post by massradius » Wed May 18, 2011 6:46 pm

OK. Just realized this page only loads the first 200 records (realized this after a lot of my POs stopped at 200). I tried to create a kind on the "More" button. HTML looks like this when it's disabled

</td><td><input id="pohistFrm:searchMoreKits" name="pohistFrm:searchMoreKits" type="submit" value="More" onclick="if(typeof window.getScrolling!='undefined'){oamSetHiddenInput('pohistFrm','autoScroll',getScrolling());}" class="button" disabled="disabled" /></td><td><input id="pohistFrm:clear" name="pohistFrm:clear" type="submit" value="Clear" onclick="if(typeof window.getScrolling!='undefined'){oamSetHiddenInput('pohistFrm','autoScroll',getScrolling());}" class="button" /></td><td></td></tr>

When it's enabled (i.e. there are more than 200 records and to see the next 200 you click "More"

</td><td><input id="pohistFrm:searchMoreKits" name="pohistFrm:searchMoreKits" type="submit" value="More" onclick="if(typeof window.getScrolling!='undefined'){oamSetHiddenInput('pohistFrm','autoScroll',getScrolling());}" class="button" /></td><td><input id="pohistFrm:clear" name="pohistFrm:clear" type="submit" value="Clear" onclick="if(typeof window.getScrolling!='undefined'){oamSetHiddenInput('pohistFrm','autoScroll',getScrolling());}" class="button" /></td><td></td></tr>
</tbody></table>


It looks like the class="button" disabled="disabled" piece is missing when it shows. I tired to create a kind but it's not getting to this level of detail. I manually went in and added

<Item>
<Property>DisabledAttribute</Property>
<Value xsi:type="xsd:string">disabled</Value>
</Item>

but I'm guessing the syntax is wrong as Helium Scraper shows it with a line through it. Do I need to precede it with a parent index? How do I go about counting or figuring out what it is?

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Looping through web tables

Post by webmaster » Wed May 18, 2011 7:27 pm

Hi,

First thing, the TreeData.UserData property can be used to store data that is accessible only from the current actions tree. To store data that will be accessible from the whole project you would use Global.UserData. There is more info about this in the documentation at Actions -> Actions List -> Execute JavaScript, specially under the Class List section.

To do what you are trying to, you need to create a JavaScript gatherer that gathers the "disabled" attribute. This would be the code:

Code: Select all

return element.getAttribute("disabled");
Then whatever kinds you create will take into consideration whether the button is disabled or not. So if you create a kind with buttons that are all disabled, this kind will only select disabled buttons. The reason why your item appears crossed out is because Helium Scraper cannot find a gatherer (either a build-in gatherer or a custom JavaScript gatherer) named "DisabledAttribute". Remember that when you create a JavaScript gatherer, Helium Scraper adds the "JS_" prefix to its name.
Juan Soldi
The Helium Scraper Team

massradius
Posts: 20
Joined: Thu May 12, 2011 4:05 pm

Re: Looping through web tables

Post by massradius » Wed May 18, 2011 7:33 pm

webmaster wrote: Remember that when you create a JavaScript gatherer, Helium Scraper adds the "JS_" prefix to its name.
For Juan Soldi :)

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Looping through web tables

Post by webmaster » Thu May 19, 2011 3:21 am

Hey good catch! Is actually for JavaScript but perhaps there is some kind of unconscious wish involved :roll:
Juan Soldi
The Helium Scraper Team

Post Reply