Looping through web tables

Questions and answers about anything related to Helium Scraper
massradius
Posts: 20
Joined: Thu May 12, 2011 4:05 pm

Looping through web tables

Post by massradius » Thu May 12, 2011 4:11 pm

I havent installed Helium Scraper yet as I don't have the admin privelages to install the .NET framework. Can you confirm Helium can read data in from a file (csv or text), use that to conudct searches, then export the results back out?

I was using Automation anywhere to scrape data from this site

http://www-947.ibm.com/support/entry/portal/parts

I had the Type (sample type is 2904) and Serial (sample serial is R83D12G) in my file. I would put it in, click submit, loop through the web table, right that data out to a csv and add in the original serial to each row. Just need to know if this is possible so I can work with IT to get this installed

thanks

webmaster
Site Admin
Posts: 494
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Looping through web tables

Post by webmaster » Thu May 12, 2011 5:38 pm

Yes, Helium Scraper can do that.

The way to "read" data from a csv or text file is by simply copying and pasting your data into Helium Scraper's data table editor and saving it. You can also import a whole MDB database.

If you need to automatically fill up the fields you will need a little javascript. I'll be glad to help you out with that if necessary.
Juan Soldi
The Helium Scraper Team

massradius
Posts: 20
Joined: Thu May 12, 2011 4:05 pm

Re: Looping through web tables

Post by massradius » Thu May 12, 2011 6:41 pm

Excellent. I downloaded the software and identified the 3 Kinds I'm looking for. I've uploaded my test file. At this point it will just extract data when I fill in the Type and Serial manually. Can you help me with the javascript to use the data from the source table. grab the data and append it to the partsexport table?
Attachments
LenovoScraper.zip
(30.8 KiB) Downloaded 323 times
Last edited by massradius on Thu May 12, 2011 8:38 pm, edited 1 time in total.

massradius
Posts: 20
Joined: Thu May 12, 2011 4:05 pm

Re: Looping through web tables

Post by massradius » Thu May 12, 2011 7:25 pm

webmaster wrote:
If you need to automatically fill up the fields you will need a little javascript. I'll be glad to help you out with that if necessary.
Can you recommend a good intro javascript book?

webmaster
Site Admin
Posts: 494
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Looping through web tables

Post by webmaster » Fri May 13, 2011 8:45 am

Hi,

I'm attaching a project that should fit your needs. If you fill up the "SourceTbl" table and press play, it will extract your data to the "PartsExport" table.

I figure the easiest way to explain you how the javascript code works in this project is by adding comments everywhere in the code (comments are the green lines of text that start with "//"). You can see all the javascript code by double clicking each "Execute JS" action. There is actually not much code as you can probably tell.

There are plenty of javascript tutorials online. The only problem with them is that they are usually focused on javascript related to HTML, which we don't really need in Helium Scraper. Here is one I found that is very short and doesn't involve HTML other than in the very beginning. It should get you started. As for javascript applied to Helium Scraper, you can find everything you need to know in the documentation at Actions -> Actions List -> Execute JavaScript.
Attachments
LenovoScraper2.hsp
(736.42 KiB) Downloaded 339 times
Juan Soldi
The Helium Scraper Team

massradius
Posts: 20
Joined: Thu May 12, 2011 4:05 pm

Re: Looping through web tables

Post by massradius » Fri May 13, 2011 1:09 pm

Nearly perfect. On occasion if I put in a number longer than 7 characters for instance

Type: 4051
Serial: ABU0196460

The page leaves the data from the last search up and shows "Serial must be 7 characters.". Would I just create this error as a kind? If I can add in this error handling as part of the javascript?

Also if the serial is 7 characters but not in the dB it returns "Error message: MachineTypeSerialNotFound No information was found matching the search criteria. Please try again. " If you can help me with the first one I can probably add in this check. I'm thinking I can do something as compare the entry serial with the serial (defined as a kind) and if they don't match skip the export.

Lastly, what will happen if there are no records for a part? The second table will have no data in it. Will this program just export nothing since there are no matching kinds? Sorry for all the questions. Attached is the dB with a few parts to illustrate the examples

Thanks again for the help (I found a trim function).
Attachments
LenovoScraper2.zip
(47.77 KiB) Downloaded 318 times

webmaster
Site Admin
Posts: 494
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Looping through web tables

Post by webmaster » Sat May 14, 2011 5:49 pm

Hi,

The easiest thing to do there is to just let Helium Scraper extract duplicated data and then filter duplicates out with SQL. If your table is called, say, MyTable, this query would return unique results:

Code: Select all

SELECT DISTINCT * FROM [MyTable]
And this other query will create a table called NewTable containing only unique rows taken from the MyTable table:

Code: Select all

SELECT DISTINCT * INTO [NewTable] FROM [MyTable]
If you really don't want to extract duplicated results, you could create a kind that selects the error message, add a "Select Kind" action and underneath this action, add a "Execute JS" with this code:

Code: Select all

if(Node.Counter > 0) return false;
if(Global.Browser.Selection.Count > 0) return false;
else return true;
This will execute the child nodes of the "Execute JS" action if and only if no error is found. So you would place your "Extract" action inside this "Execute JS" action.

Let me know if you need any further help. BTW, I've posted a couple of tutorials to our blog that you might find useful.
Juan Soldi
The Helium Scraper Team

massradius
Posts: 20
Joined: Thu May 12, 2011 4:05 pm

Re: Looping through web tables

Post by massradius » Sat May 14, 2011 6:47 pm

Thanks Juan. My big issue isn't the dupes. The problem is I don't want to assosciate the wrong parts witht the wrong serial (since it would pick up the parts on the previous serial and now I would think those compnonents were there). I just need a way to search for error msgs and then skip the extract step in those cases.

webmaster
Site Admin
Posts: 494
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Looping through web tables

Post by webmaster » Sat May 14, 2011 7:39 pm

Hi,

In that case, the second solution will do. I've attached a project that implements it. The "test" actions tree will execute its child nodes if no error is found. I've added 3 kinds that will try to be selected. If any of them is found, the child nodes won't be executed. Look at the code in the "Execute JS (Check For Errors)". If you would like to check for another error, just create a kind, say, called "SomeErrorKind" and add this code right before the "return true;" line at the bottom:

Code: Select all

Global.Browser.SelectKind("SomeErrorKind");
if(Global.Browser.Selection.Count > 0) return false;
Inside the "Execute JS (No error)" action, there is a line of code that if you uncomment (remove the "//" at the beginning), you will get a message box every time no error is found. You can use this for testing purposes.

Let me know how it goes.
Attachments
LenovoScraper3.hsp
(1.18 MiB) Downloaded 352 times
Juan Soldi
The Helium Scraper Team

massradius
Posts: 20
Joined: Thu May 12, 2011 4:05 pm

Re: Looping through web tables

Post by massradius » Mon May 16, 2011 1:06 pm

That's working great. Is there a way to check the HTML itself vs creating a kind? This is when searching for errors. I could look for "must be" vs creating two different kinds. This should be my last question. Thanks a lot for the help

Post Reply