Data from different pages in the same table/row?

Questions and answers about anything related to Helium Scraper
Post Reply
btm
Posts: 2
Joined: Fri Apr 22, 2011 12:59 pm

Data from different pages in the same table/row?

Post by btm » Fri Apr 22, 2011 4:05 pm

Basically, I want to create a single table, but each row within that table needs to contain information from three different html pages. Before I get into too much detail about it, is there a straightforward way to do this with Helium Scraper?

If it helps for finding the right row, there is one unique piece of data (call it 'idNumber') that could be used as a primary key for each row. I can get the relevant idnumber directly from each of the three html pages for cross-referencing, although I will have to define three different kinds for it because the number is in a different place in the html structure in each of the three pages.

webmaster
Site Admin
Posts: 494
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Data from different pages in the same table/row?

Post by webmaster » Fri Apr 22, 2011 5:08 pm

Hi,

Each row can only be extracted from the same page at the same time when using a "Extract" action. What you can do is create two "Extract" actions that extract each to a different table and extract the "idNumber" to both tables so then you can use a SQL JOIN to join them together into a single table. If the "idNumber" is not a number you would need to use a SQL WHERE, such as "SELECT * FROM [Table1], [Table2] WHERE [Table1.idNumber] = [Table2.idNumber]".

If you have never used SQL, I recommend taking a look at this tutorial. Is actually a very straightforward language as you can probably tell by looking at my example above.

Another option would be to extract the rows from JavaScript code to a single table, but this might be more complicated.

Let me know if you need further help.
Juan Soldi
The Helium Scraper Team

btm
Posts: 2
Joined: Fri Apr 22, 2011 12:59 pm

Re: Data from different pages in the same table/row?

Post by btm » Fri Apr 22, 2011 6:05 pm

Thanks! The SQL JOIN will work.

Post Reply