Page 1 of 1

Frustrated User

Posted: Thu Dec 08, 2011 7:48 pm
by tipud
Hi
I am an occasional user of helium script i am still learning how to use the script.

I know HS can do what i want but i am still trying to understand the basics.

Please advise on the following :
1, If i want to scrape a site, the code / link for example is Wholesale Suppliers <a href="http://www.wholesalers4u.co.uk">Wholesale Suppliers</a> I want to extract the url (www.wholesalers4u.co.uk) sometimes this could be in among a paragraph of words which i do not want.

2, If the address is in a few lines - 1 title, 2 street, 3 City, 4 state, 5 Zip code, i want to extract this seperatly, when i select in HS it selects all the section only.

3, many of the sites i want to scrape have category => list => detail how do i do this ? i can get HS to go to list then detail and scrape but not when there is another layer.

Any help would be greatly appreciated.

Re: Frustrated User

Posted: Fri Dec 09, 2011 6:58 pm
by webmaster
Hi,
tipud wrote:1, If i want to scrape a site, the code / link for example is Wholesale Suppliers <a href="http://www.wholesalers4u.co.uk">Wholesale Suppliers</a> I want to extract the url (http://www.wholesalers4u.co.uk) sometimes this could be in among a paragraph of words which i do not want.
You should still be able to create a kind that selects only the link and then extract its Link property, which contains the target URL.
tipud wrote:2, If the address is in a few lines - 1 title, 2 street, 3 City, 4 state, 5 Zip code, i want to extract this seperatly, when i select in HS it selects all the section only.
This can be done with Text gatherers at Project -> Text Gatherers. Here is some more info on how to use them.
tipud wrote:3, many of the sites i want to scrape have category => list => detail how do i do this ? i can get HS to go to list then detail and scrape but not when there is another layer.
You can go deep down as many levels as you need by putting one Navigate Each action as a child of another. I'm not sure what you mean by having another layer. Perhaps a URL will help me figure it out.