Easy deep Yellow Pages extraction

Here we will be posting premade Helium Scraper projects and helpful stuff.
webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Easy deep Yellow Pages extraction

Post by webmaster » Fri Apr 08, 2011 9:17 pm

This project is a useful sample of Helium Scraper's new features. It takes a given list of search terms and cities and navigates inside each of the business links for you to extract from the landing pages whatever you want.

All you have to do is go to is fill up the "Input" table with a list of search terms and cities you want to use, then create an actions tree (there is a sample one already created called "Sample Tree") and add a "Execute Actions Tree" action that executes the "Search in YP" tree. You will be asked to enter the amount of pages to go through. Remember that Yellow Pages shows 30 results per page. Then, add a "Extract" action as a child node of the "Execute Actions Tree" action, and extract all you want.

Remember that the "Extract" action will be executed inside each of the business pages and not on the results pages. This is for you to have access to more detailed information about the business.

Give it a try and let me know how it goes.
Attachments
YellowPages.hsp
(550.54 KiB) Downloaded 1189 times
Juan Soldi
The Helium Scraper Team

rmbraaten
Posts: 10
Joined: Wed Jun 01, 2011 12:00 am

Re: Easy deep Yellow Pages extraction

Post by rmbraaten » Wed Jun 01, 2011 12:11 am

Love this custom script. For some reason though, it stops on the fourth page of listings, so I only get 120 listing (with their accompanying details). The repeat value on the "Search in YP" action is set to 2. I can set it to 3 or 4, but it doesn't make a difference--so I'm not sure how that number affects anything. I don't get an error after the 4th page is loaded saved, just an "action completed" (or something to that effect) when I'd actually like it to load a couple more pages. I can't figure out what's causing it to stop. Any suggestions?

Thanks!

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Easy deep Yellow Pages extraction

Post by webmaster » Wed Jun 01, 2011 2:43 am

Hi,

You should enter the amount of pages in the "Execute Tree: Search in YP" action by double clicking it. Do not change it in the "Repeat" action, otherwise you'll just get repeated results.
Juan Soldi
The Helium Scraper Team

rmbraaten
Posts: 10
Joined: Wed Jun 01, 2011 12:00 am

Re: Easy deep Yellow Pages extraction

Post by rmbraaten » Wed Jun 01, 2011 4:49 pm

Yes! That's it. Thanks for a quick response.

rmbraaten
Posts: 10
Joined: Wed Jun 01, 2011 12:00 am

Re: Easy deep Yellow Pages extraction

Post by rmbraaten » Fri Jun 03, 2011 3:04 pm

Would it be possible to add the number of pages variable to the input table? For example, if my input table includes Schools and Restaurants, the schools may only require 1 page of search results while the restaurants may need 5 or 6. If I could define the pages per input record (i.e., "Input.Business", "Input.City", "Input.Pages") I could put all my yp search terms in the input table and let it cycle through all of them without returning the "Can't find NextButton" error. Unless there's another, better way to solve this. Thanks.

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Easy deep Yellow Pages extraction

Post by webmaster » Sat Jun 04, 2011 7:47 am

Hi,

I'm attaching a project that is pretty much the same as the original, but instead of entering an amount of pages, you enter a maximum amount of pages. This way, if the available amount of pages is less than the given maximum, you won't get any error message and Helium Scraper will only extract as many results as possible.

I think this should solve your problem.
Attachments
YellowPages.hsp
(550.54 KiB) Downloaded 1068 times
Juan Soldi
The Helium Scraper Team

rmbraaten
Posts: 10
Joined: Wed Jun 01, 2011 12:00 am

Re: Easy deep Yellow Pages extraction

Post by rmbraaten » Tue Jun 07, 2011 4:50 pm

Excellent! Thank you. That's a more elegant solution.

liquidcherry
Posts: 1
Joined: Tue Apr 26, 2011 7:35 pm

Re: Easy deep Yellow Pages extraction

Post by liquidcherry » Fri Dec 23, 2011 11:29 pm

Hello Juan,

I played with HS a while ago and was able to extract the info i needed with your original yp file but i just installed HS again and tried your modified file and it is not extracting the phone# and address,(only the name) i spend some time now to find out why but it exceeds my powers(i also tried to adapt/change things so that i can scrape the vendor listings from getmarried.com to no avail), do you have any idea what i am doing wrong or is it because YP changed their site again since the last time i tried?

Any help would be highly appreciated

Frank

Happy Holidays!!

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Easy deep Yellow Pages extraction

Post by webmaster » Mon Dec 26, 2011 7:21 pm

Hi,

Here is an updated version. All I needed to do was select the items in some sample page that weren't being extracted and add them to their corresponding kinds (phone and address) with the Add selection to this kind button.
Attachments
YellowPages.hsp
(550.54 KiB) Downloaded 1124 times
Juan Soldi
The Helium Scraper Team

Guest

Re: Easy deep Yellow Pages extraction

Post by Guest » Wed Mar 12, 2014 5:30 pm

h ijuan,
thx will play with it...and sorry for the late response, have been busy with some stuff:)

kudos to helium, it is an awesome product!

cheers

liquid

Post Reply