Need help

Questions and Answers about programming Helium Scraper.
latifbawany
Posts: 5
Joined: Thu Jul 19, 2012 7:02 am

Need help

Post by latifbawany » Sat Jul 21, 2012 9:13 pm

Hi,

I need to know as to how I can have a separate column for First,Middle & Last name and also separate column for Address,City,State, Zip/postal code and country?
I am not good in java or programming and I need help.

webmaster
Site Admin
Posts: 491
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Need help

Post by webmaster » Mon Jul 23, 2012 7:40 pm

Have you tried using text gatherers?
Juan Soldi
The Helium Scraper Team

latifbawany
Posts: 5
Joined: Thu Jul 19, 2012 7:02 am

Re: Need help

Post by latifbawany » Mon Jul 23, 2012 10:10 pm

Thank you for your early reply.

I'll rephrase the question

I get full name in one columns which i require in 3 columns, first middle and last.
I get complete in one column which I want in address,city, state and zip in different columns.

I tried using delimiter in xlx it did not work up to the mark.

I have attached a file as a sample. As I said I am not a programmer and java and all that is french to me (no offense meant)

Please show me the easiest possible solution. the link you gave me is toooooooooooooooooooooo fast. lol

Regards.

webmaster
Site Admin
Posts: 491
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Need help

Post by webmaster » Tue Jul 24, 2012 1:12 am

Hi,

I didn't get your attachment. Try using this service. Also, can you provide a sample URL?
Juan Soldi
The Helium Scraper Team

latifbawany
Posts: 5
Joined: Thu Jul 19, 2012 7:02 am

Re: Need help

Post by latifbawany » Tue Jul 24, 2012 3:56 am

sampledatalist-350.jpg
check this jpg out
sampledatalist-350.jpg (22.12 KiB) Viewed 9860 times
Trying to extract data from Yellowpages UK

webmaster
Site Admin
Posts: 491
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Need help

Post by webmaster » Thu Jul 26, 2012 8:17 pm

Hi,

There is also an explanation on how to use text gatherers above the video on the post I linked. These text gatherers will do the job and they're really not hard to use at all. They basically work the same as Excel delimiters, except the work is done before the data is extracted.

If you need an specific example send me a URL and which data you need to extract to separated columns and I'll give you instructions on how to extract them.
Juan Soldi
The Helium Scraper Team

latifbawany
Posts: 5
Joined: Thu Jul 19, 2012 7:02 am

Re: Need help

Post by latifbawany » Fri Jul 27, 2012 1:30 am

Untitled2.png
Untitled2.png (208.5 KiB) Viewed 9840 times
Thank you for your quick response. Attach are the jpgs with remark in each. Hope there will be no confusion as I tried my best to inform when the issue is. I have even tried to do the breakup in delimiter by have not been successful even though I have used xlxs for quite a while.
Attachments
Untitled3.png
Untitled3.png (207.97 KiB) Viewed 9840 times
Untitled1.png
Untitled1.png (142.17 KiB) Viewed 9840 times

latifbawany
Posts: 5
Joined: Thu Jul 19, 2012 7:02 am

Re: Need help

Post by latifbawany » Fri Jul 27, 2012 1:32 am

could not attached more then 3 jpgs so I am sending you 2 more so you can have the near complete picture of the issue I am facing.
Attachments
Untitled5.png
Untitled5.png (94.75 KiB) Viewed 9840 times
Untitled4.png
Untitled4.png (98.51 KiB) Viewed 9840 times

webmaster
Site Admin
Posts: 491
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Need help

Post by webmaster » Sun Jul 29, 2012 1:56 pm

Hi,

You need to use some kind of rule for those addresses. The text gatherers will let you break the text into words, but if you do this, the gatherer will extract only one word. If you need to extract a section, try using the project attached. Import it into your current project, then go to Project -> JavaScript Gatherers and see the JS_Section gatherer. Note the first 3 lines of code. The separator is the character that you'll use to break the text into sections, the start will be the index of the first section and the end will be the index of the last section, where the first index is zero. If you use negative numbers on any of these last two variables, it will start counting from the last section and go backwards. Finally, if you want to go from any section to the last section, set the end variable to undefined (code would look like this: var end = undefined;).

Here is an example, say you want to extract the last two words from this text: "4900 Bathurst Toronto ON M2R 1X6" (which would give you "M2R 1X6"). To do this, you'd set start to -2 and end to undefined, like this:

Code: Select all

var separator = " ";
var start = -2;
var end = undefined;
If you want to extract the word "ON" from the text above you could do it by setting start to -3 and the end to -2 (you'd get the same result on this particular text by setting the start to 3 and the end to 4, but you'd be counting from the beginning instead of from the end, and street addresses can have different amount of words, but seems like zip codes always have two words). To extract everything before province (ON), you'd set start to 0 and end to -3.

Remember that you can preview what any gatherer will extract by clicking on the Choose visible properties button in the selection panel at the bottom and checking whatever gatherer you want to preview under the Selection Preview panel (in this case, choose JS_Sections) and then selecting whatever elements you want to preview it for with selection mode on. Also, note that you can copy this whole code and paste it into a new JavaScript gatherer to use more than one section on your project.
Attachments
Sections.hsp
(488.68 KiB) Downloaded 419 times
Juan Soldi
The Helium Scraper Team

YADDY
Posts: 2
Joined: Tue Dec 04, 2012 3:35 pm

Re: Need help

Post by YADDY » Wed Dec 05, 2012 5:06 pm

Do anyone know how to make helium recognize that 20 is the last page and not start back over at 1.
<< < 1 2 3 4 5 6 7 8 9 10 > >>
<< < 10 11 12 13 14 15 16 17 18 19 20 > >>

Post Reply