Could someone show me how to extract the URL of the page im extracting data from? I believe it may involve java coding? If so could someone be able to show me the code i need to use and where to put it?
Thanks
Extracting a Page URL
Re: Extracting a Page URL
Just change the property being extracted to Url as in this picture:
You do need to create a kind that will select any element in the page. No matter which element it selects, the Url will be always the same for every element in a given page. So normally, you would use any other kind from which you are also extracting the inner text or anything else.
If you want to extract only the Url, you should create a kind that selects one element per page. Otherwise, Helium Scraper would extract repeated URL's, since it's extracting the Url of each element the kind selects. What I usually do is create a kind that selects the BODY element, because the BODY is always present and always once per page. I'm attaching a project that contains a kind that always selects the BODY element, no matter which page you are at. You can import it into you current project from File -> Import.
Hope it helped.
You do need to create a kind that will select any element in the page. No matter which element it selects, the Url will be always the same for every element in a given page. So normally, you would use any other kind from which you are also extracting the inner text or anything else.
If you want to extract only the Url, you should create a kind that selects one element per page. Otherwise, Helium Scraper would extract repeated URL's, since it's extracting the Url of each element the kind selects. What I usually do is create a kind that selects the BODY element, because the BODY is always present and always once per page. I'm attaching a project that contains a kind that always selects the BODY element, no matter which page you are at. You can import it into you current project from File -> Import.
Hope it helped.
- Attachments
-
- Body.hsp
- (289.46 KiB) Downloaded 687 times
Juan Soldi
The Helium Scraper Team
The Helium Scraper Team
Re: Extracting a Page URL
Thanks for your quick response.