extract more photos than i want, and duplicate lines

Questions and answers about anything related to Helium Scraper
Post Reply
Bookseller
Posts: 6
Joined: Fri Mar 23, 2012 1:07 pm

extract more photos than i want, and duplicate lines

Post by Bookseller » Mon Apr 09, 2012 12:04 am

Hello,
i'm really amazed by this program, it gives me a lot of help!

I would like not to download 'all' the photos from a page just the one type i registered in Kinds (_F.jpg in my case)
How to tweak to get only one photo from the page, please? I don't want duplicate lines (automatically made by the program for each photos downloaded).

Any advice, please?

Bests
Attachments
Win [En fonction].jpg
Win [En fonction].jpg (164.81 KiB) Viewed 9049 times

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: extract more photos than i want, and duplicate lines

Post by webmaster » Mon Apr 09, 2012 2:35 am

Hi,

Well the kind is definitely selecting all the photos. Does the "_F" photo have any particular difference compared to the rest of them? Perhaps you could try going to Project -> Options -> Select Property Gatherers and select all under the Kind Defining tab, and then creating your photo kind again.

If the only reason why you don't want to download the other pictures is to prevent duplicated lines, then you can also extract them to another table and relate them by an Id. Assuming your main table (that one you are not downloading the pictures to) is called "Table1", you'd add a column to your "Pictures" table (the one you'd extract the photos to) that extracts the ID_Table1 property from the BODY kind. Then you'd place the Extract action that extracts to the "Pictues" table right underneath the one that extracts to the "Table1" table. This will keep both tables related and prevent duplicated lines while letting you download more than one picture per page.
Juan Soldi
The Helium Scraper Team

Bookseller
Posts: 6
Joined: Fri Mar 23, 2012 1:07 pm

Re: extract more photos than i want, and duplicate lines

Post by Bookseller » Mon Apr 09, 2012 1:04 pm

Great thank you!

I'll go for double scraping, one for text , the other for images.

But, another thing bother me.

When scraping text from the central part of the page i usaully have to choose different part of it. From the exemple before i have : "citation", resumé", "notes" and "bio".
I would have the entire text wrapped toghether, instead of 4 different colums. Since i'm scraping thousands of pages, this would be of a great help.
Any way to get it the right way, please?

Bests

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: extract more photos than i want, and duplicate lines

Post by webmaster » Wed Apr 11, 2012 1:43 am

Do you need the text to show up on 4 columns or all in one column? If one column, try selecting any of these items and clicking on the Select Parent button at the bottom until the whole text is selected and then creating your kind.
Juan Soldi
The Helium Scraper Team

Post Reply