Avoiding duplicate entries from 'navigate all pages'

Questions and answers about anything related to Helium Scraper
Post Reply
tkeimer
Posts: 1
Joined: Mon Jun 24, 2013 4:01 pm

Avoiding duplicate entries from 'navigate all pages'

Post by tkeimer » Mon Jun 24, 2013 6:01 pm

On the following page I am trying to scrape all donation amounts/timings that can be found in the 'Funders' tab:

http://www.indiegogo.com/projects/help- ... mile-again

To see all donations, one has to click the 'show more funders' button/link so the page expands downward to show more entries (on the same page). I selected this button as my 'next' kind as described in the tutorial.

My action tree is as follows:

Navigate each 'FundersTab'
Go through all pages 'Show more funders button'
Extract dollar amounts/timing to table

The problem is that when executing the table, 'Go through all pages' returns many duplicates since the list merely expands downwards when clicking 'show more funders' and the same results get reported over and over for each 'show more funders' iteration in addition to the 'new' results. Is there any way to fix this? I plan to run this extraction logic daily on about 500 pages over the course of 4 weeks to track donation progress, so it would be very helpful if i could avoid having to clean up duplicates everytime :) I apologize if I am asking a question that has received attention previously; i went through the forums and didn't find a solution.

Unfortunately I am completely unrelated to programming, so please keep an explanation as easy as possible.

Thank you, your help is very much appreciated!
tkeimer

Post Reply