HS skipping/duplicating data

Let us know if anything goes wrong with our baby :)
Post Reply
Posts: 22
Joined: Tue Aug 16, 2011 12:44 am

HS skipping/duplicating data

Post by doug » Tue Sep 06, 2011 4:17 pm

My project is scraping 20 rows from each page in a sequence. When I finish and use one of the kinds as a key in a database I'm told there are duplicates, but the data is guaranteed to be unique. Examining the data shows that some extracts were duplicated a page at a time. But also that a page of data is missing. HS extracted a page using the data from the previous load. I've tried force-select but that doesn't help. (If the forced kind was in the old page it seems HS is happy?). I've also tried waiting up to 9 seconds but that is tedious (with failures). What can be used as a trigger for new data to extract?

Site Admin
Posts: 502
Joined: Mon Dec 06, 2010 8:39 am

Re: HS skipping/duplicating data

Post by webmaster » Wed Sep 07, 2011 4:26 am


I'd need to look at your project to see exactly what's causing duplicated content. But I've made a small variation to the Force Select premade that might help with your problem. This one will force-select-new-content. It simply stores the last selected content and keeps trying to select until different content is found or until the timeout is reached.

Note that it stores this data in the Global.UserData variable, so if you use this variable anywhere else, the code would need to be modified a little bit. Let me know if you need help with this. Also, you can only use the Force Select New Content actions tree in one point in your extraction tree. I think this should be good enough for your particular case.

You should place the Force Select New Content right before your Extract action.
(289.46 KiB) Downloaded 310 times
Juan Soldi
The Helium Scraper Team

Post Reply