While Loop Function in recurring actions
Posted: Sat Sep 04, 2021 7:08 pm
Hi.
I'm trying to scrape threads from a target forum. It uses IDs as thread indexes.
Let's say today I scraped successfully from thread 100 to thread 50 (showed in decrescent way).
I want to run the helium job tomorrow. So I have thread id on my sqlite db starting from threadId 50.
Forum page shows all threads, and they go from thread 200 to 0.
I'm trying to write a function that does a while loop that does, more or less:
- Navigate the forum and search the most recent thread. Save the id to a variable.
- Query local database for first record of threadId
- If they do not equal, execute the Scrape global, else return.
Instead of using an if function to do the comparison, i'd like to use a while loop that iterates over each thread ID in the forum. If the Id is different from my first database record, then proceed to scrape, else return.
What should be the best way of achieving this?
Also: Is there a way to avoid duplicates in the database while scraping?
Maybe related: Is there a way, from inside the software, to do a sort of scheduled run?
Thanks! Helium is really a wonder of a software!
I'm trying to scrape threads from a target forum. It uses IDs as thread indexes.
Let's say today I scraped successfully from thread 100 to thread 50 (showed in decrescent way).
I want to run the helium job tomorrow. So I have thread id on my sqlite db starting from threadId 50.
Forum page shows all threads, and they go from thread 200 to 0.
I'm trying to write a function that does a while loop that does, more or less:
- Navigate the forum and search the most recent thread. Save the id to a variable.
- Query local database for first record of threadId
- If they do not equal, execute the Scrape global, else return.
Instead of using an if function to do the comparison, i'd like to use a while loop that iterates over each thread ID in the forum. If the Id is different from my first database record, then proceed to scrape, else return.
What should be the best way of achieving this?
Also: Is there a way to avoid duplicates in the database while scraping?
Maybe related: Is there a way, from inside the software, to do a sort of scheduled run?
Thanks! Helium is really a wonder of a software!