Clear database

Questions & Answers about Helium Scraper 3
Post Reply
jonpaulin
Posts: 4
Joined: Tue Oct 23, 2018 7:44 pm

Clear database

Post by jonpaulin » Wed Oct 24, 2018 3:46 pm

I don't understand how to clear the database at the beginning of the project.

webmaster
Site Admin
Posts: 495
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Clear database

Post by webmaster » Wed Oct 31, 2018 9:05 pm

You can right-click the Database icon and select Clear Database. Note that this will clear the whole database. Or you can right-click a particular table set inside the database and select Clear All Data.
Juan Soldi
The Helium Scraper Team

jonpaulin
Posts: 4
Joined: Tue Oct 23, 2018 7:44 pm

Re: Clear database

Post by jonpaulin » Tue Jan 15, 2019 8:17 pm

Sorry for the response time.

What I mean is how do I clear the table from the Globals at the beginning of my project so it always starts with a clean table?

webmaster
Site Admin
Posts: 495
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Clear database

Post by webmaster » Wed Jan 16, 2019 5:06 am

You can use Action.ClearData to do this. But note that you cannot mix actions and sequences (the ones you use to extract data and interact with the browser), so this needs to go into a separate Global. Suppose your global is called Main, which would also be the name of the output table(s). If so, just create another global with this code and run this global instead of Main:

Code: Select all

Action.ClearData
   ·  "Main"
Action.Extract
   ·  Main
   ·  "Main"
This global will clear the Main table and then extract the Main global to the Main table.
Juan Soldi
The Helium Scraper Team

ABkeeper
Posts: 3
Joined: Fri Jan 04, 2019 1:57 pm

Re: Clear database

Post by ABkeeper » Sat Jan 19, 2019 4:18 pm

Using the "Clear database" command does not reset the ID. This leads to inconsistencies in the ID in the databases ... This is very inconvenient.

webmaster
Site Admin
Posts: 495
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Clear database

Post by webmaster » Mon Jan 21, 2019 5:44 pm

I wouldn't recommend using the auto-generated ID as an identifier for data that has been extracted on separate extractions because if any elements are added or removed from the site, IDs won't match anymore. Instead, I would use the text or link of the element if you know they're unique within the site.

But if you know they won't change, what you can do is add an Action.ExecuteNonQuery action that resets the IDs right bellow your Action.ClearData action. Helium uses SQLite, so you can use the query given on this answer.

If you had a single table called Main, the whole Extract global would look like this:

Code: Select all

Action.ClearData
   ·  "Main"
Action.ExecuteNonQuery
   ·  "delete from sqlite_sequence where name='Main';"
Action.Extract
   ·  Main
   ·  "Main"
Note that if your Main table set has many tables, you'll need to reset the IDs for all tables. If your Main data set had two tables, one called Main and another called Main.children, you'd use this instead:

Code: Select all

Action.ClearData
   ·  "Main"
Action.ExecuteNonQuery
   ·  "delete from sqlite_sequence where name='Main' or name='Main.children';"
Action.Extract
   ·  Main
   ·  "Main"
Juan Soldi
The Helium Scraper Team

Post Reply