Actions right arrow Actions List right arrow Extract

This is the simplest way to extract data to the database. To create an Extract action, first click on the Extract menu item, then select the Kinds you want to extract, and click OK. If more than one Kind is chosen, Helium Scraper will figure out by analyzing the HTML structure how to organize the extracted data in rows. Normally, elements that share the same HTML parent node will share the same row in the data table.

On the next window you can modify various extraction parameters and the structure of the output table. Here is the list of parameters and their functions:

Id Column Name

If specified, adds an Id Column to the generated table with the given name. This Id can be then gathered by using the ID_{Table Name} property gatherer to keep rows in different tables related to each other in a coherent way.

Column Name

The name of the column in the data table where the data will be extracted.

Kind Name

The Kind to be extracted.

Property

The Property to be extracted from the selected Kind. Only Extraction Properties will be listed. These properties can be set from the Project -> Options menu item.

Req. Mode

Requirement Mode. Determines whether the required amount must be at least or exactly the amount given in the Req. Amount column.

Req. Amount

Required Amount. The amount of elements that must be selected. If the requirement is not met, a Message Box will appear.

Unique

Causes this column to be the or one of the columns that are used by the Extract action to uniquely identify a row. If more than one Unique column is used, every distinct occurrence of all Unique columns considered together will serve as the row identifier.

This means that a row where Column1's value is A, and Column2's value is B, will be considered as different than a row where Column1's value is A and Column2's value is C, despite the fact that Column1's value is A in both rows.

Whenever a row is about to be extracted, the target table will be queried for a row that matches the Unique columns to be extracted. If found, the row will be updated with the new non Unique values if these are different than the stored ones, or otherwise ignored.

Download

If checked, Helium Scraper will try to download the resource at the location given by the value of the selected Property. For example, setting the Property column to SrcAttribute will download the images if the selected Kind selects image elements. The value in the data table will be the name of the downloaded file instead of the value of the selected Property. The Downloads Folder can be set at Project -> Options in the main menu.


Advanced Parameters

Data Type

Sets the Data Type of the column to be created. These are the available options:

  • Auto: Uses a 16 characters Text as the starting data type and expands it automatically as needed, up to a Memo data type.
  • Custom: Let you use any Jet SQL (MS Access) data type in the Custom DT field. When used, the Max Length field is ignored. Any parameter must be set in the Custom DT field itself.
  • Currency: Holds up to 15 digits of whole dollars, plus 4 decimal places.
  • Date: Use for dates and times.
  • Float: Double precision floating-point. Will handle most decimals.
  • Integer: Allows whole numbers between -2,147,483,648 and 2,147,483,647.
  • Memo: Stores up to 65,536 characters. Note that Memo fields cannot be used in JOIN queries.
  • Numeric: Stores numbers that have fixed precision and scale. The precision and scale can be set in the Max Length field separated by a comma.
    • Precision: Sets the maximum total amount of digits that can be stored. Must be a number between 1 and 38. The default value is 18.
    • Scale: Can only be specified if Precision is specified. Sets the maximum amount of digits that can be stored to the right of the decimal point. Must be a number from 0 through Precision. The default value is 0.
  • Text: Stores up to 255 characters if no Max Length is given or up to Max Length characters if given.
Custom DT

Used when the Data Type field is set to Custom. Let you use any Jet SQL (MS Access) data type. When used, the Max Length field is ignored. Any parameter must be set in the Custom DT field itself.

Max Length

Used with theText and Numeric data types. See the Data Type field for details.

If the Simulate option is checked, no extraction will be performed, but information will be stored in the Virtual Tree as normally, except that elements "inserted" in the data table will have a NULL ID.

This action cannot contain child nodes.