Skip to content

Globals

Globals are the main components of Helium Scraper. Each global consists of a single do-block, which contains a list of one or more statements, and which may in turn contain other do-blocks in the form of arguments. When a statement produces a sequence that contains many values, such as in the case of a selector or a query, the statements below run for each of these values. In consequence, if a statement produces an empty sequence, the statements below do not run.

The value produced by a do-block is the concatenation of the values produced by the last or only statement. When a global runs and the last statement is an extract action, a set of one or more output tables is created. This table set is named after the global, and its structure mirrors the structure of the extract action.

Syntax

Statements in a do-block are defined by a list of items having all the same indentation:

   [statement1]
   [statement2]
   [statement3]
   …

Every do-block contains a placeholder as its only statement by default. Additional statements can be added using the editor by right-clicking a statement and selecting Add Sibling. Also, the value produced by a statement can be stored into one or more variables using the as keyword, which can be added by right clicking a statement and selecting Output Result.

Gather.Link
as url

In the example above, the url variable will be accessible to every statement below and will contain the value produced by Gather.Link.

If the statement produces a composite value, such as when the statement is an extract action, a list of values in round brackets can be used, which will be populated with the values of the struct members. In the example below, the value of the one variable will be the number 1, and the value of the two variable will be the string "two":

extract
   one
      1
   two
      "two"
as (one two)

Note that the last or only statement in a do-block cannot have output variables.

Arguments

Most keywords, operators, and functions can take one or more arguments. Arguments are identified by bullet points (although periods are also accepted):

+
   ·  Argument1
   ·  Argument2
   ·  Argument3

The example above passes 3 separate arguments to the + operator, which will produce the sum of all 3 arguments, or their concatenation if they are sequences or strings. Note that the above example would produce a different result than this (notice the missing bullet):

+
   ·  Argument1
       Argument2
   ·  Argument3

In this example, assuming all arguments are sequences, the result of the + operator will be the concatenation of only two arguments. The first argument is a do-block containing both Argument1 and Argument2, which will run one after another. And the second argument is just Argument3.

Some operators, such as the + operator, can take an arbitrary number of arguments. To add an argument to an operator, place the cursor over a bullet character, such that both the bullet and the argument are highlighted, and then right-click it and select Add Sibling.

Whitespace

Whitespace and line-breaks are taken into consideration when code blocks are parsed. For instance, the following two pieces of code are not equivalent:

Browser.TurnPages
   ·  Select.NextButton
Select.Link
Browser.TurnPages
   ·  Select.NextButton
       Select.Link

In the first example Select.NextButton is the argument passed to Browser.TurnPages, and Select.Link will be selected inside each page.

In the second example, the argument passed to Browser.TurnPages is a do-block that contains both Select.NextButton and Select.Link. Note that this example would only work if the HTML element selected by Select.Link is the actual link to the next page, and is contained inside the element selected by Select.NextButton.