Data Flow

Provides data input and output functionality.

Ajax

Creates functions that receive a URL and perform a web request. The response is then parsed as a JSON string and a structure having a particular JSON schema is returned by the function.

These functions are created with the 'Ajax.' prefix.

JSON Parsers

Creates functions that parse a JSON string and produces and structure matching the provided JSON schema.

These functions are created with the 'Parse.' prefix.

JSON Schema

Helium Scrapers uses a non-strict way of reading JSON data, for both Ajax and JSON Parsers. Here are a few points to take into consideration:

  • If the JSON Schema represents a string, and the JSON is any other valid JSON than a string, no error will be thrown and a string representation of the object will be returned.
  • If the JSON Schema represents an object with a single property called value, and the JSON is an atomic value, this value will be used as the value of the value property. This is useful when extracting arrays of atomic values directly into the database, since arrays of atomic values are not extracted into a separate table and only the first value would be extracted unless the Schema represents an array of objects.
  • If the JSON Schema represents an array, and the JSON is a value of a non-array type, then a single item list containing this value will be returned.
  • If the JSON Schema represents an array, and the JSON represents null, and empty list will be returned.

Since there is no null value in Helium Scraper, nullable types can be represented as arrays containing either no items or a single item. To do this, add the "optionals": "list" property to the top level object of the JSON Schema. This property causes JSON non-array object properties, that are not marked as required, to be converted to lists. This prevents errors in case their value is null or undefined.

JSON Schema can be inferred given any sample JSON. To do this, click the JSON Schema Inference button on the ajax or JSON parser editor to shown the inference panel. Then, either paste the JSON on the JSON editor to the left, or enter a URL and press the Download JSON button to download the JSON and fill up the JSON editor. Then press the Infer JSON Schema button and the schema editor will be filled up with the inferred schema.

When the JSON Schema is inferred, the "optionals": "list" property is always included.

Queries

Creates values or functions that output the results of a given SQL query. If the query has no parameters, the result is a value. If it does, the result is a function that takes the selected parameters. To add parameters, select the Parameters button on the query editor and enter one or more parameters. To add them to the query, prefix their names with the '@' symbol.

Queries on table sets can be quickly created by right clicking a table set and selecting Create Query.

These functions are created with the 'Query.' prefix.

Text

This is an easy way to quickly create a function that extracts and transforms text from HTML elements.

To get started, right click the Text category and select Create. If any elements were selected on the main browser, their text will be shown on a table. Otherwise a default text will be shown. New sample text can be added to this table by typing on the last row. Then, click the Add Step button and select one or more steps until the desired output is produced. Each of the following three kinds of steps transform the input text in a different way:

  • Slice: Outputs a section of the input text. The source text is split into sections by a given delimiter, and the section at the zero based Slice Position is used as the output.
  • Replace: Replaces every occurrence of a string with another string. If regular expressions are used, the replacement text can include the $N placeholder to output regular expression matches, where N is a zero based index representing the index of the capturing group.
  • Regular Expression: Runs a regular expression on the input text and outputs the match at Match Position.

After the desired output has been produced, press the Save button and the function will be accessible from any global using the 'Text.' prefix. If the function is added below a selector, just like with any gathering action, the function will be applied to the elements selected by the selector above. The following example uses a text transformation function called MyTextFunction to extract text from the elements selected by the MySelector selector:

   Select.MySelector
   Text.MyTextFunction