Data Flow

Provides data input and output functionality.

Ajax

Creates functions that receive a URL, or a raw HTTP request without a body, and perform a web request. The response is then parsed as a JSON string and a structure having a particular JSON schema is returned by the function.

These functions are created with the 'Ajax.' prefix.

JSON Parsers

Creates functions that parse a JSON string and produces and structure matching the provided JSON schema.

These functions are created with the 'Parse.' prefix.

JSON Schema

Helium Scrapers uses a non-strict way of reading JSON data, for both Ajax and JSON Parsers. Here are a few points to take into consideration:

  • If the JSON Schema represents a string, and the JSON has any other type, no error will be thrown and a string representation of the object will be returned.
  • If the JSON Schema represents an object with a single property called value, and the JSON has an atomic type, this value will be used as the value of the value property. This is useful when extracting arrays of atomic values directly into the database, since arrays of atomic values are not extracted into a separate table and only the first value would be extracted unless the Schema represents an array of objects.
  • If the JSON Schema represents an array, and the JSON has a non-array type, then a single item list containing this value will be returned.
  • If the JSON Schema represents an array, and the JSON represents null, an empty list will be returned.

Since there is no null value in Helium Scraper, nullable types can be represented as arrays containing either no items or a single item. To do this, add the "optionals": "list" property to the top level object of the JSON Schema. This property causes JSON non-array object properties, that are not marked as required, to be converted to lists. This prevents errors in case their value is null or undefined.

JSON Schema Inference

JSON Schema can be inferred given any sample JSON. To do this, click the JSON Schema Inference button in the Ajax or JSON Parser editor to show the inference panel. Then, either paste the JSON into the JSON editor on the left, or enter a URL or a raw HTTP request, without a body, into the URL text box, and then press the Download JSON button to download the JSON and populate the JSON editor. Finally, press the Infer JSON Schema button and the schema editor will be filled up with the inferred schema. The following Inference Settings are available:

  • Add Optionals: Whether to include the "optionals": "list" property so that non-required, non-array object properties are converted to lists. This prevents errors when the property value is null or undefined. The default value is True.
  • Force Array: Whether the top type should always be an array, so that it can always be directly added to a global. The default value is True.

Queries

Creates values or functions that output the results of a given SQL query. If the query has no parameters, the result is a value. If it does, the result is a function that takes the selected parameters. To add parameters, select the Parameters button on the query editor and enter one or more parameters. To add them to the query, prefix their names with the '@' symbol.

Queries on table sets can be quickly created by right clicking a table set and selecting Create Query.

These functions are created with the 'Query.' prefix.

Text

This is an easy way to quickly create a function that extracts and transforms text from HTML elements.

To get started, right click the Text category and select Create. If any elements were selected on the main browser, their text will be shown on a table. Otherwise a default text will be shown. New sample text can be added to this table by typing on the last row. Then, click the Add Step button and select one or more steps until the desired output is produced. Each of the following three kinds of steps transform the input text in a different way:

  • Slice: Outputs a section of the input text. The source text is split into sections by a given delimiter, and the section at the zero based Slice Position is used as the output.
  • Replace: Replaces every occurrence of a string with another string. If regular expressions are used, the replacement text can include the $N placeholder to output regular expression matches, where N is a zero based index representing the index of the capturing group.
  • Regular Expression: Runs a regular expression on the input text and outputs the match at Match Position.

After the desired output has been produced, press the Save button and the function will be accessible from any global using the 'Text.' prefix. If the function is added below a selector, just like with any gathering action, the function will be applied to the elements selected by the selector above. The following example uses a text transformation function called MyTextFunction to extract text from the elements selected by the MySelector selector:

   Select.MySelector
   Text.MyTextFunction