Actions right arrow Actions List right arrow Extract right arrow Extraction Property Gatherers

Most property gatherers are used internally by Helium Scraper to define Kinds and you will never need to worry about them. Some of them, though, can give useful information, such as the Text and SrcAttribute property gatherers. The property gatherers that are available for extraction can be set at Project -> Options -> Select Property Gatherers under the Extraction tab.

Below is the list of property gatherers that are available for extraction by default. If you would like to know what a property gatherer that is not listed does, you can test it by following these steps:

  1. Click the Choose visible properties button in the Selection panel at the bottom of the screen.
  2. Deselect all items by clicking twice on Select all and select the one you would like to test and press OK.
  3. Now select some elements on the web page and the chosen property of the selected elements will be shown.


ID_{Table Name}

Finds the closest element that has been extracted or updated in {Table Name} and that has a one-to-many relationship to the currently extracted element and gets the Id of the row to which it has been extracted or updated.

If the current document is one of the visited documents of a Navigate Each action and the previous element has been extracted from the parent document that contains the links, the matching element will be either the visited link or the element that is closest to the link.

This is possible because Helium Scraper stores extracted elements together with information about updates made by the Extract action to the database in the Virtual Tree, which exists during the whole current extraction process.

JS_MailTo

Gets the email from an A element when the href attribute starts with mailto. These are the links that when clicked, take you to the default email application with a blank, ready to be sent email.

Link

Gets the href attribute of an HTML element, which contains destination URL of a link.

OuterHTML

Gets the HTML code of the element.

SingleLineInnerText

Gets the inner text of an element, but with new lines replaced by spaces.

SrcAttribute

Gets the src attribute of an HTML element. The src element contains the URL in an image element. This is the property you would usually use when downloading pictures.

Text

Gets the text of the element after removing any leading and trailing whitespace characters. To get the non-trimmed text use the InnerText gatherer (note that InnerText is not an extraction gatherer by default).

Url

Gets the URL of the web page in which the element is located. This implies the Url will be the same for every element in the web page.

Value

In an INPUT element, such as a text box, contains its current value.