Property Gatherers

Property Gatherers are are the way Helium Scraper looks at elements in a web page. They all share the same basic principle: take a web page element as an input and output a value gathered from this element. These gatherers are used, on one hand, to define kinds as a set of properties gathered from elements, and on the other, to extract these properties from these elements into the database. You can select which gatherers are used to define kinds and which of them can be extracted at Project -> Options -> Select Property Gatherers. Additionally, you can select which gatherers are shown in the Selection panel when elements are selected in the main browser while in selection mode.

While there is a long list of built in gatherers, additional gatherers may be needed in order to fullfill the job's needs. When defining kinds, built in gatherers may not be enough to distinguish between one element and another, causing a kind to select more elements than it should. When extracting, additional properties may need to be extracted, or they may need to be pre-processed before being extracted. Helium Scraper provides a set of tools to create your own property gatheres. When created, any of these gatherers will have the "JS_" prefix prepended to their name. Advaced users can create JavaScript Gatherers. Each of them runs a JavaScript function that receives an HTML element as the input parameter, and after processing it, returns as the output a value gathered from this element. Text Gatherers let you easily take a text from an element and transform it into another text, and Boolean Text Gatherers return True or False depending on whether a string is found or not inside the element's text.