The Plus Operator

There are many operators in Helium Scraper, but the plus (+) operator deserves its own tutorial, given the number of uses it has. This is because it doesn’t just represent addition, but also concatenations of strings and sequences.

concatenation

Simple Cases

Helium Scraper will treat the operator differently, depending on the type of data that is given. For instance, if given numbers, it’ll behave as a simple addition operator:

+
   ·  10.1
   ·  20.2
   ·  30.3

The above code will produce the number 60.6.

When the operator is given strings, their concatenation is produced:

+
   ·  "Hello "
   ·  "World"

This code will produce “Hello World”. Note that any empty strings are ignored, so the code above is equivalent to this:

+
   ·  "Hello "
   ·  ""
   ·  "World"

Selectors: Extracting Prices

Things become interesting when the operator is used with sequences. Suppose you’re working on a project that extracts from an eCommerce site, where some products have a discount. When they do, the price is shown in a different place and having a completely different style than when there’s no discount. So it’s not possible to create a single selector that selects both types of prices, although we’d like to have either price extracted to the same column. The solution here is to create two selectors for each type of price: NormalPrice and DiscountedPrice, and then use the following to extract any of the prices:

extract
    price
       +
          ·  Select.NormalPrice
          ·  Select.DiscountedPrice

This will work because, just like with strings, empty sequences are ignored. When there’s no normal price, the concatenation will only contain the discounted price, and when there’s no discounted price, it’ll only contain the normal price. Also, in this particular case, if one page happens to contain both prices, only the normal price will be extracted, because only the first element is extracted when the column doesn’t contain a nested extract action. Finally, if no price is found, the concatenation will be empty, so nothing will be extracted.

Sequences: Extracting Events

Not only can selectors be concatenated, but any sequences, even when they contain complex data. Imagine you have a list of URLs for events from which you’d like to extract information, such as event name, date and address. But some of these URLs are for events that span over multiple dates, and instead of taking you directly to the event details page, they take you to a list of dates, each of which needs to be clicked in order to see the individual event details.

This can be solved by, first, having a selector that selects the event dates, called EventDate. Also, suppose you have a global called EventDetails that extracts from each individual event details page. Given this, the following code could be used to extract from both kinds of URLs, and would include every individual date when the event spans over multiple dates:

Query.EventURLs
as (url)
Browser.Load
   ·  url
+
   ·  Sequence.IfAny
         ·  Select.EventDate
         ·  Sequence.Empty
         ·  EventDetails
   ·  Select.EventDate
      Browser.Click
      EventDetails

The way this works is, the top element of the + operator (Sequence.IfAny and its 3 arguments) will either be empty when there are any event dates, or contain the result of EventDetails when there are no event dates. And the bottom element will contain one EventDetails for each event date, which implies that it’ll be empty when there are no event dates. So the result of the operation will be a concatenation of either no items on top and one or more at the bottom (when there are event dates), or a single item on top and no items at the bottom (when there are no event dates).

It is important to note that the data on both elements have the same types. It is not possible to apply the operator to different kinds of data, such as strings and sequences, or sequences containing different types of data.

For more information on this and other operators, see the plus operator documentation.

Add a Comment

Your email address will not be published. Required fields are marked *