Low-Code Web Scraping

Developers live myself love dark-mode code editors, terminals and cryptic looking command lines, but certain tasks, such as web development, data sheet calculations and, of course, web scraping, don’t require this level of technical expertise.

Low-code is an approach to software development that allows users of any level of technical knowledge to participate in the development process by providing a visual interface that requires no prior knowledge to master. The popularity of the low-code development movement has rapidly increased over the recent years, and here at Helium Scraper we embrace this trend as it reflects one of our key core values. The beauty of low-code development is that it goes two ways: it grants non-technical users computing power that only technical users once had, but it also boosts developers’ productivity by abstracting away repetitive and mundane tasks without sacrificing flexibility.

Helium Scraper achieves this through a visual editor where explicitly named actions can be inserted. These actions run, as one would naturally expect, from top to bottom:

Helium Scraper running actions

But there is a whole lot more going on here. What happens, for instance, when you plug in a turning pages action? Here, actions still run top to bottom, but some of them run more than once. Every action below Browser.TurnPages will run inside each page—including the first page:

Helium Scraper turning the pages

This is not an exception to the rule, but an inherent part of Helium Scraper’s design. In a typical imperative programming language, you would need a while loop just to loop through each page, that would look something like this:

Imperative while loop

But Helium Scraper takes the functional approach. To keep it short, every action in Helium Scraper is like a while loop in which every action below is inside the loop, so if every action produces a single value, the whole set of actions runs from top to bottom only once, but when an action outputs more than one value, actions below run for each of these values. This can be done with any number of actions, so if I have an action that runs 2 times, and below this action, another one that runs 3 times, any action below these two will run 6 times. This captures the basic notion in web scraping of drilling-down into pages, categories, sub-categories and so on, without the need for long nested sets of for/while loops.

These “list of actions” can also be used to define functions that take arguments, which can in turn invoke other functions, including themselves! This opens the door to the world of Turing-complete computations, which means you can perform the same set of complex calculations that any other technical programming language can, without the need to memorize and understand a myriad of keywords and operators, by putting under the hood all the technical aspects and prioritizing the drilling-down aspect of web scraping.

Add a Comment

Your email address will not be published. Required fields are marked *