TypeScript + Helium Scraper

Since its inception, Helium Scraper has supported injecting and running JavaScript code. This makes it possible to perform complex calculations and access information that is not directly accessible to Helium Scraper, such as JSON data stored in elements and variables.

But mapping between JavaScript and Helium Scraper types has always been a hassle. Since JavaScript has no typing information, there’s no way for Helium Scraper to know before hand which type of data is getting back from a call to Browser.EvalScript or Browser.RunScript. One way to solve this is by creating a JSON Parser under Data Flow. But this can lead to run-time errors if our JavaScript data doesn’t conform to our JSON schema, and maintaining these schemas can be a tedious job.

Since the last few months, we’ve been playing with a new feature that, although it’s currently in its experimental phase, we’ve already been using extensively and with great results, while saving hours on development time. The idea is to have Helium Scraper compile TypeScript code into JavaScript, which is then injected and run in web pages by internally calling Browser.EvalScript or Browser.RunScript.

But compiling TypeScript is not the main point of this feature. The main point is to seamlessly convert TypeScript functions into Helium Scraper functions and map TypeScript types into Helium Scraper types, while also taking advantage of TypeScript’s type inference so that the whole type mapping can be done behind the scenes. With no further introduction, let’s go ahead and see how you can use this feature in the current version of Helium Scraper.

Setting up the environment

This feature requires Node.js and npm to be installed so your machine can compile TypeScript. If you already have these tools and a code editor installed, feel free to skip to the next section. Otherwise, head over to https://nodejs.org/ and install the latest version of Node.js. Usually, you’ll want to keep all installation options to their default values. This will also install npm, so all you need now is a code editor.

As an editor, I like to use VS Code because it requires zero configuration to start writing TypeScript code. If you do install VS Code, I recommend selecting the Add “Open with code” option to Windows Explorer directory context menu option to quickly open up Helium Scraper project directories in VS Code.

make-helium

Helium Scraper needs the make-helium npm package to be installed in order for Helium Scraper to be able to import TypeScript code into Helium Scraper projects. To install the package, just run this command on a terminal or the Windows Command Line (if you get an error telling you npm is not a recognized command, just re-launch VS Code or CMD):

npm install -g make-helium

Once that’s installed, open up Helium Scraper and open or create a new project. Since the TypeScript import feature is still experimental, it can currently be only accessed using a keyboard shortcut. With a project loaded, on the Helium Scraper window, press Ctrl+Alt+I (I as in import). You should now see the message Source directory created on the log. Since there was no source directory yet, make-helium (which was ran by Helium Scraper) just created the src directory inside the project folder, and also launched a File Explorer window where the folder is located.

If you installed VS Code and selected the appropriate option while installing it, you should be able to right click the folder and select Open with Code. The folder contains a file called index.ts, which contains a constant called helium that includes the eval and run objects. Functions that you put in the eval property will be imported into Helium Scraper as globals starting with the Eval prefix, and will internally use the Browser.EvalScript function. Functions in the run property will have the Run prefix and use Browser.RunScript internally.

Hello World

For a quick test, let’s add some eval code. Replace the contents of index.ts with this:

function test() {
    return { one: "hello world!", two: 12.34 }
}

const helium = { 
    eval: { 
        test
    },
    run: { }
}

Then save the file, go back to Helium Scraper and press the Ctrl+Alt+I combination again. This time you should see Import Succeeded on the log, which means the test function was successfully imported. In a blank global, type:

Eval.Test

Then run the global, and a new table will be created, which will contain two columns called one and two, as well as the data returned by the test TypeScript function. Note how the table structure mirrors the output type of the test function.

An Actual Use Case

But the fun part begins when we run TypeScript in pages we want to scrape. As an example, let’s extract keyword suggestions from Amazon. Note that the following extraction could also be achieved without any TypeScript or JavaScript, but those of you who are new to Helium Scraper may find it easier to use a language you’re already familiar with.

To begin, head over to amazon.com using Helium Scraper’s browser. Also, let’s configure Helium Scraper to run on the main browser so everything just runs in the current page. Go to Project > Settings and set Use Main Browser to True. Then replace the previous code in index.ts with this:

// Utility function to add async delays between actions
const delay = (ms: number) => new Promise(a => setTimeout(a, ms))

async function suggestions(keyword: string) {
    let input = 
        document.querySelector<HTMLInputElement>('#twotabsearchtextbox')!

    input.blur() // This is needed for some reason
    await delay(100) // Just in case

    input.value = keyword // Set our keyword
    await delay(100) // Just in case

    input.focus() // This loads the suggestion list
    await delay(2000) // Increase if suggestions take longer to load

    let divs = 
        document.querySelectorAll<HTMLDivElement>('#suggestions > div')

    return [...divs].map(e => ({
        keyword, 
        result: e.innerText,
    }))
}

const helium = { 
    eval: { 
        suggestions
    },
    run: { }
}

Note that the suggestions function is async. These kind of functions are supported out of the box. Also, this time the function receives a string parameter, which is used to populate the input box, which in turn causes the list of suggestions to be populated and then its values are returned, together with the original input keyword. It is important to note that if we return a list, this list will only be exported as a table if it’s a list of objects, since the structure of these objects is used by Helium Scraper to generate the output table.

Before importing the code, be sure to delete the previous reference to Eval.Test in your global, otherwise you’ll get an error telling you that this function is not defined. Then press Ctrl+Alt+I again, and a new Eval.Suggestions function will be available. You can now run something like this in a global while amazon.com is loaded on Helium Scraper’s browser, and it will produce a table with the keyword and result columns, containing the input keyword and extracted suggestions:

Eval.Suggestions
   ·  "roomba"

The results can also be put into variables to be used somewhere else:

Eval.Suggestions
   ·  "roomba"
as (keyword result)
[…]

Note that TypeScript lists are turned into sequences, which means that any actions below will run for each keyword/result pair on the list.

Wrapping-up

The feature will do its best to convert TypeScript types into Helium Scraper types, by converting lists into sequences, objects into structures, and by matching string, numbers and booleans. Any type unrecognized by Helium Scraper will be turned into a string (because everything can be turned into a string). To see the resulting type of a function, you can hover over it to see its type. For instance, the function above will have the type (SEQUENCE { STRING STRING }), which is a sequence of objects with two strings. For now, union and product types are not supported.

There are a few functions declared in the types.ts file which you may find useful, in particular webRequest, which runs sync requests with no cross-domain restrictions, and the combination of getSelector and getElementsBySelector, which lets you select elements using Helium Scraper selectors by name.

As I mentioned before, this feature has already saved us countless hours in development time, so I hope anyone already using JavaScript in Helium Scraper can also benefit from it. For those who are familiar with JavaScript but not TypeScript, there is this TypeScript for JavaScript programmers quick introduction. As you can see in the samples above, TypeScript is basically just JavaScript with type inference!

Add a Comment

Your email address will not be published. Required fields are marked *