Saving full HTML of URLs

Questions & Answers about Helium Scraper 3
Post Reply
sawal86
Posts: 4
Joined: Tue Jun 19, 2018 2:16 pm

Saving full HTML of URLs

Post by sawal86 » Sun May 03, 2020 11:51 pm

Hello!
I have a list of URLs and I want it just to be saved as HTML files to my PC.

I want to set a selector as the name of each HTML file, for example, Select.CompanyName.

Is it possible in Helium3 ?

If yes, help with the template, please!

Regards.
Aleks.

webmaster
Site Admin
Posts: 501
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Saving full HTML of URLs

Post by webmaster » Tue May 19, 2020 10:27 pm

You can use Gather.HTML to get the current page HTML (or the HTML of any particular element when the element is selected), and since version 3.2.4.8 you can use Sequence.WriteFile to write files with arbitrary text content. In your case, you could do something like this, supposing all the pages you're visiting have a title element selected by a selector called Title:

Code: Select all

Query.URLs
as (url)
Browser.Load
   ·  url
extract
   html
      Gather.HTML
      as html
      Select.Title
      as title
      Sequence.WriteFile
         ·  html
         ·  +
               ·  title
               ·  ".html"
         ·  false
Juan Soldi
The Helium Scraper Team

Post Reply