Infinite Scroll

Premades & Resources to be used with Helium Scraper 3
Post Reply
webmaster
Site Admin
Posts: 491
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Infinite Scroll

Post by webmaster » Fri Jan 04, 2019 7:56 am

This function handles pages with infinite scroll, even when the scrollable area is not the document but an element inside the page. For a quick start, check out this video tutorial on how to quickly import and use this premade from within Helium Scraper.

This function takes the following arguments:
  • itemSelector: A selector that selects the items in the scrollable area, typically each of the results items. After scrolling down, the previous elements will be deleted to minimize memory consumption. Usually these elements can be automatically selected using the Detect List button on the page.
  • scrollDelay: A wait time in milliseconds, to wait for new data to load after scrolling down.
  • removeOldElements: Whether to remove old elements after they've been extracted. Should be true unless removing elements breaks the page's scrolling mechanism.
When the action runs, it selects each of the list items on the browser, so if extracting from a list, the list items should not be selected again above the extract action or below this action.

To use this on your project, follow these instructions:

Code: Select all

function (itemSelector scrollDelay removeOldElements)
   Browser.ScrollLoop
      ·  itemSelector
      ·  Sequence.First
            ·  itemSelector
         Browser.ScrollToBottom
         Browser.Wait
            ·  scrollDelay
      ·  removeOldElements            
The maximum number of elements to be extracted can be limited using the Sequence.Take function as in the following example (this example assumes the project already contains a selector called ListItem that selects the list items, and the code above was pasted under a global called InfiniteScroll):

Code: Select all

Sequence.Take
   ·  500
   ·  InfiniteScroll
         ·  Select.ListItem
         ·  100
         ·  true         
Note that the example above would extract 500 list items, each of which will usually occur many times per page, as opposed to 500 pages.
Juan Soldi
The Helium Scraper Team

Post Reply