Auto-Distrubute URLs among processes

Here we will be posting premade Helium Scraper projects and helpful stuff.
Post Reply
webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Auto-Distrubute URLs among processes

Post by webmaster » Fri Jun 08, 2012 4:17 am

This project takes a list of URL's and lets you perform any extract action in each of them. Is particularly fit to work with a long list because, on one hand, distributes the URL's among multiple processes, and in the other, keeps track of which URL's have been already extracted, such that if the extraction stops for any reason, the next time the project is run, already extracted URL's will not be visited again.

To use it follow these steps:
  • Paste your URL's into the URL column of the _URLs table and save the table.
  • Export and connect to the the database using the Export Database -> Export and Connect command in the database panel.
  • Place your extraction logic inside the Extract actions tree (this tree will run at each URL in your list).
  • Save your project.
  • Run the Run actions tree.
Extra options:
  • To forget which URL's have been extracted (and extract them again), run the Reset Dones actions tree, which clears the Dones data table that keeps track of extracted URL's.
  • To extract the current URL's ID, extract the VAR_URL_ID property from the BODY kind in your Extract action.
  • To change the amount of simultaneous processes, expand the Run actions tree, double click the Start Processes action and change the Max. Simultaneous Processes property.
  • To change the amount of URL's to be extracted per process, change the value of the groupSize variable in the Execute JS (Make Groups) inside the Run actions tree.
Attachments
AutoDistributeURLs.hsp
(515.4 KiB) Downloaded 1055 times
Juan Soldi
The Helium Scraper Team

leonardocunha
Posts: 1
Joined: Mon Oct 17, 2016 3:06 pm

Re: Auto-Distrubute URLs among processes

Post by leonardocunha » Mon Oct 17, 2016 6:24 pm

Hello!
I've tried running this but Helium Scraper says something like "System cannot find specified file".
Also tried the Online Premade "Multi-process Navigate URLs" but with same above message.

Please, can you tell me what could be wrong? I use Win 7 64bits.
(Windows 7 Ultimate 64bit, Brazilian portguese)

Error details:
<?xml version="1.0" encoding="utf-16"?>
<ErrorReport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<InnerError>
<Message>O sistema não pode encontrar o arquivo especificado</Message>
<TypeName>System.ComponentModel.Win32Exception</TypeName>
</InnerError>
<Message>O sistema não pode encontrar o arquivo especificado</Message>
<TypeName>Player.Actions.StopException</TypeName>
<StackTrace> em Player.Actions.Executor.‭‏‌‫​‏​‌‏‌‏‎‭‮‪‏‌‍‍‍‍‪‍‬‮()</StackTrace>
<TimeStamp>2016-10-17T16:22:43.4659935-02:00</TimeStamp>
<Version>2.4.3.2</Version>
<WorkingSet>246300672</WorkingSet>
<Is64Bit>true</Is64Bit>
<WinVersion>Microsoft Windows NT 6.1.7601 Service Pack 1</WinVersion>
<BrowserVersion>11.0.9600.18500</BrowserVersion>
</ErrorReport>


Thanks!

Post Reply