Page 1 of 1

Auto-Distrubute URLs among processes

Posted: Fri Jun 08, 2012 4:17 am
by webmaster
This project takes a list of URL's and lets you perform any extract action in each of them. Is particularly fit to work with a long list because, on one hand, distributes the URL's among multiple processes, and in the other, keeps track of which URL's have been already extracted, such that if the extraction stops for any reason, the next time the project is run, already extracted URL's will not be visited again.

To use it follow these steps:
  • Paste your URL's into the URL column of the _URLs table and save the table.
  • Export and connect to the the database using the Export Database -> Export and Connect command in the database panel.
  • Place your extraction logic inside the Extract actions tree (this tree will run at each URL in your list).
  • Save your project.
  • Run the Run actions tree.
Extra options:
  • To forget which URL's have been extracted (and extract them again), run the Reset Dones actions tree, which clears the Dones data table that keeps track of extracted URL's.
  • To extract the current URL's ID, extract the VAR_URL_ID property from the BODY kind in your Extract action.
  • To change the amount of simultaneous processes, expand the Run actions tree, double click the Start Processes action and change the Max. Simultaneous Processes property.
  • To change the amount of URL's to be extracted per process, change the value of the groupSize variable in the Execute JS (Make Groups) inside the Run actions tree.

Re: Auto-Distrubute URLs among processes

Posted: Mon Oct 17, 2016 6:24 pm
by leonardocunha
Hello!
I've tried running this but Helium Scraper says something like "System cannot find specified file".
Also tried the Online Premade "Multi-process Navigate URLs" but with same above message.

Please, can you tell me what could be wrong? I use Win 7 64bits.
(Windows 7 Ultimate 64bit, Brazilian portguese)

Error details:
<?xml version="1.0" encoding="utf-16"?>
<ErrorReport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<InnerError>
<Message>O sistema não pode encontrar o arquivo especificado</Message>
<TypeName>System.ComponentModel.Win32Exception</TypeName>
</InnerError>
<Message>O sistema não pode encontrar o arquivo especificado</Message>
<TypeName>Player.Actions.StopException</TypeName>
<StackTrace> em Player.Actions.Executor.‭‏‌‫​‏​‌‏‌‏‎‭‮‪‏‌‍‍‍‍‪‍‬‮()</StackTrace>
<TimeStamp>2016-10-17T16:22:43.4659935-02:00</TimeStamp>
<Version>2.4.3.2</Version>
<WorkingSet>246300672</WorkingSet>
<Is64Bit>true</Is64Bit>
<WinVersion>Microsoft Windows NT 6.1.7601 Service Pack 1</WinVersion>
<BrowserVersion>11.0.9600.18500</BrowserVersion>
</ErrorReport>


Thanks!