Page 1 of 1

PDF download behind multiple "javascript calls"

Posted: Thu Aug 08, 2013 2:45 pm
by PaterIvlia
I am trying to batch download several public records, freely available on a governamental site (academic doctoral research purposes). The sequence for retrieving each individual record is as follows. (1) Go to the site´s query page, input the query string and hit "search" (URL: http://www.stj.jus.br/SCON/). (2) Results are displayed in groups of ten, and the researcher must click on each link (example syntax: <a class="opcoes_jurisprudencia" href="javascript:inteiro_teor('/SCON/servlet/BuscaAcordaos?action=mostrar&num_registro=201102589950&dt_publicacao=02/08/2013')">Íntegra do<br>Acórdão</a>). (3) On a new instance of the browser (thus outside Helium Scraper) the ensuing page prompts the user to select the document extension (HTML or PDF, PDF being preselected). The page returns several links, to the document as a whole and to parts of it (example: https://ww2.stj.jus.br/revistaeletronic ... 02/08/2013).(4) By clicking on the link pertaining to the whole document (example: javascript:AbreDocumento('Abre_Documento.asp?sSeq=1246946&sReg=201102589950&sData=20130802')), the PDF file is served (example: https://ww2.stj.jus.br/revistaeletronic ... ormato=PDF).

Is Helium Scraper capable of automating the download of the PDF files pertaining to the search, in a way a doctoral researcher would be able to program? Any ideas on how?

Cheers.