Download PDF

Questions & Answers about Helium Scraper 3
Post Reply
durlecs
Posts: 11
Joined: Sat Sep 28, 2019 2:11 am

Download PDF

Post by durlecs » Wed Oct 02, 2019 2:35 pm

Wondering how to download a PDF file when there is no direct link. I can view the PDF and there is a download link at the top of the page, but I cannot create a Selector for the download button. When I click on it, there is nothing selected. I have included a screenshot of what I am talking about and the "Inspect Element" result from Firefox. I can't give you a direct link to the PDF and I would have to give you the whole process of how to navigate to the PDF which includes entering information and navigating a Captcha if you want to see the actual page. How do I download this PDF?
Attachments
Screen Shot 2019-10-02 at 10.44.45 AM.png
Screen Shot 2019-10-02 at 10.44.45 AM.png (37.63 KiB) Viewed 15753 times
Screen Shot 2019-10-02 at 10.37.37 AM.png
Screen Shot 2019-10-02 at 10.37.37 AM.png (146.8 KiB) Viewed 15754 times

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Download PDF

Post by webmaster » Thu Oct 03, 2019 8:04 am

Is the URL on the address bar the PDF address? If so, you can get it with Gather.URL without having to select anything.

Or you should be able to select the button like:

Code: Select all

SelectBy.Css
   ·  "button#download"
And then you can click it with a Browser.Click action, although I'm not sure if that will work.

There must be a way to get the actual URL, perhaps without even having to load the PDF on the browser, but I'd have to look at the actual page.
Juan Soldi
The Helium Scraper Team

durlecs
Posts: 11
Joined: Sat Sep 28, 2019 2:11 am

Re: Download PDF

Post by durlecs » Thu Oct 03, 2019 12:49 pm

I am not able to download the PDF or find the link to the actual PDF where I can download it. Could you please look at the page and see if there is anything that can be done to download the PDF? Here is a sample case number and how to get to the PDF.

1. Navigate to https://myeclerk.myorangeclerk.com/Cases/Search

2. Enter the case number, 2019-CA-011283-O, into the Case Number field, solve the Captcha, then click Search.

3. On the search results page, click on the case number.

4. In Docket Events, click on "Motion for Mediation" and it will take you to the PDF file that I would like to download.

I was able to download this in Helium 2 but am unable to do it in 3. Any help is greatly appreciated. Thanks

Edit: Just tried in Helium 2 and can no longer do it. When I follow the link it takes me to a login page rather than downloading the PDF. I do not have a login account for this site but shouldn't need it for public access and PDF downloading. When normally browsing, I can download the PDF just fine.

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Download PDF

Post by webmaster » Thu Oct 03, 2019 4:15 pm

I just tested this and it worked. Just download the URL behind the link:

Code: Select all

Select.MotionForMediation
Gather.Link
as url
extract
   pdf
      Browser.Download
         ·  url
Make sure the MotionForMediation selector selects the actual link. Also, I tested this by manually following your steps except number 4 and then running it on the main browser. I also see that the file doesn't have an extension, but you can add this using Browser.DownloadAs instead.

I think you won't be able to first extract all the URLs and then download them since it may need cookies that are created when you visit the actual page, but going to the page (manually or automatically) and then doing what the code above does seems to work.
Juan Soldi
The Helium Scraper Team

durlecs
Posts: 11
Joined: Sat Sep 28, 2019 2:11 am

Re: Download PDF

Post by durlecs » Fri Oct 04, 2019 3:23 pm

That works to download the PDF. Thank you. Now I am trying to use Browser.DownloadAs to create a unique name for the file and add the extension. I do an extract on the case number with Output Result to get a variable called caseNum which has the case number as a string. I would like to name the PDF file as (caseNum + ".pdf") but I am not getting it to work. What would be the correct way to do this?

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Download PDF

Post by webmaster » Fri Oct 04, 2019 4:10 pm

Helium uses prefix notation for operators:

Code: Select all

Browser.DownloadAs
   ·  url
   ·  +
         ·  caseNum
         ·  ".pdf"
   ·  false
Juan Soldi
The Helium Scraper Team

durlecs
Posts: 11
Joined: Sat Sep 28, 2019 2:11 am

Re: Download PDF

Post by durlecs » Sat Oct 05, 2019 2:28 am

Works great now. Thanks

Stevensek
Posts: 1
Joined: Mon May 11, 2020 4:54 pm
Location: Sweden
Contact:

Answer

Post by Stevensek » Tue May 12, 2020 5:46 pm

Thanks for the guide :)
Stupid is as stupid does

Post Reply