{"id":544,"date":"2019-12-11T05:27:34","date_gmt":"2019-12-11T05:27:34","guid":{"rendered":"https:\/\/www.heliumscraper.com\/blog\/?p=544"},"modified":"2021-07-31T10:07:43","modified_gmt":"2021-07-31T10:07:43","slug":"scraping-from-linkedin","status":"publish","type":"post","link":"https:\/\/www.heliumscraper.com\/blog\/scraping-from-linkedin\/","title":{"rendered":"Scraping from LinkedIn"},"content":{"rendered":"\n<p>We&#8217;ve created a ready-made template that can be used to extract people and company information from LinkedIn. An account is required for the extraction to work. Check with support to see how many profiles\/companies you&#8217;re allowed to view per day, otherwise, your account could get banned.<\/p>\n\n\n\n<h4>Getting started<\/h4>\n\n\n\n<p>To get started, <a href=\"https:\/\/www.heliumscraper.com\/forum\/download\/file.php?id=432\">download the template<\/a> and place it in an empty folder, and then open it with Helium Scraper. This project can extract both people and companies. From each of these, it can extract the top-level information or the details. The top-level information is the information directly available on search results, such as these: <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"837\" height=\"475\" src=\"https:\/\/www.heliumscraper.com\/blog\/wp-content\/uploads\/2019\/12\/searchresults.png\" alt=\"\" class=\"wp-image-545\" srcset=\"https:\/\/www.heliumscraper.com\/blog\/wp-content\/uploads\/2019\/12\/searchresults.png 837w, https:\/\/www.heliumscraper.com\/blog\/wp-content\/uploads\/2019\/12\/searchresults-300x170.png 300w, https:\/\/www.heliumscraper.com\/blog\/wp-content\/uploads\/2019\/12\/searchresults-768x436.png 768w\" sizes=\"(max-width: 837px) 100vw, 837px\" \/><\/figure>\n\n\n\n<p>The detail information is the one available on specific people and company pages, such as this: <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" width=\"837\" height=\"482\" src=\"https:\/\/www.heliumscraper.com\/blog\/wp-content\/uploads\/2019\/12\/billgates.png\" alt=\"\" class=\"wp-image-546\" srcset=\"https:\/\/www.heliumscraper.com\/blog\/wp-content\/uploads\/2019\/12\/billgates.png 837w, https:\/\/www.heliumscraper.com\/blog\/wp-content\/uploads\/2019\/12\/billgates-300x173.png 300w, https:\/\/www.heliumscraper.com\/blog\/wp-content\/uploads\/2019\/12\/billgates-768x442.png 768w\" sizes=\"(max-width: 837px) 100vw, 837px\" \/><\/figure>\n\n\n\n<p>After loading the project file, open up the&nbsp;<strong>Settings<\/strong>&nbsp;global to configure the project.<\/p>\n\n\n\n<h4><strong>Top-level extraction settings<\/strong><\/h4>\n\n\n\n<ul><li><strong>pageTurnDelayAvg<\/strong>:  The average delay in seconds between page turns when extracting from search results. The actual delay will be a random number between N &#8211; (N\/4) and N + (N\/4), so if, for instance, you enter 40 seconds, the actual delay will be between 30 and 50. <\/li><li><strong>maximumPages<\/strong>:  The maximum number of pages to visit when extracting from search results. <\/li><\/ul>\n\n\n\n<h4><strong>Details extraction settings<\/strong><\/h4>\n\n\n\n<ul><li><strong>maximumProfilesPerExtraction<\/strong>:  The maximum number of profile\/company pages to visit when running a details extraction. <\/li><li><strong>profileVisitDelayAvg<\/strong>:  The average delay in seconds between profile\/company page visits. The actual delay is calculated the same way as&nbsp;<strong>pageTurnDelayAvg<\/strong>. <\/li><\/ul>\n\n\n\n<h4>Top-level extraction<\/h4>\n\n\n\n<p>After logging into LinkedIn, run a filtered search on the main browser, such as the one on the screenshot on top, and run either the&nbsp;<strong>ProfileLinks<\/strong>&nbsp;or&nbsp;<strong>CompanyLinks<\/strong>&nbsp;global, depending on the kind of search. Note that the extraction will run on the main browser, so it&#8217;s best to avoid interacting with it while the extraction runs. After completing the extraction, the corresponding&nbsp;<strong>ProfileLinks<\/strong>&nbsp;or&nbsp;<strong>CompanyLinks<\/strong>&nbsp;table will be populated. Both tables have a column called&nbsp;<strong>url<\/strong>, which will be used on the next step. Alternatively, if you already have a list of profile or company URLs, you can manually paste them into this column and skip the top level extraction. <\/p>\n\n\n\n<h4>Details extraction<\/h4>\n\n\n\n<p>To extract details, run&nbsp;<strong>ProfileDetails<\/strong>&nbsp;or&nbsp;<strong>CompanyDetails<\/strong>. These globals will take the URLs from the corresponding links table and extract up to the number specified by the&nbsp;<strong>maximumProfilesPerExtraction<\/strong>&nbsp;setting. The next time a details extraction runs, already extracted profiles or companies won&#8217;t be visited again, as long as the details table is not cleared. This allows you to run daily extractions on smaller chunks determined by the&nbsp;<strong>maximumProfilesPerExtraction<\/strong>&nbsp;setting. In general, the only globals that you will be dealing with are the ones that don&#8217;t start with an <em>@<\/em> symbol.<\/p>\n\n\n\n<h4>Final Remarks<\/h4>\n\n\n\n<p>Since the&nbsp;<strong>ProfileDetails<\/strong>&nbsp;table contains many tables, you can right-click the table set and select&nbsp;<strong>Join Tables<\/strong>&nbsp;to see all tables as one. Note that if you do this you&#8217;ll see many rows per user. Alternatively, use the query at&nbsp;<strong>Data Flow<\/strong> \u2192 <strong>Queries<\/strong> \u2192 <strong>Profile Contact<\/strong>, which will show one row per profile and contact details will be organized into separate columns.  <\/p>\n\n\n\n<p>You&#8217;ll likely want to use proxies when extracting from LinkedIn and make sure they work with LinkedIn. Luminati offers a type of proxy called <a href=\"https:\/\/luminati.io\/faq?affiliate=ref_5dd2d878c7669177abc66180&amp;cam=L_haimt#overview-gip\">gIP<\/a>, which can be specifically configured to work with LinkedIn. Then you can <a href=\"https:\/\/luminati.io\/integration\/heliumscraper?affiliate=ref_5dd2d878c7669177abc66180&amp;cam=L_haimt\">follow these instructions<\/a> to set them up in Helium Scraper.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We&#8217;ve created a ready-made template that can be used to extract people and company information from LinkedIn. An account is required for the extraction to work. Check with support to see how many profiles\/companies you&#8217;re allowed to view per day, otherwise, your account could get banned. Getting started To get started, download the template and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":570,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/www.heliumscraper.com\/blog\/wp-json\/wp\/v2\/posts\/544"}],"collection":[{"href":"https:\/\/www.heliumscraper.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.heliumscraper.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.heliumscraper.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.heliumscraper.com\/blog\/wp-json\/wp\/v2\/comments?post=544"}],"version-history":[{"count":27,"href":"https:\/\/www.heliumscraper.com\/blog\/wp-json\/wp\/v2\/posts\/544\/revisions"}],"predecessor-version":[{"id":620,"href":"https:\/\/www.heliumscraper.com\/blog\/wp-json\/wp\/v2\/posts\/544\/revisions\/620"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.heliumscraper.com\/blog\/wp-json\/wp\/v2\/media\/570"}],"wp:attachment":[{"href":"https:\/\/www.heliumscraper.com\/blog\/wp-json\/wp\/v2\/media?parent=544"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.heliumscraper.com\/blog\/wp-json\/wp\/v2\/categories?post=544"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.heliumscraper.com\/blog\/wp-json\/wp\/v2\/tags?post=544"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}