Pattern recognition?
Pattern recognition?
Is there a pattern recognition capability in Helium Scraper - for example - to have Helium recognize an email address using the @ and . or recognize a web address such as xx.xx.xx or a phone number?
Re: Pattern recognition?
Hi,
We do have a premade project called Phones and Emails that you can import from File -> Online Premades. After you import this project you can create a kind that selects a container where the phone or email is, and then extract the JS_Email or JS_Phone property of this kind to get a phone or an email. You can look at how these JavaScript gatherers are written at Project -> JavaScript Gatherers after you've imported the project.
If you're familiar with regular expressions you can use your own patterns by copying, say, the JS_Email gatherer's code into a new gatherer and then replacing the pattern in the line that starts with "var regex = ".
We do have a premade project called Phones and Emails that you can import from File -> Online Premades. After you import this project you can create a kind that selects a container where the phone or email is, and then extract the JS_Email or JS_Phone property of this kind to get a phone or an email. You can look at how these JavaScript gatherers are written at Project -> JavaScript Gatherers after you've imported the project.
If you're familiar with regular expressions you can use your own patterns by copying, say, the JS_Email gatherer's code into a new gatherer and then replacing the pattern in the line that starts with "var regex = ".
Juan Soldi
The Helium Scraper Team
The Helium Scraper Team
Re: Pattern recognition?
Thanks. I will try that. Is there a way to extract multiple occurrences of something like an email address into separate fields? like - email1, email2, email3?
Re: Pattern recognition?
I've updated the Phones and Emails premade to let you easily do this. All you need to do is import it again (you'll need to delete your existing javascript gatherers if you already imported it into your current project), then open up the JS_Email gatherer (assuming you need emails) from Project -> JavaScript Gatherers, copy the code and paste it into a new gatherer (say, JS_Email2), and change the line of code:
for this:
This would extract the second email. Setting it to 2 would extract the third email and so on. You can then repeat this as many times as you need.
Code: Select all
var index = 0;Code: Select all
var index = 1;Juan Soldi
The Helium Scraper Team
The Helium Scraper Team