How to extract a just a section of text

Here we will be posting premade Helium Scraper projects and helpful stuff.
Post Reply
webmaster
Site Admin
Posts: 494
Joined: Mon Dec 06, 2010 8:39 am
Contact:

How to extract a just a section of text

Post by webmaster » Mon Feb 14, 2011 9:56 pm

The attached file serves as an example and as a template to extract text inside a larger text. This is necessary when the text we want to extract is not an individual element in the web page, which would force Helium Scraper to select not the particular section of text we want but the whole text. In this example is necessary that the text we want to extract is in between, before, or after another text that is always the same. For example, we might want to remove the brackets out of a text that is always in between brackets before extracting it.

As a quick example of how to use it, lets use this post as our souce. Open the file and navigate to this post (you can copy and paste the address on the address bar). Now on Selection Mode, select a couple of items from the following list (by pressing the CTRL key and clicking on them).
  • (2000)
  • (2001)
  • (2002)
  • (2003)
  • (2004)
  • (2005)
  • (2006)
  • (2007)
  • (2008)
Now create a kind called "items" by clicking on "Create kind from selection". Now go to Project -> JavaScript Gatheres, replace "Left side text goes here", and "Right side text goes here" for "(" and ")" respectively, Save and close. Now go to the Actions panel, right click on the "Repeat 1 times" action (expand the "Actions tree 1" if necessary) and select Add Child -> Extract. Select the Kind "items", press OK, and on the Property column, first row, change "InnerText" for "JS_TextInBetween" and press OK. Now press Play and see the results in the Table1 table.

Hope this have been useful. If you have any question, feel free to post it here and we will respond as soon as possible.
Attachments
TextInBetween.hsp
(290.4 KiB) Downloaded 641 times
Juan Soldi
The Helium Scraper Team

Post Reply