Unable to capture dynamic text

Questions and answers about anything related to Helium Scraper
Post Reply
jkok12
Posts: 3
Joined: Wed Jan 15, 2014 7:38 am

Unable to capture dynamic text

Post by jkok12 » Sun Feb 08, 2015 11:11 am

Hello, I am having problem extracting text from a series of pages.

http://kithara.to/ss.php?id=exuGGXkyAkf ... w0AfKa-19v

The text seems to be nested in a <div id="text"> like below, which is of course different for each page.
What can I do?

<div id="text">
<script language="JavaScript1.2" type="text/javascript">
<!--
function ll(s){var t='';for(var i=0;i<s.length;i+=3){t=t+s.substring(i+2,i+3)+s.substring(i,i+1)}location.href=t;}
//-->
</script><div class="ch">&nbsp;&nbsp;Gm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Dm7</div>
<div class="te">Αν&nbsp;μ'&nbsp;αγαπάς,&nbsp;μείνε&nbsp;κοντά&nbsp;μου</div>
<div class="ch">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Cm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Dm7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Gm</div>
<div class="te">όσα&nbsp;κι&nbsp;αν&nbsp;έρθουν,&nbsp;να&nbsp;είσαι&nbsp;πάντα&nbsp;εδώ</div>
<div class="te">Αν&nbsp;μ'&nbsp;αγαπάς,&nbsp;γίνε&nbsp;ο&nbsp;στίχος</div>
<div class="te">σ'&nbsp;ένα&nbsp;τραγούδι&nbsp;που&nbsp;γράψαμε&nbsp;κι&nbsp;δυο</div>
<div class="no">&nbsp;</div>
<div class="ch">Gm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Eb&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;D</div>
<div class="te">Έλα&nbsp;πάρε&nbsp;με&nbsp;καρδιά&nbsp;μου&nbsp;ως&nbsp;εκεί&nbsp;που&nbsp;πας</div>
<div class="ch">Cm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Gm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Eb&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Dm7&nbsp;&nbsp;&nbsp;Gm</div>
<div class="te">να&nbsp;σου&nbsp;στρώνω&nbsp;χίλια&nbsp;αστέρια&nbsp;για&nbsp;να&nbsp;περπατάς</div>
<div class="te">χάρισε&nbsp;μου&nbsp;ένα&nbsp;κόσμο&nbsp;μοναχά&nbsp;για&nbsp;μας</div>
<div class="te">και&nbsp;εγώ&nbsp;ότι&nbsp;θέλεις&nbsp;δίνω&nbsp;για&nbsp;να&nbsp;μ'&nbsp;αγαπάς</div>
<div class="no">&nbsp;</div>
<div class="te">Αν&nbsp;μ'&nbsp;αγαπάς,&nbsp;δως&nbsp;μου&nbsp;ένα&nbsp;δρόμο</div>
<div class="te">να&nbsp;τον&nbsp;βαδίζω&nbsp;με&nbsp;σένα&nbsp;αγκαλιά</div>
<div class="te">Αν&nbsp;μ'&nbsp;αγαπάς,&nbsp;γίνε&nbsp;ζωή&nbsp;μου</div>
<div class="te">όσο&nbsp;θα&nbsp;ρίχνει&nbsp;το&nbsp;σώμα&nbsp;μου&nbsp;σκιά</div>
<div class="no">&nbsp;</div>
<div class="te">Έλα&nbsp;πάρε&nbsp;με&nbsp;καρδιά&nbsp;μου&nbsp;ως&nbsp;εκεί&nbsp;που&nbsp;πας</div>
<div class="te">να&nbsp;σου&nbsp;στρώνω&nbsp;χίλια&nbsp;αστέρια&nbsp;για&nbsp;να&nbsp;περπατάς</div>
<div class="te">χάρισε&nbsp;μου&nbsp;ένα&nbsp;κόσμο&nbsp;μοναχά&nbsp;για&nbsp;μας</div>
<div class="te">και&nbsp;εγώ&nbsp;ότι&nbsp;θέλεις&nbsp;δίνω&nbsp;για&nbsp;να&nbsp;μ'&nbsp;αγαπάς...</div>
<div class="no">&nbsp;</div>
</div>

jkok12
Posts: 3
Joined: Wed Jan 15, 2014 7:38 am

Re: Unable to capture dynamic text

Post by jkok12 » Tue Feb 10, 2015 8:47 am

well I implemented a jscript gatherer, quite efficiently.
The problem now is that it extracts not only the desired text that lies between multiple DIVs but also the following:
<!--
function ll(s){var t='';for(var i=0;i<s.length;i+=3){t=t+s.substring(i+2,i+3)+s.substring(i,i+1)}location.href=t;}
//-->

Anyway how can I get rid of it, and what the hell is the meaning of it?

Post Reply