xpath - Scrapy doesn't get all data -


im trying scrape page:

http://binpar.caicyt.gov.ar/cgi-bin/koha/opac-detail.pl?biblionumber=98723

with code:

def parse_web9(self, response): #conicet!!      publication in response.css('div#wrap > div.main > div.container-fluid > div.row-fluid > div.span9 > div#catalogue_detail_biblio > div.record'):          pubtitle = publication.xpath('./h1[@class="title"]/text()').extract_first()          author = publication.xpath('./span[@class="results_summary publisher"]/span/span/a/text()').extract()          isxn = publication.xpath('./span[@class="results_summary issn"]/span/text()').re(r'\d+-\d+')          yield{             'titulo_publicacion': pubtitle,             'anio_publicacion': none,             'isbn': isxn,             'nombre_autor': author,             'url_link' : none         } 

but 'm getting title of publication, i'm not sure why.

cheers!

you should inner fields property attributes:

$ scrapy shell http://binpar.caicyt.gov.ar/cgi-bin/koha/opac-detail.pl?biblionumber=98723 >>> publication in response.css('div#wrap > div.main > div.container-fluid > div.row-fluid > div.span9 > div#catalogue_detail_biblio > div.record'): ...     author = publication.css("span[property=contributor] span[property=name]::text").extract_first() ...     title = publication.css("h1[property=name]::text").extract_first() ...     issn = publication.css("span[property=issn]::text").extract_first() ...     print(author, title, issn) ...  (u'asociaci\xf3n filat\xe9lica de la rep\xfablica argentina', u'afra, bolet\xedn informativo. ', u'0001-1193.') 

Comments

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -