xpath - Scrapy doesn't get all data -
im trying scrape page:
http://binpar.caicyt.gov.ar/cgi-bin/koha/opac-detail.pl?biblionumber=98723
with code:
def parse_web9(self, response): #conicet!! publication in response.css('div#wrap > div.main > div.container-fluid > div.row-fluid > div.span9 > div#catalogue_detail_biblio > div.record'): pubtitle = publication.xpath('./h1[@class="title"]/text()').extract_first() author = publication.xpath('./span[@class="results_summary publisher"]/span/span/a/text()').extract() isxn = publication.xpath('./span[@class="results_summary issn"]/span/text()').re(r'\d+-\d+') yield{ 'titulo_publicacion': pubtitle, 'anio_publicacion': none, 'isbn': isxn, 'nombre_autor': author, 'url_link' : none } but 'm getting title of publication, i'm not sure why.
cheers!
you should inner fields property attributes:
$ scrapy shell http://binpar.caicyt.gov.ar/cgi-bin/koha/opac-detail.pl?biblionumber=98723 >>> publication in response.css('div#wrap > div.main > div.container-fluid > div.row-fluid > div.span9 > div#catalogue_detail_biblio > div.record'): ... author = publication.css("span[property=contributor] span[property=name]::text").extract_first() ... title = publication.css("h1[property=name]::text").extract_first() ... issn = publication.css("span[property=issn]::text").extract_first() ... print(author, title, issn) ... (u'asociaci\xf3n filat\xe9lica de la rep\xfablica argentina', u'afra, bolet\xedn informativo. ', u'0001-1193.')
Comments
Post a Comment