parsing - How to parse Web pages that don't return results initially in Python? -
i want load list of images in this page in python. however, when opened page in browser (chrome or safari) , opened dev tools, inspector returned list of images <img class="grid-item--image">...
.
however, when tried parse in python, result seemed different. specifically, got list of images <img class="carousel--image"...>
, whereas soup.findall("img", "grid-item--image")
did return empty list. also, tried saving images using srcset
tag, of saved images not listed on web.
i think web page used sort of technics when rendering. how can parse web pages successfully?
i used beautifulsoup 4 on python 3.5. loaded page follows:
import requests bs4 import beautifulsoup html = requests.get(url).text soup = beautifulsoup(html, "html.parser", from_encoding="utf-8") return soup
you better use selenium
follows:
from bs4 import beautifulsoup selenium import webdriver browser = webdriver.firefox() browser.get("http://www.vogue.com/fashion-shows/fall-2016-menswear/fendi#collection") html_source = browser.page_source soup = beautifulsoup(html_source, "html.parser") item in soup.find_all("img", {"class":"grid-item--image"}): print(item.get('srcset'))
this display following kind of output:
http://assets.vogue.com/photos/569d37e434324c316bd70f04/master/w_195/_fen0016.jpg http://assets.vogue.com/photos/569d37e5d928983d20a78e4f/master/w_195/_fen0027.jpg http://assets.vogue.com/photos/569d37e834324c316bd70f0a/master/w_195/_fen0041.jpg http://assets.vogue.com/photos/569d37e334324c316bd70efe/master/w_195/_fen0049.jpg http://assets.vogue.com/photos/569d37e702e08d8957a11e32/master/w_195/_fen0059.jpg ... ... ... http://assets.vogue.com/photos/569d3836486d6d3e20ae9625/master/w_195/_fen0616.jpg http://assets.vogue.com/photos/569d381834324c316bd70f3b/master/w_195/_fen0634.jpg http://assets.vogue.com/photos/569d3829fa6d6c9057f91d2a/master/w_195/_fen0649.jpg http://assets.vogue.com/photos/569d382234324c316bd70f41/master/w_195/_fen0663.jpg http://assets.vogue.com/photos/569d382b7dcd2a8a57748d05/master/w_195/_fen0678.jpg http://assets.vogue.com/photos/569d381334324c316bd70f2f/master/w_195/_fen0690.jpg http://assets.vogue.com/photos/569d382dd928983d20a78eb1/master/w_195/_fen0846.jpg
this allows full rendering of page take place inside browser, , resulting html can obtained.
Comments
Post a Comment