parsing - How to parse Web pages that don't return results initially in Python? -


i want load list of images in this page in python. however, when opened page in browser (chrome or safari) , opened dev tools, inspector returned list of images <img class="grid-item--image">....

however, when tried parse in python, result seemed different. specifically, got list of images <img class="carousel--image"...>, whereas soup.findall("img", "grid-item--image") did return empty list. also, tried saving images using srcset tag, of saved images not listed on web.

i think web page used sort of technics when rendering. how can parse web pages successfully?

i used beautifulsoup 4 on python 3.5. loaded page follows:

import requests bs4 import beautifulsoup html = requests.get(url).text soup = beautifulsoup(html, "html.parser", from_encoding="utf-8")  return soup 

you better use selenium follows:

from bs4 import beautifulsoup selenium import webdriver  browser = webdriver.firefox() browser.get("http://www.vogue.com/fashion-shows/fall-2016-menswear/fendi#collection") html_source = browser.page_source soup = beautifulsoup(html_source, "html.parser")  item in soup.find_all("img", {"class":"grid-item--image"}):     print(item.get('srcset')) 

this display following kind of output:

http://assets.vogue.com/photos/569d37e434324c316bd70f04/master/w_195/_fen0016.jpg http://assets.vogue.com/photos/569d37e5d928983d20a78e4f/master/w_195/_fen0027.jpg http://assets.vogue.com/photos/569d37e834324c316bd70f0a/master/w_195/_fen0041.jpg http://assets.vogue.com/photos/569d37e334324c316bd70efe/master/w_195/_fen0049.jpg http://assets.vogue.com/photos/569d37e702e08d8957a11e32/master/w_195/_fen0059.jpg ... ... ... http://assets.vogue.com/photos/569d3836486d6d3e20ae9625/master/w_195/_fen0616.jpg http://assets.vogue.com/photos/569d381834324c316bd70f3b/master/w_195/_fen0634.jpg http://assets.vogue.com/photos/569d3829fa6d6c9057f91d2a/master/w_195/_fen0649.jpg http://assets.vogue.com/photos/569d382234324c316bd70f41/master/w_195/_fen0663.jpg http://assets.vogue.com/photos/569d382b7dcd2a8a57748d05/master/w_195/_fen0678.jpg http://assets.vogue.com/photos/569d381334324c316bd70f2f/master/w_195/_fen0690.jpg http://assets.vogue.com/photos/569d382dd928983d20a78eb1/master/w_195/_fen0846.jpg 

this allows full rendering of page take place inside browser, , resulting html can obtained.


Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -