python - Detecing forms (and filling them in) with Scrapy -
i'm struggling find generic approach detecting form in html , submitting it. when page structure know in advance given page, of course have several options:
-- selenium/webdriver (by filling in fields , 'clicking' button)
-- determining form of post query manually, reconstructing urllib2 directly:
import urllib2 import urllib import lxml.html lh url = "http://apply.ovoenergycareers.co.uk/vacancies/#results" params = urllib.urlencode([('field_36[]', 73), ('field_37[]', 76), ('field_32[]', 82)]) response = urllib2.urlopen(url, params)
or requests:
import requests r = requests.post("http://apply.ovoenergycareers.co.uk/vacancies/#results", data = 'manager') r.text
but although forms involve post request, input fields , submit button, vary in implementation under hood. when number of pages scraped gets hundreds, it's not feasible define custom form-filling approach each.
my understanding scrapy's main added value ability follow links. presume include links arrived @ via form submission. can ability used build generic approach "following" form submission?
clarification: in case of form several dropdown menus, typically leaving these @ default value, , filling in search term input field. locating field , 'filling in' main challenge here.
link extractors cannot follow form submissions in scrapy. there mechanism called formrequest
designed ease submitting forms.
note formrequest
s cannot handle forms when javascript involved in submission.
Comments
Post a Comment