python - Detecing forms (and filling them in) with Scrapy -

May 15, 2015

i'm struggling find generic approach detecting form in html , submitting it. when page structure know in advance given page, of course have several options:

-- selenium/webdriver (by filling in fields , 'clicking' button)

-- determining form of post query manually, reconstructing urllib2 directly:

import urllib2 import urllib import lxml.html lh  url = "http://apply.ovoenergycareers.co.uk/vacancies/#results" params = urllib.urlencode([('field_36[]', 73), ('field_37[]', 76),    ('field_32[]', 82)]) response = urllib2.urlopen(url, params)

or requests:

import requests r = requests.post("http://apply.ovoenergycareers.co.uk/vacancies/#results", data = 'manager') r.text

but although forms involve post request, input fields , submit button, vary in implementation under hood. when number of pages scraped gets hundreds, it's not feasible define custom form-filling approach each.

my understanding scrapy's main added value ability follow links. presume include links arrived @ via form submission. can ability used build generic approach "following" form submission?

clarification: in case of form several dropdown menus, typically leaving these @ default value, , filling in search term input field. locating field , 'filling in' main challenge here.

link extractors cannot follow form submissions in scrapy. there mechanism called formrequest designed ease submitting forms.

note formrequests cannot handle forms when javascript involved in submission.

Search This Blog

Color

python - Detecing forms (and filling them in) with Scrapy -

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -