ruby - How to scrap the text of <li> and children -
i trying scrape content of <li>
tags , within them.
the html looks like:
<div class="insurancesaccepted"> <h4>what insurance accept?*</h4> <ul class="nobottommargin"> <li class="first"><span>aetna</span></li> <li> <a title="see accepted plans" class="insuranceplantoggle arrowup">avmed</a> <ul style="display: block;" class="insuranceplanlist"> <li class="last first">open access</li> </ul> </li> <li> <a title="see accepted plans" class="insuranceplantoggle arrowup">blue cross blue shield</a> <ul style="display: block;" class="insuranceplanlist"> <li class="last first">blue card ppo</li> </ul> </li> <li> <a title="see accepted plans" class="insuranceplantoggle arrowup">cigna</a> <ul style="display: block;" class="insuranceplanlist"> <li class="first">cigna hmo</li> <li>cigna ppo</li> <li class="last">great west healthcare-cigna ppo</li> </ul> </li> <li class="last"> <a title="see accepted plans" class="insuranceplantoggle arrowup">empire blue cross blue shield</a> <ul style="display: block;" class="insuranceplanlist"> <li class="last first">empire blue cross blue shield hmo</li> </ul> </li> </ul> </div>
the main issue when trying content from:
doc.css('.insurancesaccepted li').text.strip
it displays <li>
text @ once. want "avmed" , "open access" scrapped @ same time relationship parameter can insert mysql table reference.
the problem doc.css('.insurancesaccepted li')
matches all nested list items, not direct descendants. match direct descendant 1 should use parent > child
css rule. accomplish task need assemble result of iteration:
doc = nokogiri::html(html) result = doc.css('div.insurancesaccepted > ul > li').each |li| chapter = li.css('span').text.strip section = li.css('a').text.strip subsections = li.css('ul > li').map(&:text).map(&:strip) puts "#{chapter} ⇒ [ #{section} ⇒ [ #{subsections.join(', ')} ] ]" puts '=' * 40 end
resulted in:
# aetna ⇒ [ ⇒ [ ] ] # ======================================== # ⇒ [ avmed ⇒ [ open access ] ] # ======================================== # ⇒ [ blue cross blue shield ⇒ [ blue card ppo ] ] # ======================================== # ⇒ [ cigna ⇒ [ cigna hmo, cigna ppo, great west healthcare-cigna ppo ] ] # ======================================== # ⇒ [ empire blue cross blue shield ⇒ [ empire blue cross blue shield hmo ] ] # ========================================
Comments
Post a Comment