ruby - How to scrap the text of <li> and children -


i trying scrape content of <li> tags , within them.

the html looks like:

 <div class="insurancesaccepted">    <h4>what insurance accept?*</h4>    <ul class="nobottommargin">       <li class="first"><span>aetna</span></li>       <li>          <a title="see accepted plans" class="insuranceplantoggle arrowup">avmed</a>          <ul style="display: block;" class="insuranceplanlist">             <li class="last first">open access</li>          </ul>       </li>       <li>          <a title="see accepted plans" class="insuranceplantoggle arrowup">blue cross blue shield</a>          <ul style="display: block;" class="insuranceplanlist">             <li class="last first">blue card ppo</li>          </ul>       </li>       <li>          <a title="see accepted plans" class="insuranceplantoggle arrowup">cigna</a>          <ul style="display: block;" class="insuranceplanlist">             <li class="first">cigna hmo</li>             <li>cigna ppo</li>             <li class="last">great west healthcare-cigna ppo</li>          </ul>       </li>       <li class="last">          <a title="see accepted plans" class="insuranceplantoggle arrowup">empire blue cross blue shield</a>          <ul style="display: block;" class="insuranceplanlist">             <li class="last first">empire blue cross blue shield hmo</li>          </ul>       </li>    </ul>   </div> 

the main issue when trying content from:

doc.css('.insurancesaccepted li').text.strip 

it displays <li> text @ once. want "avmed" , "open access" scrapped @ same time relationship parameter can insert mysql table reference.

the problem doc.css('.insurancesaccepted li') matches all nested list items, not direct descendants. match direct descendant 1 should use parent > child css rule. accomplish task need assemble result of iteration:

doc = nokogiri::html(html) result = doc.css('div.insurancesaccepted > ul > li').each |li|   chapter = li.css('span').text.strip   section = li.css('a').text.strip   subsections = li.css('ul > li').map(&:text).map(&:strip)    puts "#{chapter} ⇒ [ #{section} ⇒ [ #{subsections.join(', ')} ] ]"   puts '=' * 40 end 

resulted in:

# aetna ⇒ [  ⇒ [  ] ] # ======================================== #  ⇒ [ avmed ⇒ [ open access ] ] # ======================================== #  ⇒ [ blue cross blue shield ⇒ [ blue card ppo ] ] # ======================================== #  ⇒ [ cigna ⇒ [ cigna hmo, cigna ppo, great west healthcare-cigna ppo ] ] # ======================================== #  ⇒ [ empire blue cross blue shield ⇒ [ empire blue cross blue shield hmo ] ] # ======================================== 

Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -