Python script to convert adblock list in a forbidden file for Polipo proxy (regex) -


on system running polipo proxy, adblock purposes. search on internet i've found many shell scripts convert adblock plus lists in polipo forbidden file format; of these scripts rely on sed, ruby or python. none of them able generate valid forbidden file: when restart polipo new generated forbidden file, in polipo's log file see message: "couldn't compile regex: unmatched ( or \("

the following python script attempt use, intended convert easylist file in polipo's forbidden file format:

#!/bin/python  # convert adblock ruleset polipo-forbidden format  if __name__ == "__main__":      import os     import sys     import re      if len(sys.argv) == 1:         sys.exit("usage: %s <adblockrules>" % os.path.basename(sys.argv[0]))      if not os.path.exists(sys.argv[1]):         sys.exit("the rules file (%s) doesn't exist" % sys.argv[1])      fhandle = file(sys.argv[1])     lines = fhandle.readlines()     fhandle.close()      dollar_re = re.compile("(.*?)\$.*")      line in lines:         if line:             if (line[0] in ("[", "!", "~", "#", "@") or                 line.startswith("/adverti") or                 "##" in line):                 continue             line = dollar_re.sub(r"\1", line) #           line = line.replace("|http://", "")             line = line.replace("|", "")             line = line.replace("||", "")             line = line.replace(".", r"\.")             line = line.replace("*", ".*")             line = line.replace("?", r"\?")             line = line.replace("^", r"[\/:\.=&\?\+\-\ ]+") #           line = line.replace("&", r"\&") #           line = line.replace("+", r"\+") #           line = line.replace("-", r"\-") #           line = line.replace(";", r"\;") #           line = line.replace("=", r"\=") #           line = line.replace("/", r"\/")             print(line.strip())     print("") 

but i've said, when actualize forbidden file, polipo claim "couldn't compile regex: unmatched ( or \("

this 1 forbidden file generated script http://wikisend.com/download/494664/forbidden.conf

as i've said, online, there many scripts 1 use, of them relies on sed, no 1 seems able generate valid forbidden file (polipo claims "couldn't compile regex"). not polipo's fault, because if made clean forbidden file web url inside, polipo block these connections.

can me , explain how modify/make proper script convert adblock lists in valid regex forbidden file polipo?

many thanks.

you can convert adblock rule python regex using https://github.com/scrapinghub/adblockparser library:

>>> adblockparser import adblockrule >>> rule = adblockrule("/ad/loaders/*") >>> print(rule.regex) /ad/loaders/.* 

i'm not sure polipo suports same regex format though; regexes can pretty hairy:

>>> print(adblockrule("||example.com$").regex) ^(?:[^:/?#]+:)?(?://(?:[^/?#]*\.)?)?example\.com 

also take care of rules options; may better remove them because semantics different.

hope helps.


Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -