Python script to convert adblock list in a forbidden file for Polipo proxy (regex) -
on system running polipo proxy, adblock purposes. search on internet i've found many shell scripts convert adblock plus lists in polipo forbidden file format; of these scripts rely on sed, ruby or python. none of them able generate valid forbidden file: when restart polipo new generated forbidden file, in polipo's log file see message: "couldn't compile regex: unmatched ( or \(
"
the following python script attempt use, intended convert easylist file in polipo's forbidden file format:
#!/bin/python # convert adblock ruleset polipo-forbidden format if __name__ == "__main__": import os import sys import re if len(sys.argv) == 1: sys.exit("usage: %s <adblockrules>" % os.path.basename(sys.argv[0])) if not os.path.exists(sys.argv[1]): sys.exit("the rules file (%s) doesn't exist" % sys.argv[1]) fhandle = file(sys.argv[1]) lines = fhandle.readlines() fhandle.close() dollar_re = re.compile("(.*?)\$.*") line in lines: if line: if (line[0] in ("[", "!", "~", "#", "@") or line.startswith("/adverti") or "##" in line): continue line = dollar_re.sub(r"\1", line) # line = line.replace("|http://", "") line = line.replace("|", "") line = line.replace("||", "") line = line.replace(".", r"\.") line = line.replace("*", ".*") line = line.replace("?", r"\?") line = line.replace("^", r"[\/:\.=&\?\+\-\ ]+") # line = line.replace("&", r"\&") # line = line.replace("+", r"\+") # line = line.replace("-", r"\-") # line = line.replace(";", r"\;") # line = line.replace("=", r"\=") # line = line.replace("/", r"\/") print(line.strip()) print("")
but i've said, when actualize forbidden file, polipo claim "couldn't compile regex: unmatched ( or \(
"
this 1 forbidden file generated script http://wikisend.com/download/494664/forbidden.conf
as i've said, online, there many scripts 1 use, of them relies on sed, no 1 seems able generate valid forbidden file (polipo claims "couldn't compile regex"). not polipo's fault, because if made clean forbidden file web url inside, polipo block these connections.
can me , explain how modify/make proper script convert adblock lists in valid regex forbidden file polipo?
many thanks.
you can convert adblock rule python regex using https://github.com/scrapinghub/adblockparser library:
>>> adblockparser import adblockrule >>> rule = adblockrule("/ad/loaders/*") >>> print(rule.regex) /ad/loaders/.*
i'm not sure polipo suports same regex format though; regexes can pretty hairy:
>>> print(adblockrule("||example.com$").regex) ^(?:[^:/?#]+:)?(?://(?:[^/?#]*\.)?)?example\.com
also take care of rules options; may better remove them because semantics different.
hope helps.
Comments
Post a Comment