Filtering web pages with Privoxy
Introduction
Privoxy is one of the web proxy applications
(alternatives: Proxomitron, BFilter,
etc.)that let you rewrite web pages on the fly. This can be used to remove ad
banners, avoid downloading useless images, etc.
Trimming Privoxy to the bare minimum
The only files you need if you want to remove all the default settings that
are shipped with the stock Privoxy are config.txt, user.action and user.filter.
Here's what config.txt should contain:
- confdir .
- logdir .
- actionsfile user.action
- filterfile user.filter
- logfile privoxy.log
- debug 1 # Log the destination for each
request Privoxy let through.
- #Change 127.0.0.1 to IP address to share Privoxy with other hosts
- listen-address 127.0.0.1:8118
- toggle 1
- enable-remote-toggle 0
- enable-remote-http-toggle 0
- enable-edit-actions 0
- enforce-blocks 0
- buffer-limit 4096
- accept-intercepted-requests 0
- split-large-forms 0
- keep-alive-timeout 5
- socket-timeout 300
- handle-as-empty-doc-returns-ok 1
Then, just create a filter in user.action, and the related regexes in user.filter.
Customizing Privoxy
Config.txt
The main configuration file is named (in Windows) config.txt, and tells Privoxy
which action files and filter files to load.
user.action
Privoxy provides about fifteen "actions", ie. tasks that it
can perform when the users hits such and such URL such as "{ allow-all-cookies
}". One of the actions is "filter{}", which can be used to refer
to regexes located in user.filter and will be ran everytime the user hits the
URL's listed for this action in user.action. Some Filters predefined in the
supplied default.filter include "js-annoyances" and "content-cookies".
There are 4 types of lines in action files: comments, actions, aliases, and
patterns (eg. "www.example.com/index.html").
user.filter
Note that you are free to choose the delimiter as you see fit. In addition
to the Perl options gimsx, the following nonstandard options are supported:
- 'U' turns the default to ungreedy matching. Add ? to quantifiers
to switch back to greedy.
- 'T' (trivial) prevents parsing for backreferences in the substitute.
Use if you want to include text like '$&' in your substitute without
quoting.
- 'D' (Dynamic) allows the use of variables. Supported variables are:
$host, $origin (the IP address the request came from), $path and $url. Note
that '$' is a bad choice as delimiter for dynamic filters as you might end
up with unintended variables if you use a variable name directly after the
delimiter. Variables will be resolved without escaping anything, therefore
you also have to be careful not to chose delimiters that appear in the replacement
text. For example '<' should be save, while '?' will sooner or later
cause conflicts with $url.
Most of the default stuff is triggered in default.action. I would try
starting with the line that says {{alias}}, and delete from there on down.
standard.action is automatically invoked by privoxy, and will set some
defaults (including some filters possibly). This can be edited
manually or via the cgi editor of Privoxy.
- The main configuration file is named config on Linux, Unix, BSD, OS/2,
and AmigaOS and config.txt on Windows. This is a required file.
- default.action (the main actions file) is used to define which "actions"
relating to banner-blocking, images, pop-ups, content modification, cookie
handling etc should be applied by default. It also defines many exceptions
(both positive and negative) from this default set of actions that enable
Privoxy to selectively eliminate the junk, and only the junk, on as many
websites as possible.
- Multiple actions files may be defined in config. These are processed
in the order they are defined. Local customizations and locally preferred
exceptions to the default policies as defined in default.action (which you
will most probably want to define sooner or later) are probably best applied
in user.action, where you can preserve them across upgrades. standard.action
is only for Privoxy's internal use.
- There is also a web based editor that can be accessed from http://config.privoxy.org/show-status
(Shortcut: http://p.p/show-status) for the various actions files.
- "Filter files" (the filter file) can be used to re-write the
raw page content, including viewable text as well as embedded HTML and JavaScript,
and whatever else lurks on any given web page. The filtering jobs are only
pre-defined here; whether to apply them or not is up to the actions files.
default.filter includes various filters made available for use by the developers.
Some are much more intrusive than others, and all should be used with caution.
You may define additional filter files in config as you can with actions
files. We suggest user.filter for any locally defined filters or customizations.
Config.txt
All options in the config file except for confdir and logdir are optional.
Watch out in the below description for what happens if you leave them unset.
confdir .
logdir .
listen-address 127.0.0.1:8120
actionsfile standard.action # Internal purpose, recommended
actionsfile default.action # Main actions file
actionsfile user.action # User customizations
filterfile default.filter
filterfile user.filter # User customizations
user.action
This is where user-specific actions should be put, as default.action may
be overwritten when updating Privoxy.
Q&A
How to apply a filter to all sites?
- {+filter{common-filter}}
- /
Resources