Command line basic usage¶
help as usual is obtained this way:
./speed_shoot --help
it spits out:
usage: speed_shoot [-h] [-c FILE] [-g GEOIP] [-q SILENT] [-cs CACHE_SIZE]
[-d DIAGNOSE [DIAGNOSE ...]] [-in INCLUDE] [--off OFF]
[-x EXCLUDE] [-f OUTPUT_FORMAT] [-lf LOG_FORMAT]
[-lp LOG_PATTERN] [-lpn LOG_PATTERN_NAME]
[-dp DATE_PATTERN] [-o OUTPUT_FILE]
...
Utility for parsing logs in the apache/nginx combined log format
and output a json of various aggregatted metrics of frequentation :
* by Geolocation (quite fuzzy but still);
* by user agent;
* by hour;
* by day;
* by browser;
* by status code
* of url by ip;
* by ip;
* by url;
* and bandwidth by ip;
Example :
=========
from stdin (useful for using zcat)
**********************************
zcat /var/log/apache.log.1.gz | parse_log.py > dat1.json
excluding IPs 192.168/16 and user agent containing Mozilla
**********************************************************
use::
parse_log -o dat2.json -x '{ "ip" : "^192.168", "agent": "Mozill" }' /var/log/apache*.log
Since archery is cool here is a tip for aggregating data::
>>> from archery.barrack import bowyer
>>> from archery.bow import Hankyu
>>> from json import load, dumps
>>> dumps(
bowyer(Hankyu,load(file("dat1.json"))) +
bowyer(Hankyu,load(file("dat2.json")))
)
Hence a usefull trick to merge your old stats with your new one
positional arguments:
files
optional arguments:
-h, --help show this help message and exit
-c FILE, --config FILE
specify a config file in json format for the command
line arguments any command line arguments will disable
values in the config
-g GEOIP, --geoip GEOIP
specify a path to a geoip.dat file
-q SILENT, --silent SILENT
quietly discard errors
-cs CACHE_SIZE, --cache-size CACHE_SIZE
in conjonction with cp=fixed chooses dict size
-d DIAGNOSE [DIAGNOSE ...], --diagnose DIAGNOSE [DIAGNOSE ...]
diagnose list of space separated arguments :
**rejected** : will print on STDERR rejected parsed
line, **match** : will print on stderr data filtered
out
-in INCLUDE, --include INCLUDE
include from extracted data with a json (string or
filename) in the form { "field" : "pattern" }
--off OFF turn off plugins : geo_ip to skip geoip, user_agent to
turn httpagentparser off
-x EXCLUDE, --exclude EXCLUDE
exclude from extracted data with a json (string or
filename) in the form { "field" : "pattern" }
-f OUTPUT_FORMAT, --output-format OUTPUT_FORMAT
decide if output is in a specified formater amongst :
csv, json
-lf LOG_FORMAT, --log-format LOG_FORMAT
log format amongst apache_log_combined, lighttpd
-lp LOG_PATTERN, --log-pattern LOG_PATTERN
add a custom named regexp for parsing log lines
-lpn LOG_PATTERN_NAME, --log-pattern-name LOG_PATTERN_NAME
the name with witch you want to register the pattern
-dp DATE_PATTERN, --date-pattern DATE_PATTERN
add a custom date format, usefull if and only if using
a custom log_pattern and date pattern differs from
apache.
-o OUTPUT_FILE, --output-file OUTPUT_FILE
output file
A commented jumbo command line example¶
The following command line:
./speed_shoot -g data/GeoIP.dat -lf lighttpd -x '{ "datetime" : "^01/May", "uri" : "(.*munin|.*(png|jpg))$"}' -d rejected -d match -i '{ "_country" : "(DE|GB)" }' *log yahi/test/biggersample.log
does:
- locate geoIP g file in data/GeoIP.dat;
- set log format lf to lighttpd;
- exclude (x) any match of either
- an uri containing munin or ending by jpg or png
- May the first;
- include (i) all match containing
- any IP which has been geoloclaized,
- any non authentified user;
- will diagnose (d) (thus print on stderr) any lines that would not match
the log format regexp or any lines rejected by -x and -i
for all the given log files.
Using a config file¶
Well, not impressive:
./speed_shoot -c config.json
If any option is specified in the config file it will override those setted in the command line.
Here is a sample of a config file:
{
"exclude" : {
"uri" : ".*munin.*",
"referer" : ".*(munin|php).*"
},
"include" : { "datetime" : "^04" },
"silent" : "False",
"files" : [ "yahi/test/biggersample.log" ]
}
Easter eggs or bad idea¶
The following options -x -i -c can either take a string or a filename, which makes debugging of badly formatted json a pain.