Command line basic usage

help as usual is obtained this way:

./speed_shoot --help

it spits out:

usage: speed_shoot [-h] [-c FILE] [-g GEOIP] [-q SILENT] [-cs CACHE_SIZE]
                   [-d DIAGNOSE [DIAGNOSE ...]] [-in INCLUDE] [--off OFF]
                   [-x EXCLUDE] [-f OUTPUT_FORMAT] [-lf LOG_FORMAT]
                   [-lp LOG_PATTERN] [-lpn LOG_PATTERN_NAME]
                   [-dp DATE_PATTERN] [-o OUTPUT_FILE]
                   ...

Utility for parsing logs in the apache/nginx combined log format
and output a json of various aggregatted metrics of frequentation :
     * by Geolocation (quite fuzzy but still);
     * by user agent;
     * by hour;
     * by day;
     * by browser;
     * by status code
     * of url by ip;
     * by ip;
     * by url;
     * and bandwidth by ip;

Example :
=========

from stdin (useful for using zcat)
**********************************
zcat /var/log/apache.log.1.gz | parse_log.py  > dat1.json

excluding IPs 192.168/16 and user agent containing Mozilla
**********************************************************
use::
    parse_log -o dat2.json -x '{ "ip" : "^192.168", "agent": "Mozill" }'  /var/log/apache*.log 

Since archery is cool here is a tip for aggregating data::
    >>> from archery.barrack import bowyer
    >>> from archery.bow import Hankyu
    >>> from json import load, dumps
    >>> dumps(
            bowyer(Hankyu,load(file("dat1.json"))) + 
            bowyer(Hankyu,load(file("dat2.json")))
        )

Hence a usefull trick to merge your old stats with your new one
        

positional arguments:
  files

optional arguments:
  -h, --help            show this help message and exit
  -c FILE, --config FILE
                        specify a config file in json format for the command
                        line arguments any command line arguments will disable
                        values in the config
  -g GEOIP, --geoip GEOIP
                        specify a path to a geoip.dat file
  -q SILENT, --silent SILENT
                        quietly discard errors
  -cs CACHE_SIZE, --cache-size CACHE_SIZE
                        in conjonction with cp=fixed chooses dict size
  -d DIAGNOSE [DIAGNOSE ...], --diagnose DIAGNOSE [DIAGNOSE ...]
                        diagnose list of space separated arguments :
                        **rejected** : will print on STDERR rejected parsed
                        line, **match** : will print on stderr data filtered
                        out
  -in INCLUDE, --include INCLUDE
                        include from extracted data with a json (string or
                        filename) in the form { "field" : "pattern" }
  --off OFF             turn off plugins : geo_ip to skip geoip, user_agent to
                        turn httpagentparser off
  -x EXCLUDE, --exclude EXCLUDE
                        exclude from extracted data with a json (string or
                        filename) in the form { "field" : "pattern" }
  -f OUTPUT_FORMAT, --output-format OUTPUT_FORMAT
                        decide if output is in a specified formater amongst :
                        csv, json
  -lf LOG_FORMAT, --log-format LOG_FORMAT
                        log format amongst apache_log_combined, lighttpd
  -lp LOG_PATTERN, --log-pattern LOG_PATTERN
                        add a custom named regexp for parsing log lines
  -lpn LOG_PATTERN_NAME, --log-pattern-name LOG_PATTERN_NAME
                        the name with witch you want to register the pattern
  -dp DATE_PATTERN, --date-pattern DATE_PATTERN
                        add a custom date format, usefull if and only if using
                        a custom log_pattern and date pattern differs from
                        apache.
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        output file

A commented jumbo command line example

The following command line:

./speed_shoot -g data/GeoIP.dat -lf lighttpd -x '{ "datetime" : "^01/May", "uri" : "(.*munin|.*(png|jpg))$"}' -d rejected -d match -i '{ "_country" : "(DE|GB)"  }' *log  yahi/test/biggersample.log

does:

  • locate geoIP g file in data/GeoIP.dat;
  • set log format lf to lighttpd;
  • exclude (x) any match of either
    • an uri containing munin or ending by jpg or png
    • May the first;
  • include (i) all match containing
    • any IP which has been geoloclaized,
    • any non authentified user;
  • will diagnose (d) (thus print on stderr) any lines that would not match

the log format regexp or any lines rejected by -x and -i

for all the given log files.

Using a config file

Well, not impressive:

./speed_shoot -c config.json

If any option is specified in the config file it will override those setted in the command line.

Here is a sample of a config file:

{
    "exclude" : {
        "uri"  : ".*munin.*",
        "referer" : ".*(munin|php).*"
    },
    "include" : { "datetime" : "^04" },
    "silent" : "False",
    "files" : [ "yahi/test/biggersample.log" ]
}

Easter eggs or bad idea

The following options -x -i -c can either take a string or a filename, which makes debugging of badly formatted json a pain.