Notch and shoot by the example¶

For this exercice I do have a preference for bpython, since it has the ctrl+S shortcut. Thus, you can save any «experiments» in a file.

It is pretty much a querying language in disguise.

Initially I did not planned to use it in a console or as a standalone module so the API is not satisfying.

Notch: choose your input¶

So let’s take an example::

>>> context=notch(
     'yahi/test/biggersample.log' ,'another_log',
     include="yahi/test/include.json",
     exclude='{ "ip" : "^(192\.168|10\.)"}',
     output_format="csv"
)
# include.json contains : { "_country"  : "GB","user" : "-" }

Here you parse two files, you want:

only GB hits,
non authed users,
to filter out private IP,
and you may want to use a CSV formater as an output format.

(Since no output file is set, output is redirected to stdout (errors are directed on stderr)).

Shoot: choose and aggregate your data¶

Shoot has 2 inputs:

a context (setup by notch);
an extractor;

An extractor is a function extracting and transforming datas, and since I love short circuits, that may contain some on the fly filtering :)

Total hits in a log matching the conditions from notch¶

Example::

>>> from archery import Hankyu as _dict
>>> shoot(
... context,
... lambda data: _dict({ 'total_lines' : 1 })
... )

Gross total hits in business hours and off business hour¶

Business hour being each weekday from monday to friday, between 8 am and 5 pm.

Example::

>>> from archery import Hankyu as _dict
>>> shoot(
... context,
... lambda data: _dict({ (
...        8 >= data["_datetime"].hour >= 17 and
...        data["_datetime"].weekday() < 5
...    ) and "business_hour" or "other_hour" :  1 })
... )

Hankyu is a dict supporting addition.

Grouping hits per country code¶

Example::

>>> from archery import Hankyu as _dict
>>> shoot(
... context,
... lambda data: _dict({ data["_country"]: 1 })
... )

ToxicSet is a set that maps add to union.

Distinct IP¶

Example::

>>> from archery import Hankyu as _dict
>>> from yahi import ToxicSet
>>> shoot(
... context,
... lambda data: _dict(distinct_ip = ToxicSet({ data["ip"]}))
... )

ToxicSet is a set that maps add to union.

Hits per day¶

example::

>>> date_formater= lambda dt :"%s-%s-%s" % ( dt.year, dt.month, dt.day)
>>> from archery import Hankyu as _dict
>>> shoot(
... context,
... lambda data: _dict({
...     date_formater(data["_datetime"]) : 1
... }))

Parallelizing request¶

You can now parallize all your requests by adding one key in the aggregator dict.

Just beware of the memory consumption.

Custom filtering¶

Sometimes regexp are not enough, imagine you have a function for checking if a user belongs to the employees, and you want to check all the workhaolic in your company reaching an authentified realm out of the working hours:

>>> context.data_filter= lambda data: (
...     is_employee(data["user"]) and not working_hours(data["_datetime"])
... )
>>> shoot( context, _dict(workaholicness = _dict({data["user"] : 1})))

Warning

data_filter will override any include/exclude rules given in notch

Notch and shoot by the example¶

Notch: choose your input¶

Shoot: choose and aggregate your data¶

Total hits in a log matching the conditions from notch¶

Gross total hits in business hours and off business hour¶

Grouping hits per country code¶

Distinct IP¶

Hits per day¶

Parallelizing request¶

Custom filtering¶

Table Of Contents

Previous topic

Next topic

This Page