Notch and shoot by the example ****************************** For this exercice I do have a preference for *bpython*, since it has the ctrl+S shortcut. Thus, you can save any «experiments» in a file. It is pretty much a querying language in disguise. Initially I did not planned to use it in a console or as a standalone module so the API is not satisfying. Notch: choose your input ======================== So let's take an example:: >>> context=notch( 'yahi/test/biggersample.log' ,'another_log', include="yahi/test/include.json", exclude='{ "ip" : "^(192\.168|10\.)"}', output_format="csv" ) # include.json contains : { "_country" : "GB","user" : "-" } Here you parse two files, you want: - only GB hits, - non authed users, - to filter out private IP, - and you may want to use a CSV formater as an output format. (Since no output file is set, output is redirected to stdout (errors are directed on stderr)). Shoot: choose and aggregate your data ===================================== Shoot has 2 inputs: - a context (setup by notch); - an extractor; An extractor is a function extracting and transforming datas, and since I love short circuits, that may contain some on the fly filtering :) Total hits in a log matching the conditions from notch ------------------------------------------------------ Example:: >>> from archery import Hankyu as _dict >>> shoot( ... context, ... lambda data: _dict({ 'total_lines' : 1 }) ... ) Gross total hits in business hours and off business hour -------------------------------------------------------- Business hour being each weekday from monday to friday, between 8 am and 5 pm. Example:: >>> from archery import Hankyu as _dict >>> shoot( ... context, ... lambda data: _dict({ ( ... 8 >= data["_datetime"].hour >= 17 and ... data["_datetime"].weekday() < 5 ... ) and "business_hour" or "other_hour" : 1 }) ... ) Hankyu is a dict supporting addition. Grouping hits per country code ------------------------------ Example:: >>> from archery import Hankyu as _dict >>> shoot( ... context, ... lambda data: _dict({ data["_country"]: 1 }) ... ) ToxicSet is a set that maps add to union. Distinct IP ----------- Example:: >>> from archery import Hankyu as _dict >>> from yahi import ToxicSet >>> shoot( ... context, ... lambda data: _dict(distinct_ip = ToxicSet({ data["ip"]})) ... ) ToxicSet is a set that maps add to union. Hits per day ------------ example:: >>> date_formater= lambda dt :"%s-%s-%s" % ( dt.year, dt.month, dt.day) >>> from archery import Hankyu as _dict >>> shoot( ... context, ... lambda data: _dict({ ... date_formater(data["_datetime"]) : 1 ... })) Parallelizing request --------------------- You can now parallize all your requests by adding one key in the aggregator dict. Just beware of the memory consumption. Custom filtering ================ Sometimes regexp are not enough, imagine you have a function for checking if a user belongs to the employees, and you want to check all the workhaolic in your company reaching an authentified realm out of the working hours:: >>> context.data_filter= lambda data: ( ... is_employee(data["user"]) and not working_hours(data["_datetime"]) ... ) >>> shoot( context, _dict(workaholicness = _dict({data["user"] : 1}))) .. warning:: data_filter will override any include/exclude rules given in notch