#############################
Various ways of misusing yahi
#############################
It's one feature I like of yahi you can combine
- :ref:`notch` and :ref:`shoot` programmatically
- abuse :ref:`yahi_all_in_one_maker`
to obtain versatile results
================
parsing auth.log
================
Imagine you are a sysadmin and your boss want a graph of all the request
you do, and you don't like using excel
.. literalinclude:: ../../examples/plot_attack
.. image:: ../../examples/attack.png
**********************************
Histograms or time series from CSV
**********************************
================================
CSV that can be parsed as regexp
================================
There are simple cases when CSV don't have strings embedded and are litteraly comma separated integers/floats.
In this case, CSV can be parsed as a regexp and it's all the more convenient when the CSV has no title.
Here is an example using the CSV coming from the CSV generated by `trollometre `_
A line is made off a timestamp followed by various (int) counters.
.. tip:: For the sake of ease of use I hacked the date_pattern format to accept "%s" as a timestamp (while it's normally
only valid strptime formater)
.. literalinclude:: test.py
or alternatively:
.. literalinclude:: test1a.py
Then, all that remains to do is ::
yahi_all_in_one_maker
firefox aio.html
You click on time series and can see the either the chronological time serie
.. image:: csv_1.png
Or the profile by hour
.. image:: csv_2.png
================================
Raw approach with csv.DictReader
================================
Let's take the use case where my job insurance sent me the data of all the 10000 jobless persons
in my vicinity consisting for each line of :
opaque id,civility,firstname, lastname, email,email of the counseler following the job less person
For this CSV, I have the title as the first line, and have strings that may countain ",", hence the regexp approach
is strongly ill advised.
What we want here is 2 histograms :
- the frequency of the firstname (that does not violates RGPD) and that I can share,
- how much each adviser is counseling.
Here is the code
.. literalinclude:: test2.py
Then, all that remains to do is ::
yahi_all_in_one_maker && firefox aio.html
And here we can see that each counseler is following on average ~250 jobless persons.
.. image:: csv_3.png
And the frequency of the firstname
.. image:: csv_4.png
Which correlated with the demographic of the firstname as included here below tends to prove
that the older you are the less likeky you are to be jobless.
I am not saying *ageism*, the data are doing it for me.
.. image:: csv_5.png
.. image:: csv_6.png
.. image:: csv_7.png
=============================
Graphing data from a database
=============================
Thanks to `trollometre `_ I also have
real life data coming from a bluesky bot that I may want to graph with
the following database structure::
CREATE TABLE posts (
uri TEXT PRIMARY KEY,
url TEXT NOT NULL,
post JSON NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
is_spam BOOL,
maybe_spam BOOL,
score INTEGER not NULL
);
The interesting columns here are :
- *created_at* wich is datetime at which a post is being put into base;
- *maybe_spam* which is the value of detection of spam (99% reliable);
- *score* which is the sum of likes, answers and repost a bluesky post got
for being reposted.
.. literalinclude:: test4.py
*****************************
Smaller granularity than hour
*****************************
Here I simply show case that *hour_* category can be used for sub hour slicing,
as long as you use something that is lexicographically sortable.
.. image:: sql1.png
****************
Simple histogram
****************
Ratio of spam vs ham detected in the database
.. image:: sql2.png
**********
Date serie
**********
With the cumulated score per day as a time serie you can notice that in France
the 10th and 17th of september 2025 had quite an echo.
.. image:: sql3.png
=====================================
Making connection graph from web logs
=====================================
A connection graph tells the journey of visitors between web pages.
Here I made a minimal web site with 5 web pages, *a*, *b* ... that can be clicked
to visit one another.
.. literalinclude:: test3.py
This example illustrates how to simply use the library of regexp of logs.
******
Result
******
After executing ::
python docs/source/test3.py | dot -T png > docs/source/dot.png
We have got the following result:
.. image:: dot.png