############################# Various ways of misusing yahi ############################# It's one feature I like of yahi you can combine - :ref:`notch` and :ref:`shoot` programmatically - abuse :ref:`yahi_all_in_one_maker` to obtain versatile results ================ parsing auth.log ================ Imagine you are a sysadmin and your boss want a graph of all the request you do, and you don't like using excel .. literalinclude:: ../../examples/plot_attack .. image:: ../../examples/attack.png ********************************** Histograms or time series from CSV ********************************** ================================ CSV that can be parsed as regexp ================================ There are simple cases when CSV don't have strings embedded and are litteraly comma separated integers/floats. In this case, CSV can be parsed as a regexp and it's all the more convenient when the CSV has no title. Here is an example using the CSV coming from the CSV generated by `trollometre `_ A line is made off a timestamp followed by various (int) counters. .. tip:: For the sake of ease of use I hacked the date_pattern format to accept "%s" as a timestamp (while it's normally only valid strptime formater) .. literalinclude:: test.py or alternatively: .. literalinclude:: test1a.py Then, all that remains to do is :: yahi_all_in_one_maker firefox aio.html You click on time series and can see the either the chronological time serie .. image:: csv_1.png Or the profile by hour .. image:: csv_2.png ================================ Raw approach with csv.DictReader ================================ Let's take the use case where my job insurance sent me the data of all the 10000 jobless persons in my vicinity consisting for each line of : opaque id,civility,firstname, lastname, email,email of the counseler following the job less person For this CSV, I have the title as the first line, and have strings that may countain ",", hence the regexp approach is strongly ill advised. What we want here is 2 histograms : - the frequency of the firstname (that does not violates RGPD) and that I can share, - how much each adviser is counseling. Here is the code .. literalinclude:: test2.py Then, all that remains to do is :: yahi_all_in_one_maker && firefox aio.html And here we can see that each counseler is following on average ~250 jobless persons. .. image:: csv_3.png And the frequency of the firstname .. image:: csv_4.png Which correlated with the demographic of the firstname as included here below tends to prove that the older you are the less likeky you are to be jobless. I am not saying *ageism*, the data are doing it for me. .. image:: csv_5.png .. image:: csv_6.png .. image:: csv_7.png ============================= Graphing data from a database ============================= Thanks to `trollometre `_ I also have real life data coming from a bluesky bot that I may want to graph with the following database structure:: CREATE TABLE posts ( uri TEXT PRIMARY KEY, url TEXT NOT NULL, post JSON NOT NULL, created_at TIMESTAMP DEFAULT NOW(), is_spam BOOL, maybe_spam BOOL, score INTEGER not NULL ); The interesting columns here are : - *created_at* wich is datetime at which a post is being put into base; - *maybe_spam* which is the value of detection of spam (99% reliable); - *score* which is the sum of likes, answers and repost a bluesky post got for being reposted. .. literalinclude:: test4.py ***************************** Smaller granularity than hour ***************************** Here I simply show case that *hour_* category can be used for sub hour slicing, as long as you use something that is lexicographically sortable. .. image:: sql1.png **************** Simple histogram **************** Ratio of spam vs ham detected in the database .. image:: sql2.png ********** Date serie ********** With the cumulated score per day as a time serie you can notice that in France the 10th and 17th of september 2025 had quite an echo. .. image:: sql3.png ===================================== Making connection graph from web logs ===================================== A connection graph tells the journey of visitors between web pages. Here I made a minimal web site with 5 web pages, *a*, *b* ... that can be clicked to visit one another. .. literalinclude:: test3.py This example illustrates how to simply use the library of regexp of logs. ****** Result ****** After executing :: python docs/source/test3.py | dot -T png > docs/source/dot.png We have got the following result: .. image:: dot.png