Recently, I have parsed logs of several applications to generate custom weekly reports. It was a fascinating exercise.
I created two shell scripts to illustrate the whole idea by parsing HAProxy log files.
Display top 404 pages
The shell script used to display the top 404 pages for the last three weeks for the web frontend and blog, statistics backends.
#!/bin/bash # Display weekly top 404 requests for n previous weeks # number of previous weeks number_of_weeks="3" # directory to keep aggregated data aggregated_logs_directory="/tmp/aggregated" # application name application="haproxy" # application log files log_filename="/var/log/haproxy.log*" # date format to search for: [15/Mar/2018: file_log_date_format="\[%d/%b/%Y:" # file types to search for: [a-Z0-9]\+\.\(php\|html\|txt\|png\) file_types="php html txt png" # frontends to filter limit_frontends="^web$" #limit_frontends=".*" # backends to filter limit_backends="blog\|statistics" #limit_backends=".*" # Print current date echo "Current date: $(date)" echo # create aggregated log directory if it is missing if [ ! -d "${aggregated_logs_directory}" ]; then echo "Creating aggregated log directory \"${aggregated_logs_directory}\"" mkdir "${aggregated_logs_directory}" else echo "Using aggregated log directory \"${aggregated_logs_directory}\"" fi # loop over previous weeks for n_weeks_ago in $(seq 1 ${number_of_weeks}); do # define pretty date from/to loop_pretty_date_from=$(date +%d/%b/%Y --date "last monday - ${n_weeks_ago} week + 0 day") loop_pretty_date_to=$(date +%d/%b/%Y --date "last monday - ${n_weeks_ago} week + 6 day") # define machine date from/to loop_txt_date_from=$(date +%Y%m%d --date "last monday - ${n_weeks_ago} week + 0 day") loop_txt_date_to=$(date +%Y%m%d --date "last monday - ${n_weeks_ago} week + 6 day") # define log filename aggregated_log_filename="${application}_${loop_txt_date_from}-${loop_txt_date_to}.log" # aggregate data if [ ! -f "${aggregated_logs_directory}/${aggregated_log_filename}" ]; then echo "Creating ${aggregated_log_filename} log file to store data from ${loop_pretty_date_from} to ${loop_pretty_date_to}" for weekday in $(seq 0 6); do zgrep $(date +${file_log_date_format} --date "last monday - ${n_weeks_ago} weeks + ${weekday} days") ${log_filename} | tee -a ${aggregated_logs_directory}/${aggregated_log_filename} >/dev/null done else echo "Using existing ${aggregated_log_filename} log file that contain data from ${loop_pretty_date_from} to ${loop_pretty_date_to}" fi # parse data if [ -f "${aggregated_logs_directory}/${aggregated_log_filename}" ]; then echo "Parsing data from ${loop_pretty_date_from} to ${loop_pretty_date_to} (${n_weeks_ago} week/weeks ago)" # filter frontends frontends=$(awk '{if ($8 !~ ":" && $8 !~ "~" && !seen_arr[$8]++) print $8}' ${aggregated_logs_directory}/${aggregated_log_filename} | grep "${limit_frontends}") # filter backends and highlight nosrv backends=$(awk '{split($9,backend,"/");if ($8 !~ ":" && !seen_arr[backend[1]]++) {if (backend[2] !~ "NOSRV" ) print backend[1]; else print "NOSRV";}}' ${aggregated_logs_directory}/${aggregated_log_filename} | grep "${limit_backends}" | sort) # parse each log file for top 404 pages for frontend in ${frontends}; do echo "${frontend} frontend" for backend in ${backends}; do echo "->${backend}" if [ "${backend}" = "NOSRV" ]; then not_found_list=$(grep "${frontend}\([~]\)\? ${frontend}/" ${aggregated_logs_directory}/${aggregated_log_filename} | awk '$11 == "404" {query=substr($0,index($0,$18)); print query}' | sort | uniq -c | sort -hr | head) else not_found_list=$(grep "${frontend}\([~]\)\? ${backend}/" ${aggregated_logs_directory}/${aggregated_log_filename} | awk '$11 == "404" {query=substr($0,index($0,$18)); print query}' | sort | uniq -c | sort -hr | head) fi if [ -z "$not_found_list" ]; then echo " --- none ---" else echo "$not_found_list" fi done done echo fi done
Sample output.
Current date: Fri Mar 16 19:06:41 CET 2018 Creating aggregated log directory "/tmp/aggregated" Creating haproxy_20180305-20180311.log log file to store data from 05/Mar/2018 to 11/Mar/2018 Parsing data from 05/Mar/2018 to 11/Mar/2018 (1 week/weeks ago) web frontend ->web-blog-production 892 "GET /wp-login.php HTTP/1.1" 596 "GET /apple-touch-icon.png HTTP/1.1" 560 "GET /apple-touch-icon-precomposed.png HTTP/1.1" 470 "GET /xfavicon.png.pagespeed.ic.ITJELUENXe.png HTTP/1.1" 74 "GET /assets/images/blog_sleeplessbeastie_eu_image.png HTTP/1.1" 72 "GET /tags/index.php HTTP/1.0" 72 "GET /index.php HTTP/1.0" 66 "GET /2013/01/21/how-to-automate-mouse-and-keyboard/index.php HTTP/1.0" 66 "GET /01/21/how-to-automate-mouse-and-keyboard/index.php HTTP/1.0" 40 "GET /favicon.png.pagespeed.ce.I9KrGowxSl.png HTTP/1.1" ->web-statistics-production --- none --- Creating haproxy_20180226-20180304.log log file to store data from 26/Feb/2018 to 04/Mar/2018 Parsing data from 26/Feb/2018 to 04/Mar/2018 (2 week/weeks ago) web frontend ->web-blog-production 1012 "GET /wp-login.php HTTP/1.1" 568 "GET /apple-touch-icon.png HTTP/1.1" 554 "GET /apple-touch-icon-precomposed.png HTTP/1.1" 502 "GET /xfavicon.png.pagespeed.ic.ITJELUENXe.png HTTP/1.1" 72 "GET /tags/index.php HTTP/1.0" 72 "GET /index.php HTTP/1.0" 72 "GET /assets/images/blog_sleeplessbeastie_eu_image.png HTTP/1.1" 44 "GET /favicon.png.pagespeed.ce.I9KrGowxSl.png HTTP/1.1" 26 "HEAD /apple-touch-icon-precomposed.png HTTP/1.1" 26 "HEAD /apple-touch-icon.png HTTP/1.1" ->web-statistics-production --- none --- Creating haproxy_20180219-20180225.log log file to store data from 19/Feb/2018 to 25/Feb/2018 Parsing data from 19/Feb/2018 to 25/Feb/2018 (3 week/weeks ago) web frontend ->web-blog-production 1068 "GET /wp-login.php HTTP/1.1" 846 "GET /apple-touch-icon.png HTTP/1.1" 816 "GET /apple-touch-icon-precomposed.png HTTP/1.1" 134 "GET /xfavicon.png.pagespeed.ic.ITJELUENXe.png HTTP/1.1" 66 "GET /tags/index.php HTTP/1.0" 66 "GET /index.php HTTP/1.0" 44 "GET /2013/01/21/how-to-automate-mouse-and-keyboard/index.php HTTP/1.0" 42 "GET /01/21/how-to-automate-mouse-and-keyboard/index.php HTTP/1.0" 40 "GET /assets/images/blog_sleeplessbeastie_eu_image.png HTTP/1.1" 32 "HEAD /apple-touch-icon-precomposed.png HTTP/1.1" ->web-statistics-production 4 "HEAD /https://statistics.sleeplessbeastie.eu/ HTTP/1.1" 4 "GET /rules.abe HTTP/1.1"
Display specified file types occurrence
The shell script used to display weekly statistics for specified file types occurrence.
#!/bin/bash # Display weekly statistics for several file types for n previous weeks # diplay mode # 1 - pretty # 2 - regular display_mode="1" # number of previous weeks number_of_weeks="3" # directory to keep aggregated data aggregated_logs_directory="/tmp/aggregated" # application name application="haproxy" # application log files log_filename="/var/log/haproxy.log*" # date format to search for: [15/Mar/2018: file_log_date_format="\[%d/%b/%Y:" # file types to search for: [a-Z0-9]\+\.\(php\|html\|txt\|png\) file_types="php html txt png" # frontends to filter limit_frontends="^web$" #limit_frontends=".*" # backends to filter limit_backends="NOSRV\|blog\|statistics" #limit_backends=".*" # print current date echo "Current date: $(date)" echo # create aggregated log directory if it is missing if [ ! -d "${aggregated_logs_directory}" ]; then if [ "${display_mode}" -eq "1" ]; then echo "Creating aggregated log directory \"${aggregated_logs_directory}\"" fi mkdir "${aggregated_logs_directory}" else if [ "${display_mode}" -eq "1" ]; then echo "Using aggregated log directory \"${aggregated_logs_directory}\"" fi fi # loop over previous weeks for n_weeks_ago in $(seq 1 ${number_of_weeks}); do # define pretty date from/to loop_pretty_date_from=$(date +%d/%b/%Y --date "last monday - ${n_weeks_ago} week + 0 day") loop_pretty_date_to=$(date +%d/%b/%Y --date "last monday - ${n_weeks_ago} week + 6 day") # define machine date from/to loop_txt_date_from=$(date +%Y%m%d --date "last monday - ${n_weeks_ago} week + 0 day") loop_txt_date_to=$(date +%Y%m%d --date "last monday - ${n_weeks_ago} week + 6 day") # define log filename aggregated_log_filename="${application}_${loop_txt_date_from}-${loop_txt_date_to}.log" # aggregate data if [ ! -f "${aggregated_logs_directory}/${aggregated_log_filename}" ]; then if [ "${display_mode}" -eq "1" ]; then echo "Creating ${aggregated_log_filename} log file to store data from ${loop_pretty_date_from} to ${loop_pretty_date_to}" fi for weekday in $(seq 0 6); do zgrep $(date +${file_log_date_format} --date "last monday - ${n_weeks_ago} weeks + ${weekday} days") ${log_filename} | tee -a ${aggregated_logs_directory}/${aggregated_log_filename} >/dev/null done else if [ "${display_mode}" -eq "1" ]; then echo "Using existing ${aggregated_log_filename} log file that contain data from ${loop_pretty_date_from} to ${loop_pretty_date_to}" fi fi # parse data if [ -f "${aggregated_logs_directory}/${aggregated_log_filename}" ]; then if [ "${display_mode}" -eq "1" ]; then echo "Parsing data from ${loop_pretty_date_from} to ${loop_pretty_date_to} (${n_weeks_ago} week/weeks ago)" fi # filter frontends frontends=$(awk '{if ($8 !~ ":" && $8 !~ "~" && !seen_arr[$8]++) print $8}' ${aggregated_logs_directory}/${aggregated_log_filename} | grep "${limit_frontends}") # filter backends #backends=$(awk '{split($9,backend,"/");if ($8 !~ ":" && !seen_arr[backend[1]]++) print backend[1]}' ${aggregated_logs_directory}/${aggregated_log_filename} | grep "${limit_backends}") # highlight nosrv backends=$(awk '{split($9,backend,"/");if ($8 !~ ":" && !seen_arr[backend[1]]++) {if (backend[2] !~ "NOSRV" ) print backend[1]; else print "NOSRV";}}' ${aggregated_logs_directory}/${aggregated_log_filename} | grep "${limit_backends}" | sort) # parse each file type/element for frontend in ${frontends}; do if [ "${display_mode}" -eq "1" ]; then echo "${frontend} frontend" fi for backend in ${backends}; do if [ "${display_mode}" -eq "1" ]; then echo "->${backend}" fi for element in ${file_types}; do if [ "${backend}" = "NOSRV" ]; then count=$(grep "${frontend}\([~]\)\? ${frontend}/<NOSRV>" ${aggregated_logs_directory}/${aggregated_log_filename} | grep -c "[a-Z0-9]\+\.${element}") else # grep for frontend and frontend~ (ssl) count=$(grep "${frontend}\([~]\)\? ${backend}/" ${aggregated_logs_directory}/${aggregated_log_filename} | grep -c "[a-Z0-9]\+\.${element}") fi if [ "${display_mode}" -eq "2" ]; then echo "${loop_pretty_date_from} - ${loop_pretty_date_to} (${n_weeks_ago} week/weeks ago) ${frontend}->${backend}: ${element} file found ${count} times" elif [ "${display_mode}" -eq "1" ]; then if [ "${count}" -gt "0" ]; then echo " ${element} file found ${count} times" fi fi done done done echo fi done
Sample output.
Current date: Fri Mar 16 19:27:59 CET 2018 Creating aggregated log directory "/tmp/aggregated" Creating haproxy_20180305-20180311.log log file to store data from 05/Mar/2018 to 11/Mar/2018 Parsing data from 05/Mar/2018 to 11/Mar/2018 (1 week/weeks ago) web frontend ->NOSRV php file found 2030 times html file found 8272 times txt file found 2622 times png file found 1044 times ->web-blog-production php file found 1184 times html file found 206 times txt file found 2602 times png file found 160770 times ->web-statistics-production php file found 360992 times html file found 608 times txt file found 50 times png file found 836 times Creating haproxy_20180226-20180304.log log file to store data from 26/Feb/2018 to 04/Mar/2018 Parsing data from 26/Feb/2018 to 04/Mar/2018 (2 week/weeks ago) web frontend ->NOSRV php file found 1822 times html file found 9682 times txt file found 2722 times png file found 950 times ->web-blog-production php file found 1276 times html file found 216 times txt file found 2604 times png file found 159288 times ->web-statistics-production php file found 269462 times html file found 822 times txt file found 52 times png file found 1108 times Creating haproxy_20180219-20180225.log log file to store data from 19/Feb/2018 to 25/Feb/2018 Parsing data from 19/Feb/2018 to 25/Feb/2018 (3 week/weeks ago) web frontend ->NOSRV php file found 2028 times html file found 10712 times txt file found 2956 times png file found 796 times ->web-blog-production php file found 1376 times html file found 360 times txt file found 2816 times png file found 166808 times ->web-statistics-production php file found 352380 times html file found 1218 times txt file found 98 times png file found 1278 times
These shell scripts are here to merely illustrate the whole idea of generating weekly reports from existing log files, so you can improve them further.