Use Filebeat to deal with many small log files.
For example, I have a directory that contains several thousand small log files. These are created by short-running cron jobs on daily basics.
Filebeat configuration
I do not want to harvest everything as most of these files are quite old. Besides, I do not want to put a burden on the operating system.
So, I will harvest logs from files that were modified at most 3 hours ago (which will decrease the number of log files to dozens of files) and remove these files’ state after 4 hours as there is no reason to keep track of these after that time. I will use at most five harvesters at a time. It will take time to perform the initial operation, but should be more than enough to keep up after that. I will also scan for new files every two minutes and close files after one minute of inactivity to keep the number of open files low.
filebeat.inputs: - type: log fields.application: application-cron # additional field tags: ["application-cron", "application-logs"] # define tags paths: # logs path - /var/log/application/cron/search_*.log ignore_older: 3h # ignore files that were modified 3 hours ago clean_inactive: 4h # removes the state of a file after 4h (it must be > ignore_older + scan_frequency) close_inactive: 1m # close file handle after 1 minute of inactivity (file can be reopened after scan_frequency) scan_frequency: 2m # check for new files every 2 minutes harvester_limit: 5 # limit the number of harvesters to 5 at a time setup.template.enabled: false logging.level: info logging.to_files: true logging.files: path: /var/log/filebeat name: filebeat keepfiles: 2 permissions: 0644 output.elasticsearch: hosts: ["192.0.2.110:9200"] index: "%{[fields.application]}-%{+yyyy.MM.dd}"
Filebeat behavior
It will randomly start harvesting logs from five files.
2020-04-19T13:25:39.961Z INFO log/harvester.go:228 Harvester started for file: /var/log/application/cron/search_2020-04-11.log 2020-04-19T13:25:40.161Z INFO log/harvester.go:228 Harvester started for file: /var/log/application/cron/search_2020-04-17.log 2020-04-19T13:25:41.332Z INFO log/harvester.go:228 Harvester started for file: /var/log/application/cron/search_2020-04-12.log 2020-04-19T13:25:41.447Z INFO log/harvester.go:228 Harvester started for file: /var/log/application/cron/search_2020-04-15.log 2020-04-19T13:25:41.649Z INFO log/harvester.go:228 Harvester started for file: /var/log/application/cron/search_2020-04-19.log
It will generate an error, but you can ignore it as the limit was introduced for a particular reason.
2020-04-19T13:25:39.924Z ERROR log/input.go:460 Harvester could not be started on new file: /var/log/application/cron/search_2020-04-14.log, Err: Harvester limit reached 2020-04-19T13:25:39.927Z ERROR log/input.go:460 Harvester could not be started on new file: /var/log/application/cron/search_2020-04-18.log, Err: Harvester limit reached
Harvesters are freed as files are closed after one minute of inactivity. Harvester will pick up any of these files if it gets modified during the three-hour time window.
2020-04-19T13:26:39.896Z INFO log/harvester.go:253 File is inactive: /var/log/application/cron/search_2020-04-11.log. Closing because close_inactive of 1m0s reached. 2020-04-19T13:26:40.991Z INFO log/harvester.go:253 File is inactive: /var/log/application/cron/search_2020-04-17.log. Closing because close_inactive of 1m0s reached. 2020-04-19T13:26:41.566Z INFO log/harvester.go:253 File is inactive: /var/log/application/cron/search_2020-04-12.log. Closing because close_inactive of 1m0s reached. 2020-04-19T13:26:41.748Z INFO log/harvester.go:253 File is inactive: /var/log/application/cron/search_2020-04-15.log. Closing because close_inactive of 1m0s reached. 2020-04-19T13:26:41.818Z INFO log/harvester.go:253 File is inactive: /var/log/application/cron/search_2020-04-19.log. Closing because close_inactive of 1m0s reached.
It will randomly start harvesting logs from the next five files, and so on.
2020-04-19T13:26:42.171Z INFO log/harvester.go:228 Harvester started for file: /var/log/application/cron/search_2020-04-14.log 2020-04-19T13:26:24.334Z INFO log/harvester.go:228 Harvester started for file: /var/log/application/cron/search_2020-04-06.log 2020-04-19T13:26:42.465Z INFO log/harvester.go:228 Harvester started for file: /var/log/application/cron/search_2020-04-18.log 2020-04-19T13:26:42.627Z INFO log/harvester.go:228 Harvester started for file: /var/log/application/cron/search_2020-04-09.log 2020-04-19T13:26:42.913Z INFO log/harvester.go:228 Harvester started for file: /var/log/application/cron/search_2020-04-16.log