Fix disconnecting Graylog node after upgrading from version 3.0 to 3.1.
I have upgraded Graylog from version 3.0 to 3.1, but immediately I have experienced issues with my master node. It was disconnecting every few seconds.
Notification condition [NO_MASTER] has been fixed.
message was logged in System messages and [NodePingThread] Did not find meta info of this node. Re-registering.
in Graylog log file on master node.
[milosz@graylog-server-5 ~]$ tail -f /var/log/graylog-server/server.log [...] 2020-04-12T12:20:22.777Z WARN [NodePingThread] Did not find meta info of this node. Re-registering. 2020-04-12T12:20:33.761Z WARN [NodePingThread] Did not find meta info of this node. Re-registering. 2020-04-12T12:22:13.708Z WARN [NodePingThread] Did not find meta info of this node. Re-registering. 2020-04-12T12:22:57.770Z WARN [NodePingThread] Did not find meta info of this node. Re-registering. [...]
The solution is to increase stale_master_timeout
from default 2 seconds
in server.conf
Graylog server configuration file.
# Time in milliseconds after which a detected stale master node is being rechecked on startup. #stale_master_timeout = 2000
Inspect current stale_master_timeout
value.
[milosz@graylog-server-5 ~]$ grep stale_master_timeout /etc/graylog/server/server.conf #stale_master_timeout = 2000
Increase stale_master_timeout
to 10 seconds
.
[milosz@graylog-server-5 ~]$ sudo sed -i -e "s/#stale_master_timeout = 2000/stale_master_timeout = 10000/" /etc/graylog/server/server.conf
Inspect current stale_master_timeout
value.
[milosz@graylog-server-5 ~]$ grep stale_master_timeout /etc/graylog/server/server.conf stale_master_timeout = 10000
Restart graylog-server
service.
[milosz@graylog-server-5 ~]$ systemctl restart graylog-server
Inspect graylog-server
service status.
[milosz@graylog-server-5 ~]$ systemctl status graylog-server ● graylog-server.service - Graylog server Loaded: loaded (/usr/lib/systemd/system/graylog-server.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2020-04-12 12:28:25 UTC; 6s ago Docs: http://docs.graylog.org/ Main PID: 7493 (graylog-server) CGroup: /system.slice/graylog-server.service ├─7493 /bin/sh /usr/share/graylog-server/bin/graylog-server └─7507 /usr/bin/java -Xms1g -Xmx1g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:-OmitStackTraceInFastThrow -XX:+UseParNewGC -jar -Dlog4j.configurationFile=file:///etc/graylog/server/... Apr 12 12:28:25 graylog-server-5 systemd[1]: Started Graylog server.
No more unexpected behaviour and warning messages.
[milosz@graylog-server-5 ~]$ tail -f /var/log/graylog-server/server.log 2020-04-12T12:29:48.540Z INFO [NetworkListener] Started listener bound to [192.0.2.11:9000] 2020-04-12T12:29:48.542Z INFO [HttpServer] [HttpServer] Started. 2020-04-12T12:29:48.542Z INFO [JerseyService] Started REST API at <192.0.2.11:9000> 2020-04-12T12:29:48.543Z INFO [ServiceManagerListener] Services are healthy 2020-04-12T12:29:48.544Z INFO [ServerBootstrap] Services started, startup times in ms: {InputSetupService [RUNNING]=3, EtagService [RUNNING]=73, ConfigurationEtagService [RUNNING]=73, OutputSetupService [RUNNING]=73, JobSchedulerService [RUNNING]=73, GracefulShutdownService [RUNNING]=94, JournalReader [RUNNING]=95, UrlWhitelistService [RUNNING]=127, KafkaJournal [RUNNING]=143, MongoDBProcessingStatusRecorderService [RUNNING]=147, PeriodicalsService [RUNNING]=147, BufferSynchronizerService [RUNNING]=160, LookupTableService [RUNNING]=184, StreamCacheService [RUNNING]=420, JerseyService [RUNNING]=27175} 2020-04-12T09:29:48.546Z INFO [InputSetupService] Triggering launching persisted inputs, node transitioned from Uninitialized [LB:DEAD] to Running [LB:ALIVE] 2020-04-12T12:29:48.548Z INFO [ServerBootstrap] Graylog server up and running. 2020-04-12T12:29:48.567Z INFO [InputStateListener] Input [Beats/5cfe0aeaec88901911304649] is now STARTING 2020-04-12T12:29:48.864Z INFO [InputStateListener] Input [Beats/5cfe0aeaec88901911304649] is now RUNNING [...]
Use ansible
or any other configuration management tool to apply update to other Graylog servers and perform service restart.