Restart Elasticsearch node.
Default behavior
Inspect cluster health.
$ curl -u elastic:************ https://192.168.8.153:9200/_cluster/health?pretty
{ "cluster_name" : "elasticsearch-cluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 5, "number_of_data_nodes" : 5, "active_primary_shards" : 25, "active_shards" : 51, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
Restart data node server and inspect cluster health.
$ curl -u elastic:************ https://192.168.8.153:9200/_cluster/health?pretty
{ "cluster_name" : "elasticsearch-cluster", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 4, "active_primary_shards" : 25, "active_shards" : 34, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 17, "delayed_unassigned_shards" : 17, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 66.66666666666666 }
By default allocation process will be delayed for one minute, after that time shard relocation will start. This can put a lot of strain on the cluster.
$ curl -u elastic:************ https://192.168.8.153:9200/_cluster/health?pretty
{ "cluster_name" : "elasticsearch-cluster", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 4, "active_primary_shards" : 25, "active_shards" : 50, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 1, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 98.0392156862745 }
Cluster health status will return to green after restarted node comes back up.
$ curl -u elastic:************ https://192.168.8.153:9200/_cluster/health?pretty
{ "cluster_name" : "elasticsearch-cluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 5, "number_of_data_nodes" : 5, "active_primary_shards" : 25, "active_shards" : 51, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
You can increase the delay time after node left to give it more time.
$ curl -X PUT -u elastic:************ "https://192.168.8.153:9200/_all/_settings?pretty" \ -H 'Content-Type: application/json' -d'{"settings": {"index.unassigned.node_left.delayed_timeout": "5m"}}'
{ "acknowledged" : true }
Inspect the delay time after node left on individual indices.
$ curl -s -u elastic:************ "https://192.168.8.153:9200/_all/_settings?include_defaults=true"| \ jq -c --raw-output 'keys[] as $k | "Name: \($k)\nindex.unassigned.node_left.delayed_timeout: \(.[$k].defaults.index.unassigned.node_left.delayed_timeout //"")\(.[$k].settings.index.unassigned.node_left.delayed_timeout //"")\n"'
Name: .ds-.monitoring-es-8-mb-2023.07.23-000001 index.unassigned.node_left.delayed_timeout: 5m Name: .ds-metricbeat-8.8.2-2023.07.23-000001 index.unassigned.node_left.delayed_timeout: 5m Name: .fleet-file-data-agent-000001 index.unassigned.node_left.delayed_timeout: 5m Name: .fleet-files-agent-000001 index.unassigned.node_left.delayed_timeout: 1m Name: .internal.alerts-observability.apm.alerts-default-000001 index.unassigned.node_left.delayed_timeout: 5m Name: .internal.alerts-observability.logs.alerts-default-000001 index.unassigned.node_left.delayed_timeout: 5m Name: .internal.alerts-observability.metrics.alerts-default-000001 index.unassigned.node_left.delayed_timeout: 5m Name: .internal.alerts-observability.slo.alerts-default-000001 index.unassigned.node_left.delayed_timeout: 5m Name: .internal.alerts-observability.uptime.alerts-default-000001 index.unassigned.node_left.delayed_timeout: 5m Name: .internal.alerts-security.alerts-default-000001 index.unassigned.node_left.delayed_timeout: 5m
Temporarily alter the default behavior
Inspect cluster health.
$ curl -u elastic:************ https://192.168.8.153:9200/_cluster/health?pretty
{ "cluster_name" : "elasticsearch-cluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 5, "number_of_data_nodes" : 5, "active_primary_shards" : 25, "active_shards" : 51, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
List shards for specific index.
$ curl -u elastic:************ https://192.168.8.153:9200/_cat/shards/.apm-agent-configuration
.apm-agent-configuration 0 p STARTED 0 247b 192.168.8.159 elastic-2 .apm-agent-configuration 0 r STARTED 0 247b 192.168.8.165 elastic-4
Ensure that shard allocation is enabled only for primary shards.
$ curl -X PUT -u elastic:************ "https://192.168.8.153:9200/_cluster/settings?pretty" \ -H 'Content-Type: application/json' -d'{"persistent": {"cluster.routing.allocation.enable": "primaries"}}'
{ "acknowledged" : true, "persistent" : { "cluster" : { "routing" : { "allocation" : { "enable" : "primaries" } } } }, "transient" : { } }
You can always check persistent settings.
$ curl -s -u elastic:************ "https://192.168.8.153:9200/_cluster/settings?include_defaults=true"| jq .persistent
{ "cluster": { "routing": { "allocation": { "enable": "primaries" } } } }
Flush indices, so the data in the transaction log is also permanently stored in the Lucene index.
$ curl -X POST -u elastic:************ "https://192.168.8.153:9200/_flush?pretty"
{ "_shards" : { "total" : 20, "successful" : 20, "failed" : 0 } }
Restart data node, elastic-2
in this case.
The new primary shards will be allocated immediately.
$ curl -u elastic:************ https://192.168.8.153:9200/_cat/shards/.apm-agent-configuration
.apm-agent-configuration 0 p STARTED 0 247b 192.168.8.165 elastic-4 .apm-agent-configuration 0 r UNASSIGNED
Other shards will wait till the allocation limitation is lifted
$ curl -u elastic:************ https://192.168.8.153:9200/_cluster/health?pretty
{ "cluster_name" : "elasticsearch-cluster", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 4, "active_primary_shards" : 25, "active_shards" : 34, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 17, "delayed_unassigned_shards" : 17, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 66.66666666666666 }
Allow shard allocation for all kinds of shards after restarted data node is up.
$ curl -X PUT -u elastic:************ "https://192.168.8.153:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'{"persistent": {"cluster.routing.allocation.enable":null}}'
{ "acknowledged" : true, "persistent" : { }, "transient" : { } }
Inspect cluster status.
$ curl -u elastic:************ https://192.168.8.153:9200/_cluster/health?pretty
{ "cluster_name" : "elasticsearch-cluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 5, "number_of_data_nodes" : 5, "active_primary_shards" : 25, "active_shards" : 51, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
List shards for specific index.
$ curl -u elastic:************ https://192.168.8.153:9200/_cat/shards/.apm-agent-configuration
.apm-agent-configuration 0 r STARTED 0 247b 192.168.8.159 elastic-2 .apm-agent-configuration 0 p STARTED 0 247b 192.168.8.165 elastic-4
Additional notes
Cluster-level shard allocation settings