Retry Elasticsearch shard allocation that was blocked due to too many subsequent allocation failures.
Display unassigned shards.
# curl -s "http://127.0.0.1:9200/_cat/shards?v" | awk 'NR==1 {print}; $4 == "UNASSIGNED" {print}' index shard prirep state docs store ip node books_2018-07-26 1 r UNASSIGNED bookmarks_index 4 r UNASSIGNED
Display unassigned shards and a reason for that.
$ curl http://127.0.0.1:9200/_cluster/state | jq '.routing_table.indices | .[].shards[][] | select(.state=="UNASSIGNED") | {index: .index, shard: .shard, primary: .primary, unassigned_info: .unassigned_info}'
{ "index": "books_2020-01-05", "shard": 1, "primary": false, "unassigned_info": { "reason": "ALLOCATION_FAILED", "at": "2020-01-06T09:06:38.931Z", "failed_attempts": 10, "delayed": false, "details": "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[books_2020-01-05][1]: obtaining shard lock timed out after 5000ms]; ", "allocation_status": "no_attempt" } } { "index": "bookmarks_index", "shard": 4, "primary": false, "unassigned_info": { "reason": "ALLOCATION_FAILED", "at": "2020-01-06T07:31:46.449Z", "failed_attempts": 5, "delayed": false, "details": "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[bookmarks_index][4]: obtaining shard lock timed out after 5000ms]; ", "allocation_status": "no_attempt" } }
Retry Elasticsearch shard allocation that was blocked due to too many subsequent allocation failures.
$ curl -X POST http://127.0.0.1:9200/_cluster/reroute?retry_failed=true
In most simple cases, you can increase the number of max entries.
$ curl --silent \ --request PUT \ --header 'Content-Type: application/json' \ http://127.0.0.1:9200/bookmarks_index/_settings?pretty=true \ --data-ascii \ '{ "index": { "allocation": { "max_retries": 15 } } }'
{ "acknowledged" : true }
Retry operation will allocate shards that have encountered temporary issues, but this is not always the case as the reason may be elsewhere.
$ curl http://127.0.0.1:9200/_cluster/state | jq '.routing_table.indices | .[].shards[][] | select(.state=="UNASSIGNED") | {index: .index, shard: .shard, primary: .primary, unassigned_info: .unassigned_info}'
{ "index": "books_2020-01-05", "shard": 1, "primary": false, "unassigned_info": { "reason": "ALLOCATION_FAILED", "at": "2020-01-06T09:06:38.931Z", "failed_attempts": 10, "delayed": false, "details": "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[books_2020-01-05][1]: obtaining shard lock timed out after 5000ms]; ", "allocation_status": "no_attempt" } }
In such cases, get a detailed explanation for shard allocations in this cluster.
$ curl http://127.0.0.1:9200/_cluster/allocation/explain?pretty=true
{ "index": "books_2020-01-05", "shard": 1, "primary": false, "current_state": "unassigned", "unassigned_info": { "reason": "ALLOCATION_FAILED", "at": "2020-01-06T09:06:38.931Z", "failed_allocation_attempts": 10, "details": "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[books_2020-01-05][1]: obtaining shard lock timed out after 5000ms]; ", "last_allocation_status": "no_attempt" }, "can_allocate": "no", "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes", "node_allocation_decisions": [ { "node_id": "6D0ib0mOSlueLqKDaGzlPw", "node_name": "elastic-node-b", "transport_address": "10.0.25.11:9300", "node_decision": "no", "deciders": [ { "decider": "max_retry", "decision": "NO", "explanation": "shard has exceeded the maximum number of retries [10] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-01-06T09:06:38.931Z], failed_attempts[10], delayed=false, details[failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[books_2020-01-05][1]: obtaining shard lock timed out after 5000ms]; ], allocation_status[no_attempt]]]" }, { "decider": "filter", "decision": "NO", "explanation": "node matches cluster setting [cluster.routing.allocation.exclude] filters [_ip:\"10.0.25.11 OR 10.0.25.12\"]" } ] }, { "node_id": "PMmTS-8WTLCd5S69DwUVTg", "node_name": "elastic-node-c", "transport_address": "10.0.25.12:9300", "node_decision": "no", "deciders": [ { "decider": "max_retry", "decision": "NO", "explanation": "shard has exceeded the maximum number of retries [10] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-01-06T09:06:38.931Z], failed_attempts[10], delayed=false, details[failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[books_2020-01-05][1]: obtaining shard lock timed out after 5000ms]; ], allocation_status[no_attempt]]]" }, { "decider": "filter", "decision": "NO", "explanation": "node matches cluster setting [cluster.routing.allocation.exclude] filters [_ip:\"10.0.25.11 OR 10.0.25.12\"]" } ] } ] }
In this specific case, the exclude filter prevents this shard from the allocation, so you have to take the appropriate steps to handle this situation.