Get the cluster health status
curl -XGET http://localhost:9200/_cluster/health?pretty
Get shards details
curl -XGET http://localhost:9200/_cat/shards
Force Reroute identified shard to a specific node
Usually required to force assing unassigned_shards to a node
for shard in $(curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $2}'); do
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"allocate" : {
"index" : "index_taken_from_step_2",
"shard" : $shard,
"node" : "datanode",
"allow_primary" : true
}
}
]
}'
sleep 5
done
All Shards on a node stucks in initialization statue
Restart the box could be the only solution.
Troubleshooting Unavailable Shard Exception
a. Index folder is deleted
Recreate the index
b. Node hosting primary shard left cluster
Reboot the node
Fixing UNASSIGNED shard error
Lookup for faulty shard
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNEDfor ES5+ :
curl -XGET localhost:9200/_cluster/allocation/explain?pretty
If it looks like the unassigned shards belong to an index you thought you deleted already,
or an outdated index that you don’t need anymore,
then you can delete the index to restore your cluster status to green:
curl -XDELETE 'localhost:9200/index_name/'
Possible Reason or unassigned shard error
Shard allocation is purposefully delayed
Too many shards, not enough nodes
You need to re-enable shard allocation
Shard data no longer exists in the cluster
Low disk watermark
Multiple Elasticsearch versions
Shard allocation is purposefully delayed
When a node leaves the cluster, the master node temporarily delays shard reallocation to avoid needlessly wasting resources on rebalancing shards, in the event the original node is able to recover within a certain period of time (one minute, by default). If this is the case, your logs should look something like this:
[TIMESTAMP][INFO][cluster.routing] [MASTER NODE NAME] delaying allocation for [54] unassigned shards, next check in [1m]
You can dynamically modify the delay period like so:
curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d
'{
"settings": {
"index.unassigned.node_left.delayed_timeout": "30s"
}
}'
Replacing <INDEX_NAME> with _all will update the threshold for all indices in your cluster.
After the delay period is over, you should start seeing the master assigning those shards. If not, keep reading to explore solutions to other potential causes.
Too many shards, not enough nodes
Node should be > replica + 1
Add new nodes or decrease replica number
You need to re-enable shard allocation
curl -XPUT 'localhost:9200/_cluster/settings' -d
'{ "transient":
{ "cluster.routing.allocation.enable" : "all"
}
}'
Low disk watermark
query : curl -s 'localhost:9200/_cat/allocation?v'
Increase allocation
curl -XPUT 'localhost:9200/_cluster/settings' -d
'{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "90%"
}
}'
Change to "persistent" if need to persist data between restart
Shard data no longer exists in the cluster
In this case, primary shard 0 of the constant-updates index is unassigned. It may have been created on a node without any replicas (a technique used to speed up the initial indexing process), and the node left the cluster before the data could be replicated. The master detects the shard in its global cluster state file, but can’t locate the shard’s data in the cluster.
Another possibility is that a node may have encountered an issue while rebooting. Normally, when a node resumes its connection to the cluster, it relays information about its on-disk shards to the master, which then transitions those shards from “unassigned” to “assigned/started”. When this process fails for some reason (e.g. the node’s storage has been damaged in some way), the shards may remain unassigned.
In this scenario, you have to decide how to proceed: try to get the original node to recover and rejoin the cluster (and do not force allocate the primary shard), or force allocate the shard using the Reroute API and reindex the missing data using the original data source, or from a backup.
If you decide to allocate an unassigned primary shard, make sure to add the "allow_primary": "true" flag to the request:
curl -XPOST 'localhost:9200/_cluster/reroute' -d
'{ "commands" :
[ { "allocate" :
{ "index" : "constant-updates", "shard" : 0, "node": "<NODE_NAME>", "allow_primary": "true" }
}]
}'
Without the "allow_primary": "true" flag, we would have encountered the following error:
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[NODE_NAME][127.0.0.1:9301][cluster:admin/reroute]"}],"type":"illegal_argument_exception","reason":"[allocate] trying to allocate a primary shard [constant-updates][0], which is disabled"},"status":400}
The caveat with forcing allocation of a primary shard is that you will be assigning an “empty” shard. If the node that contained the original primary shard data were to rejoin the cluster later, its data would be overwritten by the newly created (empty) primary shard, because it would be considered a “newer” version of the data.
You will now need to reindex the missing data, or restore as much as you can from a backup snapshot using the Snapshot and Restore API.
0 comments:
Post a Comment