Elastic Search Monitoring and Troubleshooting Notes ~ Techie's Notes

Get the cluster health status

curl -XGET http://localhost:9200/_cluster/health?pretty

Get shards details

curl -XGET http://localhost:9200/_cat/shards

Force Reroute identified shard to a specific node

Usually required to force assing unassigned_shards to a node

for shard in $(curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $2}'); do

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{

"commands" : [ {

"allocate" : {

"index" : "index_taken_from_step_2",

"shard" : $shard,

"node" : "datanode",

"allow_primary" : true

}

]

sleep 5

done

All Shards on a node stucks in initialization statue

Restart the box could be the only solution.

Troubleshooting Unavailable Shard Exception

a. Index folder is deleted

Recreate the index

b. Node hosting primary shard left cluster

Reboot the node

Fixing UNASSIGNED shard error

Lookup for faulty shard

curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED

for ES5+ :
curl -XGET localhost:9200/_cluster/allocation/explain?pretty

If it looks like the unassigned shards belong to an index you thought you deleted already,
or an outdated index that you don’t need anymore,
then you can delete the index to restore your cluster status to green:

curl -XDELETE 'localhost:9200/index_name/'

Possible Reason or unassigned shard error

Shard allocation is purposefully delayed

Too many shards, not enough nodes

You need to re-enable shard allocation

Shard data no longer exists in the cluster

Low disk watermark

Multiple Elasticsearch versions

Shard allocation is purposefully delayed

When a node leaves the cluster, the master node temporarily delays shard reallocation to avoid needlessly wasting resources on rebalancing shards, in the event the original node is able to recover within a certain period of time (one minute, by default). If this is the case, your logs should look something like this:

[TIMESTAMP][INFO][cluster.routing] [MASTER NODE NAME] delaying allocation for [54] unassigned shards, next check in [1m]

You can dynamically modify the delay period like so:

curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d

"settings": {

"index.unassigned.node_left.delayed_timeout": "30s"

}

Replacing <INDEX_NAME> with _all will update the threshold for all indices in your cluster.

After the delay period is over, you should start seeing the master assigning those shards. If not, keep reading to explore solutions to other potential causes.

Too many shards, not enough nodes

Node should be > replica + 1

Add new nodes or decrease replica number

You need to re-enable shard allocation

curl -XPUT 'localhost:9200/_cluster/settings' -d

'{ "transient":

{ "cluster.routing.allocation.enable" : "all"

}

Low disk watermark

query : curl -s 'localhost:9200/_cat/allocation?v'

Increase allocation

curl -XPUT 'localhost:9200/_cluster/settings' -d

"transient": {

"cluster.routing.allocation.disk.watermark.low": "90%"

}

Change to "persistent" if need to persist data between restart

Shard data no longer exists in the cluster

In this case, primary shard 0 of the constant-updates index is unassigned. It may have been created on a node without any replicas (a technique used to speed up the initial indexing process), and the node left the cluster before the data could be replicated. The master detects the shard in its global cluster state file, but can’t locate the shard’s data in the cluster.

Another possibility is that a node may have encountered an issue while rebooting. Normally, when a node resumes its connection to the cluster, it relays information about its on-disk shards to the master, which then transitions those shards from “unassigned” to “assigned/started”. When this process fails for some reason (e.g. the node’s storage has been damaged in some way), the shards may remain unassigned.

In this scenario, you have to decide how to proceed: try to get the original node to recover and rejoin the cluster (and do not force allocate the primary shard), or force allocate the shard using the Reroute API and reindex the missing data using the original data source, or from a backup.

If you decide to allocate an unassigned primary shard, make sure to add the "allow_primary": "true" flag to the request:

curl -XPOST 'localhost:9200/_cluster/reroute' -d

'{ "commands" :

[ { "allocate" :

{ "index" : "constant-updates", "shard" : 0, "node": "<NODE_NAME>", "allow_primary": "true" }

}]

Without the "allow_primary": "true" flag, we would have encountered the following error:

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[NODE_NAME][127.0.0.1:9301][cluster:admin/reroute]"}],"type":"illegal_argument_exception","reason":"[allocate] trying to allocate a primary shard [constant-updates][0], which is disabled"},"status":400}

The caveat with forcing allocation of a primary shard is that you will be assigning an “empty” shard. If the node that contained the original primary shard data were to rejoin the cluster later, its data would be overwritten by the newly created (empty) primary shard, because it would be considered a “newer” version of the data.

You will now need to reindex the missing data, or restore as much as you can from a backup snapshot using the Snapshot and Restore API.

Techie's Notes

while( !(succeed=try())){}

Popular Posts

Recent Posts

Categories

Blog Archive

Contributors

Followers

Total Pageviews

Search This Blog

Blogroll

About

Blogger templates

Sunday, 16 December 2018

Elastic Search Monitoring and Troubleshooting Notes

Get the cluster health status

Get shards details

Force Reroute identified shard to a specific node

All Shards on a node stucks in initialization statue

Troubleshooting Unavailable Shard Exception

Fixing UNASSIGNED shard error

Lookup for faulty shard

Possible Reason or unassigned shard error

Shard allocation is purposefully delayed

Too many shards, not enough nodes

You need to re-enable shard allocation

Low disk watermark

Shard data no longer exists in the cluster

0 comments:

Post a Comment

Popular Posts

Recent Posts

Categories

Blog Archive

Contributors

Subscribe To

Followers

Total Pageviews

Search This Blog

Blogroll

About

Blogger templates

Sunday, 16 December 2018

Get the cluster health status

Get shards details

Force Reroute identified shard to a specific node

All Shards on a node stucks in initialization statue

Troubleshooting Unavailable Shard Exception

Fixing UNASSIGNED shard error

Lookup for faulty shard

Possible Reason or unassigned shard error

Shard allocation is purposefully delayed

Too many shards, not enough nodes

You need to re-enable shard allocation

Low disk watermark

Shard data no longer exists in the cluster

0 comments:

Post a Comment