SolrCloud Autoscaling Trigger Actions

2020-01-12

TriggerAction implementations process events generated by triggers in order to ensure the cluster’s health and good use of resources.

Currently two implementations are provided: ComputePlanAction and ExecutePlanAction.

Compute Plan Action

The ComputePlanAction uses the policy and preferences to calculate the optimal set of Collection API commands which can re-balance the cluster in response to trigger events.

The following parameters are configurable:

collections

A comma-separated list of collection names. If this list is not empty then the computed operations will only calculate collection operations that affect listed collections and ignore any other collection operations for collections not listed here. Note that non-collection operations are not affected by this.

Example configuration:

{
 "set-trigger" : {
  "name" : "node_added_trigger",
  "event" : "nodeAdded",
  "waitFor" : "1s",
  "enabled" : true,
  "actions" : [
   {
    "name" : "compute_plan",
    "class" : "solr.ComputePlanAction",
    "collections" : "test1,test2",
   },
   {
    "name" : "execute_plan",
    "class" : "solr.ExecutePlanAction",
   }
  ]
 }
}

In this example only collections test1 and test2 will be potentially replicated / moved to an added node, other collections will be ignored even if they cause policy violations.

Execute Plan Action

The ExecutePlanAction executes the Collection API commands emitted by the ComputePlanAction against the cluster using SolrJ. It executes the commands serially, waiting for each of them to succeed before continuing with the next one.

Currently, it has the following configurable parameters:

taskTimeoutSeconds

Default value of this parameter is 120 seconds. This value defines how long the action will wait for a command to complete its execution. If a timeout is reached while the command is still running then the command status is provisionally considered a success but a warning is logged, unless taskTimeoutFail is set to true.

taskTimeoutFail

Boolean with a default value of false. If this value is true then a timeout in command processing will be marked as failure and an exception will be thrown.

If the Overseer node fails while ExecutePlanAction is running, then the new Overseer node will run the chain of actions for the same event again after waiting for any running Collection API operations belonging to the event to complete.

Please see SolrCloud Autoscaling Fault Tolerance for more details on fault tolerance within the autoscaling framework.