Colocating Collections

2020-01-12

Solr provides a way to colocate a collection with another so that cross-collection joins are always possible.

The colocation guarantee applies to all future Collection operations made either via Collections API or by Autoscaling actions.

A collection may only be colocated with exactly one withCollection. However, arbitrarily many collections may be linked to the same withCollection.

Create a Colocated Collection

The Create Collection API supports a parameter named withCollection which can be used to specify a collection with which the replicas of the newly created collection should be colocated. See Create Collection API.

/admin/collections?action=CREATE&name=techproducts&numShards=1&replicationFactor=2&withCollection=tech_categories

In the above example, all replicas of the techproducts collection will be colocated on a node with at least one replica of the tech_categories collection.

Colocating Existing Collections

When collections already exist beforehand, the Modify Collection API can be used to set the withCollection parameter so that the two collections can be linked. This will not trigger changes to the cluster automatically because moving a large number of replicas immediately might de-stabilize the system. Instead, it is recommended that the Suggestions UI page should be consulted on the operations that can be performed to change the cluster manually.

Example: /admin/collections?action=MODIFYCOLLECTION&collection=techproducts&withCollection=tech_categories

Deleting Colocated Collections

Deleting a collection which has been linked to another will fail unless the link itself is deleted first by using the Modify Collection API to un-set the withCollection attribute.

Example: /admin/collections?action=MODIFYCOLLECTION&collection=techproducts&withCollection=

Limitations and Caveats

The collection being used as the withCollection must have one shard only and that shard should be named shard1. Note that when using the default router, the shard name is always set to shard1 but special care must be taken to name the shard as shard1 when using the implicit router.

In case new replicas of the withCollection have to be added to maintain the colocation guarantees then the new replicas will be of type NRT only. Automatically creating replicas of TLOG or PULL types is not supported.

In case, replicas have to be moved from one node to another, perhaps in response to a node lost trigger, then the target nodes will be chosen by preferring nodes that already have a replica of the withCollection so that the number of moves is minimized. However, this also means that unless there are Autoscaling policy violations, Solr will continue to move such replicas to already loaded nodes instead of preferring empty nodes. Therefore, it is advised to have policy rules which can prevent such overloading by e.g., setting the maximum number of cores per node to a fixed value.

Example: {'cores' : '<8', 'node' : '#ANY'}

The colocation guarantee is one-way only i.e., a collection 'X' colocated with 'Y' will always have one or more replicas of 'Y' on any node that has a replica of 'X' but the reverse is not true. There may be nodes which have one or more replicas of 'Y' but no replicas of 'X'. Such replicas of 'Y' will not be considered a violation of colocation rules and will not be cleaned up automatically.