Cross Data Center Replication (CDCR)

2020-01-12

Cross Data Center Replication (CDCR) allows you to create multiple SolrCloud data centers and keep them in sync.

What is CDCR?

The SolrCloud architecture is designed to support Near Real Time (NRT) searches on a Solr collection that usual consists of multiple nodes in a single data center. CDCR augments this model by forwarding updates from a Solr collection in one data center to a parallel Solr collection in another data center where the network latencies are greater than the SolrCloud model was designed to accommodate.

For more information about CDCR, see the following sections:

CDCR Glossary

For the purposes of discussing CDCR, the following terminology is used. If you are already familiar with SolrCloud, many of these terms will already be familiar to you.

Node

A JVM instance running Solr; a server.

Cluster

A set of Solr nodes managed as a single unit by a ZooKeeper ensemble hosting one or more Collections.

Data Center

A group of networked servers hosting a Solr cluster. For CDCR, the terms Cluster and Data Center are interchangeable as we assume that each Solr cluster is hosted in a different group of networked servers.

Shard

A sub-index of a single logical collection. This may be spread across multiple nodes of the cluster. Each shard can have 1-N replicas.

Leader

Each shard has replica identified as its leader. All the writes for documents belonging to a shard are routed through the leader.

Replica

A copy of a shard for use in failover or load balancing. Replicas comprising a shard can either be leaders or non-leaders.

Follower

A convenience term for a replica that is not the leader of a shard.

Collection

A logical index, consisting of one or more shards. A cluster can have multiple collections.

Update

An operation that changes the collection’s index in any way. This could be adding a new document, deleting documents or changing a document.

Update Log(s)

An append-only log of write operations maintained by each node.