Before you begin
Make sure you have already completed Under-Replication Troubleshooting and have a cluster of 3 nodes running.
Step 1. Simulate the problem
-
In the terminal where node 2 is running, press CTRL-C.
- In the terminal where node 3 is running, press CTRL-C. You may need to press CRTL + C a second time to force this node to stop.
Step 2. Troubleshoot the problem
-
Go back to the Admin UI:
You'll notice that an error is shown and timeseries metrics are no longer being reported.
-
In a new terminal, try to query the one node that was not stopped:
$ ./cockroach sql \ --insecure \ --host=localhost:26257 \ --execute="SHOW DATABASES;" \ --logtostderr=WARNING
Because all ranges in the cluster, specifically the system ranges, no longer have a majority of their replicas, the cluster as a whole cannot make progress, and so the query will hang indefinitely.
Step 3. Resolve the problem
-
In the terminal where node 2 was running, restart the node:
$ ./cockroach start \ --insecure \ --store=node2 \ --listen-addr=localhost:26258 \ --http-addr=localhost:8081 \ --join=localhost:26257,localhost:26258,localhost:26259
-
In the terminal where node 3 was running, restart the node:
$ ./cockroach start \ --insecure \ --store=node3 \ --listen-addr=localhost:26259 \ --http-addr=localhost:8082 \ --join=localhost:26257,localhost:26258,localhost:26259
-
Go back to the terminal where you issued the query.
All ranges have a majority of their replicas again, and so the query executes and succeeds:
database_name +---------------+ defaultdb postgres system (3 rows)
Clean up
In the next module, you'll start a new cluster from scratch, so take a moment to clean things up.
-
Stop all CockroachDB nodes:
$ pkill -9 cockroach
-
Remove the nodes' data directories:
$ rm -rf node1 node2 node3