Upgrading from standalone SOLR to SOLR Cloud

SolrCloud is designed to provide a highly available, fault tolerant environment for distributing your indexed content and query requests across multiple servers.

So what if we already have a standalone SOLR setup running on a single server and fulfilling ou
search needs. As data grows or search traffic increases, vertically scaling that one single server is neither efficient nor cost effective.

The answer comes as SOLR cloud and this document aims at providing an effective solution to migrating from standalone SOLR to a cluster with minimum effort.

Step 1: Identify and setup SOLR and Zookeeper on required set of servers.
Step 2: Start SOLR in cloud mode with the available Zookeeper configuration.
Step 3: Rebuilding the cores.

An efficient way to build the cores is to reuse our existing standalone SOLR setup.We can simply copy the existing SOLR core to the {solr.installation}/server/solr directory to start with. Copies of this SOLR core can be placed on all SOLR installations (nodes) on which the core is expected to be persisted. It might be a good idea to copy our cores to all nodes as we can balance our installation later. To start with, we can copy the core with all configuration and data, as is.
Once cores are created, start Zookeeper while keeping SOLR stopped, for now. Each of our cores on standalone SOLR has a conf directory inside it. Upload the conf to Zookeeper using zkcli script. zkcli script is available {solr.installation}/server/scripts/cloud-scripts directory. For each of our cores we can upload the configuration to Zookeeper as: zkcli.sh -cmd upconfig -confdir {confdirpath} -z <<zookeeper host ip addressess>> -confname <<corename>>. It is important to keep the name of configuration same as the name of the core for which the configuration is intended. Doing this ensures that the standalone SOLR configurations get exported correctly to the cloud.
Start SOLR in cloud mode on all the nodes as solr start -cloud -z <<zookeeper host ip address>> -p <<solr port>>

Step 4: The above steps start SOLR in cloud mode. We should be able to see our cores and collections in SOLR admin console. Now all that remains is balancing our SOLR cloud cores with desired replication and sharding. This is achieved by using COLLECTIONS API, invoking MODIFYCOLLECTION operation with required attribute values for maxShardsPerNode, replicationFactor and autoAddReplicas.

Once setup, restart the SOLR Cloud and Zookeeper setup and we are all set.

Consideration 1: Application migration from standalone SOLR to SOLR cloud with solrj

Whenever we write new applications using solrj, working with standalone SOLR, it is a good idea to write a central connection handling module that returns HTTPSolrClient or CloudSolrClient as SolrClient interface for each of the core, based on a switch.

Consideration 2: Data import handler in SOLR cloud mode

If there are any data import handlers defined in for our cores, those get configured in SOLR cloud, through above process. Data import can be scheduled on any one of the nodes of SOLR cloud. Scheduling process can be written with failure handling that invokes data import on alternate node if one fails.

A good idea may be to keep two SOLR nodes in SOLR cloud, devoid of any cores. All housekeeping activities can be performed through these nodes via. data import handler or COLLECTIONS API.

Enterprise Live

Search This Blog

Upgrading from standalone SOLR to SOLR Cloud

Comments

Post a Comment

Popular posts from this blog

Using JNDI managed JMS objects with Apache CAMEL

Catch hold of that Exception and hide that stacktrace!!!

Container ready spring boot application