Skip to main content

Upgrading from standalone SOLR to SOLR Cloud

SolrCloud is designed to provide a highly available, fault tolerant environment for distributing your indexed content and query requests across multiple servers.

So what if we already have a standalone SOLR setup running on a single server and fulfilling ou
search needs. As data grows or search traffic increases, vertically scaling that one single server is neither efficient nor cost effective.

The answer comes as SOLR cloud and this document aims at providing an effective solution to migrating from standalone SOLR to a cluster with minimum effort.

Step 1: Identify and setup SOLR and Zookeeper on required set of servers.
Step 2: Start SOLR in cloud mode with the available Zookeeper configuration.
Step 3: Rebuilding the cores.


  1. An efficient way to build the cores is to reuse our existing standalone SOLR setup.We can simply copy the existing SOLR core to the {solr.installation}/server/solr directory to start with. Copies of this SOLR core can be placed on all SOLR installations (nodes) on which the core is expected to be persisted. It might be a good idea to copy our cores to all nodes as we can balance our installation later. To start with, we can copy the core with all configuration and data, as is.
  2. Once cores are created, start Zookeeper while keeping SOLR stopped, for now. Each of our cores on standalone SOLR has a conf directory inside it. Upload the conf to Zookeeper using zkcli script. zkcli script is available {solr.installation}/server/scripts/cloud-scripts directory. For each of our cores we can upload the configuration to Zookeeper as: zkcli.sh -cmd upconfig -confdir {confdirpath} -z <<zookeeper host ip addressess>> -confname <<corename>>. It is important to keep the name of configuration same as the name of the core for which the configuration is intended. Doing this ensures that the standalone SOLR configurations get exported correctly to the cloud.
  3. Start SOLR in cloud mode on all the nodes as solr start -cloud -z <<zookeeper host ip address>> -p <<solr port>>
Step 4: The above steps start SOLR in cloud mode. We should be able to see our cores and collections in SOLR admin console. Now all that remains is balancing our SOLR cloud cores with desired replication and sharding. This is achieved by using COLLECTIONS API, invoking MODIFYCOLLECTION operation with required attribute values for maxShardsPerNode, replicationFactor and autoAddReplicas.

Once setup, restart the SOLR Cloud and Zookeeper setup and we are all set.

Consideration 1: Application migration from standalone SOLR to SOLR cloud with solrj
Whenever we write new applications using solrj, working with standalone SOLR, it is a good idea to write a central connection handling module that returns HTTPSolrClient or CloudSolrClient as SolrClient interface for each of the core, based on a switch.

Consideration 2: Data import handler in SOLR cloud mode
If there are any data import handlers defined in for our cores, those get configured in SOLR cloud, through above process. Data import can be scheduled on any one of the nodes of SOLR cloud. Scheduling process can be written with failure handling that invokes data import on alternate node if one fails. 

A good idea may be to keep two SOLR nodes in SOLR cloud, devoid of any cores. All housekeeping activities can be performed through these nodes via. data import handler or COLLECTIONS API.

Comments

Popular posts from this blog

Using JNDI managed JMS objects with Apache CAMEL

Apache CAMEL uses Spring JMS to work with JMS Queues or Topics. Evidently, we will need Spring to configure and use JMS capabilities provided by CAMEL. Details about how to implement JMS based routes using Apache CAMEL can be found in the CAMEL documentation. However, the documentation leaves a lot to be figured out. In a typical Java EE container, it is usually a good idea to abstract the underlying JMS resources by using JNDI. We can use the below configuration to achieve that. This configuration is tested in Websphere environment, but should work in any JEE container. Create a JMS queue connection factory in the JNDI registry. CAMEL configuration will be able to use only one queue connection factory, even if we have more than one. Create one or more JMS queue or topics, in the JNDI registry, as required. The above two steps are related to generic JNDI configuration for JMS resources. Now we come to the setup required for making these JMS resources work with CAMEL rout...

Catch hold of that Exception and hide that stacktrace!!!

E xceptions happen!!! Rules are to be followed, too. Time and again, Java developers are told the golden rule to catch specific exceptions and not to catch the generic Exception. The thought process behind that is, applications should not catch runtime exceptions. This is apt as runtime exceptions are an indicator of "bugs" in the code. However, blindly following rules, as always, can have unexpected consequences. If you are developing services that are to be exposed over the wire, it is always a good idea to break this rule and "catch that Exception". Instead, follow the below principles: Service methods should implement a generic Exception block, along with catching declared exceptions, thrown from inner layers of the code.  If needed, the service can throw another exception back to the client. What's important is that we create a new Exception instance to be thrown, along with relevant message for the client. The service can log stacktrace for the E...

Container ready spring boot application

Spring boot applications are now ubiquitous. The usual way to build one is to create an uber jar. At the same time, Docker allows us to build self reliant containers which are unaffected by the underlying server architecture or neighboring applications or their dependencies. Spring boot applications can also run in docker containers.  However running an uber jar inside a container fails to satisfy an important goal. That of high speed build and deployment. An uber jar is a heavy weight entity. That will make docker image heavy and slow to build. Here's a step by step solution, which leverages docker layer caching feature for faster builds with all spring boot goodness. We will use Maven to build a deployment structure for our docker image that allows fast deployments. Step 1: Create a spring boot application with only the SpringBootApplication and Configuration classes, such as one for REST configuration package scanning, one for JPA etc. Most often this wil...