Deploying and Optimizing CouchDB

Piyas DeSeptember 28th, 2015Last Updated: January 5th, 2018

0 107 20 minutes read

This article is part of our Academy Course titled CouchDB – Database for the Web.

This is a hands-on course on CouchDB. You will learn how to install and configure CouchDB and how to perform common operations with it. Additionally, you will build an example application from scratch and then finish the course with more advanced topics like scaling, replication and load balancing. Check it out here!

1. Scaling

1.1. Scaling Read Requests
1.2. Scaling Write Requests
1.3. Scaling Data

2. Replication

2.1. Algorithm
2.2. Simple Replication with the Admin Interface
2.3. Replication in Detail

3. Conflict management

4. Load Balancing

4.1. Having a Backup

5. Clustering

5.1. Introducing CouchDB Lounge
5.2. Consistent Hashing

6. Distributed load testing

1. Scaling

Scaling, or scalability, doesn’t refer to a specific technique or technology, but rather is an attribute of a specific architecture.

In this lesson we shall cover the scaling of CouchDB, a popular NoSQL database. For CouchDB, we can scale three general properties:

Read requests
Write requests
Data

1.1. Scaling Read Requests

A read request retrieves a piece of information from the database. It follows these stations within CouchDB: First, the HTTP server module needs to accept the request. For that, it opens a socket to send over the data. The next station is the HTTP request handle module, which analyzes the request and directs it to the appropriate submodule inside CouchDB. For single documents, the request then gets passed to the database module where the data for the document is looked up on the filesystem and returned all the way up again.

All this takes processing time and additionally there must be enough sockets (or file descriptors) available. The storage backend of the server must be able to fulfill all these read requests. There are a few more things that can limit a system to accept more read requests; the basic point here is that a single server can process only so many concurrent requests.

The nice thing about read requests is that they can be cached. Often-used items can be held in memory and can be returned at a much higher speed. Requests that can use this cache don’t ever hit database and are thus less IO Operation intensive.

1.2. Scaling Write Requests

A write request is similar to a read request which reads a piece of data from disk, but it writes it back after modifying it. Remember, the nice thing about reads is that they’re cacheable. A cache must be notified when a write changes the underlying data, or the clients must be notified to not use the cache. If we have multiple servers for scaling reads, a write must occur on all servers.

1.3. Scaling Data

The third way of scaling is scaling data. Today’s hard drives are cheap and provie a lot of capacity, and they will only get better in the future, but there is only so much data a single server can make sensible use of. It must also maintain one or more indexes to the data, thus it uses even moredisk space.

The solution is to chop the data into manageable chunks and put each chunk on a separate server. In this way, all servers with a chunk will form a cluster that holds all your data.

While we are taking separate looks at scaling of reads, writes, and data, these rarely occur isolated. Decisions to scale one will affect the others.

2. Replication

A replicator simply connects to two DBs as a client, then reads from one and writes to the other. Push replication is reading the local data and updating the remote DB; pull replication is vice versa.

The replicator is actually an independent Erlang application, running on its own process. It connects to both CouchDBs, then reads records from one and writes them to the other.
CouchDB has no way of knowing who is a normal client and who is a replicator (let alone whether the replication is push or pull). It all looks like client connections. Some of them read records, some of them write records.

The CouchDB Replication protocol is a synchronization protocol for synchronizing documents between 2 peers over HTTP/1.1.

2.1. Algorithm

The replication algorithm can be explained as follows:

Assign an unique identifier to the source Database. Most of the time it will be the URI.
Save this identifier in a special Document named _local/<uniqueid> on the Target database. This document isn’t replicated. It will collect the last Source sequence ID, the Checkpoint, from the previous replication process.
Get the Source changes feed by passing it the Checkpoint using the since parameter by calling the /<source>/_changes URL. The changes feed only return a list of current revisions.

Note: This step can be performed continuously using the feed=longpoll or feed=continuous parameters. Then the feed will continuously get the changes.

Collect a group of Document/Revisions ID pairs from the changes feed and send them to the target databases on the /<target>/_revs_diffs URL. The result will contain the list of revisions NOT in the Target database.

GET each revision from the source Database by calling the URL /<source>/<docid>?revs=true&rev=<revision> . This will get the document with the parent revisions. Also don’t forget to get attachements that aren’t already stored at the target. As an optimization, we can use the HTTP multipart API to retrieve them all.

Collect a group of revisions fetched at the previous step and store them on the target database using the Bulk Docs API with the new_edit: false JSON property to preserve their revisions ID.

After the group of revision is stored on the Target Database, save the new Checkpoint on the Source database.

Note: Even if some revisions have been ignored, the sequence should be taken into consideration for the Checkpoint.

To compare non numeric sequence ordering, we will have to keep an ordered list of the sequences IDS as they appear in the _changes feed and compare their indices.

One thing to keep in mind is that the _users database, the design documents and the security attributes of the databases are not being replicated.

For, the _users database and the design documents, there is solution: We just need to run the process as administrator in order to replicate them.

Only server and database admins can create design docs and access views:

curl -H 'Content-Type: application/json' -X POST http://localhost:5984/_replicate -d ' {"source": "http://admin:admin_password@production:5984/foo", "target": "http://admin:admin_password@stage:5984/foo", "create_target": true, "continuous": true} '

This POST request will also work with the _users database.

Replication is a one-off operation: we send an HTTP request to CouchDB that includes a source and a target database, and CouchDB will send the changes from the source to the target. That is all. Granted, calling something world-class and then only needing one sentence to explain it does seem odd. But part of the reason why CouchDB’s replication is so powerful lies in its simplicity.

Let’s see what replication looks like:

POST /_replicate HTTP/1.1
{"source":"database","target":"http://example.org/database"} -H "Content-Type: application/json"

This call sends all the documents in the local database database to the remote database http://example.org/database. A database is considered “local” when it is on the same CouchDB instance you send the POST /_replicate HTTP request to. All other instances of CouchDB are “remote.”

To send changes from the target to the source database, just make the same HTTP requests, only with source and target database swapped.

POST /_replicate HTTP/1.1
Content-Type: application/json
{"source":"http://example.org/database","target":"database"}

A remote database is identified by the same URL we use to talk to it. CouchDB replication works over HTTP using the same mechanisms that are available to us. This example shows that replication is a unidirectional process. Documents are copied from one database to another and not automatically vice versa. If we want bidirectional replication, we trigger two replications with source and target swapped.

When we ask CouchDB to replicate one database to another, it will go and compare the two databases to find out which documents on the source differ from the target and then submit a batch of the changed documents to the target until all changes are transferred. Changes include new documents, changed documents, and deleted documents. Documents that already exist on the target in the same revision are not transferred; only newer revisions are.

Databases in CouchDB have a sequence number that gets incremented every time the database is changed. CouchDB remembers what changes came with which sequence number. That way, CouchDB can answer questions like, “What changed in database A between sequence number 53 and now?” by returning a list of new and changed documents. Finding the differences between databases this way is an efficient operation. It also adds to the robustness of replication.

We can use replication on a single CouchDB instance to create snapshots of our databases to be able to test code changes without risking data loss or to be able to refer back to older states of our database. But replication gets really fun if we use two or more different computers, potentially geographically spread out.

With different servers, potentially hundreds or thousands of miles apart, problems are bound to happen. Servers crash, network connections break off, things go wrong. When a replication process is interrupted, it leaves two replicating CouchDBs in an inconsistent state. Then, when the problems are gone and we trigger replication again, it continues where it left off.

2.2. Simple Replication with the Admin Interface

We can run replication from your web browser using Futon, CouchDB’s built-in administration interface. Start CouchDB and open the url to http://127.0.0.1:5984/_utils/. On the righthand side, there is a list of things to visit in Futon. Click on “Replication.”

Futon will show an interface to start replication. We can specify a source and a target by either picking a database from the list of local databases or filling in the URL of a remote database.

Click on the Replicate button, wait a bit, and have a look at the lower half of the screen where CouchDB gives us some statistics about the replication run or, if an error occurred, an explanatory message.

2.3. Replication in Detail

So far, we’ve skipped over the result from a replication request. Here’s an example:

{
  "ok": true,
  "source_last_seq": 10,
  "session_id": "c7a2bbbf9e4af774de3049eb86eaa447",
  "history": [
    {
      "session_id": "c7a2bbbf9e4af774de3049eb86eaa447",
      "start_time": "Mon, 24 Aug 2009 09:36:46 GMT",
      "end_time": "Mon, 24 Aug 2009 09:36:47 GMT",
      "start_last_seq": 0,
      "end_last_seq": 1,
      "recorded_seq": 1,
      "missing_checked": 0,
      "missing_found": 1,
      "docs_read": 1,
      "docs_written": 1,
      "doc_write_failures": 0,
    }
  ]
}

The “ok”: true part, similar to other responses, tells us everything went well. source_last_seq includes the source’s update_seq value that was considered by this replication. Each replication request is assigned a session_id, which is just a UUID.

The next bit is the replication history. CouchDB maintains a list of history sessions for future reference. The history array is currently capped at 50 entries. Each unique replication trigger object (the JSON string that includes the source and target databases as well as potential options) gets its own history.

The session_id is recorded here again for convenience. The start and end time for the replication session are also recorded. The _last_seq denotes the update_seqs that were valid at the beginning and the end of the session. recorded_seq is the update_seq of the target again. It’s different from end_last_seq if a replication process dies in the middle and is restarted. missing_checked is the number of docs on the target that are already there and don’t need to be replicated. missing_found is the number of missing documents on the source.

The last three—docs_read, docs_written, and doc_write_failures—show how many documents we read from the source, wrote to the target, and how many failed. If all is well, _read and _written are identical and doc_write_failures is 0. If not, something went wrong during replication. Possible failures are a server crash on either side, a lost network connection, or a validate_doc_update function rejecting a document write.

One common scenario is triggering replication on nodes that have admin accounts enabled. Creating design documents is restricted to admins, and if the replication is triggered without admin credentials, writing the design documents during replication will fail and be recorded as doc_write_failures. We have admins and need to include the credentials in the replication request:

> curl -X POST http://127.0.0.1:5984/_replicate  -d '{"source":"http://example.org/database", "target":"http://admin:password@127.0.0.1:5984/database"}' -H "Content-Type: application/json"

2.3.1 Continuous Replication

When we add “continuous”: true to the replication trigger object, CouchDB will not stop after replicating all missing documents from the source to the target. It will listen on CouchDB’s _changes API and automatically replicate over any new docs as they come into the source to the target. In fact, they are not replicated right away; there’s a complex algorithm determining the ideal moment to replicate for maximum performance.

> curl -X POST http://127.0.0.1:5984/_replicate -d '{"source":"db", "target":"db-replica", "continuous":true}' -H "Content-Type: application/json"

CouchDB doesn’t remember continuous replications over a server restart. For the time being, we need to trigger them again when you restart CouchDB. In the future, CouchDB will allow us to define permanent continuous replications that survive a server restart without having to do anything.

3. Conflict management

CouchDB has a mechanism to maintain continuous replication, so one can keep a whole set of computers in sync with the same data, whenever a network connection is available.

When we replicate two databases in CouchDB and we face conflicting changes, CouchDB will detect this and will flag the affected document with the special attribute “_conflicts”:true. Next, CouchDB determines which of the changes will be stored as the latest revision (remember, documents in CouchDB are versioned). The version that gets picked to be the latest revision is the winning revision. The losing revision gets stored as the previous revision.

CouchDB does not attempt to merge the conflicting revision. Our application dictates how the merging should be done. The choice of picking the winning revision is arbitrary.

Replication guarantees that conflicts are detected and that each instance of CouchDB makes the same choice regarding winners and losers, independent of all the other instances. Here a deterministic algorithm determines the order of the conflicting revision. After replication, all instances taking part have the same data. The data set is said to be in a consistent state. If we ask any instance for a document, we will get the same answer regardless which one we ask.

Whether or not CouchDB picked the version that our application needs, we need to go and resolve the conflict, just as we need to resolve a conflict in a version control system like Subversion by merging them and save it as the now latest revision. Replicate again and our resolution will populate over to all other instances of CouchDB. Our conflict resolving on one node could lead to further conflicts, all of which will need to be addressed, but eventually, we will end up with a conflict-free database on all nodes.

4. Load Balancing

4.1. Having a Backup

Whatever the cause is, we want to make sure that the service we are providing is resilient against failure. The road to resilience is a road of finding and removing single points of failure. A server’s power supply can fail. To keep the server from turning off during such an event, most come with at least two power supplies. To take this further, we could get a server where everything is duplicated (or more). It is much cheaper to get two similar servers where the one can take over if the other has a problem. However, we need to make sure both servers have the same set of data in order to switch them without a user noticing.

Removing all single points of failure will give us a highly available or a fault-tolerant system. The order of tolerance is restrained only by our budget. If we can’t afford to lose a customer’s shopping cart in any event, we need to store it on at least two servers in at least two far apart geographical locations.

Before we dive into setting up a highly available CouchDB system, let’s look at another situation. Suppose that an Online Shopping Site suddenly faces a lot more traffic than usual and that the customers are complaining for the site being “slow”. Now, a probable solution for such a scenerio would be to setup a second server which will take some load from first server when the load exceeds a certain threshold.

The solution to the outlined problem looks a lot like the earlier one for providing a fault-tolerant setup: install a second server and synchronize all data. The difference is that with fault tolerance, the second server just sits there and waits for the first one to fail. In the server-overload case, a second server helps answer all incoming requests. This case is not fault-tolerant: if one server crashes, the other will get all the requests and will likely break down, or provide a very slow service, neither of which is acceptable.

Keep in mind that although the solutions look similar, high availability and fault tolerance are not the same. We’ll get back to the second scenario later on, but first we will take a look at how to set up a fault-tolerant CouchDB system.

5. Clustering

In this chapter we’ll be dealing with the aspect of putting together a partitioned or sharded cluster that will have to grow at an increasing rate over time from day one.

We’ll look at request and response dispatch in a CouchDB cluster with stable nodes. Then we’ll cover how to add redundant hot-failover twin nodes, so there is no worry about losing machines. In a large cluster, we should plan for 5–10% of our machines to experience some sort of failure or reduced performance, so cluster design must prevent node failures from affecting reliability. Finally, we’ll look at adjusting cluster layout dynamically by splitting or merging nodes using replication.

5.1. Introducing CouchDB Lounge

CouchDB Lounge is a proxy-based partitioning and clustering application, originally developed for Meebo, a web-based instant messaging service. Lounge comes with two major components: one that handles simple GET and PUT requests for documents, and another that distributes view requests.

The dumbproxy handles simple requests for anything that isn’t a CouchDB view. This comes as a module for nginx, a high-performance reverse HTTP proxy. Because of the way reverse HTTP proxies work, this automatically allows configurable security, encryption, load distribution, compression, and, of course, aggressive caching of our database resources.

The smartproxy handles only CouchDB view requests, and dispatches them to all the other nodes in the cluster so as to distribute the work, making view performance a function of the cluster’s cumulative processing power. This comes as a daemon for Twisted, a popular and high-performance event-driven network programming framework for Python.

5.2. Consistent Hashing

CouchDB’s storage model uses unique IDs to save and retrieve documents. Sitting at the core of Lounge is a simple method of hashing the document IDs. Lounge then uses the first few characters of this hash to determine which shard to dispatch the request to. We can configure this behavior by writing a shard map for Lounge, which is just a simple text configuration file.

Because Lounge allocates a portion of the hash (known as a keyspace) to each node, we can add as many nodes as we like. Because the hash function produces hexadecimal strings that bear no apparent relation to our DocIDs, and because we dispatch requests based on the first few characters, we ensure that all nodes see roughly equal load. And because the hash function is consistent, Lounge will take any arbitrary DocID from an HTTP request URI and point it to the same node each time.

This idea of splitting a collection of shards based on a keyspace is commonly illustrated as a ring, with the hash wrapped around the outside. Each tic mark designates the boundaries in the keyspace between two partitions. The hash function maps from document IDs to positions on the ring. The ring is continuous so that we can always add more nodes by splitting a single partition into pieces. With four physical servers, we can allocate the keyspace into 16 independent partitions by distributing them across the servers like so:

A	0,1,2,3
B	4,5,6,7
C	8,9,a,b
D	c,d,e,f

Table 1

If the hash of the DocID starts with 0, it would be dispatched to shard A. Similarly for 1, 2, or 3. Whereas, if the hash started with c, d, e, or f, it would be dispatched to shard D. As a full example, the hash 71db329b58378c8fa8876f0ec04c72e5 is mapped to the node B, database 7 in the table just shown. This could map to http://B.couches.local/db-7/ on our backend cluster. In this way, the hash table is just a mapping from hashes to backend database URIs. Don’t worry if this all sounds very complex; all we have to do is provide a mapping of shards to nodes and Lounge will build the hash ring appropriately—so no need to get our hands dirty if we don’t want to.

To frame the same concept with web architecture, because CouchDB uses HTTP, the proxy can partition documents according to the request URL, without inspecting the body. This is a core principle behind REST and is one of the many benefits using HTTP affords us. In practice, this is accomplished by running the hash function against the request URI and comparing the result to find the portion of the keyspace allocated. Lounge then looks up the associated shard for the hash in a configuration table, forwarding the HTTP request to the backend CouchDB server.

Consistent hashing is a simple way to ensure that we can always find the documents we saved, while balancing storage load evenly across partitions. Because the hash function is simple (it is based on CRC32), we are free to implement our own HTTP intermediaries or clients that can similarly resolve requests to the correct physical location of our data.

6. Distributed load testing

There are many tools available that allow us to create tests for application. In order to stress test a distributed system, we will need a distributed environment and distributed load testing tool. Tsung is a distributed load and stress testing tool that we will use for the example this chapter.

Here our steps will be:

Create a master slave replication in CouchDB.
Install and Configure tsung
Test both the master and slave CouchDBwith tsung.
Evaluate the Testing results
Create proxy-server for CouchDB.
Run tsung for the proxy server environment.
Identify the performance parameters.

Let’s see those in detail.

First we are creating the CouchDB database in both server in our network. For example, our two couchdb server machines are in on IPs 192.168.19.216 and 192.168.19.155
For ubuntu, the couchdb installation command is:
```
sudo apt-get install couchdb
```
To test the couchdb server running or not, we check with an HTTP request:
```
curl localhost:5984
```
After installing couchdb on both machines, we need to bind the ip address by editing the local.ini file
The command to edit the couchdb configuration is the following:
```
sudo gedit /etc/couchdb/local.ini
```
We need to edit the file as follows:
```
[httpd]
port = 5984
bind_address = 192.168.19.216
```
and
```
[httpd]
port = 5984
bind_address = 192.168.19.155
```
and save those.
Restart couchdb in both machines with the command:
```
sudo /etc/init.d/couchdb restart
```
Now we will create couchdb database from ubuntu terminal of machines:
```
curl -X PUT http://192.168.19.216:5984/ourdb
```
Response:
```
{"ok":true}
```
Similarly:
```
curl -X PUT http://192.168.19.155:5984/ourdb
```
Response:
```
{"ok":true}
```
In the above stage, our databases are created on the servers.
To create the pull replication from one machine to another, the command is:
```
curl -X POST http://192.168.19.155:5984/_replicate \\

-H "Content-Type: application/json" \\

-d \\

'{

   "source":"http://192.168.19.216:5984/ourdb",

   "target":"ourdb",

   "continuous":true

}'
```
Couchdb response:
```
{"ok":true,"_local_id":"e6118c2930eabde4ab06df8873a0994b+continuous"} (In every machine this response will be different)
```
Here in the above statement, the pull replication from 192.168.19.216 to 192.168.19.155 machine will be created.
Now we will insert a json document in 192.168.19.216 and we will get automatic replication in 192.168.19.155.
```
curl -X POST http://192.168.19.216:5984/ourdb -H "Content-Type: application/json" -d '{"_id":"U001","name":"John"}'
```
Couchdb response:
```
{"ok":true,"id":"U001","rev":"1-7f570a3bb28cc04b130c3fbb95c7a513"}
```
(Please note that in every machine this response will be different)
To see the results, the commands is:
```
curl -X GET http://192.168.19.155:5984/ourdb/U001
```
Couchdb response:
```
{"_id":"U001","_rev":"1-7f570a3bb28cc04b130c3fbb95c7a513","name":"John"}
```
(In every machine this response will be different)
```
curl -X GET http://192.168.19.216:5984/ourdb/U001
```
```
{"_id":"U001","_rev":"1-7f570a3bb28cc04b130c3fbb95c7a513","name":"John"}
```
(In every machine this response will be different)
We will install tsung for our Distributed Testing Tool.
To install tsung in ubuntu first we need to install the erlang and helper libraries. This is done with the following command:
```
sudo apt-get install build-essential debhelper     erlang-nox erlang-dev     python-matplotlib gnuplot     libtemplate-perl
```
Next we have to download tsung:
```
wget https://github.com/processone/tsung/archive/v1.5.0.tar.gz
```
Next, we have to configure tsung using following commands –
```
tar -xvzf v1.5.0.tar.gz

cd tsung-1.5.0

./configure

make

make deb

cd ..

sudo dpkg -i tsung_1.5.0-1_all.deb
```
Run tsung -v to check that tsung is installed.
To test the individual servers of couchdb, we need to add an xml file, named tsung_load_test.xml with following configurations:
```
<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE tsung SYSTEM "/usr/share/tsung/tsung-1.0.dtd" []>

<tsung loglevel="warning">

  <clients>

    <client host="localhost" cpu="2" maxusers="30000000"/>

  </clients>

  <servers>

    <server host="192.168.19.155" port="5984" type="tcp"/>

  </servers>

  <load>

    <arrivalphase phase="1" duration="1" unit="minute">

      <users arrivalrate="5" unit="second" maxnumber="1000"/>

    </arrivalphase>

  </load>

  <sessions>

    <session name="es_load" weight="1" type="ts_http">

	<for from="1" to="1000" incr="1" var="counter"> 

		<request> <http url="/ourdb/U001" version="1.1" content_type='application/json' method='GET'></http></request>

	</for>

    </session>

  </sessions>

</tsung>
```
In the above configuration, the client element is the tsung test client which will run on localhost. The server element is the actual couchdb server to which the load testing will be done.
In the arrivalphase element, we configure to run the test for 1 minute and in every second, 5 concurrent user will be triggered.
In session element, we define the actual request which will run 1000 times.
For more information about tsung test element configuration, please refer to the User Manual.
To run the test, we will need to run the following:
```
tsung -f tsung_load_test.xml start
```
It will show some directory and log file, where tsung log will be generated.
The same process will be followed for the 192.168.19.216 machine.
To view the tsung test results, we have some built-in report of tsung. We can get the report by using:
```
/usr/lib/tsung/bin/tsung_stats.pl --stats /home/piyas/.tsung/log/20150413-1835/tsung.log chromium graph.html
```
Here the log was generated in the /home/piyas/.tsung/log/20150413-1835/ folder.
The graphical report will be found in graph.html.
We have included 2 sample results for the above indicated 2 machines as a zip file.

Before creating the proxy server, we have created the couchdb server in another machine (192.168.19.122). Additionally, we have tested the server with tsung as we have described above.

To make the proxy server, we had installed and configured apache on 192.168.19.216.

First we downloaded apache in our machine:

sudo apt-get install apache2

To enable proxy, proxy_http and proxy_balancer, we have to use the following commands:

sudo apt-get install libapache2-mod-proxy-html  
sudo a2enmod proxy 
sudo a2enmod proxy_http 
sudo a2enmod proxy_balancer 
sudo a2enmod headers 
sudo a2enmod rewrite

We have installed apache 2.4.7 in our ubuntu machine. Here the configurations are little different.

First we have to edit the 000-default.conf file in the sites-enabled folder:

sudo gedit /etc/apache2/sites-enabled/000-default.conf

And we need to put the following configuration within virtualroot element i.e. at the end within the tag

Header append Vary Accept 
Header add Set-Cookie "NODE=%{BALANCER_WORKER_ROUTE}e; path=/ourdb" \\ 
env=BALANCER_ROUTE_CHANGED

<Proxy balancer://couch-slave> 
    BalancerMember http://192.168.19.155:5984/ourdb route=a max=10
    BalancerMember http://192.168.19.122:5984/ourdb route=b max=10 
    ProxySet stickysession=NODE 
    ProxySet timeout=5 
</Proxy>

RewriteEngine On 
RewriteCond %{REQUEST_METHOD} ^(POST|PUT|DELETE|MOVE|COPY)$ 
RewriteRule ^/ourdb(.*)$ http://192.168.19.216:5984/ourdb$1 [P] 
RewriteCond %{REQUEST_METHOD} ^(GET|HEAD|OPTIONS)$ 
RewriteRule ^/ourdb(.*)$ balancer://couch-slave$1 [P] 
ProxyPassReverse / ourdb http://192.168.19.216:5984/ ourdb
ProxyPassReverse / ourdb balancer://couch-slave 
RewriteEngine On 
RewriteOptions inherit

Here we have configured 192.168.19.155 and 192.168.19.122 machine as our couch-slave and made 192.168.19.216 as couch-master.

So when a http request will come through proxy server, it will be replicated to 192.168.19.155 and 192.168.19.122 from 192.168.19.216

We can see that only the POST, PUT, DELETE, MOVE, COPY commands will be allowed through Proxy but no GET method is allowed here.

Now we restart apache:

sudo service apache2 restart

Troubleshooting –

If user get the error in

/var/log/apache2/error.log

Cannot find LB Method: byrequests

Then he should enable the modules using in apache:

sudo ln -s ../mods-available/lbmethod_byrequests.load lbmethod_byrequests.load 
sudo ln -s ../mods-available/lbmethod_bytraffic.load lbmethod_bytraffic.load 
sudo ln -s ../mods-available/lbmethod_heartbeat.load lbmethod_heartbeat.load 
sudo ln -s ../mods-available/lbmethod_bybusyness.load lbmethod_bybusyness.load

Again to restart apache2:

sudo service apache2 restart

Now if we take a test of our proxy server:

curl -X GET http://192.168.19.216/ourdb

We should get the couchdb response-

{"db_name":"ourdb","doc_count":3,"doc_del_count":0,"update_seq":3,"purge_seq":0, 
"compact_running":false,"disk_size":24678,"data_size":1002,"instance_start_time":"1428993076631193", 
"disk_format_version":6,"committed_update_seq":3}

Which means the proxy server is running successfully.

Now we can perform some test using the proxy server and POST requests:

curl -X POST http://192.168.19.216/ourdb \\ 
-H "Content-Type: application/json" \\ 
-d '{ 
   "_id":"U003", 
   "name":"Steve"	 
}'

curl -X POST http://192.168.19.216/ ourdb \\ 
-H "Content-Type: application/json" \\ 
-d '{ 
   "_id":"U004", 
   "name":"Mark"	 
}'

For the load testing through the proxy server, we will use the same tsung command as the one we described above. The configuration file for the testing will be:

<?xml version="1.0" encoding="utf-8"?> 
<!DOCTYPE tsung SYSTEM "/usr/share/tsung/tsung-1.0.dtd" []> 
<tsung loglevel="warning"> 
 
  <clients> 
    <client host="localhost" cpu="2" maxusers="30000000"/> 
  </clients> 
 
  <servers> 
    <server host="192.168.19.216" port="80" type="tcp"/> 
  </servers> 
 
  <load> 
    <arrivalphase phase="1" duration="1" unit="minute"> 
      <users arrivalrate="5" unit="second" maxnumber="1000"/> 
    </arrivalphase> 
  </load> 
 
  <sessions> 
    <session name="es_load" weight="1" type="ts_http"> 
      
<for from="1" to="1000" incr="1" var="counter"> <request> <http url="/ourdb/U001" version="1.1" content_type='application/json' method='GET'></http></request></for> 
 
    </session> 
  </sessions> 
</tsung>

The results for the proxy server testing is also attached in the ZIP files.

Identifying the performance parameters:
1. If the machines have the same hardware configuration, the couchdb servers will function properly for the given tsung load testing parameters (no downtime).
2. The proxy server environment testing is also running fine. Each slave machine is serving 10 requests in a round-robin manner as per the proxy server scheme. Thus, by using more slave machines for serving requests, the request queuing will be shorter.

You may download the stress test result files here.

Deploying and Optimizing CouchDB

Thank you!

Table Of Contents

1. Scaling

1.1. Scaling Read Requests

1.2. Scaling Write Requests

1.3. Scaling Data

2. Replication

2.1. Algorithm

2.2. Simple Replication with the Admin Interface

2.3. Replication in Detail

2.3.1 Continuous Replication

3. Conflict management

4. Load Balancing

4.1. Having a Backup

5. Clustering

5.1. Introducing CouchDB Lounge

5.2. Consistent Hashing

6. Distributed load testing

Thank you!

Piyas De

Thank you!

Thank you!

Table Of Contents

1. Scaling

1.1. Scaling Read Requests

1.2. Scaling Write Requests

1.3. Scaling Data

2. Replication

2.1. Algorithm

2.2. Simple Replication with the Admin Interface

2.3. Replication in Detail

2.3.1 Continuous Replication

3. Conflict management

4. Load Balancing

4.1. Having a Backup

5. Clustering

5.1. Introducing CouchDB Lounge

5.2. Consistent Hashing

6. Distributed load testing

Thank you!

Related Articles

Thank you!