Benchmarking LDAP

 

Benchmarking an LDAP server can be more difficult than it may seem at first sight. Benchmarking several different LDAP server products for comparison purposes can be even more complex.

The basic problem is that unless care is taken, a benchmark test can end up measuring something other than the LDAP server performance characteristics; typically a bottleneck in the supporting infrastructure (Server cache, Hardware, OS, File-system, TCP/IP stack, network infrastructure or a combination of these),  or the performance of the LDAP client(s) creating the load.

Even when care is taken to avoid or at least minimize these problems there is often a temptation to load the server to the maximum to see what its extreme performance is like. This is usually done by sending nose-to-tail requests over multiple connections.

Unfortunately, this often yields some very unhelpful results.

In a real production environment, care will be taken not to run servers at their limits. In fact, careful system design will try to ensure that any predictable traffic spikes will be somewhat less than the maximum capacity of the system.

In this article we examine the effect that the number of connections to an LDAP server in a benchmark can have for different types of traffic.

The systems used in the following set of tests are 2 CPU, 4 core 2.53GHz machines with 24GB of memory running Centos 6.2. The LDAP server is  configured with a 16GB cache and loaded with one million entries. All the entries and indexes fit into memory. Beyond configuring the cache no tuning was performed as would typically be the case for initial benchmarking  runs. Similar characteristics can be expected with virtually any modern LDAP server.

Searches

A typical benchmark will consist of using multiple clients, each running some number of threads, and sending requests as fast as possible over each connection to the LDAP server. The results obtained this way can be deceiving. A typical curve of number of connections vs. request rate (throughput) looks like this:

What stands out is that with nose-to-tail requests on each connection, throughput maximum is reached with ~30 connections. In fact, as the number of connections increases, throughput actually drops slightly. Looking at the request response times is instructive.

Once maximum throughput (around 30 connections) is reached, traffic is being queued somewhere (most likely in a combination of the work queue within the LDAP server, awaiting worker threads, and possibly within the TCP/IP stack(s) of the client and/or server machines.

Without taking care of what was being measured, a simple interpretation of a benchmark run with 600 connections would conclude that this server is capable of around 74,000 searches with a response time of around 8.5 ms.

In reality, if too many connections are not used, it is capable of 75,500 searches with a response time of 0.5ms. Not a big difference in number of requests handled, but a very big difference in response time (roughly 16x).

The decrease in number of requests handled and increase in response times as connections are added beyond the maximum capacity point is almost entirely due to the overhead of handling the additional connections, which contribute nothing to throughput, but do contribute to overhead and request queuing time.

 Authentication

If we look at timings of a typical authentication sequence consisting of searching for an entry based upon an attribute value (uid) then performing a bind against the DN of the entry located by the search, we see a similar curve (response time is for the entire search/bind sequence).

Again, the “sweet spot” for this particular HW/OS/Server combination is ~30 connections carrying nose-to-tail traffic.

There is a gradual degradation in the throughput as the number of connections is increased. This would lead us to suppose that there may well be a fairly dramatic increase in response times as for search operation.

 

 

 

As indeed we do see in this graph.

For this sort of benchmark to be meaningful, there need to be several runs to determine the response characteristics as above. Even then, it is still not a really useful test since in production no system would be designed to be carrying maximum supportable traffic on each LDAP server instance.

In reality, there would be multiple instances, probably behind a load balancer to ensure that under normal conditions each received an amount of traffic well within its capabilities.

But what if we can’t have that much control over the number of connections? In that case we may want to look at how the throughput and response time varies if we limit the authentication rate.

 

It is perfectly feasible to limit traffic rates with decent load balancers and/or proxy servers, so this is not an unrealistic test. Picking some reasonable value, in this case 5,000 authentications per second, we vary the number of connections.

There is no perceptible degradation in throughput, as we would expect, since we know from the previous tests that the server is capable of much higher throughput than this.

 

 

 

Response times remain acceptable, although this curve does clearly illustrate that many managing many connections does have a measurable (but probably insignificant) impact.

 

 

 

 

 

Modifications

MOD requests, particularly on a system with relatively slow file-store as on this system (single internal disk) are typically limited more by disk IO bandwidth and than anything else. So we would expect to see different response curves.

In fact, they turn out to be quite similar, with maximum throughput being reached with a relatively low number of connections:

 

MOD operations are inherently slower, so the lower maximum request rate is not a surprise.

 

 

 

 

 

 

 

 

 

 

 

Response times are also heavily influenced by the number of concurrent connections to the server.

 

 

 

 

 

Other Factors

When pushing servers to their limits, where they (hopefully) will not be operating in a production environment, it is worth noting that there are other factors which can make a noticeable difference to performance.

For example, in the search test above, three attributes were returned (sn, cn mail).

What happens if we only return one attribute (mail)?

 

 

 

 

Overall, the effect is marginal, but quite measurable.

 

 

 

 

Normal logging operations become noticeable at the limits. For example, the same authentication test as previously performed with the access log turned off:

 

Note that this is for authentications, search and bind operations only, no write activity. The effect would almost certainly be more pronounced if the same (slow) disk was used for both database and logs.

Other factors related to logging which can have a significant impact on performance are the type of logging performed (write to file, vs write to a RDBMS vs write to syslog), the level of logging and the number of logs being maintained.

 

Benchmarks – How To

The most useful benchmarks are based upon production traffic patterns, with the same mix/rate of all types of requests that will be used in practice.

It is not always possible to determine this, but best estimates are much better than measuring individual request types, or some trivial mixture.

If the test is to determine the suitability of some product to replace an existing system, using the same request/rate mix gives a base to compare the existing system to a proposed replacement.

Once the system is characterized for the expected traffic, rates and number of connections can be increased, but always try to change these independently, determining the best number of connections to achieve the maximum throughput.

Next, determine the expected maximum throughput, which hopefully will be significantly less than the server limit. Some experimentation with numbers of connections will soon determine if there is a maximum that you do not want to exceed, and careful tuning of connection pools can ensure that this is not exceeded in practice.

On load generation

In order to be certain that what is measured is the LDAP server characteristics, and not those of the LDAP client(s) some care needs to be taken in understanding the client. For example, using SLAMD it is tempting to use the “Mixed” client to measure a mix of MOD and SEARCH traffic. This will often produce somewhat disappointing results due not so much to the LDAP server as to limitations in the SLAMD client. Much better results are typically achieved by running two SLAMD jobs in parallel, one performing SEARCH operations and on MOD operations.

When testing a large, load-balanced system, several machines should be used to host clients, and care taken to ensure that CPU and/or network bandwidth is not exceeded, both on the LDAP server, LDAP clients and all intermediate network segments and network devices.

To achieve maximum throughput, LDAP client threads should be restricted to a small multiple of the number of CPUs on the machine on which they run.