Connection pool traps on containerization

10 min readJan 22, 2023

Connection pools are common across all programming languages and are awesome to increase the application’s performance by reusing the database connections, avoiding the costlier affair of getting new connections for each database calls.

Nostalgic…..!!!

Couple of years before, I wrote an article to avoid the connection pooling usage in microservices like the way we tend to use in monolithic applications and look for alternative ways to implement. Some people still follow the same old path in cloud/containerization, without considering the unknown knowns it brings, especially on cloud native. Whether it is a containerized application or a computing service based, consequences are similar with variances in degree.

In this article, let us look into the pitfalls it brings in containerization projects specifically in Kubernetes or openshift and also the cost it adds.

Containerization

This word refers either an existing monolithic application/microservice embedded inside a docker container( or any containers) and deployed inside an orchestrator like Kubernetes / mesos/Openshift .

Containerized applications are portable across cloud and often referred as cloudnative application. They can be deployed into on-prem kubernetes or on cloud based services like Amazon EKS, Azure Kubernetes Service (AKS), Google Kubernetes Engine (GKE), Amazon ECS, Google Cloudrun etc.,

Connection pool working

I will not dive deep into the working of connection pool, but for a quick recap on its working, refer a picture here — as ‘Images speak a thousand words’ . (paxdiablo gave an awesome description)

Image source. Represents a single connection pool configured in application server

Connection pool size management

The connection pool is implemented with a minimum and maximum limit, the pool automatically create new connections to replace the ones that were closed, resulting increase in CPU and memory usage.

When a connection in a connection pool increases, the CPU and memory usage will also increase. This is because each new connection requires additional resources to maintain and manage. The CPU will be used to handle the communication between the connection and the database, and the memory will be used to store the connection and any related data. If the number of connections in the pool continues to grow, it could lead to increased resource usage and potential performance issues.

When a connection decreases, the CPU and memory usage will also decrease. The CPU will have less work to do, as it will not be managing as many connections and the memory will also have less usage as it will not be storing as much data for those connections.

Connection pooling in containerized application

After dockerising monolithic application and addressing the cross cutting concerns using config maps, secrets etc., the need for external config files like standalone.xml(in jboss) gets eliminated and the connection pool often gets missed or unanswered or overlooked due to this elimination.

Mostly this connection pool miss gets identified during performance tuning and most of the folks rush back to incorporate the standalone.xml file and store the db credentials also here (username / password). The complete benefits of configmaps, secrets and IRSA provided by some cloud providers are lost here.

Only advantage we see in retaining the connection pool at the application server level while containerising, is to have quick win. The downside results in deferred crisis due to non consideration of the consequence factors.

AFIK this approach is made even by well experienced Architects without looking into the underlying trap and I had seen this approach used in so many applications. This warranted me to share my thoughts on this crux.

Fundamental difference

Basically the working behaviour of connection pool differs among containerised application and standalone web/application server, due to the application’s deployment model. Following the same model for connection pooling used in monolithic application deployment brings additional trouble with huge cost when deployed as containerised application or in a cloud native.

For our understanding let us consider a single deployment of any monolithic application in a data center.

Let us take an example monolithic java application which was deployed in Jboss Application server having a connection pool with 10 connections to postgresql database. The connection pool is configured in standalone.xml file of jboss which configures min and max connections. This jboss application server is deployed in a single node which has 4 GB RAM and a dual core single CPU.

Here the jboss application server refers to the connection pool configured in it with a minimum connection of 10 and maximum connection of 50 (sizing of connection pooling is a different topic).

Let us assume a 4 GB RAM total memory on the application server discounting 1 GB for JBOSS, OS and other resources we have remaining 3.85 GB memory. (connection pool min connections 10 x 15 MB = 150 MB).

This whole remaining memory is available for expanding the connection pool if a need arise to increase, due to heavy load.

Regular monitoring of the connection pool happens and adjustments are made to ensure optimal performance which are done periodically by the administrators. Increase or decrease of the connection size are made by changing the configuration file at the application server(Jboss) level. No code rebuild or redeployment of the application needed for this.

Now let us see how the same application gets deployed after containerization. Let us simply consider a kubernetes environment (Amazon EKS) or openshift environment in aws cloud (Red Hat OpenShift Service on AWS (ROSA)) as both provide same orchestration functionality. Unless otherwise needed, all the pods will be deployed with minimum three replicas by defacto standard, so we also assume it for seeing our connection pool issues.

So in containerised jboss deployment, three pods has three connection pools, each with minimum 10 and maximum 50, already starting with a minimum of 30 connections at the database compared to 10 connection with the Monolithic application connection pool load.

Simply adding the Jboss connection pool as we do in monolithic invites heavy problem. Even if one or two clients are using, there is a cost involved.

Pod limits refer to the maximum number of resources (such as CPU and memory) that a Kubernetes pod can consume. These limits can be configured to ensure that pods do not consume more resources than available, preventing resource contention and ensuring that all pods have the resources they need to function properly.

Default pod limits are not set in any kubernetes or openshift by default and has to be defined explicitly.

Problem 1

‘Baking the db connection pool configuration inside container images’. — What does this mean?

Images should be “generic” in the sense that they should be able to run in any environment. This is a good practice even for non containerised applications and this practise is part of the 12-factor app. Containerised applications also should be built once and then promoted from one environment to another. No configuration should be present in the container itself.

In our example application as we already embedded the connection pool inside the container, if the connection pool size has to be increased above the Minimum size defined, the image has to be rebuilt every time and deployed after changing in the config file(standalone.xml) .

Problem 2

In our Kubernetes deployment, if we had not defined limits explicitly, any pod (say pod1) can consume all of the resources available on the node on which it is running. (for example, if a pod is running on a node with 4 CPU cores and 8 GB of memory, by default, the pod can use all 4 cores and 8 GB of memory).

At this point, if the connection pool in pod 2 tries to expand its pool, it lands into trouble of resource starvation, since been running in the same node. This leads to poor performance or crashes of the pod or even the node.

This crash can have a significant impact on the performance and stability of the system

Pain point1 — Loss of Connections: The connection pool manages the connections between the jboss server in pod2 and the database server. If the connection pool crashes, connections will be lost and the jboss pod2 will not be able to communicate with the database server.

Pain point 2 — Data Integrity Issues: The crash can cause data integrity issues. This can happen if the jboss pod 2 is in the middle of a transaction and the connection is lost resulting in data been lost or the transaction rolled back.

Pain point 3 — Improper connection closure : The crash will not shut down the jboss properly and the connections are not returned to the database creating stale connections.

Problem 3

In our Kubernetes deployment, if we had defined limits explicitly, any pod (say pod 1) cannot consume all of the resources available on the node and instead can consume only to the maximum limit defined in the pod. In our case let us define 1 GiB of memory and 900 mi cpu per pod.

Let us assume that our jboss pod 1 had already consumed 850 MiB of memory including the minimum 10 connection pools. Due to additional incoming requests the remaining 150 MiB of memory is occupied and if a new connection is requested for increasing the size of Connection, there will be no memory, as already the allocated 1 GiB is consumed by Pod 1. Two things can happen here

The kubelet running in the node will start a process called “OOM (Out of Memory) Killer” to reclaim memory. The OOM killer will identify and terminate the process that is using the most memory in the pod, in an attempt to bring the pod’s memory usage back within its limits. In this case if the process is waiting for some dependency, that may get killed and results may be disastrous.
If the pod continues to exceed its memory limit after the OOM killer has been invoked, Kubernetes will start to evicts pods. This is called OOM eviction. The pod will be terminated and its status will be changed to “Out of Memory”.

It’s important to note that the OOM killer may not always be able to reclaim enough memory to bring the pod back within its limits, and in some cases, the pod may be terminated despite the OOM killer’s efforts.

If a pod running a JBoss container is evicted due to an Out of Memory (OOM) condition, the JBoss container will be terminated by Kubernetes and will not have a chance to cleanly close or return the database connections, causing the connections to be abruptly terminated and potentially causing data loss or incomplete processing.

The incoming requests will be routed to the three pods in a round robin fashion by default by Kubernetes services. So if pod 1 is in trouble and trying to do either OOM process, the connections may not be available for the newly received requests and this may gets piled up.

Problem 4

In our Kubernetes deployment, due to additional node availability, another approach of Pod rebalancing may also be attempted.

Pod rebalancing in a Kubernetes cluster can potentially impact the connection pool of a JBoss container running on the pod. When a pod is moved to another node, the JBoss container’s connections to these resources may be lost and need to be re-established.

This can cause a temporary interruption in the availability of the application, and can also increase the load not only on the database but also on other resources being connected to.

Pod rebalancing can have some cost implications in some cloud provided cluster Services (Red Hat OpenShift Service on AWS (ROSA), Amazon EKS), depending on the size of the cluster and the number of pods being moved.

When a pod is moved to another node, the data stored in the pod’s local storage will have to be transferred to the new node. This can add some network traffic and increase the data transfer costs.

Additionally, if the pod is being moved to a different availability zone, there will be additional data transfer costs. This is because data transfer between availability zones or regions is charged separately from data transfer within an availability zone or region.

Also, when a pod is moved to another node, the pod will experience a brief period of downtime as it is being rescheduled. This can impact the availability of the application and may result in lost revenue or other costs.

Impact on Database

So far we had been discussing the impact on the containers caused by implementing the connection pool in its old way.

Database is another victim, often unnoticed and majorly affected by this containerised connection pool problem. Depending on the specific configuration and implementation of the Database’s connection pool, the connections may be terminated abruptly, causing any in-progress transactions to be rolled back and resulting in potential data loss. Some key problems might include

Increased Load on the Database: Containerized application, will cause the database to allocate a minimum of 3 times connections during application bootstraps. Scaling up of pod also creates additional connection pool and this cause additional cpu and memory usage irrespective of the connection pools usage. Additionally, database may not be available if the connection limit at the database exceeds its limit (RDS postgresql creates 20 connections per core cpu).
Reduced Database Performance: As a result of the increased load on the database server, the overall performance of the system may be reduced. This can lead to longer response times and a decrease in the number of requests that can be handled by the system.
Stale connections: Happens when the Jboss pod1 is crashed or engaged in eviction, since the connections from the jboss pod 1 connection pool are not closed.
Connection Starvation: Stale connections also gets created due to jboss pod scale down activity, without gracious shutting down due to default setting time. As a result, starvation for new connections arises and becomes more challenging to provide to the new pods created, during the time of pod scaleup/recreation or pod shifting.
Database Lockouts: If the connection pool embedded POD crashes, it can cause the database server to lockout connections. This can happen if the database server has a maximum number of connections it can handle at a time and the webserver is trying to establish new connections.
Amazon RDS postgresql /Amazon Aurora postgresql by default will not kill or timeout the staled connections. Due to this, the database service itself might get affected. Additionally, new connections may not be able to be established until the connection pool is restored or replaced.

Enough problems so far😔

Summary

In a nut shell, the connection pooling approach used in monolithic applications should not be used in containerized applications or in cloud-native environments (even in lift and shift), as it leads to increased cpu, memory usage with potential bottlenecks of additional trouble and high cost.

We will discuss various solution options on how to solve this problem by well architecting in another article.