Untangle connection pool knots in Cloud container applications

6 min readMay 13, 2023

Containers are ephemeral. They can be created or destroyed at any time, which can cause problems for both connection pools in pods and connected databases. A dead connection can persist and affect availability to other pods and applications. Using a traditional connection pool on containerized applications poses a lot of challenges and has significant implications for performance and stability. It’s vital that you address these potential issues properly.

Traditional connection pool approach

A Stack Overflow user named “Paxdiablo” gave an excellent description of the general workings of the connection pool, as illustrated in the following image

Figure 1: A single configured connection pool (source)

However, using the traditional approach for containers brings hidden problems, including:

Increased requirements for image rebuilding for connections
Resource starvation of pods
Stale connection issues due to pod crashes and improper connection closure
Data integrity issues and lost connections
“Out of Memory Killer” issues
Pod rebalancing causing connection and performance bottlenecks in containerization

In addition, a traditional container approach can cause database issues, including:

Non-availability of the database service
Database lockouts
Reduced database performance
Connection starvation
Stale database connections
Increased database load

More details of the connection pool problems in cloud containerised application was discussed in my previous article in medium on the topic connection pool traps on containerization

Cloud deployment models for connection pooling

There are two main models for connection pooling: containerization and Kubernetes.

Containerization model

In this model, the applications are packaged into containers with a Docker base image (or Podman, or any other container technology). The configuration items should be externalized outside Docker based on ConfigMaps, secrets, and other elements, and the containers can take any approach to load this information during runtime.

Typically, the containers are deployed on containerized cloud services such as AWS Elastic Container Service (ECS) or AWS Fargate. In AWS ECS, containers are deployed directly and orchestrated using tasks, scaling up or down.

Kubernetes model

The Kubernetes model for connection pooling can use plain Kubernetes or virtual machines such as AWS Elastic Compute Cloud instances, or be managed by the AWS Elastic Kubernetes Service. Following are strategies that can be applied to Kubernetes deployed in AWS.

Strategy 1: Offload connection pool as a StatefulSet workload

Create a connection pool and deploy it as a workload inside Kubernetes. All other services can call this pool and get a connection. The connections will be returned and reused by other pods.

It’s easier to manage your connection pools because they are already offloaded to a separate workload, with minimum and maximum connections configured externally through ConfigMaps, and credentials through secrets. Problems with pod rescheduling, replication, or crashing will not impact the connection pool.

Figure 2: Offload a connection pool using a StatefulSet workload

However, there are still multiple challenges:

The connection pool pod must be created as a StatefulSet. Unlike a Deployment, which replaces pods when they fail or are updated, StatefulSets maintain the identity of each pod across updates and failures. Each pod in the StatefulSet can maintain a pool of connections to a shared database with stable network identities maintained by Kubernetes, which improves reliability and performance. Persistent storage preserves the connection pool’s state across restarts and node failures.
Though the maximum and minimum connections must be synced with ConfigMap, whenever there is a change to prevent a client pod from crashing you need to carefully tailor the connection pool’s graceful shutdown with the StatefulSet.
Pod rescheduling issues are not completely eliminated. It’s important to configure the StatefulSet and persistent storage correctly and to monitor the cluster for performance and availability. Additionally, you might need to implement load balancing or network partitioning solutions to ensure high availability and connection pool performance. You will also need to consider issues around multinode deployment, as described below.
Network partitioning: In the event of a network partition, some pods in the StatefulSet could become unavailable, causing connection pool disruptions that can result in increased latency and lost connections for all client pods.
Persistent storage: If the persistent storage used by a StatefulSet is not configured correctly, the connection pool state could be lost when pods are rescheduled to another node.
Load balancing: The connection pool might not be evenly distributed across nodes in the cluster, leading to imbalanced loads and potential performance degradation.
Network latency: Network latency between nodes in the cluster can affect the performance of the connection pool and lead to increased latency.

Strategy 2: Offload connection pool as a DaemonSet

When you implement a database connection pool using a DaemonSet, you need to consider issues such as connection management, thread safety, and resource utilization. This strategy has only one advantage compared to StatefulSet: it ensures even distribution of the connection pool across nodes during scale-up and scale-down activities in the cluster.

During node autoscaling, the DaemonSet pods will be automatically created or deleted based on the changes to the number of nodes in the cluster, ensuring that the DaemonSet continues to run a single instance of the connection pool pod on every node in the cluster.

To create a DaemonSet for a connection pool, you need to create a stateless pod only for the connection pool (for example, you can use any open source database connection pool, such as PgBouncer). You then define a DaemonSet that specifies the connection pool pod and deploy it to run a single instance of the pod on every node in the cluster.

Note: We do not recommend strategy 2 because there is still the potential for stale connections.

Figure 3: Offload a connection pool using a DaemonSet

Strategy 3: Use a cloud-native connection pool

AWS RDS Proxy is a proxy service that lets you avoid all the problems we discussed above regarding containerized connection pools. It is a fully managed, highly available database proxy service providing a connection pooler for Amazon Relational Database Service (RDS) instances. It enables multiple applications to reuse existing database connections and helps improve performance.

When you use RDS Proxy, applications connect to the proxy service and get a connection from the proxy service’s pool of database connections. RDS Proxy provides several features to help improve the performance and security of database connections, including:

Connection pooling: Automatic connection management in the pool for optimal and efficient reuse.
Connection tracking: Tracking of connections between the application and the RDS instance, providing a detailed view of the connections and helping to identify issues or performance bottlenecks.
Read replica promotion: Automatic promotion of read replicas to primary instances in the event of a failover, allowing applications to continue reading from a database during an outage, resulting in no change to the end-point name.
Authentication: Support for authentication mechanisms, such as AWS Identity and Access Management authentication, to help secure database connections.

Figure 4: Using AWS RDS Proxy for a connection pool

Conclusion

Using AWS RDS Proxy avoids the problems that come with traditional connection pooling. AWS RDS Proxy improves performance, reliability, and security, and reduces management overhead for containerized applications that access RDS. This strategy is the simplest and easiest of the three that we’ve presented and we recommend it for anyone who needs to handle database connection pool issues in their container implementation.

This article was originally published in developer.ibm.com blog post