Investigate why the number of connections to the prod sandbox database keep growing and reaches the max threshold of 600

Description

None

Activity

Show:
Gopal Menon
August 11, 2020, 11:06 PM
Edited

org.hspconsortium.platform.api.multitenant.db.DataSourceRepository is used for storing associations of tenant id to a connection pool in a concurrent map. The connection pool has a limit of 10. The map has an intended limit of 15. This should theoretically limit the number of connections to 10 x 15 = 150. However, there is a bug in the part that returns a connection pool object. It checks to see if the number of entries in the map is equal to the cache size limit of 15, and if it is the oldest thread pool object is removed after its connections are closed. The problem is that this check is not thread safe and lets the cache grow past the limit of 15. Once it crosses the limit, since the check is to see if the size is equal to 15 and not greater than 15, the cache keeps growing. This causes the number of connections to keep increasing.

The attachment above shows the cache size staying at the limit of 15 and then crossing over to 16.

As shown in the second attachment the cache keeps growing once it crosses the limit. It is currently at 29.

Gopal Menon
August 11, 2020, 11:11 PM
Edited

To fix this, we would need to fix the problematic code below so that it will remove the oldest used thread pool if the cache size is greater than or equal to the limit. Currently it only checks for the equal to condition. If we want to ensure that the limit does not cross 15 under any circumstance, we would need to keep removing the oldest used thread pool till the number reaches the limit of 15. However that could also result in the number going below 15 and degrading performance

Gopal Menon
August 12, 2020, 4:34 PM

Talked about it in standup. We will wait till the next maintenance window to put in the fix.

Gopal Menon
August 15, 2020, 12:15 AM

Shilpy and I pushed the fix to test and prod environments. The data source cache is staying at 15 and is not increasing further. Database connections are at 130. The limit for number of connections is 15 x 10 = 150. Will monitor it and move it to Done status later.

Gopal Menon
August 17, 2020, 3:09 PM

The number of connections is currently 184; higher than the theoretical maximum of 150. However, the the data source cache size is still at 15 and the fix looks to be working.

Assignee

Gopal Menon

Reporter

Gopal Menon

Labels

None

Priority

Major
Configure