Recently we had been bitten by a Systemd limitation at the “Tasks” created per-unit ie., process. This includes both the kernel threads and user-space threads, with each thread counting individually.
Am writing this blog as a reference for someone who might come across this limitation.
We have been actively working on migration DB instances, from one DC to the newly built DC .The instances on the newer DC were provisioned with the latest hardware and latest Debian OS. Below is the detailed spec of the system.
RAM : 244G
Core : 44Core
HardDisk. : SSD
IOPS : 120K
OS : Debian GNU/Linux 9 (stretch)
Kernel. : 4.9.0-8-amd64
MySQL Version : 5.6.43-84.3-log Percona Server (GPL)
We had setup MySQL on this machine, with all optimal configurations to perform the NFR(Non-Functional Requirement) Testing. The test was performed using Jmeter and we monitored the performance of the db server with PMM.
The connection limit (max_connections) was set in the DB servers as 15k . We were doing a series of test by increasing the no.of app boxes, which increases the number of threads connecting to MySQL Server. When the connection reach 6.5k to 7k mark we started to observer a few errors in the application log , mostly related to connections failures.
Previously we had a limitation with open files limit (ulimit) which affects our connection and MySQL performance and blogged it too.
As you can see in the below graph there is some discontinuity at 7K connection, this clearly shows that we are hitting a resource limit.
We started to debug the issue. At first place we had checked the mysql error log, we could see the below error.
2019-06-06 17:48:42 14704 [ERROR] Error log throttle: 19866 'Can't create thread to handle new connection' error(s) suppressed 2019-06-06 17:48:42 14704 [ERROR] Can't create thread to handle request (errno= 11) 2019-06-06 17:49:42 14704 [ERROR] Error log throttle: 17886 'Can't create thread to handle new connection' error(s) suppressed 2019-06-06 17:49:42 14704 [ERROR] Can't create thread to handle request (errno= 11)
The “errno=11” gives us the clue that its a system (kernel) related issue at the first sight.
# perror 11 OS error code 11: Resource temporarily unavailable
The task limit hit can be observed in Systemctl status for MySQL service.
# systemctl status mysql ● mysql.service - LSB: Start and stop the mysql (Percona Server) daemon Loaded: loaded (/etc/init.d/mysql; generated; vendor preset: enabled) Active: active (running) since Wed 2019-05-22 15:31:04 IST; 2 weeks 1 days ago Docs: man:systemd-sysv-generator(8) Process: 38489 ExecStop=/etc/init.d/mysql stop (code=exited, status=0/SUCCESS) Process: 38515 ExecStart=/etc/init.d/mysql start (code=exited, status=0/SUCCESS) Tasks: 6758 (limit: 6758) CGroup: /system.slice/mysql.service ├─14704 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log- └─38542 /bin/sh /usr/bin/mysqld_safe
What is TaskMax?
Below is the definition from man page:
Specify the maximum number of tasks that may be created in the unit. This ensures that the number of tasks accounted for the unit stays below a specific limit. This either takes an absolute number of tasks or a percentage value that is taken relative to the configured maximum number of tasks on the system. If assigned the special value “infinity”, no tasks limit is applied.
The system default for this setting may be controlled with DefaultTasksMax=N
To reproduce the issue we had restarted the DB servers, in order to flush the connections. As expected we had the same limitation at the same level of connections(6.5-7k)
After a few analysis we found that this limitation was imposed by “systemd”, with respect to its release version “v228”, the task limit for all services would be limited to a value of “512”
We checked our systemd version as below and found the version to be greater than V228
/var/lib/mysql# systemd --version systemd 232 +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN
Now let’s proceed to see how to increase this limit, since this would be a blocker for high concurrent environment.
The Task limit can be increased both at the system level as well as at the service level.
Increasing the limit at system level, affects all the process running.
Let us view the Task limit on process at OS level
# systemctl show --property=DefaultTasksMax DefaultTasksMax=6758
Increasing the limit at the OS level (systemd) applies to all the process running, setting at system level is online, does not require a restart and applied immediately
Edit the systemd conf file as below, with the required value of Taskmax
#vi /etc/systemd/system.conf DefaultTasksMax=10000
Then proceed to reload the system config as below
Now you can see the increased limit with the process of mysql as below
# systemctl show mysql | grep -i task TasksCurrent=25 TasksAccounting=yes TasksMax=10000
The downside of setting in the system-level is that, it might cause a race situation with other multi-threaded process if its on the same machine as the DB. If its a dedicated DB machine this would be the quickest method to fix the issue online.
Now lets see to set the limit at the process level, edit the service script of a process say mysql.service and add the below lines under the service tag as below
# systemctl edit mysql (or)
# vi /etc/systemd/system/mysql.service.d/override.conf [Service] TasksMax=10000 # systemctl daemon-reload (to reload)
Note by increasing the limit at process-level(service) needs a restart/reload of service.
General Guide lines:
- Ensure needed OS limits are set in MySQL system. Do not trust Kernel defaults, they are prone to change.
- Compute your max_connections, based on the no.of.connections per app boxes and also set the TaskMax accordingly.
- Set the time-out values accordingly in DB to close inactive client connections.
- If you don’t have a control over the connections made by application, I would recommend to have proxysql which does connection pooling & multiplexing.