Optimization and Problem Solving of Pandora FMS
Pandora FMS server can monitor about 2000 devices (between 5 and 80 thousands modules, depending on available hardware); but this also requires fine-tuning the database configuration.
This article also explains some techniques to detect and solve problems of your Pandora FMS installation.
Percona optimization
All tests and validations are performed with Percona Server for MySQL® 8 (recommended option). Due to the similarities between Percona Server for MySQL 8 and MySQL® 8 there is great compatibility between both solutions. For the practical purposes of this topic the term MySQL is used, always bearing in mind that tests are performed on Percona Server for MySQL.
To learn more about “Data Backup and Recovery in Pandora FMS”, follow this link.
General Advice
- Unless otherwise specified, this entire topic refers to MySQL version 8 .
- See also “Upgrading from MySQL 5.7 to MySQL 8”.
To work with tables larger than 2 GB, it is necessary to follow some guidelines:
- MySQL recommends using a 64-bit system. 32-bit systems may have serious problems from year 2038 onwards.
- The more RAM and the more CPU, the better the performance. In our experience, RAM is more important than CPU. The minimum for a system will be 4 GB. A good choice for a large system is 16 GB. Remember that more RAM can speed up key updates by keeping the most used key pages in RAM.
- It is a good idea to be able to remove the system in case of failure. For systems where the database is on another dedicated server for the database, use Gigabit Ethernet, preferably with fiber optics rather than copper. Latency is as important as performance.
- Disk optimization is very important for very large databases: databases and tables will have to be split on different disks. In MySQL you may use symbolic links for this. Use different disks for the system, the database and, if necessary, binary replication logs.
The use of SSD disks is recommended due to their speed and improved system latency.
- If you start the client and MySQL server are in the same machine, use sockets instead of TCP/IP connexions when connecting with MySQL (this could result in an improvement of a 7.5%). Do it without specifying the host name or the localhost when connecting with MySQL. Disable the start of the binary session and the replication if it only launches a MySQL host server.
- Pandora FMS works on MySQL and it is recommended to use the modified version of MySQL (Percona Server for MySQL), which offers better performance. The plugins programmed are for Percona®.
Please note that the following items greatly affect performance:
- Only use binary logs if you use a MySQL configuration with replication.
- Do not use logs tracing queries or slow query logs. This is only recommended in specific cases.
Disable binary replication
If you configured a Pandora FMS HA system, the binary replication is necessary. This recommendation is only valid if you have a single Pandora FMS server.
It is enabled by default on most GNU/Linux distros. To disable it, edit the my.cnf
file, usually in /etc/my.cnf
and comment the following lines:
# log-bin=mysql-bin # binlog_format=mixed
Comment both lines, and then restart MySQL Server.
Disk IO Performance
To learn more about “Data Backup and Recovery in Pandora FMS”, go to this link.
There are three very important configuration tokens, directly related to disk IO, and should be considered, because improper IO access is usually the most important bottleneck in MySQL.
innodb_log_file_size
innodb_log_file_size = 64M
This value is set by default, which can be higher (even 512M) without any risk, except for recovery in case of any problem or higher disk occupation. The default value of MySQL is 5M
, which is very low for production environments with high transaction volume. To change this value with an already running system:
- First make a complete DUMP in the databases.
- Delete the Innodb binary index files (usually in
/var/lib/mysql/ib*
). - Change file
my.cnf
with the chosen value. - Restart MySQL.
- Load the SQL DUMP.
Since the process is the same as the one to activate the innodb_file_per_table
token (described below), it is recommended to do the whole process simultaneously.
innodb_io_capacity
innodb_io_capacity = 300
This parameter has value 200 by default, but you have to know the IOPS of the system disk. You may find out exactly by looking for IOPS and the exact hard disk model, where the recommended values are: 7500RPM → 100 IOPS, 15000 RPM → 190 IOPS, SSD → 1500 IOPS. More information in this link.
innodb_file_per_table
Use a tablespace for each table:
In Percona, it is possible to store each InnoDB table and its index in its own file. This feature is called “multiple tablespaces” because each table has its own table space.
The use of multiple tablespaces can be useful for users that want to move specific tables to separate physical disks or the ones who want to restore table backups without interrupting the use of the rest of the InnoDB tables.
Multiple tablespaces can be enabled by adding this line to the mysqld
section of the my.cnf
file:
[mysqld] innodb_file_per_table
After restarting the server, InnoDB will store each new created table in its own name_table.ibd
file in the database directory to which the table belongs to. This is similar to what the MyISAM store motor does, but MyISAM divides the table in a tbl_name.MYD
data file and a tbl_name.MYI
index file.
For InnoDB, data and index are kept together in the .ibd
file. The tbl_name.frm
file should be created as usual.
innodb_file_per_table parameter affects only table creation. If you start the server with this option, then the new tables will be created using .ibd
files, but you could still have access to the existing tables in the shared tablespace. If you remove the option, then the new tables will be created in the shared space, but it will be still possible to have access to the tables created in multiple tablespaces
Avoiding Disk Flush in Every Transaction
MySQL establishes autocommit = 1
for each connection by default. This is not bad for MyISAM, since what one person writes in the disk is not guaranteed, but for InnoDB, it means that any insert / update / delete in an InnoDB table will be registered on the disk (flush).
So, would it be bad if it always writes on the disk? Not at all. It ensures that when there is any compromising event, the data will be there when the database is restored after an incident. The problem is that the DB performance is limited by the physical speed of the disk.
Given that the disk has to write the data in a disk before the writing is confirmed, this will take some time. Even when considering a searching average time of 9 ms for disk writing, it is limited to approximately 67 commits per second, which is very slow. And while the disk is busy trying for the sector to be written, it cannot read.
InnoDB can avoid some of these limitations by associating some writing together, but, even with this, this restriction still exists. You may prevent if from writing at the end of each transaction, ensuring that it uses an “automatic” writing system, which writes approximately every second. In case of failure, the data from the last second could be lost, but this is something more bearable considering that it achieves greater efficiency. To do it, use the following configuration token innodb_flush_log_at_trx_commit = 0
. It has this value in the configuration by default.
Bigger KeyBuffer size
Depending on the system total RAM, it is a very important global parameter that speeds up DELETE and INSERT instructions.
key_buffer_size = 4M
This is the default value in the configuration.
Other important buffers
There are several buffers that are empty by default in some distributions. Modifying these parameters can improve performance significantly compared to the default one. It is important to make sure that these tokens exist in MySQL configuration file.
key_buffer_size=4M read_buffer_size=128K read_rnd_buffer_size=128K sort_buffer_size=1M join_buffer_size=4M
For MySQL version 8, and later versions, MySQL development team has withdrawn support for query cache, used in previous versions of Pandora FMS; for more information, please visit the following web link:
https://dev.mysql.com/blog-archive/mysql-8-0-retiring-support-for-the-query-cache/ .
Improving InnoDB Concurrency
There is a parameter that may affect MySQL server performance with Pandora FMS. This parameter is innodb_thread_concurrency
. This parameter is used to specify how many “concurrent threads” MySQL can execute.
This is an advanced parameter and should only be modified manually if performance tuning is required on high concurrency systems.
A wrong setting of this parameter may cause it to run slower than default, so it is especially important to pay attention to several aspects:
- MySQL version: In different MySQL versions this parameter behaves very differently.
- Number of actual (physical) processors: In this regard, you may access the official MySQL documentation.
The recommended value is the number of (physical) CPUs multiplied by 2 plus the number of disks where InnoDB is located.
The value of innodb_thread_concurrency
changed in several MySQL versions, currently the default value is 0
. A value of 0
means “open as many threads as possible”. Therefore, if in doubt, it can be used:
innodb_thread_concurrency = 0
MySQL Fragmentation
Like filesystems, databases also will fragment themselves, slowing the whole system down. In a high performance system such as Pandora FMS, it is key for the database state to not affect system performance. In overloaded systems, the database could block and force the monitoring system to fall down.
Setting up MySQL server could make Pandora FMS become faster, so if you have performance problems, the reason might be a problem in MySQL Setup or problems related with the database.
Check my.cnf file
First check my.cnf
file and its basic configuration for MySQL. This configuration file is written in INI format and its location can be determined with the following command:
mysqld --help --verbose | more
my.cnf
setup should be similar to this one (4 GB RAM Server and using an average server hardware). Make sure that you have all these parameters correctly inside section [mysqld]
:
[mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock user=mysql character-set-server=utf8mb4 skip-character-set-client-handshake max_allowed_packet = 64M innodb_buffer_pool_size = 800M innodb_lock_wait_timeout = 90 innodb_file_per_table innodb_flush_log_at_trx_commit = 0 innodb_flush_method = O_DIRECT innodb_log_file_size = 64M innodb_log_buffer_size = 16M innodb_io_capacity = 100 thread_cache_size = 8 thread_stack = 256K max_connections = 100 key_buffer_size=4M read_buffer_size=128K read_rnd_buffer_size=128K sort_buffer_size=1M join_buffer_size=4M sql_mode="" [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid
If you are using MySQL 8 and do not have an HA environment, disable the binary logs with the following command in section [mysqld]
:
skip-log-bin
If there is any change in my.cnf
file, restart MySQL service.
- Check service status with
systemctl status mysqld.service
. - Take a look at the end of the
/var/log/mysqld.log
for any errors. - For more information, check the following link at MySQL website.
Restoring databases
To learn more about “Server Management and Administration”, follow this link.
When making certain modifications to my.cnf
(for example, adding the innodb_file_per_table
parameter), it is likely that the database will not work when restarting the service. If you get the following error, you should restore the previous configuration (you should use root user credentials or root user) and back up the database:
InnoDB: Error: log file ./ib_logfile0 is of different size 0 5242880 bytes InnoDB: than specified in the .cnf file 0 67108864 bytes!
1. Back up the database:
mysqldump -uroot -p pandora > /home/pandora/pandora.sql
2. Stop MySQL server and move the data to a backup folder:
systemctl stop mysql mv /var/lib/mysql /var/lib/mysql.bak
3. Create a new folder for MySQL data (the relevant permissions will be assigned in step five):
mkdir /var/lib/mysql
4. Initialize MySQL server by specifying the target folder in the parameter --datadir
. This process will generate a temporary password which must be noted down (it will be displayed by standard output or stored in /var/lib/mysqld.log
):
mysqld --initialize --datadir /var/lib/mysql
5. Assign the appropriate permissions to the new folder:
chown -R mysql:mysql /var/lib/mysql chcon -R system_u:object_r:mysqld_db_t:s0 /var/lib/mysql
6. Start MySQL service and log in with MySQL client, using the password from step four:
systemctl start mysql mysql -uroot -p
MySQL/Percona systems often do not load correctly the configuration parameters of the my.cnf
file, usually because these values have been written outside [mysqld]
section.
7. Change the user password root to the one of your choosing (here we use Pandor4!
):
ALTER USER 'root'@'localhost' IDENTIFIED BY 'Pandor4!';
8. Exit MySQL client and check that you can log in again using the new password.
9. Access MySQL client again and create the database:
CREATE DATABASE pandora; USE pandora;
10. Load the backup from the database:
SOURCE /home/pandora/pandora.sql
11. Create the access users for Pandora FMS using the same credentials as in the previous installation (here Pandor4!
is used) and then give them permissions over the database:
CREATE USER 'pandora'@'localhost' IDENTIFIED WITH mysql_native_password BY 'Pandor4!'; CREATE USER 'pandora'@'127.0.0.1' IDENTIFIED WITH mysql_native_password BY 'Pandor4!'; GRANT ALL PRIVILEGES ON pandora.* TO 'pandora'@'localhost'; GRANT ALL PRIVILEGES ON pandora.* TO 'pandora'@'127.0.0.1';
12. After having configured my.cnf
file and restarted MySQL service, you should check that these changes have been correctly applied. To that end, check the variables one by one:
SHOW VARIABLES LIKE 'innodb_log_file_size';
SHOW VARIABLES LIKE 'innodb_io_capacity';
SHOW VARIABLES LIKE 'innodb_file_per_table';
Or with a general inquiry such as:
SHOW VARIABLES LIKE "innodb%";
Once you check that Pandora FMS (both Web Console and server) can connect correctly to the database and everything works properly, you may delete mysql.bak
directory:
rm -rf /var/lib/mysql.bak
Check if isolated datafile for each table is ACTIVE
ls -lah /var/lib/mysql/pandora/*.ibd | wc -l
There should be more than 100 files there (depending on the version of Pandora FMS), each .ibd
is the data file of each table, when innodb_file_per_table
parameter is enabled in file my.cnf
. If you do not have any of these files, .ibd
means it uses a single file to store all information. The previous one means table fragmentation is also present on all tables and performance will worsen each week.
If you have your database running in a single database, first you will need to recreate the database after correctly configuring file my.cnf
and restarting MySQL.
Optimizing Specific tables
Another less “drastic” solution to solve the fragmentation issue is the use of MySQL OPTIMIZE tool to optimize certain Pandora FMS tables. To benefit from it, run right away from MySQL the following:
OPTIMIZE TABLE tagente_datos; OPTIMIZE TABLE tagente; OPTIMIZE TABLE tagente_datos_string; OPTIMIZE TABLE tagent_access; OPTIMIZE TABLE tagente_modulo; OPTIMIZE TABLE tagente_estado;
This prevents performance from degrading over time as the system works.
In large environments, the OPTIMIZE option could be “blocked”. In this case, the best option is to rebuild the DB.
After doing these operations, run:
FLUSH TABLES;
From MySQL manual:
“For InnoDB tables, OPTIMIZE TABLE is mapped to ALTER TABLE, which rebuilds the table to update index statistics and free unused space in the clustered index”.
MySQL special tokens
There are some very “special” tokens in MySQL, which can improve or worsen performance:
innodb_flush_method
:
innodb_flush_method = O_DIRECT
This important parameter has an effect on how information is written on the disk.
innodb_lock_wait_timeout
:
innodb_lock_wait_timeout = 90
This helps when there is a bottleneck, so that MySQL does not fail and stop. If it lasts more than 90 lock, there is a real problem.
Check fragmentation table by table
Using MySQL CLI, run this query:
SELECT ENGINE, TABLE_NAME,Round( DATA_LENGTH/1024/1024) AS data_length , round(INDEX_LENGTH/1024/1024) AS index_length, round(DATA_FREE/ 1024/1024) AS data_free, (data_free/(index_length+data_length)) AS frag_ratio FROM information_schema.tables WHERE TABLE_TYPE = 'BASE TABLE' AND DATA_FREE> 0 ORDER BY frag_ratio DESC;
You should get only the tables with some fragmentation index, for example:
+--------+-------------------------+-------------+--------------+-----------+------------+ | ENGINE | TABLE_NAME | data_length | index_length | data_free | frag_ratio | +--------+-------------------------+-------------+--------------+-----------+------------+ | InnoDB | tserver_export_data | 0 | 0 | 5 | 320.0000 | | InnoDB | tagent_module_inventory | 0 | 0 | 6 | 25.6000 | | InnoDB | tagente_datos_inventory | 4 | 0 | 40 | 9.8842 | | InnoDB | tsesion_extended | 1 | 0 | 4 | 3.3684 | | InnoDB | tagent_access | 2 | 7 | 27 | 2.9845 | | InnoDB | tpending_mail | 2 | 0 | 4 | 2.6392 | | InnoDB | tagente_modulo | 2 | 0 | 4 | 2.1333 | | InnoDB | tgis_data_history | 24 | 11 | 67 | 1.9075 | | InnoDB | tsesion | 2 | 0 | 4 | 1.7778 | | InnoDB | tupdate | 3 | 0 | 3 | 1.1852 | | InnoDB | tagente_datos | 186 | 194 | 399 | 1.0525 | | InnoDB | tagente_datos_string | 15 | 9 | 24 | 0.9981 | | InnoDB | tevento | 149 | 62 | 46 | 0.2183 | | InnoDB | tagente_datos | 2810 | 2509 | 65 | 0.0122 | | InnoDB | tagente_datos_string | 317 | 122 | 5 | 0.0114 | +--------+-------------------------+-------------+--------------+-----------+------------+
This query works only on tables with more than 10 % of fragmentation. Too big tables (like tagent_data
) can take a lot of time to get optimized if they are very fragmented. This may affect the production system.
Caution is advised when optimizing such large tables. A normal environment could be optimized once a year and larger environments every six months.
To optimize the tagent_module_inventory
table (in this case the database is called pandora
):
OPTIMIZE TABLE pandora.tagent_module_inventory;
A warning message will appear:
"Table does not support optimize, doing recreate + analyze instead".
If you check again, you should see fragmentation is gone:
+--------+-------------------------+-------------+--------------+-----------+------------+ | ENGINE | TABLE_NAME | data_length | index_length | data_free | frag_ratio | +--------+-------------------------+-------------+--------------+-----------+------------+ | InnoDB | tserver_export_data | 0 | 0 | 5 | 320.0000 | | InnoDB | tagente_datos_inventory | 4 | 0 | 40 | 9.8842 | | InnoDB | tsesion_extended | 1 | 0 | 4 | 3.3684 | | InnoDB | tagent_access | 2 | 7 | 27 | 2.9845 | | InnoDB | tpending_mail | 2 | 0 | 4 | 2.6392 | | InnoDB | tagente_modulo | 2 | 0 | 4 | 2.1333 | | InnoDB | tgis_data_history | 24 | 11 | 67 | 1.9075 | | InnoDB | tsesion | 2 | 0 | 4 | 1.7778 | | InnoDB | tupdate | 3 | 0 | 3 | 1.1852 | | InnoDB | tagente_datos | 186 | 194 | 399 | 1.0525 | | InnoDB | tagente_datos_string | 15 | 9 | 24 | 0.9981 | | InnoDB | tevento | 149 | 62 | 46 | 0.2183 | | InnoDB | tagente_datos | 2810 | 2509 | 65 | 0.0122 | | InnoDB | tagente_datos_string | 317 | 122 | 5 | 0.0114 | +--------+-------------------------+-------------+--------------+-----------+------------+
To be able to perform this optimization, there must be enough space on the hard disk to perform the operation. Otherwise an error will appear and the operation will not be performed.
System Load
This is more general, but you need to make sure the system IO is not a bottleneck (disk). Run the following command to collect the system information:
vmstat 1 10
Look at the last columns (CPU WA), a value higher than 10 means that there is a disk I/O problem that should be solved.
Having high CPU-US is normal, but CPU-SY should not be over 10~15.
Usually, SWAP-SI and SWAP-SO should have value zero, if not, it means the system is using SWAP memory, which degrades performance. Increase RAM or decrease RAM usage in your applications (Pandora FMS server threads, Buffers in MySQL, etc.).
Sample output of a “normal” system:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 46248 78664 154644 576800 0 0 2 147 0 9 7 10 83 0 0 0 0 46248 78656 154644 576808 0 0 0 0 49 37 0 0 100 0 0 2 0 46248 78904 154648 576740 0 0 0 184 728 2484 63 6 31 0 0 0 0 46248 79028 154648 576736 0 0 16 616 363 979 21 0 79 0 0 1 0 46248 79028 154648 576736 0 0 0 20 35 37 0 1 98 1 0 0 0 46248 79028 154648 576736 0 0 0 0 28 22 0 0 100 0 0 1 0 46248 79028 154648 576736 0 0 0 3852 141 303 0 0 98 2 0 2 0 46248 78904 154660 576660 0 0 0 188 642 2354 56 4 40 0 0 1 0 46248 78904 154660 576680 0 0 0 88 190 634 13 0 86 1 0 1 0 46248 78904 154660 576680 0 0 0 16 35 40 0 0 100 0 0 1 0 46248 78904 154660 576680 0 0 0 0 26 21 0 0 100 0 0 0 0 46248 78904 154660 576680 0 0 0 0 27 27 0 0 100 0 0 1 0 46248 78904 154724 576616 0 0 112 192 608 2214 52 4 44 0 0 0 0 46248 78904 154724 576616 0 0 0 76 236 771 16 0 84 0 0 0 0 46248 78904 154724 576616 0 0 0 20 38 38 0 0 100 0 0 0 0 46248 78904 154724 576616 0 0 0 0 31 21 0 0 100 0 0 0 0 46248 78904 154740 576608 0 0 0 3192 187 322 1 0 96 3 0 1 0 46248 79028 154756 576544 0 0 16 192 632 2087 53 5 42 0 0 0 0 46248 79028 154760 576568 0 0 0 56 255 927 19 2 79 0 0 0 0 46248 79028 154768 576564 0 0 0 20 33 44 0 0 100 0 0
MySQL Table Partitioning
To use MySQL table partitioning, you should also use the multiples tablespaces system described above (innodb_file_per_table
).
MySQL supports table partitioning, which allows very large tables to be distributed into smaller chunks, such as logical subdivisions (see MySQL manual for details).
If you have large amounts of data in Pandora FMS database (main and history) and it is estimated that many Console operations that refer to that data (for example drawing graph) are slow, then it will improve its performance using table partitioning.
Automatic partitioning
Pandora FMS automatically performs a monthly partitioning in the history database if it is configured from the Web Console, section History database, of the general configuration. For more details check this link.
Manual partitioning
After having installed Pandora FMS and enabled its history database, this will be the one that holds the largest amount of data and it is the recommended one to perform a table partitioning. Precisely the tables indicated for that are tagent_data
, tagent_data_string
, tagente_datos_inc
and tevento
. The manual practical process for the table tagent_data
is described below (see also “Automatic partitioning”).
You should check at first that the directory /var/lib/mysql/pandora_history/*.ibd
has many files (one per table). If this is not the case, you will need to dump the database, change the configuration of the my.cnf
file, restart MySQL, delete the current database and recreate it from the dump.
Once you ensure that innodb_file_per_table
is enabled, separate the two main databases into different partitions.
- This operation will need enough disk space to be completed. You will need to check how big the
tagent_data.ibd
file is. If for example it occupies 10 gigabytes, you will need 15 GB of free space to start the operation. - This operation may take a long time, depending on the size of the table. For example, it would take an hour and a half to split a table with approximately 7500 modules data for 100 days (more than 50,000,000 rows).
This is an example for partitioning all of 2023 so far and for future months. To start the process, you will need to execute the following query in MySQL CLI:
ALTER TABLE tagente_datos PARTITION BY RANGE (utimestamp) ( PARTITION Jan23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-02-01 00:00:00')), PARTITION Feb23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-03-01 00:00:00')), PARTITION Mar23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-04-01 00:00:00')), PARTITION Apr23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-05-01 00:00:00')), PARTITION May23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-06-01 00:00:00')), PARTITION Jun23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-07-01 00:00:00')), PARTITION Jul23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-08-01 00:00:00')), PARTITION Aug23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-09-01 00:00:00')), PARTITION Sep23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-10-01 00:00:00')), PARTITION Oct23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-11-01 00:00:00')), PARTITION Nov23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-12-01 00:00:00')), PARTITION Dec23 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-01 00:00:00')), PARTITION pActual VALUES LESS THAN (MAXVALUE) );
Then the following query would have to be run every month to reorganize the partitioning:
ALTER TABLE tagente_datos REORGANIZE PARTITION pActual INTO ( PARTITION Jan24 VALUES LESS THAN (UNIX_TIMESTAMP('2024-02-01 00:00:00')), PARTITION pActual VALUES LESS THAN MAXVALUE);
Changing “Jan24” to the month you are in.
Remember again that this operation could take hours, depending on how big the tagent_data
table is. You may check the process by watching the size of the partitioning files by running:
[root@histdb pandora_history]# ls -lah | grep "#sql" -rw-rw---- 1 mysql mysql 424M feb 24 05:58 #sql-76b4_3f7c#P#Jan23.ibd -rw-rw---- 1 mysql mysql 420M feb 24 05:51 #sql-76b4_3f7c#P#Feb23.ibd -rw-rw---- 1 mysql mysql 128K feb 24 05:40 #sql-76b4_3f7c#P#Mar23.ibd -rw-rw---- 1 mysql mysql 840M feb 24 05:44 #sql-76b4_3f7c#P#Apr23.ibd -rw-rw---- 1 mysql mysql 440M feb 24 05:47 #sql-76b4_3f7c#P#May23.ibd -rw-rw---- 1 mysql mysql 10M feb 24 05:42 #sql-76b4_3f7c#P#Jun23.ibd -rw-rw---- 1 mysql mysql 404M feb 24 05:56 #sql-76b4_3f7c#P#Jul23.ibd -rw-rw---- 1 mysql mysql 436M feb 24 05:54 #sql-76b4_3f7c#P#Aug23.ibd -rw-rw---- 1 mysql mysql 400M feb 24 05:49 #sql-76b4_3f7c#P#Sep23.ibd -rw-rw---- 1 mysql mysql 408M feb 24 05:52 #sql-76b4_3f7c#P#Oct23.ibd -rw-rw---- 1 mysql mysql 72M feb 24 06:03 #sql-76b4_3f7c#P#Nov23.ibd -rw-rw---- 1 mysql mysql 404M feb 24 06:03 #sql-76b4_3f7c#P#Dec23.ibd -rw-rw---- 1 mysql mysql 416M feb 24 06:00 #sql-76b4_3f7c#P#jan23.ibd
DB Rebuilding
To find out more abour Pandora FMS backup and data recovery, go to this link.
Partial Rebuilding
MySQL database management system, same as other SQL engines, such as Oracle®, is degraded with time due to causes such as data fragmentation produced by deleting and continuous insertion in large tables. In large environments, with a lot traffic volume, there is a very easy way to improve performance and prevent performance from degrading. This is rebuilding the DB from time to time.
To that end, schedule a service stop, which could last approximately 1 hour.
In this service stop, stop the Pandora FMS WEB console and the server too. Attention: you may leave the Tentacle server, so that it can still receive data and these will be processed as soon as the server works again.
Once they have been stopped, dump the DB (Export); in this example, the database is called pandora3
and the user must be root:
mysqldump -u root -p pandora3> /tmp/pandora3.sql
Delete the DB:
mysql -u root -p
DROP DATABASE pandora3; Query OK, 87 ROWS affected (1 MIN 34.37 sec)
Create the DB and import the previous data export from the dump you did at first:
CREATE DATABASE pandora3; USE pandora3; SOURCE /tmp/pandora3.sql
This may take a few seconds or several minutes, depending on the size of the database and the resources available on the machine.
It is possible to automatize this process, but, because it is very delicate, the best option is to make it manually.
Total Rebuilding
This section affects only InnoDB databases. Pandora FMS is built on Innodb databases.
Unfortunately, MySQL is degraded with time, and this affects the global system performance.There is no other solution that does not involve rebuilding all the database outlines from scratch, rebuilding the data binary file that MySQL uses to store all the information and the files used to rebuild the transactions.
If you take a look at the /var/lib/mysql
directory, you may see that there are three files, that have always the same name, and that are, depending on the severity of the case, huge. In this example:
-rw-rw---- 1 mysql mysql 4.8G 2012-01-12 14:00 ibdata1 -rw-rw---- 1 mysql mysql 5.0M 2012-01-12 14:00 ib_logfile0 -rw-rw---- 1 mysql mysql 5.0M 2012-01-12 14:00 ib_logfile1
The ibdata1
file is the one that stores all the system Innobd data. In a very fragmented system that has not been “rebuilt” or “installed” for a long time, this system will be big but little efficient. The innodb_file_per_table
parameter, that was mentioned before, regulates part of this performance.
Similarly, each database has in the /var/lib/mysql
directory, a directory to define its structure. Delete them too.
The process is quite easy:
- Dump (via mysqldump) all the schema to the disk:
mysqldump -u root -p -A> alldata.sql
- Stop MySQL.
- Delete
ibdata1
,ib_logfile0
,ib_logfile1
and the InnoDB database directories - Start MySQL.
- Create
pandora
database again (create database pandora;
) - Import the backup file (
alldata.sql
)
mysql -u root -p
CREATE DATABASE pandora; USE pandora; SOURCE alldata.sql;
The system should work faster now.
Optional Indexes
There are some situations when you can optimize MySQL performance, but giving up other system resources.
This index optimizes speed on graph rendering (a lot), but it uses more disk storage space, and could entail a slightly decrease on INSERT/DELETE operation, due to the Index overhead:
ALTER TABLE `pandora`.`tagente_datos` ADD INDEX `id_agente_modulo_utimestamp` (`id_agente_modulo`,`utimestamp`);
At the moment, in the heaviest tables of Pandora FMS in MySQL, this optimization is there by default. It is advisable to have expert help to optimize MySQL tables.
Slow queries study
In some systems, depending on the type of information you have, you may find some “slow queries” that make the system work worse. You may enable logs of this type of queries over a short period of time (since it harms the system performance) in order to consider trying to optimize queries to tables with indexes. To enable these settings, do the following:
- Edit
my.cnf
and add the following lines:
slow_query_log=1 long_query_time=2 slow_query_log_file=/var/log/mysql_slow.log
- To be able to use it and set the admin rules:
touch /var/log/mysql_slow.log chown mysql:mysql /var/log/mysql_slow.log chmod 640 /var/log/mysql_slow.log
- Restart MySQL.
- When finishing analyzing which ones are the slow queries, remember to reset the file
my.cnf
commenting the aggregated lines and restarting again MySQL service.
External references
Measuring Pandora FMS for High Capacity
This section describes different methods to configure Pandora FMS in a high capacity environment. It also describes different tools to make load tests, which are useful to adjust the environment to the highest possible process capacity.
Pandora FMS has been configured to support a load of around 2500 agents in systems where database, console and server are in the same machine. The maximum recommended number is around 2500 agents (about 60,000 modules) per system, but this number varies greatly depending on whether they are XML agents, remote modules, with high or low intervals, or with systems with high capacity or low memory.
All factors greatly alter the number of agents that a system can manage efficiently. In laboratory tests, 10,000 agents have been executed in a single server with basic hardware, but strongly optimized.
Example of High Capacity Server Configuration
Assuming a RHEL 8 machine with 16 GB of RAM and 8 CPU to be optimized for the maximum processing capacity of the Dataserver (XML).
my.cnf
Only the most important parameters are shown.
[mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock user=mysql character-set-server=utf8mb4 skip-character-set-client-handshake # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 # MySQL optimizations for Pandora FMS # Please check the documentation in http://pandorafms.com for better results max_allowed_packet = 64M innodb_buffer_pool_size = 6400M innodb_lock_wait_timeout = 90 innodb_file_per_table innodb_flush_log_at_trx_commit = 0 innodb_flush_method = O_DIRECT innodb_log_file_size = 64M innodb_log_buffer_size = 16M innodb_io_capacity = 300 thread_cache_size = 8 thread_stack = 256K max_connections = 100 key_buffer_size=4M read_buffer_size=128K read_rnd_buffer_size=128K sort_buffer_size=1M join_buffer_size=4M sql_mode="" [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid
pandora_server.conf
Only the most relevant parameters are shown.
verbose 3 server_threshold 5 xxxxserver_threads 8 max_queue_files 5000
Aspects to take into account:
- The number established in the
verbose
parameter refers to the amount of information that is written in the logs, being advisable not to exceed 3. The higher the number, the lower the performance of Pandora FMS due to the large amount of information to write in the logs.
- A high value (15) of the
server_threshold
parameter causes the DB to suffer less, while the increase in the maximum number of files processed causes the server to seek files and fill the buffers every time. These two elements of the configuration are closely linked. In the case of optimizing the network server, it may be interesting to lower theserver_threshold
to 5 or 10.
- The very high number of threads (more than 5) set in
xxxxserver_threads
' only benefits processes with long I/O waits, such as the network server or plugin server. In the case of the Dataserver, which is in process all the time, setting too many threads can even hurt performance. On systems with a slow DB: try different combinations between 1 and 10; with faster disks and multi-core CPUs, it could be increased. In the case of optimizing the system for the Networkserver, the number can be much higher, between 10 and 30.
Capacity analysis Tools (Capacity)
Pandora FMS has several tools that may help you to measure properly its hardware and software for the amount of data it expects to obtain. One of them is useful to “attack” directly the database with fictitious data (dbstress) and the other generates fictitious XML files (xml_stress).
Pandora FMS XML Stress
This is an small script that generates XML data files like the ones sent by Pandora FMS agents. By default it is placed on:
/usr/share/pandora_server/util/pandora_xml_stress.pl
The scripts read agent names from a text file and generate XML data files for each agent according to a configuration file, where modules are defined as templates.
Modules are filled with random data. An initial value and the probability of the module data changing may be specified.
Run the script like this:
./pandora_xml_stress.pl <configuration file>
Sample < configuration file >
called pandora_xml_stress.conf
:
# Maximum number of threads, by default 10. max_threads 10 # File containing a list of agent names (one per line). agent_file agent_names.txt # Directory where XML data files will be placed, by default /tmp. temporal /var/spool/pandora/data_in # Pandora FMS XML Stress log file, logs to stdout by default. log_file pandora_xml_stress.log # XML version, by default 1.0. xml_version 1.0 # XML encoding, by default ISO-8859-1. encoding ISO-8859-1 # Operating system (shared by all agents), by default Linux. os_name Linux # Operating system version (shared by all agents), by default 2.6. os_version 2.6 # Agent interval, by default 300. agent_interval 300 # Data file generation start date, by default now. time_from 2009-06-01 00:00:00 # Data file generation end date, by default now. time_to 2009-06-05 00:00:00 # Delay after generating the first data file for each agent to avoid # race conditions when auto-creating the agent, by default 2. startup_delay 2 # Address of the Tentacle server where XML files will be sent (optional). # server_ip 192.168.50.1 # Port of the Tentacle server, by default 41121. # server_port 41121 # Module definitions. Similar to pandora_agent.conf. module_begin module_name Module 1 module_type generic_data module_description A long description. module_max 100 module_min 10 module_exec type=RANDOM;variation=60;min=20;max=80 module_end module_begin module_name Module 2 module_type generic_data module_description A long description. module_max 80 module_min 20 module_exec type=SCATTER;prob=1;avg=40;min=0;max=80 module_end module_begin module_name Module 3 module_type generic_data module_description A long description. module_max 80 module_min 20 module_exec type=CURVE;min=20;max=80;time_wave_length=3600;time_offset=0 module_end module_begin module_name Module 4 module_type generic_data_string module_description A long description. module_max 100 module_min 10 module_exec type=RANDOM;variation=60;min=20;max=80 module_end module_begin module_name Module_5 module_type generic_proc module_descripcion Module 3 description. # Initial data. module_data 1 module_end
Send and Receive Agent Local Configuration
If you activate in your pandora_xml_stress.conf
the get_and_send_agent_conf
configuration value to 1, you may make the test load agents work as normal agents, so that they send their configuration file and also the MD5 hash. From PFMS Web Console you may modify the remote configuration for subsequent runs of pandora_xml_stress
.
Besides this, you may configure where to store locally the .conf
files of your testing agents with the directory_confs
configuration token in pandora_xml_stress.conf
file.
Configuration File
- max_threads: Number of threads where the script will be executed. This improves the E/S.
- agent_file: Path of the name list file path, separated by a new line.
- temporal: Path of the directory where made-up XML data files are generated.
- log_file: Path of the log where it will report about its execution script.
- xml_version: Version of the XML data file (by default 1.0).
- encoding: XML data file encoding (by default ISO-8859-1).
- os_name: Name of the made-up agent Operative System (Linux by default).
- os_version: Version of the made-up agents Operative System (2.6 by default)
- agent_interval: Interval of the made-up agents in seconds (300 by default).
- time_from: Time from which made-up XML data files are generated, in format
yyyy-MM-dd HH:mm:ss
. - time_to: Time until which made-up XML data files are generated, in format
yyyy-MM-dd HH:mm:ss
. - get_and_send_agent_conf: Boolean value 0 or 1. When it is active, the made-up agents will try to download by remote configuration a more updated version of the standard configuration file of an agent. And they can be edited through Pandora FMS Web console.
- startup_delay: Time numeric value in seconds before each agent starts to generate files. It is used to avoid race conditions.
- timezone_offset: Numeric value of the time zone offset.
- timezone_offset_range: Numeric value that is useful to generate the timezone in this range randomly.
- latitude_base: Numeric value. It is the latitude geographic area to be used to define fictitious agents.
- longitude_base: Numeric value. It is the longitude geographic area to be used to define fictitious agents.
- altitude_base: Numeric value. It is the altitude geographic area to be used to define fictitious agents.
- position_radius: Numeric value. It defines the radius of the circumference in which the geographic position of the fictitious agent will be set (randomly and based on parameters latitude_base and longitude_base).
The definition of one module in the script configuration file. If remote configuration was enabled, it will also be the same. It is:
module_begin module_name <module_name> module_type <module_type_data> module_description <description> module_exec type =<xml_stress_type_generation>;<another_option>;<another_option> … module_unit <units> module_min_critical <value> module_max_critical <value> module_min_warning <value> module_max_warning <value> module_end
Each one can be configured as:
- module_attenuation <value>: The generated module value is multiplied by the specified value, usually between 0.1 and 0.9.
- module_attenuation_wdays <value> <value> … <value>: The module value is only attenuated during the given days, ranging from Sunday (0) to Saturday (6). For example, the following module simulates a 50 % drop in network traffic on Saturdays and Sundays:
module_begin module_name Network Traffic module_type generic_data module_description Incoming network traffic (Kbit/s) module_exec type=RANDOM;variation=50;min=0;max=1000000 module_unit Kbit/s module_min_critical 900000 module_attenuation 0.5 module_attenuation_wdays 0 6 module_end
- module_incremental < value >: If set to 1, the module's previous value is always added to a new value, resulting in an increasing function.
- Others: See below which options are available, depending on the execution type.
RANDOM
These have the following options:
- variation. Probability percentage of change regarding the previous value.
- min. Minimum value that the value could have.
- max. Maximum value that the value could have.
Numeric
It generates random numeric values between the range values min and max
Booleans
It generates values between 0 and 1.
String
It generates a length string between values min and max. The characters are random between A and Z and capital, lower case letters and also numeric ciphers are included.
External data source (SOURCE)
It allows to use a plain text file as a data source. Options:
- src: Source data file.
The file contains one data per line, there is no limit for lines. For example:
4 5 6 10
Both numbers and strings are allowed as values. These kinds of modules will use file data to generate module data in Pandora FMS. Data are retrieved sequentially. For example data above will be shown as follows:
4 5 6 10 4 5 6 10 4 5 6 10 4 5 6 10 4 5 6 10 4 5 6 10
SCATTER
It is only useful for numeric data, and the generated graphics are similar to the ones of a heartbeat, that is, a normal value, and from time to time a “beat”.
It has the following options:
- min. Minimum value that the value could have.
- max. Maximum value that the value could have.
- prob. Probability percentage that it generates a “beat”.
- avg. Average value that should be shown by default if there is no “beat”.
CURVE
It generates module data following a trigonometric curve. They have the following options:
- min: Minimum value that the value could have.
- max: Maximum value that the value could have.
- time_wave_length: Numeric value in seconds of the duration of the “crest” of the wave.
- time_offset: Numeric value in seconds from the starting point of the wave from time zero with module value zero (similar to the sine graph).
Note: This tool is preconfigured to look for, in all agents, “random”, “curve” or boolean name modules that use an interval between 300 seconds and 30 days.
How to measure Data server Processing Capacity
There is a small script called pandora_count.sh
that is found in the /usr/share/pandora_server/util/
directory in Pandora FMS server directory. This script is used to measure the processing rate of XML files by the data server, and it uses as reference all the files yet to be processed at /var/spool/pandora/data_in
, so to be able to use it, thousands of packages yet to be processed are needed (or they must be generated with the tool mentioned before). Once running, you may stop it pressing the keys CTRL+C.
This script simply calculates the file processing rate of the server. It is a somewhat crude measure, but it is useful to get an idea of the effectiveness of the server configuration settings.
Pandora FMS DB Stress
This is a small tool to test database performance. It could also be used to generate periodical or random data (using trigonometric functions) and fill in made-up modules.
Create an agent and allocate the modules to that agent for automatic data injection with this tool. The names should be these ones:
- random: To generate random data.
- curve: To generate a matching curve using trigonometric functions. It is useful to use the interpolating work with different intervals, etc.
- boolean: To generate random boolean data. That way, it is possible to use any name that contains the words: «random», «curve» and/or «boolean». For example:
- random_1
- curve_other
Only the “data_server” module type can be chosen.
Pandora FMS DB Stress Fine Adjustment
This tool is preconfigured in order to search, in all agents, the module names “random”, “curve” or “boolean”, that use an interval between 300 seconds and 30 days.
If you wish to modify this performance, edit pandora_dbstress
script and change some variables at the beginning of the file:
# Configure here target (AGENT_ID for Stress) my $target_module = -1; # -1 for all modules of that agent my $target_agent = -1; my $target_interval = 300; my $target_days = 30;
- The first line of the variable corresponding with
target_module
should be fixed for a fix module or-1
to process all the matching targets. - The second line of variable must match
target_agent
, for a specific agent. - The third line must match
target_interval
, defined in seconds and which represents the module predefined periodical interval. - The fourth line is
target_days
and represents the number of days in the past since the date, in the current timestamp.
Diagnostic tools in Pandora FMS
Sometimes, there are problems for which direct help from Pandora FMS support is needed. To fmake communication with the Support team easier, Pandora FMS servers have some tools for this.
Diagnostic Info
This tool is located in the Management → Admin tools → Diagnostic Info section, and is conceived to provide the most important information about Pandora FMS configuration and its database.