Optimization and Problem Solving of Pandora FMS

Pandora FMS server can monitor about 2000 devices (between 5 and 80 thousands modules, depending on available hardware); but this also requires fine-tuning the database configuration.

This article also explains some techniques to detect and solve problems of your Pandora FMS installation.

Percona optimization

All tests and validations are performed with Percona Server for MySQL® 8 (recommended option). Due to the similarities between Percona Server for MySQL 8 and MySQL® 8 there is great compatibility between both solutions. For the practical purposes of this topic the term MySQL is used, always bearing in mind that tests are performed on Percona Server for MySQL.

To learn more about “Data Backup and Recovery in Pandora FMS”, follow this link.

General Advice

To work with tables larger than 2 GB, it is necessary to follow some guidelines:

  • MySQL recommends using a 64-bit system. 32-bit systems may have serious problems from year 2038 onwards.
  • The more RAM and the more CPU, the better the performance. In our experience, RAM is more important than CPU. The minimum for a system will be 4 GB. A good choice for a large system is 16 GB. Remember that more RAM can speed up key updates by keeping the most used key pages in RAM.
  • It is a good idea to be able to remove the system in case of failure. For systems where the database is on another dedicated server for the database, use Gigabit Ethernet, preferably with fiber optics rather than copper. Latency is as important as performance.
  • Disk optimization is very important for very large databases: databases and tables will have to be split on different disks. In MySQL you may use symbolic links for this. Use different disks for the system, the database and, if necessary, binary replication logs.

The use of SSD disks is recommended due to their speed and improved system latency.

  • If you start the client and MySQL server are in the same machine, use sockets instead of TCP/IP connexions when connecting with MySQL (this could result in an improvement of a 7.5%). Do it without specifying the host name or the localhost when connecting with MySQL. Disable the start of the binary session and the replication if it only launches a MySQL host server.
  • Pandora FMS works on MySQL and it is recommended to use the modified version of MySQL (Percona Server for MySQL), which offers better performance. The plugins programmed are for Percona®.

Please note that the following items greatly affect performance:

  • Only use binary logs if you use a MySQL configuration with replication.
  • Do not use logs tracing queries or slow query logs. This is only recommended in specific cases.

Disable binary replication

If you configured a Pandora FMS HA system, the binary replication is necessary. This recommendation is only valid if you have a single Pandora FMS server.

It is enabled by default on most GNU/Linux distros. To disable it, edit the my.cnf file, usually in /etc/my.cnf and comment the following lines:

# log-bin=mysql-bin
# binlog_format=mixed

Comment both lines, and then restart MySQL Server.

Disk IO Performance

To learn more about “Data Backup and Recovery in Pandora FMS”, go to this link.

There are three very important configuration tokens, directly related to disk IO, and should be considered, because improper IO access is usually the most important bottleneck in MySQL.

innodb_log_file_size

innodb_log_file_size = 64M

This value is set by default, which can be higher (even 512M) without any risk, except for recovery in case of any problem or higher disk occupation. The default value of MySQL is 5M, which is very low for production environments with high transaction volume. To change this value with an already running system:

  1. First make a complete DUMP in the databases.
  2. Delete the Innodb binary index files (usually in /var/lib/mysql/ib*).
  3. Change file my.cnf with the chosen value.
  4. Restart MySQL.
  5. Load the SQL DUMP.

Since the process is the same as the one to activate the innodb_file_per_table token (described below), it is recommended to do the whole process simultaneously.

innodb_io_capacity

innodb_io_capacity = 300

This parameter has value 200 by default, but you have to know the IOPS of the system disk. You may find out exactly by looking for IOPS and the exact hard disk model, where the recommended values are: 7500RPM → 100 IOPS, 15000 RPM → 190 IOPS, SSD → 1500 IOPS. More information in this link.

innodb_file_per_table

Use a tablespace for each table:

In Percona, it is possible to store each InnoDB table and its index in its own file. This feature is called “multiple tablespaces” because each table has its own table space.

The use of multiple tablespaces can be useful for users that want to move specific tables to separate physical disks or the ones who want to restore table backups without interrupting the use of the rest of the InnoDB tables.

Multiple tablespaces can be enabled by adding this line to the mysqld section of the my.cnf file:

[mysqld]
innodb_file_per_table

After restarting the server, InnoDB will store each new created table in its own name_table.ibd file in the database directory to which the table belongs to. This is similar to what the MyISAM store motor does, but MyISAM divides the table in a tbl_name.MYD data file and a tbl_name.MYI index file.

For InnoDB, data and index are kept together in the .ibd file. The tbl_name.frm file should be created as usual.

innodb_file_per_table parameter affects only table creation. If you start the server with this option, then the new tables will be created using .ibd files, but you could still have access to the existing tables in the shared tablespace. If you remove the option, then the new tables will be created in the shared space, but it will be still possible to have access to the tables created in multiple tablespaces

Avoiding Disk Flush in Every Transaction

MySQL establishes autocommit = 1 for each connection by default. This is not bad for MyISAM, since what one person writes in the disk is not guaranteed, but for InnoDB, it means that any insert / update / delete in an InnoDB table will be registered on the disk (flush).

So, would it be bad if it always writes on the disk? Not at all. It ensures that when there is any compromising event, the data will be there when the database is restored after an incident. The problem is that the DB performance is limited by the physical speed of the disk.

Given that the disk has to write the data in a disk before the writing is confirmed, this will take some time. Even when considering a searching average time of 9 ms for disk writing, it is limited to approximately 67 commits per second, which is very slow. And while the disk is busy trying for the sector to be written, it cannot read.

InnoDB can avoid some of these limitations by associating some writing together, but, even with this, this restriction still exists. You may prevent if from writing at the end of each transaction, ensuring that it uses an “automatic” writing system, which writes approximately every second. In case of failure, the data from the last second could be lost, but this is something more bearable considering that it achieves greater efficiency. To do it, use the following configuration token innodb_flush_log_at_trx_commit = 0. It has this value in the configuration by default.

Bigger KeyBuffer size

Depending on the system total RAM, it is a very important global parameter that speeds up DELETE and INSERT instructions.

key_buffer_size = 4M

This is the default value in the configuration.

Other important buffers

There are several buffers that are empty by default in some distributions. Modifying these parameters can improve performance significantly compared to the default one. It is important to make sure that these tokens exist in MySQL configuration file.

key_buffer_size=4M
read_buffer_size=128K
read_rnd_buffer_size=128K
sort_buffer_size=1M
join_buffer_size=4M

For MySQL version 8, and later versions, MySQL development team has withdrawn support for query cache, used in previous versions of Pandora FMS; for more information, please visit the following web link:

https://dev.mysql.com/blog-archive/mysql-8-0-retiring-support-for-the-query-cache/ .

Improving InnoDB Concurrency

There is a parameter that may affect MySQL server performance with Pandora FMS. This parameter is innodb_thread_concurrency. This parameter is used to specify how many “concurrent threads” MySQL can execute.

This is an advanced parameter and should only be modified manually if performance tuning is required on high concurrency systems.

A wrong setting of this parameter may cause it to run slower than default, so it is especially important to pay attention to several aspects:

  • MySQL version: In different MySQL versions this parameter behaves very differently.
  • Number of actual (physical) processors: In this regard, you may access the official MySQL documentation.

The recommended value is the number of (physical) CPUs multiplied by 2 plus the number of disks where InnoDB is located.

The value of innodb_thread_concurrency changed in several MySQL versions, currently the default value is 0. A value of 0 means “open as many threads as possible”. Therefore, if in doubt, it can be used:

innodb_thread_concurrency = 0

MySQL Fragmentation

Like filesystems, databases also will fragment themselves, slowing the whole system down. In a high performance system such as Pandora FMS, it is key for the database state to not affect system performance. In overloaded systems, the database could block and force the monitoring system to fall down.

Setting up MySQL server could make Pandora FMS become faster, so if you have performance problems, the reason might be a problem in MySQL Setup or problems related with the database.

Check my.cnf file

First check my.cnf file and its basic configuration for MySQL. This configuration file is written in INI format and its location can be determined with the following command:

mysqld --help --verbose | more

my.cnf setup should be similar to this one (4 GB RAM Server and using an average server hardware). Make sure that you have all these parameters correctly inside section [mysqld]:

[mysqld]
 datadir=/var/lib/mysql
 socket=/var/lib/mysql/mysql.sock
 user=mysql
 character-set-server=utf8mb4
 skip-character-set-client-handshake

 max_allowed_packet = 64M
 innodb_buffer_pool_size = 800M
 innodb_lock_wait_timeout = 90
 innodb_file_per_table
 innodb_flush_log_at_trx_commit = 0
 innodb_flush_method = O_DIRECT
 innodb_log_file_size = 64M
 innodb_log_buffer_size = 16M
 innodb_io_capacity = 100
 thread_cache_size = 8
 thread_stack    = 256K
 max_connections = 100

 key_buffer_size=4M
 read_buffer_size=128K
 read_rnd_buffer_size=128K
 sort_buffer_size=1M
 join_buffer_size=4M

 sql_mode=""

 [mysqld_safe]
 log-error=/var/log/mysqld.log
 pid-file=/var/run/mysqld/mysqld.pid

If you are using MySQL 8 and do not have an HA environment, disable the binary logs with the following command in section [mysqld]:

skip-log-bin

If there is any change in my.cnf file, restart MySQL service.

  • Check service status with systemctl status mysqld.service.
  • Take a look at the end of the /var/log/mysqld.log for any errors.
  • For more information, check the following link at MySQL website.

Restoring databases

To learn more about “Server Management and Administration”, follow this link.

When making certain modifications to my.cnf (for example, adding the innodb_file_per_table parameter), it is likely that the database will not work when restarting the service. If you get the following error, you should restore the previous configuration (you should use root user credentials or root user) and back up the database:

InnoDB: Error: log file ./ib_logfile0 is of different size 0 5242880 bytes
InnoDB: than specified in the .cnf file 0 67108864 bytes!

1. Back up the database:

mysqldump -uroot -p pandora > /home/pandora/pandora.sql

2. Stop MySQL server and move the data to a backup folder:

systemctl stop mysql
mv /var/lib/mysql /var/lib/mysql.bak

3. Create a new folder for MySQL data (the relevant permissions will be assigned in step five):

mkdir /var/lib/mysql

4. Initialize MySQL server by specifying the target folder in the parameter --datadir. This process will generate a temporary password which must be noted down (it will be displayed by standard output or stored in /var/lib/mysqld.log):

mysqld --initialize --datadir /var/lib/mysql

5. Assign the appropriate permissions to the new folder:

chown -R mysql:mysql /var/lib/mysql
chcon -R system_u:object_r:mysqld_db_t:s0 /var/lib/mysql

6. Start MySQL service and log in with MySQL client, using the password from step four:

systemctl start mysql
mysql -uroot -p

MySQL/Percona systems often do not load correctly the configuration parameters of the my.cnf file, usually because these values have been written outside [mysqld] section.

7. Change the user password root to the one of your choosing (here we use Pandor4!):

ALTER USER 'root'@'localhost' IDENTIFIED BY 'Pandor4!';

8. Exit MySQL client and check that you can log in again using the new password.

9. Access MySQL client again and create the database:

CREATE DATABASE pandora;
USE pandora;

10. Load the backup from the database:

SOURCE /home/pandora/pandora.sql

11. Create the access users for Pandora FMS using the same credentials as in the previous installation (here Pandor4! is used) and then give them permissions over the database:

CREATE USER 'pandora'@'localhost' IDENTIFIED WITH mysql_native_password BY 'Pandor4!';
CREATE USER 'pandora'@'127.0.0.1' IDENTIFIED WITH mysql_native_password BY 'Pandor4!';
GRANT ALL PRIVILEGES ON pandora.* TO 'pandora'@'localhost';
GRANT ALL PRIVILEGES ON pandora.* TO 'pandora'@'127.0.0.1';

12. After having configured my.cnf file and restarted MySQL service, you should check that these changes have been correctly applied. To that end, check the variables one by one:

SHOW VARIABLES LIKE 'innodb_log_file_size';
SHOW VARIABLES LIKE 'innodb_io_capacity';
SHOW VARIABLES LIKE 'innodb_file_per_table';

Or with a general inquiry such as:

SHOW VARIABLES LIKE "innodb%";

Once you check that Pandora FMS (both Web Console and server) can connect correctly to the database and everything works properly, you may delete mysql.bak directory:

rm -rf /var/lib/mysql.bak

Check if isolated datafile for each table is ACTIVE

ls -lah /var/lib/mysql/pandora/*.ibd | wc -l

There should be more than 100 files there (depending on the version of Pandora FMS), each .ibd is the data file of each table, when innodb_file_per_table parameter is enabled in file my.cnf. If you do not have any of these files, .ibd means it uses a single file to store all information. The previous one means table fragmentation is also present on all tables and performance will worsen each week.

If you have your database running in a single database, first you will need to recreate the database after correctly configuring file my.cnf and restarting MySQL.

Optimizing Specific tables

Another less “drastic” solution to solve the fragmentation issue is the use of MySQL OPTIMIZE tool to optimize certain Pandora FMS tables. To benefit from it, run right away from MySQL the following:

OPTIMIZE TABLE tagente_datos;
OPTIMIZE TABLE tagente;
OPTIMIZE TABLE tagente_datos_string;
OPTIMIZE TABLE tagent_access;
OPTIMIZE TABLE tagente_modulo;
OPTIMIZE TABLE tagente_estado;

This prevents performance from degrading over time as the system works.

In large environments, the OPTIMIZE option could be “blocked”. In this case, the best option is to rebuild the DB.

After doing these operations, run:

FLUSH TABLES;

From MySQL manual:

For InnoDB tables, OPTIMIZE TABLE is mapped to ALTER TABLE, which rebuilds the table to update index statistics and free unused space in the clustered index”.

MySQL special tokens

There are some very “special” tokens in MySQL, which can improve or worsen performance:

  • innodb_flush_method:
innodb_flush_method = O_DIRECT

This important parameter has an effect on how information is written on the disk.

  • innodb_lock_wait_timeout:
innodb_lock_wait_timeout = 90

This helps when there is a bottleneck, so that MySQL does not fail and stop. If it lasts more than 90 lock, there is a real problem.

Check fragmentation table by table

Using MySQL CLI, run this query:

SELECT ENGINE, TABLE_NAME,Round( DATA_LENGTH/1024/1024) AS data_length , round(INDEX_LENGTH/1024/1024)
 AS index_length, round(DATA_FREE/ 1024/1024) AS data_free, (data_free/(index_length+data_length))
 AS frag_ratio FROM information_schema.tables
 WHERE TABLE_TYPE = 'BASE TABLE' AND DATA_FREE> 0 ORDER BY frag_ratio DESC;

You should get only the tables with some fragmentation index, for example:

+--------+-------------------------+-------------+--------------+-----------+------------+
| ENGINE | TABLE_NAME              | data_length | index_length | data_free | frag_ratio |
+--------+-------------------------+-------------+--------------+-----------+------------+
| InnoDB | tserver_export_data     |           0 |            0 |         5 |   320.0000 |
| InnoDB | tagent_module_inventory |           0 |            0 |         6 |    25.6000 |
| InnoDB | tagente_datos_inventory |           4 |            0 |        40 |     9.8842 |
| InnoDB | tsesion_extended        |           1 |            0 |         4 |     3.3684 |
| InnoDB | tagent_access           |           2 |            7 |        27 |     2.9845 |
| InnoDB | tpending_mail           |           2 |            0 |         4 |     2.6392 |
| InnoDB | tagente_modulo          |           2 |            0 |         4 |     2.1333 |
| InnoDB | tgis_data_history       |          24 |           11 |        67 |     1.9075 |
| InnoDB | tsesion                 |           2 |            0 |         4 |     1.7778 |
| InnoDB | tupdate                 |           3 |            0 |         3 |     1.1852 |
| InnoDB | tagente_datos           |         186 |          194 |       399 |     1.0525 |
| InnoDB | tagente_datos_string    |          15 |            9 |        24 |     0.9981 |
| InnoDB | tevento                 |         149 |           62 |        46 |     0.2183 |
| InnoDB | tagente_datos           |        2810 |         2509 |        65 |     0.0122 |
| InnoDB | tagente_datos_string    |         317 |          122 |         5 |     0.0114 |
+--------+-------------------------+-------------+--------------+-----------+------------+

This query works only on tables with more than 10 % of fragmentation. Too big tables (like tagent_data) can take a lot of time to get optimized if they are very fragmented. This may affect the production system.

Caution is advised when optimizing such large tables. A normal environment could be optimized once a year and larger environments every six months.

To optimize the tagent_module_inventory table (in this case the database is called pandora):

OPTIMIZE TABLE pandora.tagent_module_inventory;

A warning message will appear:

"Table does not support optimize, doing recreate + analyze instead".

If you check again, you should see fragmentation is gone:

+--------+-------------------------+-------------+--------------+-----------+------------+
| ENGINE | TABLE_NAME              | data_length | index_length | data_free | frag_ratio |
+--------+-------------------------+-------------+--------------+-----------+------------+
| InnoDB | tserver_export_data     |           0 |            0 |         5 |   320.0000 |
| InnoDB | tagente_datos_inventory |           4 |            0 |        40 |     9.8842 |
| InnoDB | tsesion_extended        |           1 |            0 |         4 |     3.3684 |
| InnoDB | tagent_access           |           2 |            7 |        27 |     2.9845 |
| InnoDB | tpending_mail           |           2 |            0 |         4 |     2.6392 |
| InnoDB | tagente_modulo          |           2 |            0 |         4 |     2.1333 |
| InnoDB | tgis_data_history       |          24 |           11 |        67 |     1.9075 |
| InnoDB | tsesion                 |           2 |            0 |         4 |     1.7778 |
| InnoDB | tupdate                 |           3 |            0 |         3 |     1.1852 |
| InnoDB | tagente_datos           |         186 |          194 |       399 |     1.0525 |
| InnoDB | tagente_datos_string    |          15 |            9 |        24 |     0.9981 |
| InnoDB | tevento                 |         149 |           62 |        46 |     0.2183 |
| InnoDB | tagente_datos           |        2810 |         2509 |        65 |     0.0122 |
| InnoDB | tagente_datos_string    |         317 |          122 |         5 |     0.0114 |
+--------+-------------------------+-------------+--------------+-----------+------------+

To be able to perform this optimization, there must be enough space on the hard disk to perform the operation. Otherwise an error will appear and the operation will not be performed.

System Load

This is more general, but you need to make sure the system IO is not a bottleneck (disk). Run the following command to collect the system information:

vmstat 1 10

Look at the last columns (CPU WA), a value higher than 10 means that there is a disk I/O problem that should be solved.

Having high CPU-US is normal, but CPU-SY should not be over 10~15.

Usually, SWAP-SI and SWAP-SO should have value zero, if not, it means the system is using SWAP memory, which degrades performance. Increase RAM or decrease RAM usage in your applications (Pandora FMS server threads, Buffers in MySQL, etc.).

Sample output of a “normal” system:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id  wa st
 0  0  46248  78664 154644 576800    0    0     2   147    0    9  7 10 83  0  0
 0  0  46248  78656 154644 576808    0    0     0     0   49   37  0  0 100 0  0
 2  0  46248  78904 154648 576740    0    0     0   184  728 2484 63  6 31  0  0
 0  0  46248  79028 154648 576736    0    0    16   616  363  979 21  0 79  0  0
 1  0  46248  79028 154648 576736    0    0     0    20   35   37  0  1 98  1  0
 0  0  46248  79028 154648 576736    0    0     0     0   28   22  0  0 100 0  0
 1  0  46248  79028 154648 576736    0    0     0  3852  141  303  0  0 98  2  0
 2  0  46248  78904 154660 576660    0    0     0   188  642 2354 56  4 40  0  0
 1  0  46248  78904 154660 576680    0    0     0    88  190  634 13  0 86  1  0
 1  0  46248  78904 154660 576680    0    0     0    16   35   40  0  0 100 0  0
 1  0  46248  78904 154660 576680    0    0     0     0   26   21  0  0 100 0  0
 0  0  46248  78904 154660 576680    0    0     0     0   27   27  0  0 100 0  0
 1  0  46248  78904 154724 576616    0    0   112   192  608 2214 52  4 44  0  0
 0  0  46248  78904 154724 576616    0    0     0    76  236  771 16  0 84  0  0
 0  0  46248  78904 154724 576616    0    0     0    20   38   38  0  0 100 0  0
 0  0  46248  78904 154724 576616    0    0     0     0   31   21  0  0 100 0  0
 0  0  46248  78904 154740 576608    0    0     0  3192  187  322  1  0 96  3  0
 1  0  46248  79028 154756 576544    0    0    16   192  632 2087 53  5 42  0  0
 0  0  46248  79028 154760 576568    0    0     0    56  255  927 19  2 79  0  0
 0  0  46248  79028 154768 576564    0    0     0    20   33   44  0  0 100 0  0

MySQL Table Partitioning

To use MySQL table partitioning, you should also use the multiples tablespaces system described above (innodb_file_per_table).

MySQL supports table partitioning, which allows very large tables to be distributed into smaller chunks, such as logical subdivisions (see MySQL manual for details).

If you have large amounts of data in Pandora FMS database (main and history) and it is estimated that many Console operations that refer to that data (for example drawing graph) are slow, then it will improve its performance using table partitioning.

Automatic partitioning

Pandora FMS automatically performs a monthly partitioning in the history database if it is configured from the Web Console, section History database, of the general configuration. For more details check this link.

Manual partitioning

After having installed Pandora FMS and enabled its history database, this will be the one that holds the largest amount of data and it is the recommended one to perform a table partitioning. Precisely the tables indicated for that are tagent_data, tagent_data_string, tagente_datos_inc and tevento. The manual practical process for the table tagent_data is described below (see also “Automatic partitioning”).

You should check at first that the directory /var/lib/mysql/pandora_history/*.ibd has many files (one per table). If this is not the case, you will need to dump the database, change the configuration of the my.cnf file, restart MySQL, delete the current database and recreate it from the dump.

Once you ensure that innodb_file_per_table is enabled, separate the two main databases into different partitions.

  • This operation will need enough disk space to be completed. You will need to check how big the tagent_data.ibd file is. If for example it occupies 10 gigabytes, you will need 15 GB of free space to start the operation.
  • This operation may take a long time, depending on the size of the table. For example, it would take an hour and a half to split a table with approximately 7500 modules data for 100 days (more than 50,000,000 rows).

This is an example for partitioning all of 2023 so far and for future months. To start the process, you will need to execute the following query in MySQL CLI:

ALTER TABLE tagente_datos PARTITION BY RANGE (utimestamp) (
PARTITION Jan23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-02-01 00:00:00')),
PARTITION Feb23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-03-01 00:00:00')),
PARTITION Mar23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-04-01 00:00:00')),
PARTITION Apr23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-05-01 00:00:00')),
PARTITION May23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-06-01 00:00:00')),
PARTITION Jun23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-07-01 00:00:00')),
PARTITION Jul23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-08-01 00:00:00')),
PARTITION Aug23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-09-01 00:00:00')),
PARTITION Sep23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-10-01 00:00:00')),
PARTITION Oct23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-11-01 00:00:00')),
PARTITION Nov23 VALUES LESS THAN (UNIX_TIMESTAMP('2023-12-01 00:00:00')),
PARTITION Dec23 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-01 00:00:00')),
PARTITION pActual VALUES LESS THAN (MAXVALUE)
);

Then the following query would have to be run every month to reorganize the partitioning:

ALTER TABLE tagente_datos REORGANIZE PARTITION pActual INTO (
PARTITION Jan24 VALUES LESS THAN (UNIX_TIMESTAMP('2024-02-01 00:00:00')),
PARTITION pActual VALUES LESS THAN MAXVALUE);

Changing “Jan24” to the month you are in.

Remember again that this operation could take hours, depending on how big the tagent_data table is. You may check the process by watching the size of the partitioning files by running:

[root@histdb pandora_history]# ls -lah | grep "#sql"

 -rw-rw----  1 mysql mysql 424M feb 24 05:58 #sql-76b4_3f7c#P#Jan23.ibd
 -rw-rw----  1 mysql mysql 420M feb 24 05:51 #sql-76b4_3f7c#P#Feb23.ibd
 -rw-rw----  1 mysql mysql 128K feb 24 05:40 #sql-76b4_3f7c#P#Mar23.ibd
 -rw-rw----  1 mysql mysql 840M feb 24 05:44 #sql-76b4_3f7c#P#Apr23.ibd
 -rw-rw----  1 mysql mysql 440M feb 24 05:47 #sql-76b4_3f7c#P#May23.ibd
 -rw-rw----  1 mysql mysql  10M feb 24 05:42 #sql-76b4_3f7c#P#Jun23.ibd
 -rw-rw----  1 mysql mysql 404M feb 24 05:56 #sql-76b4_3f7c#P#Jul23.ibd
 -rw-rw----  1 mysql mysql 436M feb 24 05:54 #sql-76b4_3f7c#P#Aug23.ibd
 -rw-rw----  1 mysql mysql 400M feb 24 05:49 #sql-76b4_3f7c#P#Sep23.ibd
 -rw-rw----  1 mysql mysql 408M feb 24 05:52 #sql-76b4_3f7c#P#Oct23.ibd
 -rw-rw----  1 mysql mysql  72M feb 24 06:03 #sql-76b4_3f7c#P#Nov23.ibd
 -rw-rw----  1 mysql mysql 404M feb 24 06:03 #sql-76b4_3f7c#P#Dec23.ibd
 -rw-rw----  1 mysql mysql 416M feb 24 06:00 #sql-76b4_3f7c#P#jan23.ibd

DB Rebuilding

To find out more abour Pandora FMS backup and data recovery, go to this link.

Partial Rebuilding

MySQL database management system, same as other SQL engines, such as Oracle®, is degraded with time due to causes such as data fragmentation produced by deleting and continuous insertion in large tables. In large environments, with a lot traffic volume, there is a very easy way to improve performance and prevent performance from degrading. This is rebuilding the DB from time to time.

To that end, schedule a service stop, which could last approximately 1 hour.

In this service stop, stop the Pandora FMS WEB console and the server too. Attention: you may leave the Tentacle server, so that it can still receive data and these will be processed as soon as the server works again.

Once they have been stopped, dump the DB (Export); in this example, the database is called pandora3 and the user must be root:

mysqldump -u root -p pandora3> /tmp/pandora3.sql

Delete the DB:

mysql -u root -p
DROP DATABASE pandora3;
 Query OK, 87 ROWS affected (1 MIN 34.37 sec)

Create the DB and import the previous data export from the dump you did at first:

CREATE DATABASE pandora3;
USE pandora3;
SOURCE /tmp/pandora3.sql

This may take a few seconds or several minutes, depending on the size of the database and the resources available on the machine.

It is possible to automatize this process, but, because it is very delicate, the best option is to make it manually.

Total rebuilding

The PFMS server must first be stopped:

/etc/init.d/pandora_server stop

Or, if you have systemd installed:

systemctl stop pandora_ha

Secondly, an export of all schemas and data must be done. Choose a directory of the machine, in a partition that has enough space to store the backup of the database in case it is a large database. The following command is executed:

mysqldump -uroot -p pandora --single-transaction > backup.sql

After finishing the process we will have in the file backup.sql the data, in the directory where we are.

Once this is done, MySQL is stopped with:

systemctl stop mysqld.service

Access the MySQL installation directory in /var/lib/mysql and delete the files named binlog.000001, binlog.000002 … and binlog.index. In addition, we will delete the directory with the same name as the database pandora.

rm -rf /var/lib/mysql/pandora
rm -rf /var/lib/mysql/binlog.0*
rm -rf /var/lib/mysql/binlog.index

Once these steps are done, we will start MySQL again with:

systemctl start mysqld.service

Access the MySQL terminal with the following command:

mysql -uroot -p pandora

The pandora database will be selected, but it will be empty. The following command is used to import in the database the backup that we did from the Pandora FMS database:

SOURCE backup.sql;

Finally the PFMS server must be started:

/etc/init.d/pandora_server start

Or if systemd is installed:

systemctl start pandora_ha

Optional Indexes

There are some situations when you can optimize MySQL performance, but giving up other system resources.

This index optimizes speed on graph rendering (a lot), but it uses more disk storage space, and could entail a slightly decrease on INSERT/DELETE operation, due to the Index overhead:

ALTER TABLE `pandora`.`tagente_datos` ADD INDEX `id_agente_modulo_utimestamp` (`id_agente_modulo`,`utimestamp`);

At the moment, in the heaviest tables of Pandora FMS in MySQL, this optimization is there by default. It is advisable to have expert help to optimize MySQL tables.

Slow queries study

In some systems, depending on the type of information you have, you may find some “slow queries” that make the system work worse. You may enable logs of this type of queries over a short period of time (since it harms the system performance) in order to consider trying to optimize queries to tables with indexes. To enable these settings, do the following:

  • Edit my.cnf and add the following lines:
slow_query_log=1
long_query_time=2
slow_query_log_file=/var/log/mysql_slow.log
  • To be able to use it and set the admin rules:
touch /var/log/mysql_slow.log
chown mysql:mysql /var/log/mysql_slow.log
chmod 640 /var/log/mysql_slow.log
  • Restart MySQL.
  • When finishing analyzing which ones are the slow queries, remember to reset the file my.cnf commenting the aggregated lines and restarting again MySQL service.

External references

Measuring Pandora FMS for High Capacity

This section describes different methods to configure Pandora FMS in a high capacity environment. It also describes different tools to make load tests, which are useful to adjust the environment to the highest possible process capacity.

Pandora FMS has been configured to support a load of around 2500 agents in systems where database, console and server are in the same machine. The maximum recommended number is around 2500 agents (about 60,000 modules) per system, but this number varies greatly depending on whether they are XML agents, remote modules, with high or low intervals, or with systems with high capacity or low memory.

All factors greatly alter the number of agents that a system can manage efficiently. In laboratory tests, 10,000 agents have been executed in a single server with basic hardware, but strongly optimized.

Example of High Capacity Server Configuration

Assuming a RHEL 8 machine with 16 GB of RAM and 8 CPU to be optimized for the maximum processing capacity of the Dataserver (XML).

my.cnf

Only the most important parameters are shown.

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
character-set-server=utf8mb4
skip-character-set-client-handshake
# Disabling symbolic-links is recommended to prevent assorted    security risks
symbolic-links=0
# MySQL optimizations for Pandora FMS
# Please check the documentation in http://pandorafms.com for  better results
max_allowed_packet = 64M
innodb_buffer_pool_size = 6400M
innodb_lock_wait_timeout = 90
innodb_file_per_table
innodb_flush_log_at_trx_commit = 0
innodb_flush_method = O_DIRECT
innodb_log_file_size = 64M
innodb_log_buffer_size = 16M
innodb_io_capacity = 300
thread_cache_size = 8
thread_stack    = 256K
max_connections = 100
key_buffer_size=4M
read_buffer_size=128K
read_rnd_buffer_size=128K
sort_buffer_size=1M
join_buffer_size=4M
sql_mode=""

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

pandora_server.conf

Only the most relevant parameters are shown.

verbose 3
server_threshold 5
xxxxserver_threads 8
max_queue_files 5000

Aspects to take into account:

  • The number established in the verbose parameter refers to the amount of information that is written in the logs, being advisable not to exceed 3. The higher the number, the lower the performance of Pandora FMS due to the large amount of information to write in the logs.
  • A high value (15) of the server_threshold parameter causes the DB to suffer less, while the increase in the maximum number of files processed causes the server to seek files and fill the buffers every time. These two elements of the configuration are closely linked. In the case of optimizing the network server, it may be interesting to lower the server_threshold to 5 or 10.
  • The very high number of threads (more than 5) set in xxxxserver_threads ' only benefits processes with long I/O waits, such as the network server or plugin server. In the case of the Dataserver, which is in process all the time, setting too many threads can even hurt performance. On systems with a slow DB: try different combinations between 1 and 10; with faster disks and multi-core CPUs, it could be increased. In the case of optimizing the system for the Networkserver, the number can be much higher, between 10 and 30.

Capacity analysis Tools (Capacity)

Pandora FMS has several tools that may help you to measure properly its hardware and software for the amount of data it expects to obtain. One of them is useful to “attack” directly the database with fictitious data (dbstress) and the other generates fictitious XML files (xml_stress).

Pandora FMS XML Stress

This is an small script that generates XML data files like the ones sent by Pandora FMS agents. By default it is placed on:

/usr/share/pandora_server/util/pandora_xml_stress.pl

The scripts read agent names from a text file and generate XML data files for each agent according to a configuration file, where modules are defined as templates.

Modules are filled with random data. An initial value and the probability of the module data changing may be specified.

Run the script like this:

./pandora_xml_stress.pl <configuration file>

Sample < configuration file > called pandora_xml_stress.conf:

# Maximum number of threads, by default 10.
max_threads 10

# File containing a list of agent names (one per line).
agent_file agent_names.txt

# Directory where XML data files will be placed, by default /tmp.
temporal /var/spool/pandora/data_in

# Pandora FMS XML Stress log file, logs to stdout by default.
log_file pandora_xml_stress.log

# XML version, by default 1.0.
xml_version 1.0

# XML encoding, by default ISO-8859-1.
encoding ISO-8859-1

# Operating system (shared by all agents), by default Linux.
os_name Linux

# Operating system version (shared by all agents), by default 2.6.
os_version 2.6

# Agent interval, by default 300.
agent_interval 300

# Data file generation start date, by default now.
time_from 2009-06-01 00:00:00

# Data file generation end date, by default now.
time_to 2009-06-05 00:00:00

# Delay after generating the first data file for each agent to avoid
# race conditions when auto-creating the agent, by default 2.
startup_delay 2

# Address of the Tentacle server where XML files will be sent (optional).
# server_ip 192.168.50.1

# Port of the Tentacle server, by default 41121.
# server_port 41121

# Module definitions. Similar to pandora_agent.conf.

module_begin
module_name Module 1
module_type generic_data
module_description A long description.
module_max 100
module_min 10
module_exec type=RANDOM;variation=60;min=20;max=80
module_end

module_begin
module_name Module 2
module_type generic_data
module_description A long description.
module_max 80
module_min 20
module_exec type=SCATTER;prob=1;avg=40;min=0;max=80
module_end

module_begin
module_name Module 3
module_type generic_data
module_description A long description.
module_max 80
module_min 20
module_exec type=CURVE;min=20;max=80;time_wave_length=3600;time_offset=0
module_end

module_begin
module_name Module 4
module_type generic_data_string
module_description A long description.
module_max 100
module_min 10
module_exec type=RANDOM;variation=60;min=20;max=80
module_end

module_begin
module_name Module_5
module_type generic_proc
module_descripcion Module 3 description.
# Initial data.
module_data 1
module_end

Send and Receive Agent Local Configuration

If you activate in your pandora_xml_stress.conf the get_and_send_agent_conf configuration value to 1, you may make the test load agents work as normal agents, so that they send their configuration file and also the MD5 hash. From PFMS Web Console you may modify the remote configuration for subsequent runs of pandora_xml_stress.

Besides this, you may configure where to store locally the .conf files of your testing agents with the directory_confs configuration token in pandora_xml_stress.conf file.

Configuration File

  • max_threads: Number of threads where the script will be executed. This improves the E/S.
  • agent_file: Path of the name list file path, separated by a new line.
  • temporal: Path of the directory where made-up XML data files are generated.
  • log_file: Path of the log where it will report about its execution script.
  • xml_version: Version of the XML data file (by default 1.0).
  • encoding: XML data file encoding (by default ISO-8859-1).
  • os_name: Name of the made-up agent Operative System (Linux by default).
  • os_version: Version of the made-up agents Operative System (2.6 by default)
  • agent_interval: Interval of the made-up agents in seconds (300 by default).
  • time_from: Time from which made-up XML data files are generated, in format yyyy-MM-dd HH:mm:ss.
  • time_to: Time until which made-up XML data files are generated, in format yyyy-MM-dd HH:mm:ss.
  • get_and_send_agent_conf: Boolean value 0 or 1. When it is active, the made-up agents will try to download by remote configuration a more updated version of the standard configuration file of an agent. And they can be edited through Pandora FMS Web console.
  • startup_delay: Time numeric value in seconds before each agent starts to generate files. It is used to avoid race conditions.
  • timezone_offset: Numeric value of the time zone offset.
  • timezone_offset_range: Numeric value that is useful to generate the timezone in this range randomly.
  • latitude_base: Numeric value. It is the latitude geographic area to be used to define fictitious agents.
  • longitude_base: Numeric value. It is the longitude geographic area to be used to define fictitious agents.
  • altitude_base: Numeric value. It is the altitude geographic area to be used to define fictitious agents.
  • position_radius: Numeric value. It defines the radius of the circumference in which the geographic position of the fictitious agent will be set (randomly and based on parameters latitude_base and longitude_base).

The definition of one module in the script configuration file. If remote configuration was enabled, it will also be the same. It is:

module_begin
module_name <module_name>
module_type <module_type_data>
module_description <description>
module_exec type =<xml_stress_type_generation>;<another_option>;<another_option> …
module_unit <units>
module_min_critical <value>
module_max_critical <value>
module_min_warning <value>
module_max_warning <value>
module_end

Each one can be configured as:

  • type_generation_xml_stress: It can have values RANDOM, SCATTER, CURVE.
  • module_attenuation <value>: The generated module value is multiplied by the specified value, usually between 0.1 and 0.9.
  • module_attenuation_wdays <value> <value> … <value>: The module value is only attenuated during the given days, ranging from Sunday (0) to Saturday (6). For example, the following module simulates a 50 % drop in network traffic on Saturdays and Sundays:
module_begin
module_name Network Traffic
module_type generic_data
module_description Incoming network traffic (Kbit/s)
module_exec type=RANDOM;variation=50;min=0;max=1000000
module_unit Kbit/s
module_min_critical 900000
module_attenuation 0.5
module_attenuation_wdays 0 6
module_end
  • module_incremental < value >: If set to 1, the module's previous value is always added to a new value, resulting in an increasing function.
  • Others: See below which options are available, depending on the execution type.

RANDOM

These have the following options:

  • variation. Probability percentage of change regarding the previous value.
  • min. Minimum value that the value could have.
  • max. Maximum value that the value could have.

Numeric

It generates random numeric values between the range values min and max

Booleans

It generates values between 0 and 1.

String

It generates a length string between values min and max. The characters are random between A and Z and capital, lower case letters and also numeric ciphers are included.

External data source (SOURCE)

It allows to use a plain text file as a data source. Options:

  • src: Source data file.

The file contains one data per line, there is no limit for lines. For example:

4
5
6
10

Both numbers and strings are allowed as values. These kinds of modules will use file data to generate module data in Pandora FMS. Data are retrieved sequentially. For example data above will be shown as follows:

4 5 6 10 4 5 6 10 4 5 6 10 4 5 6 10 4 5 6 10 4 5 6 10

SCATTER

It is only useful for numeric data, and the generated graphics are similar to the ones of a heartbeat, that is, a normal value, and from time to time a “beat”.

It has the following options:

  • min. Minimum value that the value could have.
  • max. Maximum value that the value could have.
  • prob. Probability percentage that it generates a “beat”.
  • avg. Average value that should be shown by default if there is no “beat”.

CURVE

It generates module data following a trigonometric curve. They have the following options:

  • min: Minimum value that the value could have.
  • max: Maximum value that the value could have.
  • time_wave_length: Numeric value in seconds of the duration of the “crest” of the wave.
  • time_offset: Numeric value in seconds from the starting point of the wave from time zero with module value zero (similar to the sine graph).

Note: This tool is preconfigured to look for, in all agents, “random”, “curve” or boolean name modules that use an interval between 300 seconds and 30 days.

How to measure Data server Processing Capacity

There is a small script called pandora_count.sh that is found in the /usr/share/pandora_server/util/ directory in Pandora FMS server directory. This script is used to measure the processing rate of XML files by the data server, and it uses as reference all the files yet to be processed at /var/spool/pandora/data_in, so to be able to use it, thousands of packages yet to be processed are needed (or they must be generated with the tool mentioned before). Once running, you may stop it pressing the keys CTRL+C.

This script simply calculates the file processing rate of the server. It is a somewhat crude measure, but it is useful to get an idea of the effectiveness of the server configuration settings.

Pandora FMS DB Stress

This is a small tool to test database performance. It could also be used to generate periodical or random data (using trigonometric functions) and fill in made-up modules.

Create an agent and allocate the modules to that agent for automatic data injection with this tool. The names should be these ones:

  • random: To generate random data.
  • curve: To generate a matching curve using trigonometric functions. It is useful to use the interpolating work with different intervals, etc.
  • boolean: To generate random boolean data. That way, it is possible to use any name that contains the words: «random», «curve» and/or «boolean». For example:
  • random_1
  • curve_other

Only the “data_server” module type can be chosen.

Pandora FMS DB Stress Fine Adjustment

This tool is preconfigured in order to search, in all agents, the module names “random”, “curve” or “boolean”, that use an interval between 300 seconds and 30 days.

If you wish to modify this performance, edit pandora_dbstress script and change some variables at the beginning of the file:

# Configure here target (AGENT_ID for Stress)
my $target_module = -1; # -1 for all modules of that agent
my $target_agent = -1;
my $target_interval = 300;
my $target_days = 30;
  1. The first line of the variable corresponding with target_module should be fixed for a fix module or -1 to process all the matching targets.
  2. The second line of variable must match target_agent, for a specific agent.
  3. The third line must match target_interval, defined in seconds and which represents the module predefined periodical interval.
  4. The fourth line is target_days and represents the number of days in the past since the date, in the current timestamp.

Diagnostic tools in Pandora FMS

Sometimes, there are problems for which direct help from Pandora FMS support is needed. To fmake communication with the Support team easier, Pandora FMS servers have some tools for this.

Diagnostic Info

This tool is located in the Management → Admin tools → Diagnostic Info section, and is conceived to provide the most important information about Pandora FMS configuration and its database.

Go back to Pandora FMS documentation index