Cloudwatch discovery This document describes the PandoraFMS cloudwatch functionality, consisting of the discovery plugin. Introduction This plugin allows you to monitor AWS environments using CloudWatch, collecting detailed information on performance metrics and the status of cloud resources. It extracts data from multiple services and regions, including CPU, memory, disk usage, latencies, and other key indicators that can be configured using custom queries. It allows you to track historical and real-time metrics, providing aggregate statistics or individual values according to the defined configuration. The script automatically manages the connection to AWS using hardcoded credentials, supports the use of proxies and SSL configurations, and executes queries in parallel to optimize performance in environments with multiple metrics and regions. All collected data is transformed into Pandora FMS-compatible modules, automatically generating agents and modules, ready to be sent to the platform via XML, either by Tentacle or by local transfer. This solution facilitates centralized monitoring of the AWS infrastructure, allowing you to detect anomalies, load spikes, or potential problems before they affect operations, integrating critical metrics directly into Pandora FMS for continuous and complete control of the cloud environment. Translated with DeepL.com (free version) Pre requisites Connectivity to the AWS API The plugin is a binary that contains the packaged dependencies, so no dependencies need to be installed in order to use it. Permissions To make requests through the API, the account used for authentication during execution requires certain permissions (these will depend on the queries used and the resources being accessed). Billing (read) CloudWatch (list,read) Cost Explorer Service (Full access) EC2 (full read, limited: list) Permission assignment Create a policy in JSON like the following: { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "ec2:DescribeInstances", "ec2:DescribeVolumes", "aws-portal:ViewBilling", "cloudwatch:GetMetricStatistics", "ec2:DescribeAddresses", "ce:GetCostAndUsage", "ce:GetTags" ], "Resource": "*" } ] } The above policy must be assigned to a new user. Users with these permissions will be able to access AWS EC2 monitoring through this plugin. To obtain advanced statistics for EC2 instances, detailed monitoring must be enabled for the specific instance. This is done from the instance menu in AWS. Actions -> Monitoring and troubleshooting -> Manage detailed monitoring -> Enable Parameters and configuration options Parámetros --creds Base 64 of the credentials  Before being converted to base 64, credentials have the following format: { "access_key_id": "KEY", "secret_access_key": "PASSWORD" } --query_conf Path to the AWS custom query configuration file In the query file, you can specify one or more Metrics Insights queries. These follow the structure documented in the following link:  https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_GetMetricData.html The structure for executing one or more Metrics insights queries is as follows: { "time_frame": "< time window in which to collect data, possible options : 5m, 10m, 15m, 30m, 1h>", "region_name": "< region in which to execute the query > ", "MetricDataQueries": [ < Metrics insights operations > ] } These queries must be defined within a list and separated by commas, so the structure of two queries would be as follows: [ { "time_frame": "< Time window for collecting data, possible options : 5m, 10m, 15m, 30m, 1h>", "region_name": "< region in which to execute the query > ", "MetricDataQueries": [ < Operaciones de metrics insights > ] } , { "time_frame": "< ime window for collecting data, possible options : 5m, 10m, 15m, 30m, 1h>", "region_name": "< region in which to execute the query > ", "MetricDataQueries": [ < Operaciones de metrics insights > ] } ] An example of a query configuration would be the following: [ { "time_frame": "5m", "region_name": "us-east-1", "MetricDataQueries": [ { "Expression": "SELECT AVG(CPUUtilization) FROM SCHEMA("AWS/EC2", InstanceId) WHERE InstanceId = 'i-01e5fbcd406596f84' GROUP BY InstanceId", "Id": "q1", "Period": 300, "Label": "avg_cpu" }, { "Id": "q2", "Expression": "(100 - q1)", "Label": "free space" } ] }, { "time_frame": "5m", "region_name": "us-east-1", "MetricDataQueries": [ { "Id": "ebs_write_time", "Expression": "SELECT AVG(VolumeTotalWriteTime) FROM SCHEMA("AWS/EBS", VolumeId)", "Label": "Average EBS Volume Total Write Time", "Period":300, "ReturnData": true } ] }, { "time_frame": "5m", "region_name": "us-east-1", "MetricDataQueries": [ { "Id": "ec2_cpu", "Expression": "SELECT MAX(CPUUtilization) FROM SCHEMA("AWS/EC2", InstanceId) GROUP BY InstanceId ORDER BY MAX() DESC LIMIT 10", "Label": "top_10", "Period":300, "ReturnData": true } ] } ] The ID and label fields for the operations within each query are important for linking these queries to the desired Pandora configuration for the data obtained from this operation. Further down in the “module configuration” section, you will find details on how these fields can be linked. --conf Path to the configuration file Configuration file (--conf) [CONF] agents_group = < Group for agents created by the plugin > agent_name = < Default agent name > threads = < Number of threads used by the plugin > interval = < Interval in seconds for agents > transfer_mode = < Transfer mode, tentacle or local > tentacle_ip = < IP of the destination machine for the agents created > tentacle_port = < Tentacle port, default: 41121 > tentacle_opts = < Additional tentacle client options > use_proxy = < Enable to use proxy mode > proxy_url = < URL of the proxy address to use > ssl_check = < Enable to activate SSL check > Example [CONF] agents_group = Applications agent_name = AWS threads = 4 interval = 300 transfer_mode = tentacle tentacle_ip = 127.0.0.1 tentacle_port = 41121 tentacle_opts = use_proxy = 0 proxy_url = ssl_check = 1 Module configuration The module configuration must be defined in the conf file seen above, under the variables. The structure for configuring modules for the data obtained from each operation of each query is as follows: [< ID - The ID defined in the metrics insights operation that is expected to receive the data should go here >] name = < Name of the module that will generate the data. The macro __label__ can be used. If this macro is used, the name of the module will be obtained from the label tag in the operation containing the ID defined above. This is useful for queries that return more than one piece of data, for example, a query that returns the 10 instances with the highest CPU usage. If in the operation you define label as “cpu,” the return obtained will be "cpu , cpu cpu , cpu ... and so on, and these will be the names of the modules generated > type = < Module type> description = < Module description > value = < Value to be collected from the data obtained in the operation, possible options: last, average, sum, max, min > agent = < Agent in which the module will be included; if it does not exist, it will be created; if not defined, the modules will be placed in the default agent Example of module configuration, according to the metrics defined above: [q1] name = __label__ type = generic_data description = Monitors CPU utilization unit =percent value = average min_warning = 75 min_critical = 85 [q2] name = __label__ type = generic_data description = Monitors CPU free space unit = percent value = last min_warning = 11 max_warning = 20 min_critical = 0 max_critical = 10 [ebs_write_time] name = __label__ type = generic_data description = Average EBS Volume Total Write Time value = last agent = AWS 2 Cloudwatch [ec2_cpu] name = __label__ type = generic_data description = EC2 value = last Manual execution The format for executing the plugin is as follows: ./pandora_aws_cloudwatch \ --creds "" \ --conf "" \ --query_conf "" For example: ./pandora_aws_cloudwatch \ --creds "ededNFAFNAFNAFNAPOFBIUFASBFOABCOAFNOIA==" \ --conf "" \ --query_conf " Discovery > Cloud section. The following minimum data will be requested for each task: AWS Credentials : AWS API connection credentials to be used by the task. They must be previously registered in the Pandora FMS credential store. Default agent name : Nombre para el agente por defecto (los módulos se crearán aquí cuando no se especifique ningún agente). Tentacle IP : IP address of the Tentacle server to which agent data should be sent. Normally, this parameter will not be modified. Tentacle port : Port of the Tentacle server to which the data should be sent. Normally, this parameter will not be modified. Tentacle extra options : Additional options for sending data using the Tentacle client. Normally, this parameter will not be modified. In the following window, the configuration options are as follows: Max threads : Number of threads used by the plugin. Use proxy: Enable to use proxy mode Proxy url: URL of the proxy address to use Verify proxy SSL: Check to enable SSL checking Configure Cloudwatch queries: Window to configure queries Configure module data: Window to configure modules Successfully completed tasks will have an execution summary with the following information: Total agents: Total number of agents generated by the task. Total modules: Total number of modules generated by the task. Tasks that are not successfully completed will have an execution summary recording the errors that occurred. Agents and modules generated by the plugin The agents and modules generated by the plugin depend entirely on the configuration settings. For example, the following general and metric settings: Metrics insights configuration [ { "time_frame": "5m", "region_name": "us-east-1", "MetricDataQueries": [ { "Expression": "SELECT AVG(CPUUtilization) FROM SCHEMA("AWS/EC2", InstanceId) WHERE InstanceId = 'i-01e5fbcd406596f84' GROUP BY InstanceId", "Id": "q1", "Period": 300, "Label": "avg_cpu" }, { "Id": "q2", "Expression": "(100 - q1)", "Label": "free space" } ] }, { "time_frame": "5m", "region_name": "us-east-1", "MetricDataQueries": [ { "Id": "ebs_write_time", "Expression": "SELECT AVG(VolumeTotalWriteTime) FROM SCHEMA("AWS/EBS", VolumeId)", "Label": "Average EBS Volume Total Write Time", "Period":300, "ReturnData": true } ] }, { "time_frame": "5m", "region_name": "us-east-1", "MetricDataQueries": [ { "Id": "ec2_cpu", "Expression": "SELECT MAX(CPUUtilization) FROM SCHEMA("AWS/EC2", InstanceId) GROUP BY InstanceId ORDER BY MAX() DESC LIMIT 10", "Label": "cpu_usage", "Period":300, "ReturnData": true } ] } ] General and module settings  [CONF] agents_group = 1 agent_name = AWS General interval = 300 threads = 3 transfer_mode = tentacle tentacle_ip = 172.42.42.101 tentacle_port = 41121 [q1] name = __label__ type = generic_data description = Monitors CPU utilization unit =percent value = average agent = AWS EC2 min_warning = 75 min_critical = 85 [q2] name = __label__ type = generic_data description = Monitors CPU free space unit = percent value = last min_warning = 11 max_warning = 20 min_critical = 0 max_critical = 10 [ebs_write_time] name = __label__ type = generic_data description = Average EBS Volume Total Write Time value = last agent = AWS 2 Cloudwatch [ec2_cpu] name = __label__ type = generic_data description = Cpu usage agent = AWS EC2 value = last It would create the following: An agent called AWS EC2 (as defined explicitly in the two modules that execute ec2 queries above). This agent would contain the following modules: One module for each of the 10 instances of the operation with id ec2_cpu, with the following name: cpu_usage A module for operation q1 , with the following name: avg_cpu i-01e5fbcd406596f84 A module for operation q2 , with the following name: free cpu i-01e5fbcd406596f84 An agent called AWS general (the default agent) This agent would contain the following modules: A module for the ebs_write_time operation, with the following name: Average EBS Volume Total Write Time