Apache Spark plugin

Plugin to grab metrics from all executors of all running or finished apps on your Spark server.

Introduction

Ver. 03-09-2021

Plugin to grab metrics from all executors of all running or finished apps on your Spark server.

Type: Server plug-in 

Compatibility matrix

Systems where it has been tested

CentOS 7, Fedora

Systems where it should work

Any linux system

Prerequisites

Required:

Configuration

The plugin makes use of some Spark rest api endpoints, in order to access them from the plugin we will have to have a series of active ports that are not blocked by the firewall, these are the following :

firewall-cmd --permanent --zone=public --add-port=6066/tcp
firewall-cmd --permanent --zone=public --add-port=7077/tcp
firewall-cmd --permanent --zone=public --add-port=8080-8081/tcp
firewall-cmd --permanent --zone=public --add-port=4040/tcp
firewall-cmd --permanent --zone=public --add-port=18080/tcp
firewall-cmd --reload

6066: Rest url (cluster mode).

7077: Server master.

8080 : Web UI.

4040: Para aplicaciones en ejecución.

18080: Para el history server.

In order to make use of the history server we will have to enable spark.eventLog.enabled, spark.eventLog.dir and spark.history.fs.logDirectory in spark-defaults.conf.

You can find a conf template in /conf

1.png

We will create in that path the file with:

vi spark-defaults.conf

And we'll leave it at that, you can choose the path where you want to save the events.

2.png

Now we can activate the history server, in /sbin, the same path where the master, workers, etc. are activated.

3.png

We will start it with :

./start-history-server.sh

4.png

If we go to the log that returns the output we will see how correctly it has been started and its url.

5.png

If we access the url we will see the history server.

6.png

Note:

This is assumed, but for the plugin to work you will need to have the master server active, as well as have applications running or that have been run and have finished, as it is from the applications that we will take the metrics, specifically from their runners.

 

Plugin general parameters

 

./pandora_spark -i <ip-with-port> -[ -g <group> ] [ --data_dir <data dir> ] 

If the execution is correct we will see a 1.

7.png

If you want to get the data from running applications, enter the ip with port 4040, if you want to get the data from finished applications enter port 18080.

Plugin specific parameters

The plugin has the following parameters:

Parameter Description
-i <ip-with-port> --ip <ip-with-port> ip with port, mandatory.
-g GROUP, --group GROUP Pandora FMS target group (optional)
--data_dir DATA_DIR Pandora FMS data directory. By default it is /var/spool/pandora/data_in/ (optional)

Help example:

8.png

Configuration in Pandora

Installation from the console

To register the plugin, from the console, go to the "register plugin" section.

register_plugin.png

Click on select file.

register_plugin2.png

Select the .pspz2 file containing the plugin.

register1_spark.png


Once uploaded, a message will be displayed indicating that it has been successfully uploaded.

register2_spark.png

Once the plugin is registered, we will see it in the plugins section.

serversingles.png

In the parameters section, all the parameters contained in the plugin will be displayed, although it is only mandatory to use the ip parameter.

register3_spark.png

Below we can assign the required value to each macro.

register4_plugin.png

 

The best way to manage server plugins in Pandora is from "/usr/share/pandora_server/util/plugin" so we will send it to that path.

Then we will move to the folder where we have put it ("/usr/share/pandora_server/util/plugin" is the recommended one").

Remember: You have to install the dependencies that the Requests module needs in your system, it is explained in the configuration section.

We move from home with :

cd /usr/share/pandora_server/util/plugin/

We run the plugin to see that it works:

./pandora_spark -i <ip-with-port> -[ -g <group> ] [ --data_dir <data dir> ]

7.png

 

As a server plugin

Go to servers > plugins:

image-1629974405286.png

Click in add:

image-1629974430627.png

We put in the name and description of your choice:

9.png

We enter as command the path to the plugin, and as parameters the ones we have entered by executing the plugin, the "_field_" fields are macros defined below.

10.png

We put for each macro the description that you prefer and as value the data of your ip.

11.png

Once this is done, if we execute the plugin from the terminal, the agent with the modules will have been created.

 

Modules generated

An agent will be created for each application of our server with data of all its executors, the name of each module will be composed of the id of the executor plus its function.

Ejecutors modules

Nombre del módulo
id
hostPort
rddBlocks
memoryUsed
diskUsed
activeTasks
failedTasks
completedTasks
totalTasks
totalDuration
totalInputBytes
totalShuffleRead
totalShuffleWrite
maxMemory

sparkmodulos.png