Let’s talk about the BGP protocol and its monitoring
This post is also available in : Spanish
BGP Protocol: Relevance, Basic Concepts and Monitoring
The BGP protocol has become a fundamental part of the operation and performance of the Internet.
Therefore, it is obvious that we should undertake the BGP monitoring process when monitoring applications or those services that are offered being Internet the basis of its communication.
However, when it comes to the BGP protocol, there is a feeling among those responsible for technology that this is a complex issue, that actually it is not so necessary to address, that there is little we can do, or that it is actually the responsibility of the contracted ISP.
Now, if we remove ourselves from the protocol for a moment we will see that we have always been concerned with implementing WAN monitoring processes, whatever the communications solution implemented.
MPLS links, VPN-based connections, protocols such as OSPF or EIRGP, broadband network connections, all these are elements that have implied challenges for a monitoring platform.
In fact, in this same blog we recently published an article on WAN Monitoring where we analysed the challenges associated with the implementation of the Internet-based model.
So, since the BGP protocol is a piece of our WAN environment, we understand that we should at least consider this to potentially undertake BGP Monitoring projects.
The reason for this view of the BGP protocol is based on some false beliefs, among them:
–ISP problem: The general view of the BGP protocol is that it is used by ISPs to resolve the issue of the routes that packets will follow. Therefore, if there is a problem in BGP it is the problem of the ISPs and there is little we can do about it.
This vision is changing, since the WAN architecture is changing. The design has evolved due to the penetration of cloud technologies, the use of more than one ISP to balance traffic, and the fact that more and more companies’ core businesses depend on the Internet.
–BGP is a very good protocol and does not fail: This is certainly a very controversial point. The BGP protocol has proven to be very efficient, capable of adapting well to the growth of the Internet and providing a good response to security challenges, but it does fail.
Many analysts claim, of course, that the erroneous behaviour that we can observe in the implementations of the BGP protocol is due to the not always healthy relationship between ISPs rather than to the nature of the protocol itself.
Check this article published by Ivan Pepelnjack.
Having clarified these two points, we understand that BGP monitoring is an issue that we cannot ignore and we invite you to think about the challenges it implies.
As far as BGP monitoring is concerned, we believe there are three main challenges.
The first one points out the need to develop a certain level of internal expertise in the BGP protocol.
This knowledge will allow us to define when a problem is associated or not with a bug in BGP and will also allow us to develop technical support procedures to follow up on those problems.
The second challenge refers to the Visibility of the WAN platform. Here we are talking not only about the visibility of our internal platform, but also about achieving visibility over the functioning of the ISPs’ BGP nodes.
In fact, in order to take on the challenge of this visibility, tools such as BGPlay have emerged that provide timely information on how certain packets move between BGP nodes on the Internet.
These applications are useful; however, the idea of BGP monitoring should establish an information search process that is also integrated into the monitoring of our entire platform, including especially the monitoring of our applications.
The third challenge: to integrate BGP monitoring to the monitoring scheme of our entire platform so that it does not become a merely informative exercise.
Having said that, we now propose to review the problems that may arise in the application of the BGP protocol, to identify where and how the monitoring platforms could meet certain needs.
But first, we must look at some concepts associated with BGP.
How does BGP work?
Here we will try to sum up the functioning of BGP, but first we must say that BGP is a broad and super interesting subject that we recommend you to study in depth.
Let’s start by remembering that BGP (Border Gateway Protocol) is a layer 4 routing protocol in the OSI model, documented in RFC 4271 and using TCP port 179.
What does this routing do and between which entities?
Well, BGP generates routes for IP packets between Autonomous Systems.
An autonomous AS system corresponds to a group of networks that use the same entity or network operator (usually ISPs) to send and receive Internet traffic.
These operators receive from a governing body an address or autonomous system number (ASN) of 16 or 32 digits. Let’s see an example of the structure of an ISP in the following image:
Description: Example of Autonomous System Structure
In the picture, our network would be one of the user networks of the AS 100 autonomous system.
The ISP will then have one or more BGP routers with which it connects to other BGP routers from other AS (AS 200 and AS 300), as well as a structure composed of routers that do not necessarily apply the BGP protocol.
The basic functions of a BGP router are:
- To advertise your subscribers’ networks.
- To propagate information on possible routes.
- Based on this information, to choose the most convenient route for each particular traffic.
It is important to note that BGP routers have information about their users’ networks in routing tables.
By default, a router must share or announce the information contained in its routing table with its neighbouring nodes. This is done based on sessions that are defined between the BGP nodes. Nodes connected by a session are called neighbouring nodes.
However, we must clarify that filters are usually applied to the information emitted and received by a BGP router. These filters are defined according to the routing and security policies that each ISP is willing to make.
Here it is important to point out the radical difference between BGP and other routing protocols.
While other protocols are usually driven by fairly simple routing policies that only consider the need to find the optimal route, BGP, as a result of the relationship between ISPs and the large volume of traffic, tends to work based on routing policies that can be very complex.
In fact, these policies contemplate or can contemplate a quite considerable set of parameters. Among them, to mention just a few, we have the weight and length of the route, the origin of the packages, favourite neighbouring router, etc..
Now, let’s move on to the problems we may encounter.
The basic problems can be evaluated according to the fundamental processes of the protocol.
In the process of creating a session between two BGP nodes, things like the IP addresses of the nodes may not match, the addresses of the autonomous systems (ASN) may not match, and even the TCP port 179 that uses the BGP protocol may be blocked by one of the devices.
Adjacency problems can be monitored with a systematic check of active sessions between BGP nodes. In fact, Pandora FMS has a plugin that verifies BGP sessions through the SNMP protocol application.
Check the Pandora FMS BGP plugin through the following link
If the adjacencies are working correctly, the next thing that may fail and should therefore be monitored is the propagation of the information.
In this sense, an error can be generated when the BGP node does not include the network prefix in the BGP routing table.
It should be noted that this is possible because unlike other routing protocols, a BGP router will not automatically include in its routing table the information of the networks that are directly connected, but will depend on how it is configured.
Therefore, it is useful to have a process that verifies that a network prefix is or is not in the BGP routing table of the corresponding node.
Then, even if the prefix is included in the routing table, the propagation process can fail at any point on the Internet, so it may be interesting to check the propagation with some periodicity. For this there are several applications that given the information of a network verify the propagation by nodes. Of course, the results obtained here should not be integrated with our monitoring platform for the entire platform.
On the other hand we know that internally the propagation can also present problems, since our router can stop receiving updates of the routes or can receive more information than its physical capabilities allow it to handle.
Here the monitoring platform must include the router that integrates the contracted Internet service; we refer to the router as equipment, its memory, its CPU usage percentage, etc.
SNMP can be a good ally at this point; it is necessary that for each router model we check whether in the object description (OID) and in the associated database there are the correct metrics to monitor these situations.
We hope that this article will encourage you to think of the BGP protocol as an integral part of their WEB applications and services, and to evaluate the real need for a possible monitoring scheme.
In any case, it is evident that the key piece to undertake a BGP monitoring project, whatever its scope, is to have a monitoring tool for general purposes such as Pandora FMS.
If you still don’t know all the benefits that Pandora FMS can bring you, find out by clicking here.
If you have more than 100 devices to monitor you can contact the Pandora FMS team through the following form.
Also, remember that if your monitoring needs are more limited you have at your disposal the OpenSource version of Pandora FMS. Find more information here.
Don’t hesitate to send your questions. Our Pandora FMS team will be happy to assist you!