What is Data Lake?; Learn all about it and Big Data

Data! More data! What is Data Lake?

Big Data is not only a more or less fashionable marketing “word”, but also contains a quite clear concept: the accumulation and processing of enormous amounts of data in order to take advantage of the knowledge they may contain. So far so good: it’s easy to describe (not so easy to do, though).

Now, the way to store and take advantage of that enormous accumulation of data that is the Big Data can be diverse. Traditionally, one of the ways in which companies have been storing data is in the so-called Data Warehouse; however, a new way of storing data -closer to the concept of Big Data- is gaining followers in recent years: we know it as Data Lake. What is Data Lake?

But what is Data Lake?

Data Lake is a data repository in which data is stored “raw”, with hardly any processing, in order to be used later, at the time it is considered appropriate. Continuing with our oil analogy, we could say that in a Data Lake the data are stored “raw”, just as they “come out of the ground” and without “refining”.

Data Lake are fed by all types of data, with different structures (they also accommodate structured data) and from heterogeneous sources. The key concept is that of “storage”; the idea is to save the data so that they can be processed and used when necessary.

Now, not everything is as simple as throwing them into a container. Each element of the Data Lake receives an identifier and extended metadata tags, so that it can be easily identified and retrieved. However, as we will see below, this treatment is much more basic than the one received by the data used in the Data Warehouse.

How is Data Lake different from Data Warehouse?

We could say that the main difference is found both in the quantity and in the “refining” of the data.

In a Data Warehouse, the data will be structured or discriminated according to their usefulness; only the data that we are going to use for the specific objectives that are tried to reach would have a place. In addition, they will be processed beforehand so that the system can use them and extract useful information. As we said, in the case of the Data Warehouse, the “refined” is much more exhaustive than if it were a Data Lake.

The Data Warehouse will be fed with data depending on its usefulness for a specific purpose and, in addition, will give these data the specific format so that they can be analysed. The objective to be achieved will usually be the answer to a specific question or series of questions, which will be reflected in the form of reports.

For example, the Data Warehouse can help a company detect customer demographics and identify buying patterns, with the goal of directing marketing efforts in one direction or another. Or it can be used to identify users who are most likely to leave with competition, with the aim of providing incentives for them to remain customers.

What is Data Lake? Data Lake works in a different way. This is a huge “lake” in which, as we said, the data are stored with a very basic pre-treatment, only with the aim of being able to be recovered when necessary processing and analysis. Thus, Data Lake can accommodate many different types of data, from different sources and in different formats. This requires, of course, that the storage capacity be enormous, often more than in the case of a Data Warehouse (this is one of the main reasons why Data Lake is usually considered closer to the concept of Big Data than the Data Warehouse).

The different structures of a Data Lake and a Data Warehouse will make each option offer different advantages and disadvantages. With regard to Data Lake, it is often said that they are more flexible and agile (but also more extensive) about Data Warehouses that are more structured and more efficient (but also that they are more rigid and less adaptable).

Both are different ways of storing and organizing large amounts of data and, therefore, each option may serve to a greater or lesser extent depending on the objectives to be achieved. In addition, they are not exclusive options.

And now that we know what a Data Lake is, how about spending a few minutes discovering Pandora FMS?

Pandora FMS is not a Data Lake, nor is it a Data Warehouse. However, it is another type of tool that can also offer great benefits to a company or organization. Pandora FMS is a flexible monitoring software, capable of monitoring devices, infrastructures, applications, services and business processes.

You want to get to know it a lot better? Click here: https://pandorafms.com/

Or you can also send us any query you may have about Pandora FMS. You can do this in a very simple way, thanks to the contact form that can be found at the following address: https://pandorafms.com/contact/

Our Pandora FMS team will be happy to assist you!

Pandora FMS team

Pandora FMS’s editorial team is made up of a group of writers and IT professionals with one thing in common: their passion for computer system monitoring. Pandora FMS’s editorial team is made up of a group of writers and IT professionals with one thing in common: their passion for computer system monitoring.