In Star Trek: TNG, when someone needs a piece of data, they do not open an SQL console or spin up a MongoDB cluster. They talk to LCARS, the Federation system with the gentle voice of Nurse Chapel, which instantly returns technical manuals, diplomatic archives, sensor readings and even Captain Picard’s Earl Grey recipe. In 2026, LCARS is still science fiction, but its core idea, storing and retrieving information of radically different kinds with the right tool, defines the challenge in front of us: there are as many types of databases as there are types of questions we want to ask that data.
And choosing the wrong database is like storing secret CIA files in a spreadsheet. The system will let you do it, but you will pay the bill later in performance, scalability, maintenance, cloud bills and/or tearful early mornings in front of an unresponsive server.
That is why this complete guide will cover the main types of databases, their real-world examples and where they shine or fall apart.
The goal is not to sell you the latest trend, but to make sure that, until LCARS becomes real, you leave here with clear knowledge and criteria to choose the right solution for the challenge in front of you.

What a database is

I hate clichés, but it is true that every journey begins with a first step, and this one is necessary to build a truly complete guide.
A database is, essentially, an organised set of persistent information, designed to store, query and modify the data it contains reliably.
So far, that is the textbook definition, but the interesting part is not to confuse three elements that are often mixed together in far too many conversations:

  • The database itself. In other words, the data organised according to a set of rules.
  • The DBMS or database management system. This is the software that does the “dirty work” required for it to function, such as managing physical storage, queries, concurrency, security or backups. PostgreSQL, MongoDB or Oracle Database are DBMSs.
  • The data model. Or the theory behind how information is specifically organised, which may be relational, document-based, graph-based, vector-based… This is the logic implemented by the DBMS.

Or, more simply and from a bird’s-eye view:

  • The model is the architect’s blueprint.
  • The DBMS is the builder with the tools who constructs and maintains it.
  • The database is the house where information lives.

Understanding this means choosing better and avoiding sterile debates such as: “Is SQL better than MongoDB?”.
That question mixes a language, a model and a product in just five words, so before comparing anything or making a decision, each concept needs to be put in its place.
If we want an academic view of the term without builders or Star Trek, IBM has a good reference in case we need to go deeper into the theoretical foundation. But for what matters to us, let’s keep painting the bigger picture.

Main types of databases

Here we find a common challenge: there is no single taxonomy. Depending on the angle we are interested in, there are several ways to group databases, such as:

  • By the model of that data: relational, document-based, graphs…
  • By architecture: monolithic (centralised), distributed, federated, etc.
  • By deployment: on-premise, cloud, managed…
  • By use case: transactional, analytical, cache and more.

This is the right thing to point out if you want to address the topic accurately, but of course, it does not help much when making technical decisions, which is what interests us here.
For this purpose, the most useful classification is by data model, complemented by modern families that do not fit into the old “SQL vs NoSQL” dichotomy.
A good reference is the general classification by model maintained by IBM, which is fairly aligned with what we are going to cover, but here we will go deeper than Big Blue so that you have a truly definitive guide.
This will be the map for the journey: relational databases, NoSQL in its different forms (document, key-value, column, graph), time series, vector databases, NewSQL, cloud or DBaaS and, finally, analytical stores and data warehouses.
So let’s start at square one by examining…

1. Relational databases

This is what first comes to mind for any technician when they hear about databases, and for good reason.
They have been with us since the seventies, were proposed by E. F. Codd at IBM and, today, remain the default first choice for most applications.
They organise information in tables with rows (records) and columns (attributes). Each table usually has a primary key that identifies its records and may also have foreign keys that link it to other tables.
On top of that fundamental structure, and until we can talk to LCARS, queries are made with SQL (Structured Query Language), a declarative language where we tell the engine what we want and it takes care of how to get it.
Its greatest virtue is that it provides the so-called ACID guarantees (Atomicity, Consistency, Isolation, Durability). Or, to put it in plain terms, if a bank transfer subtracts money from one account and adds it to another, either both things happen, or neither does.
This ensures that you are never left with only half the process executed.
Typical use cases? There are many, surrounding us in our daily office life and instant coffee routine: ERP, CRM, financial systems, inventories, billing, transactional applications and, in general, practically any project where data is well structured and consistency is the main directive to consider.
There are both Open Source and proprietary options, and examples of this type of database include MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server or MariaDB.

2. NoSQL databases

NoSQL does not mean saying no to SQL, paradoxical as that may sound (in fact, many NoSQL engines offer similar query languages). The term, popularised in 2009, is better understood as Not Only SQL.
In other words, a family of non-relational databases designed for cases where the tabular model falls short.
And when does that happen?

  • When data does not have a fixed schema.
  • When it is necessary to scale horizontally by distributing load across many nodes.
  • When availability and latency are prioritised over immediate consistency, according to the famous CAP theorem.

This theorem, as if it were a kind of Heisenberg principle, states that in the event of a network partition, a distributed system cannot simultaneously guarantee consistency and availability.
To make it a little clearer, we use NoSQL when there is a lot of information, it constantly changes shape, arrives at high speed and fits better when distributed across several nodes.
For the constant chaos of the Internet, with millions of people liking a video on TikTok (or whatever it may have), or with everyone watching series on Netflix at the same time, NoSQL can instantly store those likes or, in Netflix’s case, distribute data across thousands of servers.
Using a traditional relational database without replicas, partitioning, cache or sharding, the application could become blocked by the traffic surge and the conflict between incoming data.
With these premises, NoSQL is a broad field that groups together very different models, so saying “we use NoSQL” without specifying anything else is like saying “I work with vehicles” without clarifying whether we fly helicopters or pilot submarines.
To avoid falling into that confusing generalisation, it is worth clearly understanding the difference between each subfamily, as well as the differences between SQL and NoSQL when it comes to choosing.

3. Document databases

The name already gives the game away: they store documents, usually in structured formats such as JSON, BSON or XML.
Each document is self-contained, so it contains its own fields and its own structure (instead of all of that being determined by the database) and does not need to look like the one next to it.
This is extremely useful when data has a variable schema or changes frequently.
In real life, this translates into product catalogues, user profiles, content management systems, customer-specific configurations or any application where adding a new field does not mean rewriting half the database.
Examples of this type of database include MongoDB, Couchbase and RavenDB. If you have to run the latter in production, by the way, here is how to monitor RavenDB without losing sleep.

4. Key-value databases

Conceptually, these are the simplest databases because each piece of data is stored as a key-value pair, that is, like a giant dictionary. Like this:

  • The key identifies the data.
  • The value can range from a text string to a binary blob (an image, a PDF…).

Here we are taking the Ferrari out, because the key factor is raw speed.
There are no schemas, no relationships, no complex queries. We ask for a key and the database returns its value at warp 9 speed.
But as nothing in life comes for free, the price is that we should not expect to perform a join or an analytical query, for example. Other families are already there for that; here, we live at lightning speed.
Use cases? Many where the data and its usage demand speed above everything else, such as application caches, user sessions, real-time counters, lightweight queues…
Examples of this type of database include Redis, Riak or Amazon DynamoDB.

5. Column-oriented databases

This label usually groups together both analytical columnar databases and wide-column or column-family stores, although internally they do not solve exactly the same problem.
When you look at them from a distance, they seem like relational databases, but if we put on our close-up glasses, we notice something interesting: they store information by columns instead of by rows.
This difference may seem minor, but it radically changes what the engine can do quickly.
So, when a query only needs a few columns from millions of records, reading by columns avoids loading everything irrelevant from disk.
What does this mean in practice and for our ultimate goal of making the best decision? These databases are highly efficient for analytics, aggregations and large volumes of distributed writes.
As a result, they are optimal for large-scale analytics, storing and querying logs, events, telemetry, IoT data or scenarios with intensive writes distributed across nodes.
Examples of this type of database include, among analytical columnar databases, ClickHouse, and among wide-column stores, Apache Cassandra, HBase or ScyllaDB.

6. Graph databases

Relational data is good at representing relationships, but only as long as those relationships remain relatively “monogamous” or involve a not-too-extensive kind of polygamy. However, things get worse when there are many relationships, when they are deep or when they are constantly changing.
In those cases, graph databases step forward.
They represent information as nodes (entities) and edges (relationships between those entities), with properties in both.
The key point is that when we query this type of database, queries travel through the graph by following relationships, rather than scanning tables.
What does this mean in the real world? That they are ideal for recommendation engines, dependency analysis, social networks, knowledge modelling, identity management…
Examples of this type of database include Neo4j, Amazon Neptune or ArangoDB.

7. Time series databases

No one escapes the domain of time, and neither does IT management. That is why, in IT, we find data whose nature consists of a sequence of values associated with time. For example, the temperature of a room every minute, the hundreds of coffees per second consumed in the department, stock market prices every millisecond…
On paper, we could store all of that in a relational table, but in practice, we will soon have billions of data points in most cases.
At that point, the relational engine quickly raises the white flag.
For this reason, time series databases are optimised for massive writes, time-based compression and window queries (such as the last minute, the average of the last three hours, last year…).
Because of this, this type of database has become the backbone that supports modern monitoring.
In day-to-day operations, they are used for infrastructure monitoring, IoT, industrial telemetry, capacity planning, financial metrics, etc.
Examples of this type of database include InfluxDB, TimescaleDB

8. Vector databases

If you have not touched databases for a couple of years, this is the novelty that has changed the landscape the most, mainly because of the omnipresent AI we can no longer escape and which has multiplied its importance and use.
Vector databases store embeddings (like the numerical translation that Artificial Intelligence makes of the world), multidimensional numerical representations of text, images, audio or any content that an AI model has “understood” (note the quotation marks around the word).
The key here is that we can perform semantic similarity searches on those vectors. This way, instead of searching for an exact word, such as “car”, they find results about “automobile”, “vehicle” or “red convertible”, because their vectors are close in space.
Real-world use cases include semantic recommendations, Retrieval Augmented Generation (RAG) for applications with LLMs, multimedia search, content classification…
And examples of this type of database include Pinecone, Milvus, Weaviate

9. NewSQL databases

As usually happens in IT, databases present us with a classic dilemma.
Traditional relational engines give us solid transactional guarantees, but they are difficult to scale horizontally, as we have seen. Meanwhile, NoSQL scales well, but again, nothing comes for free, and the tribute it demands is that we give up some of those guarantees.
NewSQL tries to solve the dilemma and let us make the omelette without breaking the eggs, so it keeps SQL and the aforementioned ACID, but takes them to a distributed architecture.
These databases try to reconcile the best of both worlds…

  • Distributing that data across nodes.
  • Managing consensus with algorithms such as Raft or Paxos.
  • Maintaining the illusion of a single relational database.

Their real-world use cases include distributed transactional systems, fintech platforms with global presence, multi-region SaaS…
Examples of these databases include CockroachDB (my favourite name), Google Spanner, YugabyteDB

10. Cloud databases and DBaaS

Yes, I know, more than a data model, this is actually a deployment model. Here, it is worth distinguishing three levels:

  • Self-managed in the cloud. In other words, you install PostgreSQL on an AWS or GCP VM. The cloud gives us the machine, and we take care of everything else ourselves.
  • Managed. Here, the provider manages the engine (patches, backups, providing high availability…), but we still choose the product, such as Amazon RDS, Azure SQL Database…
  • Database as a Service (DBaaS). Here, both the machine and the engine are the provider’s responsibility, offering a database as an API. DynamoDB (which, as we have already seen, is also a key-value database, but AWS offers it in this format), Firestore or MongoDB Atlas work this way.

The use cases are clear and appear when we need rapid scalability, when we are a small team without specialised database administrators, when we carry out global deployments, when we test prototypes where we do not want to worry about hardware…
But in exchange for this convenience, we obviously take on risk.
In this case, provider dependency, variable costs such as pay-per-use and, in some cases, less control over the fine-tuning of the infrastructure we work on.

11. Data warehouses and analytical databases

Let’s imagine the following case. We need to aggregate millions of rows of data because we want to extract trends (OLAP, Online Analytical Processing).
Here, a transactional database, optimised to read and write individual records quickly (OLTP
, Online Transaction Processing), is not the best option.
That is what data warehouses and analytical databases are for: engines designed to answer complex queries over long historical periods. Usually, with column-based storage (as we saw above) and aggressive compression.
When our need is Business Intelligence, reports, executive dashboards or historical analysis, this option steps onto the field.
Examples include Snowflake, Google BigQuery, Amazon Redshift

Comparison of database types

The journey through the different realms of databases has been long, so it is worth stopping and recapping the key points of each one in this table (by the way, as a curiosity, you can see the popularity of each model here).

Type

Data model

Typical use cases

Advantages

Limitations

Examples

Relational

Tables with relationships

ERP, CRM, transactions…

ACID, SQL, mature ecosystem

More complex horizontal scaling in traditional models, rigid schema

PostgreSQL, MySQL, Oracle

Document

JSON/BSON documents

Catalogues, CMS, profiles…

Flexible schema

More expensive complex queries

MongoDB, Couchbase

Key-value

Key-value pairs

Cache, sessions…

Minimal latency

No complex queries

Redis, DynamoDB

Columns

Column-based tables

Analytics, logs

Fast selective reads

Write cost

Cassandra, HBase

Graphs

Nodes and edges

Fraud, recommendations…

Traversing relationships

More demanding modelling

Neo4j, Neptune

Time series

Time-based points

Monitoring, IoT…

Compression, windows

Limited model outside time-based data

InfluxDB, TimescaleDB

Vector

Embedding vectors

RAG, semantic search…

Similarity, AI

New model, evolving ecosystem

Pinecone, Milvus, pgvector

NewSQL

Distributed relational

Global SaaS, fintech

SQL + horizontal scale

Operational complexity

CockroachDB, Spanner

Cloud / DBaaS

Variable

Any cloud-native workload…

Less administration

Vendor lock-in, variable cost

RDS, BigQuery, Atlas

Data warehouse

Analytical columnar

BI, reporting

Efficient massive queries

Not for OLTP

Snowflake, Redshift

How to choose the right type of database

Since this content aims to be practical for our day-to-day work, we have already seen real examples and common use cases for each type of database, which already provides quite a lot of guidance when choosing.
Now we are going to go deeper into how to make that choice, and the first step is to confirm the intuition that many will already have had:
In most situations and organisations, the decision will consist of a mix of database types. Because we will have very different needs within the same organisation (such as monitoring, reporting and user management, for example), and for each of them we will need the specialised type of database suited to it.
And to move from each of those specific needs to the right family, here is the shortcut. Always start with the question that decides the most (what is the workload for?) and work down from there.

Step 1. What is the dominant purpose of this data workload?

  • Serving an application (operational, day-to-day work, what is known as OLTP) → go to Step 2.
  • Extracting trends from long historical periods (reports, Business Intelligence, what is called OLAP) → data warehouse or analytical columnar database (Snowflake, BigQuery, Redshift, ClickHouse).
  • What matters is the relationships between entities (recommendations, fraud, networks, dependencies) → graphs (Neo4j, Neptune).
  • Data that is a sequence over time (metrics, telemetry, monitoring) → time series (InfluxDB, TimescaleDB).
  • Search by meaning, usually for AI (RAG, semantic similarity, embeddings) → vector (Pinecone, Milvus, Weaviate, pgvector).

Step 2. If it is operational, what shape does the data have?

  • Tabular and well defined, with consistency as the sacred priority:
    • Is scaling on a single server or region enough? → relational (PostgreSQL, MySQL).
    • Do you need SQL and ACID guarantees, but at global scale? → NewSQL (CockroachDB, Spanner).
  • Variable schema that often changes shapedocument (MongoDB, Couchbase).
  • You only request the data by its key and want minimal latency (cache, sessions, counters) → key-value (Redis, DynamoDB).
  • Massive writes distributed across many nodes (logs, events, IoT at scale) → wide-column (Cassandra, ScyllaDB).

And one more decision, which stands on its own: who operates it?
In real life, regardless of the family the tree has led you to (relational, document-based or whichever one), this question is independent of the data model.
In technical jargon, this is called an “orthogonal” decision, which is just a technical way of saying that these are two decisions that do not overlap. Whatever you answer to one, the other remains open.
And it has three levels:

  • You operate it on a virtual machine → self-managed in the cloud (maximum control, maximum work).
  • The provider operates the engine (patches, backups, high availability) → managed (Amazon RDS, Azure SQL).
  • The provider operates everything and serves it to you as a service → DBaaS (DynamoDB, Firestore, MongoDB Atlas): less administration, but greater dependency on that provider.

And before continuing, here is the golden rule that closes the tree.
If in doubt and your data is reasonably structured, start with a relational database.
It is the default option and the one that usually causes the fewest regrets.
That said, and although we will also have deduced it from the above, we need to issue the usual irritating and honest warning.
There is no “best database” or model superior to all the others, only those that are preferable depending on our specific use case.
That is why we can provide a high-level guide tree and the following recommendations, but not a definitive checklist; that would not be professional.

The question of constraints and additional questions during the database decision

In the real world of limited budgets and constant compromise, choosing a database means choosing what is optimal for our case, but above all, choosing according to the constraints we have.
That is how things are in this material world. The tree has given us the technically ideal family, but before marrying it, it is worth putting it through two filters. These are questions that will save us headaches if we ask them before the choice, not after it.

First filter: can the chosen family handle your technical reality?

The tree has guided us mainly by the shape of our data (whether it is tabular and stable or a shapeshifter that changes form every week), so these additional questions will confirm that this branch does not break under the rest of your case:

  • Volume: are we talking about gigabytes, terabytes or petabytes? Is growth linear, seasonal or exponential?
  • Read and write pattern: do we perform many small writes, many reads, massive analytical reads, a mix…?
  • Consistency: do we need every read to always see the latest data, or can we tolerate consistency that is only eventual?
  • Scalability: will vertical scaling be enough for us (that is, adding more CPU or RAM at kidney-selling prices), or will we hit a wall and need horizontal scaling (more machines or nodes)?
  • Latency: do we live and die by responding in microseconds, or can we accept seconds?
  • Second filter: do our constraints let us choose?

This is where the technically ideal option often falls apart. And unfortunately, it is usually the filter that has the most authority in the real world. This filter takes into account:

  • Costs: naturally, because in life, Money rules. We are talking about licence, hardware or cloud bill costs, as well as maintenance. We are talking here about the real Total Cost of Ownership (TCO), not the marketing version.
  • Technical team: does anyone know how to operate Cassandra at three in the morning (no one will raise their hand even if we ask), or do we only have PostgreSQL admins?
  • Availability: can we tolerate minutes of downtime without losing business, or does the SLA say 99.99% and Troy burns every minute?
  • Monitoring: is the operations tool we have capable of talking to the database engine we are going to choose?
  • Legislation: are there sensitive data that we cannot, for example, externalise in solutions outside Europe due to the GDPR or similar laws that affect us?

Obviously, depending on the nature of our organisation and project, each question will carry a different weight, but ignoring them usually ends with the team carrying out later database migrations in production… Every IT manager’s dream (nightmare).

Common mistakes when choosing a database

At Pandora, we are no longer kids. More than 20 years in the trenches are watching over us (in case our age was not already clear from the pop culture references in these articles). And during that time, we have often come across the same mistakes in the databases chosen for many IT infrastructures we have worked with.
The main ones are:

  • Choosing by trend rather than by use case. A far too widespread evil in technology, where the shine of the new casts a powerful spell. But the fact that MongoDB was successful on Hacker News for a while does not mean it is what our industrial ERP needs.
  • Using NoSQL for everything. Entire teams have discovered, through pain and time, that many challenges with well-structured data are better solved with a well-designed PostgreSQL.
  • Ignoring the operational cost. Because the “free” engine that requires three full-time engineers is obviously not free.
  • Not anticipating growth. Realising too late that what works with one hundred thousand records may not work with one hundred million.
  • Forgetting backups, security, high availability and monitoring from the beginning. These are not optional pieces, but a fundamental part of the product.
  • Not considering the team. Because the “best” database is very often the one our people can operate well.

Databases and monitoring

An organisation’s databases are its Library of Alexandria, and that is why there is nothing more crucial in today’s information and knowledge economy.
That is also why it is important that this “library” does not end up in flames like the one from history.
So, every database that supports critical processes needs to be monitored just like the rest of the assets. And we must treat it as a vital organ, not as a black box.
Active connections, slow queries, latency, disk usage, locks, errors or growth are metrics that should be kept under control long before the user warns us with that typical and very clarifying message: “This is slow”.
Each database model we have seen also requires a different approach.

  • In a relational database, you monitor deadlocks (blocks caused by operations in the database because, for example, there are several concurrent requests and one is waiting for another to release a piece of data) and indexes (in case they are missing, unused…).
  • In a document database, the size of collections and query times.
  • In a key-value database, cache hits (requesting data and finding it quickly) and evictions (what is removed from fast memory to keep that cache tuned with the important data we need to find at full speed).
  • In a time series database, cardinality, the number of unique combinations generated by the data, because if it becomes huge, performance will suffer.

Without monitoring, we may have chosen the best database, but then it can deliver terrible performance because of one of the events mentioned above. If that happens, it may look as if we made a mistake during implementation, when in reality the mistake was in management.
In this guide, we have travelled through the extensive territory of this topic, providing criteria for choosing successfully and marking on the map the minefields and essential stages of the journey.
And it has been long, like the journey of the Fellowship of the Ring, but it was necessary for it to be truly complete and useful in day-to-day work.
So I think it is important to reinforce some of the main key points and clarify the most common doubts.
Let’s get to it.

Frequently asked questions about choosing databases

What are the main types of databases?

Relational, document, key-value, column-oriented, graph, time series, vector, NewSQL, cloud/DBaaS and analytical databases (data warehouses).
The first families and time series databases are the most established, while vector and NewSQL databases are the most recent to consolidate, mainly due to the evolution of technology (such as AI) and the need to adapt management to the data being generated.

What is the difference between SQL and NoSQL?

SQL is a query language associated with the relational model (tables with a fixed schema and ACID guarantees).
NoSQL, on the other hand, groups together non-relational models (documents, key-value, columns or graphs) that prioritise schema flexibility and horizontal scalability, usually at the cost of some consistency, because life is always a compromise.
If we want speed, the price real life usually asks for is some degree of precision.

Which database is best for large volumes of data?

It depends on the type of workload.

  • For massive analytics, data warehouses such as Snowflake, BigQuery, Redshift or ClickHouse.
  • For distributed writing and large operational volumes, columnar engines such as Cassandra or ScyllaDB.
  • For global transactional workloads, NewSQL such as CockroachDB or Spanner.

What type of database is used for AI?

It is impossible to escape artificial intelligence, for which vector databases (Pinecone, Milvus, Weaviate…) are the best fit for RAG flows, semantic search and applications with LLMs.
This is due to their ability to store and compare embeddings efficiently, as we saw earlier.
These vector databases usually coexist with relational or document databases that store the raw data.

Which database is suitable for monitoring?

Here, the key is time series databases (InfluxDB, TimescaleDB…), as they are designed for massive time-based writes and window queries over that time.
Other types can also store metrics of this kind, but the reality is that they pay too high a price in storage and performance.

What is the difference between a database and a DBMS?

A database is the set of organised data or information. Meanwhile, the DBMS (Database Management System) is the software that manages it.
For example, PostgreSQL is a DBMS, while the dozens of databases you create inside PostgreSQL are the databases.
In the end, the conclusion to the question we asked at the beginning has that answer nobody likes, but it is the only true one: it depends.
It depends on what we need, on our organisation, on how its knowledge is structured, on what constraints we have in terms of money, technology, legislation when dealing with sensitive data that we may not be able to store off-site…
That is why what LCARS solves in fiction with a single friendly voice, in the real world we have to solve with a heterogeneous stack.
And knowing the available types of databases well is the difference between building that stack with judgement or through painful later migrations and watching the sunrise from the data centre window.

Shares