# Is the set theory relevant for databases

## Relational databases

### 1. Set theory

As the name suggests, relations form the basic structure of relational databases. In order to be able to speak formally precisely about the properties of relational databases, it is therefore important to understand the concept of a relation in a mathematical sense, which is why we begin with a brief repetition of the most important terms in set theory. We describe in general how sets can be formed and we define some operations on sets. Relations are then special sets on which further operations can be defined and carried out. We will do this in a few places, e.g. B. in the definition of the language or in the Cartesian product, deviate from the classical mathematical set theory. This makes the theory a little simpler and the mathematical definitions more closely match the actual concepts of relational databases. The terms that we introduce here form the basis for later describing the relational model exactly. This chapter also serves to define the notation.

### 2. The relational model

The basic idea of ​​the relation model is to save data in the form of relations. In this chapter we introduce the essential terms of the relation model, such as attribute, domain, schema and instance. We also discuss the concept of the primary key of a schema. In the last section we cover constraints on a database schema. In particular, we define the meaning of foreign keys, unique constraints and not null constraints.

### 3. Diagrams and modeling

Data modeling is an important step in the development of a computer science system. The point is to find a DB schema so that
1.
all required data can be saved in the DB schema,
2.
the data can be accessed efficiently and
3.
data consistency is guaranteed.
In this chapter we study the basic ways in which entities of different types (e.g. cars and their drivers) can relate to one another. We then examine how the table structure of a DB schema can map these types of relationships. In doing so, we will make substantial use of the constraints that we introduced in the previous chapter.
A DB schema is usually made up of a large number of tables. In a purely textual description of such a DB schema, the relationships between the tables can then no longer be clearly represented. We are therefore introducing a diagram notation for DB schemas with which we can graphically display the table structure of a relational database. Our notation is based on the so-called crow's foot notation.

### 4. The relational algebra

Relational algebra can be understood as the query language of the relational model. It is based on some basic operations (union, set difference, Cartesian product, projection, selection and renaming), which calculate a new relation from one or two given relations. Complex expressions can be formed from these basic operations, which describe complex calculations of relations. These expressions correspond to complex queries in a database and specify (on an abstract level) how the result of the query can be calculated.
In this chapter we formally introduce relational algebra. We then show how various join operations and relational division can be defined in the language of relational algebra. With a series of concrete examples we illustrate how database queries can be formulated as expressions of relational algebra. Finally, we extend relational algebra with grouping operations and aggregate functions.

### 5. SQL queries

SQL (Structured Query Language) is the most important language in the field of relational database systems. Although it is known as Query Language, SQL offers much more than just querying data. For example, you can define the structure of data, modify data in a database and set up security conditions. After a brief introduction to SQL, we will show in this chapter in detail how database queries can be formulated with SQL. We also consider the different types of join operations, grouping and aggregation, as well as recursive queries.

### 6. SQL for data definition and data manipulation

In this chapter we show how data can be inserted, changed and deleted with SQL. We also study so-called sequences. These are database objects with which unique values ​​can be generated. These are important for correctly filling primary key attributes. We also consider the possibilities of SQL for defining tables and constraints. We will introduce the most important data types of PostgreSQL and show how default values ​​for attributes can be defined. In addition, we examine the functionality of unique, not null, primary key, foreign key and check constraints and give examples of how changes to attributes can result in a cascade of changes in other tables. Finally, we introduce dynamic and materialized views.

### 7. Query Optimization

This chapter is about how a database system can query efficiently, i. H. quickly, can answer. To do this, we consider so-called indices in the first part. These are auxiliary objects that support the search for specific data. In particular, we are introducing indices that point to B+-Trees or hash functions based.
We then study methods for logical and physical query optimization. Logical optimization is about reformulating a given query in such a way that it delivers the same result but can be calculated more efficiently, for example because smaller intermediate results are generated. This means that less storage space is required and the calculation can be carried out more quickly.
Physical optimization deals with the selection of algorithms that implement relational algebra operations. We introduce three algorithms to compute joins: nested loop join, merge join, and hash join. We also show how PostgreSQL evaluation plans represent the various algorithms.

### 8. Multi-user operation

Certain sequences of database statements must be executed as a unit. We call such a unit a transaction. In this chapter we introduce the ACID transaction model and show how transactions are handled in SQL. In theory, parallel transactions should not influence each other. H. they are carried out in isolation. In practice, however, this condition is relaxed for performance reasons. We study the different degrees of isolation that SQL offers. To this end, we consider three phenomena that can occur in parallel transaction processing: dirty reads, non-repeatable reads and phantom reads. We also show how PostgreSQL implements the various degrees of isolation using a multi-version concurrency control (MVCC) architecture. With this approach, several versions are stored for each tuple and a visibility condition is defined as to which transaction sees which version of a tuple.

### 9. Normal forms

In a poorly chosen database schema, insert, change and delete operations can lead to certain anomalies. For example, it may be that several change operations have to be carried out for a change or that certain data cannot be saved in the given schema. Simply put, these anomalies always occur when a schema models several concepts. To make this more precise, we introduce the concept of functional dependency. This helps us to break down relation schemes in such a way that a scheme only models one concept and thus anomalies are avoided. We then say the scheme is in normal form. In this chapter we particularly study the third normal form (3NF) and the Boyce – Codd normal form (BCNF). We are interested in whether a decomposition into a normal form is lossless and dependency-preserving. We also give the relationship between 3NF and transitive dependencies.

### 10. Calculation of normal forms

This chapter deals with algorithms that decompose a given schema into normal form. First we introduce the Armstrong calculus, with which the envelope of a set of functional dependencies can be calculated. Then we consider an algorithm to compute the envelope of a set of attributes under a set of functional dependencies. This means that we calculate all those attributes which are functionally dependent on a given set of attributes. We also give an algorithm to calculate a minimal coverage of a set of functional dependencies. Finally, we investigate a decomposition algorithm to decompose a schema into BCNF without loss, as well as a synthesis algorithm to decompose a schema into the third normal form in a lossless and dependency-preserving manner.