Fundamentals of Database Engineering
- Descrição
- Currículo
- FAQ
- Revisões
Database Engineering is a very interesting sector in software engineering. If you are interested in learning about database engineering you have come to the right place. I have curated this course carefully to discuss the Fundamental concepts of database engineering.
This course will not teach you SQL or programming languages, however, it will teach you skillsets and patterns that you can apply in database engineering. A few of the things that you will learn are Indexing, Partitioning, Sharding, Replication, b-trees in-depth indexing, Concurrency control, database engines and security, and much more.
I believe that learning the fundamentals of database engineering will equip you with the necessary means to tackle difficult and challenging problems yourself. I always compare engineering to math, you never memorize specific formulas and equations, you know the basic proves and derive and solve any equation one throws at you. Database engineering is similar, you can’t possibly say MongoDB is better than MySQL or Postgres is better than Oracle. Instead, you learn your use case and by understanding how each database platform does its own trade-offs you will be able to make optimal decisions.
One other thing you will learn in this course is the lowest database interface that talks to the OS which is the database engine. Database engines or storage engines or sometimes even called embedded databases is a software library that a database management software uses to store data on disk and do CRUD (create update delete) Embedded means move everything in one software no network client-server. In this video course, I want to go through the few popular database engines, explain the differences between them, and finally, I want to spin up a database and change its engine and show the different features of each engine.
Enjoy the course.
-
7Introduction to ACIDVídeo Aula
ACID which stands for Atomicity, consistency, isolation, and durability are four critical properties of relational database. I think any engineer working with a relational database like postgres, mysql, sqlserver oracle, should understand these properties.
In this course, we will go through the four properties and explain why each is critical to build and use a relational database successfully.
-
8What is a Transaction?Vídeo Aula
-
9AtomicityVídeo Aula
-
10IsolationVídeo Aula
-
11ConsistencyVídeo Aula
-
12DurabilityVídeo Aula
-
13ACID by Practical ExamplesVídeo Aula
In this video we will demonstrate Atomicity, Isolation, Consistency and Durability on Postgres, fully practical example.
-
14Phantom ReadsVídeo Aula
-
15Serializable vs Repeatable ReadVídeo Aula
-
16Eventual ConsistencyVídeo Aula
-
17ACID QuizQuestionário
Answer the following questions about ACID properties in databases
-
18How tables and indexes are stored on disk (MUST WATCH before continue)Vídeo Aula
This lecture details the inner working of database systems with regards to storage. It is a must watch to understand the difference between tables, pages, IO, rows, indexes and data files.
-
19Row-Based vs Column-Based DatabasesVídeo Aula
-
20Primary Key vs Secondary Key - What you probably didn't knowVídeo Aula
In this lecture I will discuss the difference between Primary Key and a Secondary Key and how it can affect your performance.
-
21Databases Pages (Article)Texto
-
22Create Postgres Table with a million Rows (from scratch)Vídeo Aula
Lots of you asked me how to create a table with millions of rows in postgres, here are the details
-
23Getting Started with IndexingVídeo Aula
-
24Understanding The SQL Query Planner and Optimizer with ExplainVídeo Aula
-
25Bitmap Index Scan vs Index Scan vs Table ScanVídeo Aula
In this video, I explain the benefits of Bitmap Index Scan and how it differs from Index scan and table sequential scan.
-
26Key vs Non-Key Column Database IndexingVídeo Aula
-
27Index Scan vs Index Only ScanVídeo Aula
-
28Combining Database Indexes for Better PerformanceVídeo Aula
-
29How Database Optimizers Decide to Use IndexesVídeo Aula
-
30Create Index Concurrently - Avoid Blocking Production Database WritesVídeo Aula
If you create an index on a large production table in postgres, the operations blocks writes in order to make sure to pull all the field entries to the index. However most of the time you can't afford to block writes on an active production database table. Postgres new feature which allows create index concurrently allows writes and reads to go in the expense of cpu/memory, time and chance for the index to be invalid. A small price to pay for fast production writes! https://www.postgresql.org/docs/9.1/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
-
31Bloom FiltersVídeo Aula
-
32Working with Billion-Row TableVídeo Aula
-
33Article - The Cost of Long running TransactionsTexto
-
34Article - Microsoft SQL Server Clustered Index DesignTexto
-
35Indexing QuizQuestionário
-
36B-Tree Section's Introduction & AgendaVídeo Aula
B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. However, most contents explain this data structure from a theoretical point of view, in this lecture I’d like to shed some light on the practical considerations of B-Tree and B+Trees in real production database systems such as Postgres and MySQL.
Link to the original paper https://infolab.usc.edu/csci585/Spring2010/den_ar/indexing.pdf
-
37Full Table ScansVídeo Aula
-
38Original B-TreeVídeo Aula
-
39How the Original B-Tree Helps PerformanceVídeo Aula
-
40Original B-Tree LimitationsVídeo Aula
B-Tree limitation
-
41B+TreeVídeo Aula
-
42B+Tree DBMS ConsiderationsVídeo Aula
-
43B+Tree Storage Cost in MySQL vs PostgresVídeo Aula
-
44B-Tree Section's SummaryVídeo Aula
-
45Introduction to Database PartitioningVídeo Aula
-
46What is Partitioning?Vídeo Aula
-
47Vertical vs Horizontal PartitioningVídeo Aula
-
48Partitioning TypesVídeo Aula
-
49The Difference Between Partitioning and ShardingVídeo Aula
-
50Preparing: Postgres, Database, Table, IndexesVídeo Aula
-
51Execute Multiple Queries on the TableVídeo Aula
-
52Create and Attach Partitioned TablesVídeo Aula
-
53Populate the Partitions and Create IndexesVídeo Aula
-
54Class Project - Querying and Checking the Size of PartitionsVídeo Aula
-
55The Advantages of PartitioningVídeo Aula
-
56The Disadvantages of PartitioningVídeo Aula
-
57Section Summary - PartitioningVídeo Aula
-
58How to Automate Partitioning in PostgresVídeo Aula
Assume you have a table that is partitioned on the customer_id field serial 32bit, and you want to partition by range, how do you create all the necessary partitions? this is what I discuss in this video
Source Code
https://github.com/hnasr/javascript_playground/tree/master/automate_partitions
-
59Introduction to Database ShardingVídeo Aula
-
60What is Database Sharding?Vídeo Aula
-
61Consistent HashingVídeo Aula
-
62Horizontal partitioning vs ShardingVídeo Aula
-
63Sharding with PostgresVídeo Aula
-
64Spin up Docker Postgres ShardsVídeo Aula
-
65Writing to a ShardVídeo Aula
-
66Reading from a ShardVídeo Aula
-
67Advantages of Database ShardingVídeo Aula
-
68Disadvantages of Database ShardingVídeo Aula
-
69Database Sharding Section SummaryVídeo Aula
-
70When Should you consider Sharding your Database?Vídeo Aula
-
71Shared vs Exclusive LocksVídeo Aula
In this lecture we explain the difference between exclusive (write locks) and shared locks (read locks)
-
72Dead LocksVídeo Aula
-
73Two-phase LockingVídeo Aula
-
74Solving the Double Booking Problem (Code Example)Vídeo Aula
In this video, I demonstrate how is it possible to get double booking in database-backed web applications and how to prevent double booking and race conditions with row-level locks.
Source Code https://github.com/hnasr/javascript_playground/tree/master/booking-system
-
75Double Booking Problem Part 2 ( Alternative Solution and explination)Vídeo Aula
-
76SQL Pagination With Offset is Very SlowVídeo Aula
In this video I’ll explain why you should avoid using SQL offset when implementing any kind of paging. I’ll explain what offset does, why is it slow and what is the alternative for better performance This video is inspired by Use the index luke, i’ll have a link to the blog and slides to learn more. Let say you have a web application with an API that supports paging, you user want to request 10 news articles in page 10, this is performed via a simple GET request as shown here The API server receives the GET request and builds the SQL in order to send it to the database hopefully a pool of connections exist here. Page 10 translates to offset 100 assuming each page has 10 records and now the database is ready to execute the query against the table. Offset by design means fetch and drop the first x number of rows, so in this case the database will fetch the first 110 rows and physically drop the first 100 leaving the limit of 10 which the user will get. As the offset increase, the database is doing more work which makes this operation extremely expensive. Furthermore, the problem with offset is you might accidentally read duplicate records. consider the user now want to read page 11 and meanwhile someone inserted a new row in the table, row 111 will be read twice Let us jump and test this against postgres
Use the Index Luke Blog https://use-the-index-luke.com/no-offset
Slides in this video https://payhip.com/b/B6o1
-
77Database Connection PoolingVídeo Aula
Connection pooling is a pattern of creating a pool of available connections (usually TCP) and allow multiple clients to share the same pool of connections. This pattern is usually used when connection establishment and tearing down is costly, and the server has a limited number of connections. In this video we will learn how to use connection pooling in NodeJs when working with a Postgres Database, we will learn how to spin up a pool of database connections and use stateless pool queries and transactional queries begin/end, and finally, we will
Node JS Source Code used in this lecture here https://github.com/hnasr/javascript_playground/tree/master/postgresnode-pool
-
78Introduction to Database ReplicationVídeo Aula
Scripts and commands
docker run --name pgmaster -v /Users/HusseinNasser/postgres/v/master_data:/var/lib/postgresql/data -p 5432:5432 -e POSTGRES_PASSWORD=postgres -d postgres
docker run --name pgstandby -v /Users/HusseinNasser/postgres/v/standby_data:/var/lib/postgresql/data -p 5433:5432 -e POSTGRES_PASSWORD=postgres -d postgres
In standby node update postgresql.conf
primary_conninfo = 'application_name=standby host=husseinmac port=5432 user=postgres password=postgres’
add file standby.signal
touch standby.signal
In master update postgresql.conf
first 1 (standby1)
select * from pg_stat_replication
-
79Master/Standby ReplicationVídeo Aula
-
80Multi-master ReplicationVídeo Aula
-
81Synchronous vs Asynchronous ReplicationVídeo Aula
-
82Replication Demo with Postgres 13Vídeo Aula
Scripts and commands
docker run --name pgmaster -v /Users/HusseinNasser/postgres/v/master_data:/var/lib/postgresql/data -p 5432:5432 -e POSTGRES_PASSWORD=postgres -d postgres
docker run --name pgstandby -v /Users/HusseinNasser/postgres/v/standby_data:/var/lib/postgresql/data -p 5433:5432 -e POSTGRES_PASSWORD=postgres -d postgres
In standby node update postgresql.conf
primary_conninfo = 'application_name=standby host=husseinmac port=5432 user=postgres password=postgres’
add file standby.signal
touch standby.signal
In master update postgresql.conf
first 1 (standby1)
select * from pg_stat_replication
If using the hostname doesn't work, use the IP address of the container itself. You can get the local IP address of the container by running docker inspect container name
-
83Pros and Cons of ReplicationVídeo Aula
