Transactional Replication on SQL Server: How it Works

One day at work, when I was still learning about replication, we set up a game:

I would create a transactional replication on a dummy database and make a job that inserts and updates entries into it.
The DBA and my manager would at random intervals break the replication setup in a different manner of ways, based on real life incidents they have experienced and damage controlled.
I would then deal with it, or find out the root cause and fix it, only for it to all happen again.

This was a great exercise to demonstrate how finnicky and multi-layered the architecture of SQL Server replication can become and through repetition gave me a thorough understanding of the bits and pieces that make up a SQL Server transactional replication system.

Terminology

Let's take a look:

The terminology for replication is pretty simple:

Publisher

This is the database that you are replicating, some people may call it the "master" database, though this is confusing when there is literally a master database in your server.

Subscriber

This is the database that is being replicated into, it is the end of the journey for the replicated objects.

🤩 Our Amazing Sponsors 👇

View Website

DigitalOcean offers a simple and reliable cloud hosting solution that enables developers to get their website or application up and running quickly.

View Website

Laravel News keeps you up to date with everything Laravel. Everything from framework news to new community packages, Laravel tutorials, and more.

View Website

A Laravel Starter Kit that includes Authentication, User Dashboard, Edit Profile, and a set of UI Components. Learn more about the DevDojo sponsorship program and see your logo here to get your brand in front of thousands of developers.

Article

An article is one unit of replication, in more detail, it is an object that is replicated (so, a table with a primary key, a stored procedure, a view, an index and so on)

Distributor

The distributor is a database that stores what's going to be replicated as a middle man for the publisher and subscriber, it holds what's been replicated and what is in the queue to BE replicated.

Each publisher has a distribution server, i.e a server configured to BE a distributor, so it has a distribution database (system databases in the object explorer). Then, each publisher can have many subscribers, as you can replicate one source to many databases if you feel like splitting up the articles that you're replicating (though that seems weird).

Alongside these are three main "agents" involved with SQL Replication, these all are actually executable files stored in various places that you specify and perform the tasks between the publisher, distributor and subscribers. It does this by using a SQL Login with sysadmin privileges that you specify in the replication wizards, you can pick different logins for different agents if you care to:

Snapshot Agent

This is an executable file that takes the articles you specify from the publisher and builds a cookbook for how the subscriber should be made, by that I mean, it makes the instructions for how the subscriber database should be formed and how to make it behave exactly like the publisher database. These are called snapshot files. The name makes sense, it is literally a "snapshot" of the publisher database that lets the subscriber have a starting point. From there, while it's alive and kicking, the snapshot agent records any synchronization between your subscriber and the distributor, basically saying "yeah, it's replicated this across, confirmed.".

Log Agent

Inside your publisher database is something called a transactional log. This records transactions that have occurred in your database (deletes, updates, inserts) on tables and stored procedures and so on. If one of those tables is a listed article for replication, that transaction is also "marked" for replication. The log agent, (which lives on your distribution server) takes these marked transactions and passes them to the distribution database, which it then treats as commands to be performed on your subscriber database (publisher did this, so you should do this too subscriber). The way it specifies what changed and how is through a row pointer, which is why for transactional replication you NEED a primary key on the tables, or it's not ordered and the log agent can't specify the row to be acted upon in the command queue.

Distribution Agent

This fella does two big things:

It initially sends the snapshot files to the subscriber to build the database that is going to be copied, so it cooks the database up from the recipe the snapshot agent provided.
It reads the queued commands the log reader agent has pushed to the distribution database and starts acting on them when it detects a change. The difference between a push replication and a pull replication is where this agent lives. If it's a push, it lives on the server the distribution database lives (so it pushes the commands to the subscriber). If it's a pull, it lives on the server the subscriber database lives, (so it requests the commands from the distribution database and acts on differences it sees in the subscriber).

The distribution agent takes the commands that are in the distribution database in either case, and converts the contents into parameters for a stored procedure dependant on the type of the transaction. You can see these in your subscriber databases stored procedures, they will have a name like: sp_MSdel<table_name> which specifies the article and the method it uses.

It then executes these stored procedures to imitate the publisher database, when it does so successfully, it logs to the distribution server that it's performed the task, and the snapshot agent logs a synchronization job (updating the snapshot files so the cookbook has more information in it than before).

Conclusion

Hopefully, this gives you a more human feel to how SQL Transactional Replication works under the hood, as well as the pipes that are constructed in the background. This becomes marginally easier to debug once you know which part is responsible for which thing. Example: The distribution agent is unhappy because of a primary key mismatch, this means that the primary key it collects for the stored procedure doesn't exist, or already exists on the replicated database (if it's a delete vs an update, for example).

Tails

Static

MarkdownX

Pines

Wave

Designer

Help & Support

Sponsorships

Terms of Service

Transactional Replication on SQL Server: How it Works

Terminology

Publisher

Subscriber

Article

Distributor

Snapshot Agent

Log Agent

Distribution Agent

Conclusion

Comments (0)

Tails

Static

MarkdownX

Pines

Wave

Designer

Join 129,851 other developers as we learn, build, and grow together.

Community

Rewards

Resources

Meet the Team

Help & Support

Sponsorships

Terms of Service

Transactional Replication on SQL Server: How it Works

Terminology

Publisher

Subscriber

Article

Distributor

Snapshot Agent

Log Agent

Distribution Agent

Conclusion

Comments (0)