SQL Server Replication Monitoring Scripts

In this post I’m sharing scripts and tips that will help you monitor, identify and troubleshoot SQL Server Replication latency and issues.

The following is included in this article:
1. Check Latency in Log Reader Agent.
2. Check Latency in Distribution Agent.
3. Check Undistributed Commands.
4. List All Publications and Subscriptions.
5. Troubleshooting Replication Issues.

Having SQL scripts to hand for checking SQL Server Replication issues is key for your tool-belt as a DBA. It’s our job to ensure Replication is flowing without errors, and the data is accurate/up-to-date on the Replication Subscribers.

if you need more information and understanding on SQL Server Replication, I recommend you read this recent Microsoft article:
Effectively troubleshoot latency in SQL Server Transactional replication
Or watch the YouTube Video.

1. Check Latency in Log Reader Agent

This script monitors the Log Reader Agent activity in SQL Server replication, providing details on its status, duration, errors, and performance metrics such as:

Status: Current status of the agent (e.g., Start, Succeed, In Progress).
Latency: Delivery latency (time delay in ms between transaction generation and delivery).
Transaction Stats: Delivered transactions, commands, and average delivery rate.
Error Information: Details of any replication errors encountered.

It helps identify issues early in the replication process, such as delays in reading the transaction log, which could eventually affect the Distribution Agent and overall replication latency.

-- Query to monitor log reader agent activity and its status, duration, and errors in SQL Server replication.
USE distribution;
GO
SELECT a.name AS agent_name, 
       CASE [runstatus]  
              WHEN 1 THEN 'Start' 
              WHEN 2 THEN 'Succeed' 
              WHEN 3 THEN 'In progress' 
              WHEN 4 THEN 'Idle' 
              WHEN 5 THEN 'Retry' 
              WHEN 6 THEN 'Fail' 
       END AS Status,
      [start_time], 
      h.[time], -- Time the message is logged. 
      [duration], -- Duration in seconds. 
      [comments], 
      h.[xact_seqno], -- Last processed transaction sequence number. 
      [delivery_time], -- Time first transaction delivered. 
      [delivered_transactions], -- Total transactions delivered. 
      [delivered_commands], -- Total commands delivered. 
      [average_commands], -- Avg. commands delivered. 
      [delivery_rate], -- Avg. commands per second. 
      [delivery_latency], -- Latency in ms. 
      [error_id], -- ID of the error in MSrepl_error table. 
      e.error_text -- Error text. 
  FROM [distribution].[dbo].[MSlogreader_history] h 
  JOIN MSlogreader_agents a 
  ON a.id = h.agent_id 
  LEFT JOIN MSrepl_errors e 
  ON e.id = h.error_id 
ORDER BY h.time DESC;

2. Check Latency in Distribution Agent

This script monitors the Distribution Agent activity in SQL Server transactional replication, focusing on latency and delivery metrics. It provides the following details:

Status: Current status of the Distribution Agent (e.g., Start, Succeed, In Progress).
Latency: Delivery latency in milliseconds, including current delivery latency since the last entry and overall delivery latency.
Transaction Stats: Delivered transactions, delivered commands, and average commands delivered.
Delivery Rate: Average commands per second delivered by the agent.
Error Information: Error ID and text from the MSrepl_errors table for any issues encountered.

This script helps track the Distribution Agent’s performance, measure latency, and identify any issues or delays in delivering transactions to Subscribers.

-- Query to monitor distribution agent activity, status, delivery metrics, and errors in SQL Server replication.
USE distribution;
go
SELECT a.name AS agent_name, 
       CASE [runstatus]  
            WHEN 1 THEN 'Start' 
            WHEN 2 THEN 'Succeed' 
            WHEN 3 THEN 'In progress' 
            WHEN 4 THEN 'Idle' 
            WHEN 5 THEN 'Retry' 
            WHEN 6 THEN 'Fail' 
       END AS Status,
      [start_time], 
      h.[time], -- Time the message is logged. 
      [duration], -- Duration in seconds. 
      [comments], 
      h.[xact_seqno], -- Last processed transaction sequence number. 
      [current_delivery_rate], -- Avg. commands/sec since last entry. 
      [current_delivery_latency], -- Latency in ms since last entry. 
      [delivered_transactions], -- Total transactions delivered. 
      [delivered_commands], -- Total commands delivered. 
      [average_commands], -- Avg. commands delivered. 
      [delivery_rate], -- Avg. commands/sec. 
      [delivery_latency], -- Latency in ms. 
      [total_delivered_commands], -- Total commands delivered since creation. 
      [error_id], -- Error ID in MSrepl_error table. 
      e.error_text -- Error text. 
  FROM MSdistribution_history h 
  JOIN MSdistribution_agents a 
  ON a.id = h.agent_id 
  LEFT JOIN MSrepl_errors e 
  ON e.id = h.error_id 
  ORDER BY h.time DESC;

3. Check Undistributed Commands

This script checks for undistributed commands in SQL Server replication by querying the MSdistribution_status table.

It provides the following details:
Undistributed Commands: The count of commands that have not yet been distributed.
Publication Name: The name of the publication.
Subscriber Database: The database on the Subscriber.

USE Distribution;
GO

SELECT 
    publication AS PublicationName,
    subscriber_db AS SubscriberDatabase,
    COUNT(*) AS UndistributedCommands
FROM 
    dbo.MSdistribution_status
GROUP BY 
    publication, subscriber_db
ORDER BY 
    UndistributedCommands DESC;

A high number of undistributed commands may indicate issues with the distribution agent or network performance.

4. List All Publications and Subscriptions

If you want to see all active publications and subscriptions in your environment, this script helps track the replication topology by listing all publications and their subscribers.

-- List Publications
USE [YourPublicationDatabase];
GO

SELECT 
    name AS PublicationName,
    publication_type AS PublicationType,
    allow_push AS PushEnabled,
    allow_pull AS PullEnabled,
    allow_anonymous AS AllowAnonymous
FROM 
    sys.publications;
GO

-- List Subscriptions
USE Distribution;
GO

SELECT 
    publication AS PublicationName,
    subscriber_db AS SubscriberDatabase,
    subscriber AS SubscriberServer,
    subscription_type AS SubscriptionType,
    status AS SubscriptionStatus
FROM 
    dbo.MSsubscriptions;
GO

Alternatively, you can use this script that helps display your Replication Topology with much more info, including Subscription configuration options and Articles with size info. You run it against your Distribution Database (and, if applicable, the Publisher database when the Distributor also acts as the Publisher) to gather information about the replication topology. It helps us understand the replication structure and displays important details needed to diagnose and troubleshoot complex replication issues.

5. Troubleshooting Replication Issues

To resolve replication issues, follow these steps:

1. Use Replication Monitor: Start by checking Replication Monitor. It provides a real-time view of your replication environment, showing the status of agents, latency, and undistributed commands. This tool is invaluable for quickly identifying where issues may lie.

2. Check Agent Status: If the Log Reader Agent or Distribution Agent is failing or showing issues, you can stop and restart the agent jobs. Sometimes, simply retrying the jobs can resolve transient connection issues or allow replication to go through successfully.

3. Undistributed Commands: High numbers of undistributed commands may indicate a distribution issue. Check the MSdistribution_status table to see if commands are backing up. If this is the case, investigate the Distribution Agent’s performance and check for network or resource issues.

4. Tailor Replication Options: For some issues, you may need to adjust replication options, such as using the SkipErrors option to bypass specific errors temporarily. However, be cautious with these settings as they might mask underlying issues.

5. Review System Resources: If replication is lagging, check system resources like CPU, memory, and disk usage. Resource contention on the Publisher, Distributor, or Subscriber can slow down replication and cause delays.

For more detailed troubleshooting, including specific steps for different error scenarios, refer to the Microsoft Troubleshooting Replication Tutorial I recommended above in this post. If you’ve made it this far, the tips in that article and video will equip you with all the knowledge you need to resolve this. Also, we should consider contacting Microsoft Support during this process.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *