Emote Usage Tracker: The Data Class Library

Published: Feb 26, 2024

Introduction

The Emote Usage Tracker project is composed of multiple .NET applications, including the Twitch Bot that parses emote usage from chat messages, and the API used by the website to fetch usage data.

The Role of the Data Class Library

To keep things DRY and maintain consistency across applications, the Data class library exists to define and provide access to the data project-wide.

Core Components

The data layer consists of several core components, including APIs, Models, DTOs, Repositories, and Services.

Application Programming Interfaces (APIs)

The data layer provides its own internal API classes for each third-party API used by this project. Their methods return plain JSON, and its the reponsibility of the API consumer to process that further into data transfer objects, models, etc.

Each API class uses an HttpClient to make requests to the endpoints of the third-party API, handling authentication (if necessary), and parsing the reponses using Newtonsoft’s JSON library.

Models

Models are used to represent the database entities and their relationships, and are closely aligned with the database schema itself.

For example, here is the schema for the users table:

…and here is the corresponding model in the data class library:

namespace Data.Models;

public class User
{
    public required string Id { get; set; }
    public required string Name { get; set; }
    public string? DisplayName { get; set; }
    public string? Color { get; set; }
}

Note that, while the schema uses field names like user_id, the corresponding property in the model class is simply Id. While it’s been helpful in this project to have detailed table field names when writing complex database queries, such verbosity is less useful for the purpose of the models, and opting for simplified names is a valid design choice.

Data Transfer Objects (DTOs)

Commonly initialized as “DTO”, a Data Transfer Object is a simple object used to transfer data between layers of an application. In the case of this project, DTOs are used to transfer data from third-party APIs to services that may then create instances of models. They’re also used in this project’s API to send data to the frontend application.

Unlike Models, DTOs should not have any business logic themselves. They are simple structures meant purely for data transfer.

Here, for example, is the DTO used for transferring user data from the Twitch API:

using Newtonsoft.Json;

namespace Data.Dtos;

public class TwitchApiUserDto
{
    [JsonProperty("id")]
    public required string Id { get; set; }

    [JsonProperty("login")]
    public required string Name { get; set; }

    [JsonProperty("display_name")]
    public string? DisplayName { get; set; }

    [JsonProperty("description")]
    public string? Description { get; set; }

    [JsonProperty("profile_image_url")]
    public string? ProfileImageUrl { get; set; }
}

The attributes like [JsonProperty("login")] are used by Newtonsoft’s JSON library to map the C# properties to JSON properties (in this case, the C# property Name to the JSON property login).

Here is an example of how this TwitchApiUserDto might be used in practice to parse a JSON response from the Twitch API:

JToken twitchUserJson = await _twitchApi.GetUser(userId);
var userDto = twitchUserJson.ToObject<TwitchApiUserDto>();

Repositories

Repositories act as an access layer between the models and the database entities they represent. By exposing methods that handle database operations, a repository provides the rest of the application a way to interact with the data persisted in the database without being concerned with the specifics of how such operations are implemented.

The public methods defined by a repository’s interface tend to describe CRUD operations (create, read, update, delete), though it’s not necessary to implement all of these, nor is it required that the interface be strictly limited to such operations.

For example here is the interface for the Channel repository:

public interface IChannelRepository
{
    Task<bool> ExistsByIdAsync(string channelId, NpgsqlTransaction? transaction = null);
    Task<Channel?> GetByIdAsync(string channelId, NpgsqlTransaction? transaction = null);
    Task<Channel?> GetByNameAsync(string channelName, NpgsqlTransaction? transaction = null);
    Task CreateOrUpdateAsync(Channel channel, NpgsqlTransaction? transaction = null);
}

Note that there is no method for deleting a channel, since there isn’t currently a use for that functionality in any application using the repository. Also, there is the method to check whether or not a channel exists, which traditionally would not be considered one of the core CRUD operations.

Each repository extends a BaseRepository class. This base class holds a reference to the NpgsqlDataSource and provides higher-order functions used by the derived classes to execute database operations:

using Npgsql;

namespace Data.Repositories;

public abstract class BaseRepository(NpgsqlDataSource dataSource)
{
    private readonly NpgsqlDataSource _dataSource = dataSource;

    protected async ValueTask<NpgsqlConnection> GetConnectionAsync()
    {
        return await _dataSource.OpenConnectionAsync();
    }

    // For operations that return a value
    protected async Task<T> ExecuteDbOperationAsync<T>(
        Func<NpgsqlConnection, NpgsqlTransaction?, Task<T>> dbOperation,
        NpgsqlTransaction? transaction = null)
    {
        if (transaction != null)
        {
            if (transaction.Connection == null || transaction.Connection.State != System.Data.ConnectionState.Open)
            {
                throw new InvalidOperationException("The connection associated with the provided transaction is either null or not open.");
            }
            return await dbOperation(transaction.Connection, transaction);
        }
        else
        {
            using var conn = await GetConnectionAsync();
            return await dbOperation(conn, null);
        }
    }

    // For operations that do not return a value
    protected async Task ExecuteDbOperationAsync(
        Func<NpgsqlConnection, NpgsqlTransaction?, Task> dbOperation,
        NpgsqlTransaction? transaction = null)
    {
        if (transaction != null)
        {
            if (transaction.Connection == null || transaction.Connection.State != System.Data.ConnectionState.Open)
            {
                throw new InvalidOperationException("The connection associated with the provided transaction is either null or not open.");
            }
            await dbOperation(transaction.Connection, transaction);
        }
        else
        {
            using var conn = await GetConnectionAsync();
            await dbOperation(conn, null);
        }
    }
}

The value of having the ExecuteDbOperationAsync method is in not having to conditionally check whether the transaction argument is null in each repository method.

Here is an example of the ExecuteDbOperationAsync method in use:

public class UserRepository(NpgsqlDataSource dataSource)
    : BaseRepository(dataSource),
    IUserRepository
{
    public async Task<User?> GetByIdAsync(string userId, NpgsqlTransaction? transaction = null)
    {
        return await ExecuteDbOperationAsync(async (conn, tran) =>
        {
            User? user = null;

            var query = @"
                SELECT *
                FROM users
                WHERE user_id = @user_id
                LIMIT 1
            ";

            using var cmd = new NpgsqlCommand(query, conn, tran);
            cmd.Parameters.AddWithValue("user_id", userId);

            using var reader = await cmd.ExecuteReaderAsync();
            if (await reader.ReadAsync())
            {
                user = new User
                {
                    Id = reader.GetString(reader.GetOrdinal("user_id")),
                    Name = reader.GetString(reader.GetOrdinal("user_name")),
                    DisplayName = reader.IsDBNull(reader.GetOrdinal("user_display_name")) ? null : reader.GetString(reader.GetOrdinal("user_display_name")),
                    Color = reader.IsDBNull(reader.GetOrdinal("user_color")) ? null : reader.GetString(reader.GetOrdinal("user_color"))
                };
            }

            return user;
        }, transaction);
    }

    // ...other methods
}

It’s necessary to include an argument for an NpgsqlTransaction in repository methods so that integration tests may be written that use them, given that each test must roll back that transaction, effectively isolating the test. Note that it may very well be that non-test code never takes advantage of the ability to pass a transaction.

Services

The purpose of services in the data layer are to encapsulate the business logic of the project. Services are ideal for transforming data, orchestrating calls to multiple repositories, and other complex processing.

For example, when the application needs to create an instance of the Channel model from a TwitchApiUserDto, it can use this mapper class:

using Data.Dtos;
using Data.Models;

namespace Data.Services.Mappers;

public static class ChannelMapper
{
    public static Channel MapToChannel(TwitchApiUserDto userDto)
    {
        return new Channel
        {
            Id = userDto.Id,
            Name = userDto.Name,
            DisplayName = userDto.DisplayName,
            Description = userDto.Description,
            ProfileImageUrl = userDto.ProfileImageUrl
        };
    }
}

There are also services to abstract away emote fetching logic for each emote provider. These provider emote services implement the following interface:

using Data.Models;
using Npgsql;

namespace Data.Services.ProviderEmoteServices;

public interface IProviderEmoteService
{
    Task<IEnumerable<ChannelEmote>?> GetChannelEmotes(string channelId, NpgsqlTransaction? transaction = null);
    Task<IEnumerable<GlobalEmote>?> GetGlobalEmotes(NpgsqlTransaction? transaction = null);
}

Such services can then be used by yet other services, such as the BotEmoteService when it needs to get the active global emotes from all supported providers.

Here is a utility service DatabaseHealthService that provides a way to check whether the application can connect to the database:

using Microsoft.Extensions.Logging;
using Npgsql;

namespace Data.Services;

public interface IDatabaseHealthService : IDisposable
{
    Task CheckDatabaseConnectionAsync();
}

public class DatabaseHealthService(NpgsqlDataSource dataSource) : DatabaseHealthService
{
    private readonly NpgsqlDataSource _dataSource = dataSource;

    protected async ValueTask<NpgsqlConnection> GetConnectionAsync()
    {
        return await _dataSource.OpenConnectionAsync();
    }

    public async Task CheckDatabaseConnectionAsync()
    {
        using var conn = await GetConnectionAsync();
        using var cmd = conn.CreateCommand();
        cmd.CommandText = "SELECT 1";
        await cmd.ExecuteScalarAsync();
    }
}

Integration with Npgsql & PostgreSQL

I opted to use Npgsql to act as the data provider to access the PostgreSQL database server.

Applications that need to access the database via the Data class library must provide an NpgsqlDataSource. This is done by creating one from a connection string defined in the application’s configuration.

In the Bot application, the NpgsqlDataSource is provided using dependency injection like this:

using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Npgsql;

namespace Bot;

class Program
{
    static async Task Main(string[] args)
    {
        var hostBuilder = Host.CreateDefaultBuilder(args)
            .ConfigureServices((hostContext, services) =>
            {
                var configuration = hostContext.Configuration;
                var connectionString = configuration.GetConnectionString("postgresql")
                    ?? throw new Exception("Missing ConnectionString__postgresql in configuration!");

                services.AddSingleton(provider => NpgsqlDataSource.Create(connectionString));
                services.AddSingleton<IUserRepository, UserRepository>();
                // ...other services
            });
        var host = hostBuilder.Build();
        await host.RunAsync();
    }
}

The example above is a simplified version of the Bot’s Program class, but shows what’s fundamentally necessary for the application to use the Data class library’s UserRepository, which itself depends on having an instance of NpgsqlDataSource injected during construction.

Testing

For testing, I opted to use XUnit and Moq. Both are popular and easy to use.

In order to perform integration tests, it was necessary to create a testing database. I created another Dockerized instance of PostgreSQL locally, separate from the one used in the developer environment.

I then created a separate project in my .NET solution that runs the same initial setup and migrations that were performed on the production database.

Since each test should be as isolated as possible, each integration test executes its database commands as part of a transaction, which is then rolled back after each test. The database itself effectively persists no data generated by the tests.

In order to provide the necessary NpgsqlDataSource to the test classes, a DatabaseFixture was created:

using Npgsql;

namespace Data.Tests.Fixtures;

public class DatabaseFixture : IDisposable
{
    public NpgsqlDataSource DataSource { get; private set; }

    public DatabaseFixture()
    {
        EnumMapping.Initialize();
        var connectionString = "Host=localhost;Port=5433;Username=testing;Password=testing;Database=test;Include Error Detail=True";
        DataSource = NpgsqlDataSource.Create(connectionString);
    }

    public void Dispose()
    {
        DataSource.Dispose();
        GC.SuppressFinalize(this);
    }
}

Note the inclusion in the testing database connection string of "Add Error Detail=True". Enabling this parameter provides details that may contain sensitive data, which can be useful during testing, but should be used with caution. Refer to the Npgsql documentation on connection string security.

The DatabaseFixture is used in the test class like so:

using Data.Repositories;
using Data.Tests.Fixtures;
using Npgsql;
using Xunit;

namespace Data.Tests.Repositories;

public class UserRepositoryIntegrationTests : IClassFixture<DatabaseFixture>, IDisposable
{
    private readonly DatabaseFixture _fixture;
    private readonly NpgsqlTransaction _transaction;
    private readonly NpgsqlConnection _connection;

    public UserRepositoryIntegrationTests(DatabaseFixture fixture)
    {
        _fixture = fixture;
        _connection = _fixture.DataSource.OpenConnection();
        _transaction = _connection.BeginTransaction();
    }

    [Fact]
    public async Task GetByNameAsync_ReturnsNull_WhenUserDoesNotExist()
    {
        // Arrange
        var repository = new UserRepository(_fixture.DataSource);

        // Act
        var user = await repository.GetByNameAsync("user", _transaction);

        // Assert
        Assert.Null(user);
    }

    // ...other tests

    public void Dispose()
    {
        // Cleanup
        _transaction.Rollback();
        _connection.Close();
        GC.SuppressFinalize(this);
    }
}

Best Practices & Lessons Learned

There were definitely many challenges I encountered while putting together the Data layer for the Emote Usage Tracker project. What follows are just a few of them.

Enum Mapping

Included in my database schema is a custom enum type ProviderName:

Mapping that field to its corresponding .NET model only worked when I realized the PgNameAttribute from NpgsqlDataTypes had to be used:

using NpgsqlTypes;

namespace Data.Models;

public enum ProviderName
{
    [PgName("Twitch")]
    Twitch,
    [PgName("FFZ")]
    FFZ,
    [PgName("BTTV")]
    BTTV,
    [PgName("SevenTV")]
    SevenTV
}

Furthermore, for quite some time I had resorted to using an obsolete approach to applying the enum mapping in my project. When I first wrote the repository code, I was creating database connections within each method, using the connection string passed as an argument to the repository constructor. Due to this, I was having to call the Initialize method from the following class as early as possible in my program’s Main method:

using Data.Models;
using Npgsql;

namespace Data
{
    public class EnumMapping
    {
        public static void Initialize()
        {
#pragma warning disable CS0618 // Type or member is obsolete
            NpgsqlConnection.GlobalTypeMapper.MapEnum<ProviderName>("provider_name");
#pragma warning restore CS0618 // Type or member is obsolete
        }
    }
}

While this worked, the fact that I was disabling warnings of obsolescence was obvious enough indication that there had to be a better approach, and indeed there is as of Npgsql 7.0.

Once I had started injecting the NpgsqlDataSource as a dependency to my repositories (replacing the connection string argument), I no longer had to use the obsolete enum mapping approach. Instead, I can do this when registering services:

services.AddSingleton(provider =>
{
    var dataSourceBuilder = new NpgsqlDataSourceBuilder(connectionString);
    dataSourceBuilder.MapEnum<ProviderName>("provider_name");

    return dataSourceBuilder.Build();
});

…and when using an NpgsqlDataReader in the repository methods, a field of type ProviderName can be read like so:

reader.GetFieldValue<ProviderName>(reader.GetOrdinal("provider_name"))

NpgsqlTransaction: Null or Not?

When I started writing integration tests involving the repositories, I realized the need for them to accept an NpgsqlTransaction argument so that the operations performed could be rolled back after each test. The problem with this was that, if the transaction was null, I’d then need to create a connection rather than get it from the transaction itself.

What followed was a lot of private methods with “Internal” appended to their names that would take an NpgsqlConnection as an argument. The public method would then be responsible for checking whether or not the transaction argument was null, and getting a connection accordingly before passing it to the “Internal” method.

After an embarassing amount of redundant code had been written, I eventually came to my senses and added the higher-order function ExecuteDbOperationAsync to the base repository (the code for which is included earlier in this post).

Cleaning up that mess of “Internal” methods afterwards felt exceptionally satisfying.

Where Do the Services Go?

This is actually a challenge I’m currently deliberating over.

I have in the Data class library a set of services that only the Bot application uses. I’ve written integration tests for these services, and it’s comfortable in that respect to have them located in Data.Tests.

The problem is that there is tight coupling between these services and Bot, so shouldn’t they be located in Bot, and their tests in Bot.Tests?

On the contrary, these services are mostly responsible for interactions between the Bot and the database, so maybe they do belong in the Data layer.

As far as I’m aware, the performance differences of taking one approach over the other are negligible.

As I continue to develop this project, and I come to a decision on this point, I’ll update this post here accordingly.

Conclusion

The development of the Emote Usage Tracker’s Data Class Library has been a practical exercise in applying .NET best practices to a real-world project.

The integration with Npgsql and PostgreSQL, alongside a structured testing environment, has led to a robust and reliable data layer.

I’ve learned a lot about project architecture, writing tests, and database operations.

There was so much to write about for this post that I didn’t even get into the database queries themselves, so all that SQL will have to wait for the next post.