Emote Usage Tracker: Designing the Database
Introduction
The Emote Usage Tracker project came about as a sort-of resurrection of a similar project from several years ago. The goal in both cases was to provide Twitch streamers with usage metrics for emotes they chose to make available to their audience, whether provided by Twitch itself, or third-party providers whose emotes are made visible to those with the appropriate browser extensions or apps installed.
In the previous iteration of the project back in 2021, everything was written in TypeScript, and I used the document database MongoDB. It worked well enough in practice, but this time around I wanted to use a database more appropriate to the relational nature of the data involved.
Over the years since that first attempt at this project, I had expanded my skills beyond developing exclusively in JavaScript/TypeScript. I picked up Java, and had lately been spending more time with C#/.NET. Eager to strengthen my .NET abilities, this seemed like an ideal project in which to use it.
I opted to use PostgreSQL as it’s a reliable, open-source database. In combination with the .NET library Npgsql for database access, it promised to provide both comfort in development and solid performance in production.
Database Schema Overview
At the heart of the database schema are the tables for emotes
, channels
, and users
. The channel_emotes
and global_emotes
tables are used to keep track of the active emotes in each channel. The emote_usages
table effectively represents the overall usage of an emote by a user in a channel. The daily_emote_usage_snapshots
table is able to provide usage over time metrics.
There are also a few tables that store row counts from other tables in order to improve query performance: channel_emote_counts
, global_emote_counts
, and table_row_counts
.
Finally, a custom enum type for provider_name
is defined in the database.
Detailed Table Descriptions
Let’s take a closer look at the individual table definitions and their purpose.
Emotes
The emotes
table uses a SERIAL
primary key, with a compound unique constraint on the fields provider_name
and provider_emote_id
.
The provider_name
field uses a custom enum type by the same name defining the emote providers supported by the application: Twitch
, FFZ
, BTTV
, and SevenTV
.
Global and Channel Emotes
The global_emotes
and channel_emotes
tables are used as an extension of the emotes
table to determine which emotes are being actively tracked in a channel.
The global_emotes
table has a relation to the emotes
table by referencing the emote_id
, which is unique in both tables.
The channel_emotes
table likewise has a relation to the emotes
table, and a relation to the channels
table by referencing the channel_id
. There is a compound unique constraint on emote_id
and channel_id
.
Both tables have an is_active
field, which is a flag whose value depends not only on the emote still being available from its provider, but also on whether it takes priority over other emotes with the same name.
The emote prioritization approach taken mimics the emote rendering decision-making of the chat client Chatterino. Twitch
global and channel emotes take the utmost priority. Twitch
globals are always PascalCase, while channel emotes are always camelCase, so naming conflicts are not a concern in that respect. Any third-party provider’s channel emote takes priority over any third-party global emote, and third-party channel emotes are prioritized like so: BTTV
→ FFZ
→ SevenTV
.
The is_active
field of global and channel emotes is maintained by the Bot during periodic syncing of emote data from the providers’ APIs and the database.
Channels and Users
The channels
table holds the details of each Twitch channel. Each row corresponds to a channel, identified by its channel_id
. Other columns include channel_name
, channel_display_name
, channel_description
, and profile_image_url
. These other columns are primarily used to enrich frontend views of the data, though the channel_name
is used by the Bot’s IRC client when joining channels.
A channel’s non-key column values are kept up-to-date by periodic retrievals by the Bot of their metadata from the Twitch API.
The users
table stores information about each user. Each row corresponds to a user, identified by its user_id
. Other columns include user_name
, user_display_name
, and user_color
. As with the channels
table, these other columns are used in views.
A user’s non-key column values are updated via the Bot as part of its regular emote usage tracking behavior. This is made possible by requesting Twitch IRC Tags capabilities, which provides enriched chat messages to the client with detailed information about the user.
Emote Usages and Daily Emote Usage Snapshots
The emote_usages
table relates to the emotes
, channels
, and users
table by their respective IDs, and there is a compound unique constraint on these fields. This effectively means that emote usage is tracked on a per-emote, per-channel, and per-user basis, and usage counts persist across emote name, channel name, and user name changes.
The decision to track usage by emote_id
was made early in development. In discussions with Twitch partner broadcasters, a preference was shown that emote usage be tracked by the specific emote, rather than by an emote’s name in a channel. In practice, this means that emotes with the same name, no matter how similar in appearance, will be tracked separately unless they have the same provider_name
and provider_emote_id
.
The emote_usages
table also has the fields usage_count
, holding the all-time usage total of an emote in a channel by a user, and last_modified
, a timestamp of its most recent usage.
The closely related daily_emote_usage_snapshots
table exists to provide usage-over-time metrics. It has its own snapshot_id
as a primary key, and a snapshot_date
field, which is simply the date without a time component.
Daily Snapshot Mechanism
Determining the granularity of usage-over-time metrics came again from consulting with the project’s expected users: Twitch partner broadcasters who place a high value on viewer engagement via emotes, and who desired greater insights into their audience’s usage of their emotes.
It was agreed that usage on the order of a day was sufficient to provide useful data, as streams generally happened on a daily basis, and decisions around adding or removing emotes tended to happen on a similar basis.
To generate the rows in daily_emote_usage_snapshots
, the job scheduling tool pgAgent is used.
Every day at midnight, pgAgent runs the following:
INSERT INTO daily_emote_usage_snapshots (channel_id, user_id, emote_id, snapshot_date, usage_count)
SELECT
e.channel_id,
e.user_id,
e.emote_id,
CURRENT_DATE,
e.usage_count
FROM
emote_usages e
LEFT JOIN
(SELECT
channel_id,
user_id,
emote_id,
MAX(snapshot_date) as max_snapshot_date
FROM
daily_emote_usage_snapshots
GROUP BY
channel_id,
user_id,
emote_id
) s
ON
s.channel_id = e.channel_id
AND s.user_id = e.user_id
AND s.emote_id = e.emote_id
WHERE
s.max_snapshot_date IS NULL
OR e.last_modified > s.max_snapshot_date;
This query effectively finds the most recent snapshot related to each record in the emote_usages
table and, if the usage_count
has changed in the meantime, or if no related snapshot exists yet, a new snapshot is created with the current usage and the current date.
Given that this job is run at midnight, an emote usage snapshot is therefore the usage_count
as it was at the start of the snapshot_date
.
With these snapshots, it becomes relatively easy to query for usage-over-time in a performant way.
Challenges and Solutions
Evaluating an Event-Driven Approach
Providing usage-over-time data was a feature that had been lacking in the previous iteration of this project years before, and one that I very much wanted to implement this time around.
I thought about attempting an event-driven design, but eventually decided against that.
One consideration was the need for relational integrity. Ensuring consistency and accuracy by enforcing data integrity through constraints is easily managed by using PostgreSQL.
Event-based databases excel at handling real-time data, but present challenges for analytical queries that span across time. In this project, it’s not enough to simply collect the emote usage data, but it must also be able to frequently perform complex queries to analyze usage trends over time.
An event-based system geared for speed and scalability of write operations becomes cumbersome when there’s a need to compare usage between different channels, users, and time periods. PostgreSQL’s support for complex joins and window functions were very useful when creating queries for such data.
PostgreSQL, being a robust relational database with a mature ecosystem and extensive documentation, made it the ideal solution to reach for when designing this project’s database.
Entity Framework: Yes or No?
I had used .NET’s Entity Framework in the past on smaller projects. It’s comfortable to use, but comes with a performance cost.
The decision against using it in favor of Npgsql likely resulted in some small performance gains, but my primary motivation was to build experience writing the SQL myself. In that respect I feel that this choice has paid off, and I’m much more comfortable in that domain than I had been prior to this project’s development.
Avoiding Costly Table Scans
There are times, typically when presenting overviews, when a count of a table’s rows is necessary. On a channel card component, for example, the number of active channel emotes from each provider is a nice thing to be able to show the user.
Providing these sorts of row counts typically involves a full table scan, which is sub-optimal. In cases where it’s not necessary to have perfectly accurate counts, as in displaying the number of active emotes on a channel summary card, it’s preferable to store these counts in a separate table, which is updated periodically.
The global_emote_counts
and channel_emote_counts
tables keep track of the number of active emotes. It’s updated as part of the Bot’s syncing of provider emotes with the database, which generally occurs daily for global emotes, and hourly for each channel’s emotes.
The table_row_counts
table maintains a count of the total number of emotes, channels, and users tracked by the application. It’s updated using the following pgAgent job, scheduled to run hourly:
UPDATE table_row_counts SET row_count = (SELECT COUNT(*) FROM users) WHERE table_name = 'users';
UPDATE table_row_counts SET row_count = (SELECT COUNT(*) FROM channels) WHERE table_name = 'channels';
UPDATE table_row_counts SET row_count = (SELECT COUNT(*) FROM emotes) WHERE table_name = 'emotes';
Conclusion
This project having been operating in production for over a month at this point, I can say that I’m pleased with the choices I made when designing the database.
PostgreSQL has been perfectly capable of doing everything this project demands, and is performant enough that someone who doesn’t already have an advanced knowledge of query writing can build something that is reliable in a production environment.