Real-time Gaming Leaderboard

Questions

What

What is the scoring system for the leaderboard? (Users get a point for each match they win)
What are the functional requirements?
- Display the top 10 players on the leaderboard
- Show a user’s specific rank
- Display players who are four places above and below a specific user (bonus)

How

How are ranks determined if two players have the same score? (Their ranks are the same; tie-breaking methods can be discussed if time allows)
How often are the leaderboard rankings updated? (Real-time or as close to real-time as possible)

Who

Who is included in the leaderboard? (All players)

When

When is the leaderboard reset? (A new leaderboard starts each month with a new tournament)

Why

Why do we need to display specific ranks and adjacent players? (To provide detailed user insights and engagement)

Introduction

Client Communication with the Leaderboard Service

Should the Client Talk Directly to the Leaderboard Service?

No, the client should not set scores directly.
Direct client communication with the leaderboard service is insecure and susceptible to man-in-the-middle attacks, allowing players to manipulate scores. Instead, scores should be set server-side for better security and integrity.

Alternative Design for Server-Authoritative Games:

Server-Authoritative Approach: In games like online poker, the game server handles all game logic and updates scores internally once the game finishes, without client intervention.

Implementing Server-Side Score Updates

Client-Server Communication:
- Clients interact with the game server to report game outcomes.
- The game server then updates the score based on its authoritative logic.
Game Server Responsibilities:
- Validate game results.
- Update player scores.
- Communicate with the leaderboard service to update rankings.

Message Queue Between Game Service and Leaderboard Service

Do We Need a Message Queue?

Yes, if the scores are used for multiple functionalities.

Benefits of Using a Message Queue (e.g., Kafka):

Decoupling: Allows the game service and leaderboard service to operate independently.
Scalability: Multiple services (leaderboard, analytics, notifications) can consume score updates.
Flexibility: Easy to add new consumers for different functionalities without changing the existing architecture.

Scenario Considerations:

Turn-Based or Multi-Player Games: Notifications to other players about score updates can be handled efficiently.
Analytics and Reporting: Real-time data analysis and reporting can be streamlined.

Data Models for Leaderboard Store

Relational Database Solution

A relational database system (RDS) can be used for a simple leaderboard solution when the scale is small and there are only a few users.
Each monthly leaderboard can be represented as a database table containing user_id and score columns. When a user wins a match, their score is either inserted (if they are new) or updated.

Schema:


CREATE TABLE leaderboard (
  user_id VARCHAR(255) PRIMARY KEY,
  score INT
);

Operations:

Insert a New User Score:


INSERT INTO leaderboard (user_id, score) VALUES ('mary1934', 1);

Update an Existing User Score:


UPDATE leaderboard SET score = score + 1 WHERE user_id = 'mary1934';

Fetch User Rank:


SET @rownum := 0;
SELECT (@rownum := @rownum + 1) AS rank, user_id, score
FROM leaderboard
ORDER BY score DESC;

Optimization: Adding an index on the score column and using the LIMIT clause can improve performance, but it does not scale well for large datasets.

Optimized Query:


SET @rownum := 0;
SELECT (@rownum := @rownum + 1) AS rank, user_id, score
FROM leaderboard
ORDER BY score DESC
LIMIT 10;

Challenges:

Performance: Sorting and ranking millions of rows takes a long time.
Real-Time Queries: Relational databases struggle with the high load of real-time read queries.
Table Scans: Finding a user’s rank requires a table scan, which is not efficient.

Redis Solution for Leaderboard System

To handle a high-scale leaderboard system efficiently, Redis offers an in-memory data store with a data type called sorted sets, which is ideal for leaderboard operations.

What are Sorted Sets?

Sorted Sets: In Redis, a sorted set is a collection of unique elements, each associated with a score. Elements are ordered by their score. The main data structures used internally are a hash table and a skip list. The hash table maps users to scores, and the skip list allows fast search operations.

Implementation with Redis Sorted Sets

Redis Commands and Their Use Cases

ZADD: Add a new user or update the score of an existing user.
```
ZADD leaderboard_feb_2021 <score> <user>
```
- Time Complexity: O(log(n))
ZINCRBY: Increment the score of a user by a specified amount.
```
ZINCRBY leaderboard_feb_2021 1 'mary1934'
```
- Time Complexity: O(log(n))
ZRANGE / ZREVRANGE: Fetch a range of users sorted by score.
```
ZREVRANGE leaderboard_feb_2021 0 9 WITHSCORES
```
- Time Complexity: O(log(n) + m), where m is the number of entries to fetch.
ZRANK / ZREVRANK: Fetch the rank of a user.
```
ZREVRANK leaderboard_feb_2021 'mary1934'
```
- Time Complexity: O(log(n))

Workflow with Sorted Sets

User Scores a Point
- When a user wins a match, increment their score by 1 in the monthly leaderboard.
```
ZINCRBY leaderboard_feb_2021 1 'mary1934'
```
Fetch Top 10 Global Leaderboard
- Retrieve the top 10 users with the highest scores.
```
ZREVRANGE leaderboard_feb_2021 0 9 WITHSCORES
```
- Output: Returns a list of users and their scores in descending order.
```
[(user2, score2), (user1, score1), (user5, score5), ...]
```
Fetch User’s Rank
- Retrieve the rank of a specific user.
```
ZREVRANK leaderboard_feb_2021 'mary1934'
```
Fetch Relative Position
- To get the relative position of a user along with a few ranks above and below, use ZREVRANGE.
```
ZREVRANGE leaderboard_feb_2021 357 365
```

Advantages of Using Redis Sorted Sets

Performance: Redis sorted sets provide logarithmic time complexity for insertion, deletion, and search operations.
Real-Time Updates: Fast read and write operations due to in-memory storage.
Automatic Ordering: Elements are automatically ordered by score, eliminating the need for complex sorting queries.

Example Use Cases

Incrementing User Scores:
- A user wins a match:
```
ZINCRBY leaderboard_feb_2021 1 'mary1934'
```

Fetching Top 10 Users:

Retrieve the top 10 users:


ZREVRANGE leaderboard_feb_2021 0 9 WITHSCORES

Getting User Rank:
- Find the rank of ‘mary1934’:
```
ZREVRANK leaderboard_feb_2021 'mary1934'
```
Getting Relative Position:
- Find 4 users above and below ‘mallow007’ if their rank is 361:
```
ZREVRANGE leaderboard_feb_2021 357 365
```

Summary

Relational Database: Suitable for small datasets but faces performance issues with large, real-time datasets.
Redis: Ideal for high-performance, real-time leaderboard operations, handling large query volumes efficiently.
NoSQL: Provides scalability and flexibility but requires additional logic for ranking and real-time updates.

Storage Requirement for Redis Leaderboard

User ID and Score Storage Calculation:

User ID: 24-character string
Score: 16-bit integer (2 bytes)
Storage per entry: 26 bytes

Worst-Case Scenario:

Monthly Active Users (MAU): 25 million
Total storage required: 26 bytes * 25 million = 650 million bytes (~650 MB)

Overhead Consideration:

Accounting for the overhead of the skip list and hash table used by Redis sorted sets, we can double the memory usage estimate.
Total estimated memory: ~1300 MB (~1.3 GB)

CPU and I/O Usage

Peak QPS: 2500 updates/sec
Redis can handle this load comfortably within the performance envelope of a single server.

Redis Persistence

Persistence Concern: Redis nodes might fail, leading to data loss.
Solution: Redis supports persistence, but restarting a large instance from disk can be slow.
High Availability: Configure Redis with a read replica. If the main instance fails, promote the read replica and attach a new read replica.

Supporting Tables in Relational Database (MySQL)

User Table: Stores user ID and user’s display name.
Point Table: Stores user ID, score, and timestamp of the game win.

Schema Example:


CREATE TABLE user (
  user_id VARCHAR(255) PRIMARY KEY,
  display_name VARCHAR(255)
);
 
CREATE TABLE point (
  user_id VARCHAR(255),
  score INT,
  timestamp DATETIME,
  PRIMARY KEY (user_id, timestamp),
  FOREIGN KEY (user_id) REFERENCES user(user_id)
);

Performance Optimization

Top 10 Players Cache: Create an additional cache for user details of the top 10 players.
Memory Usage: This optimization involves a small amount of data and is primarily for performance enhancement.

Summary

Redis Storage: The worst-case storage requirement for Redis is ~1.3 GB, which can be handled by a modern Redis server.
High Availability: Use Redis persistence and read replicas to ensure data durability and quick failover.
Supporting RDBMS: Use a relational database to store detailed user and game history, enabling the recreation of the Redis leaderboard if necessary.
Performance Optimization: Maintain a small cache for frequently accessed data, such as the top 10 players.

Options for Deploying the Leaderboard System

When considering deployment options for the leaderboard system, you have two primary approaches: managing your own services or leveraging cloud infrastructure. Each has its advantages and challenges.

Option 1: Manage Our Own Services

In this approach, we handle all aspects of the leaderboard infrastructure, including Redis and MySQL databases.

Setup:

Leaderboard Data: Create a sorted set in Redis each month to store leaderboard data (user IDs and scores).
User Details: Store user details (names, profile images) in MySQL.

Workflow:

Scoring a Point: Update the sorted set in Redis.
Fetching the Leaderboard: Retrieve leaderboard data from Redis, then query MySQL for user details.

Optimizations:

User Profile Cache: Implement a cache for top users’ profiles to reduce the load on MySQL during leaderboard queries.

Pros:

Full control over infrastructure.
Can be fine-tuned for specific requirements.

Cons:

Requires significant operational overhead (setup, scaling, maintenance).
Managing high availability and fault tolerance adds complexity.

Option 2: Build on the Cloud

Leveraging cloud infrastructure simplifies many operational challenges. Here, we assume the use of AWS.

Components:

Amazon API Gateway: Manages HTTP endpoints for the RESTful API.
AWS Lambda: Serverless functions to handle logic without managing servers.
AWS ElastiCache: Managed Redis service for leaderboard storage.
Amazon RDS: Managed MySQL database for user details.

Workflow:

Scoring a Point:
1. The game client calls the API Gateway.
2. API Gateway invokes a Lambda function.
3. Lambda updates the sorted set in ElastiCache (Redis).
4. Lambda updates MySQL with the new score entry.
Fetching the Leaderboard:
1. The game client calls the API Gateway.
2. API Gateway invokes a Lambda function.
3. Lambda retrieves the top scores from ElastiCache.
4. Lambda queries MySQL for corresponding user details.
5. API Gateway returns the aggregated results to the client.

Pros:

Serverless: No need to manage infrastructure; auto-scaling is handled by AWS.
High Availability: Built-in fault tolerance and automatic backups.
Scalability: Scales automatically with user growth.
Cost-Efficient: Pay only for what you use with AWS Lambda.

Cons:

Cost: Potentially higher ongoing costs compared to self-managed infrastructure for large-scale operations.
Vendor Lock-In: Dependency on AWS services.

Design Diagrams

Use Case 1: Scoring a Point

Client calls API Gateway endpoint to score a point.
API Gateway invokes the scoring Lambda function.
Lambda function updates Redis sorted set (ElastiCache).
Lambda function updates user score in MySQL (RDS).

Use Case 2: Retrieving the Leaderboard

Client calls API Gateway endpoint to retrieve the leaderboard.
API Gateway invokes the leaderboard Lambda function.
Lambda function retrieves the top scores from Redis (ElastiCache).
Lambda function queries MySQL (RDS) for user details.
Aggregated results are returned to the client via API Gateway.

Recommendation

For a system expected to scale significantly, such as one for a popular travel site or a large gaming platform, leveraging a serverless approach on a cloud provider like AWS is highly recommended. This approach benefits from:

Ease of Scaling: Automatic scaling to handle high traffic loads.
Reduced Operational Overhead: Managed services reduce the need for manual maintenance.
High Availability and Fault Tolerance: Built-in features of AWS services ensure data durability and uptime.

Given these advantages, using AWS Lambda, Amazon API Gateway, ElastiCache, and RDS is a robust and scalable solution for the leaderboard system.

Scaling Redis for High DAU

With the anticipated growth to 500 million Daily Active Users (DAU), we need to ensure that our Redis-based leaderboard system can handle the increased load and storage requirements.

Data Sharding

We consider two primary methods for sharding data in Redis: fixed partitions and hash partitions.

Fixed Partition

Concept:

Divide the range of scores into fixed intervals.
Each interval (or range) is stored in a separate Redis shard.

Implementation:

Assume the monthly scores range from 1 to 1000.
Divide into 10 shards, each covering a range of 100 scores (e.g., 1-100, 101-200, etc.).

Workflow:

Insert/Update:
- Determine the user’s shard based on their score.
- Update the score in the appropriate shard.
- Use a secondary cache to track the user’s current score and corresponding shard.
- If the user’s score crosses into a new range, move the user to the new shard.
Fetch Top 10 Players:
- Query the top 10 players from the shard with the highest score range.
Fetch User Rank:
- Calculate the user’s rank within their shard.
- Add the number of users with higher scores from other shards.

Pros:

Predictable distribution of data.
Simplified shard management.

Cons:

Need to adjust score ranges if the distribution is uneven.
Complexity in moving users between shards when their scores change.

Hash Partition

Concept:

Use Redis Cluster to automatically shard data across multiple nodes based on hash slots.
Each key is assigned to a hash slot using the formula CRC16(key) % 16384.

Implementation:

Redis Cluster manages data distribution across nodes.
Nodes contain specific ranges of hash slots.

Workflow:

Insert/Update:
- Use the CRC16 function to determine the hash slot and corresponding node.
- Update the score in the appropriate shard.
Fetch Top 10 Players:
- Collect top players from each shard and sort them at the application level.
Fetch User Rank:
- More complex due to distributed nature; requires additional aggregation logic.

Pros:

Automated sharding and rebalancing.
Easy to add/remove nodes.

Cons:

High latency for retrieving large sets of data (e.g., top K players).
Complex rank calculation.

Conclusion:

Fixed partitions are preferred due to their predictability and easier management of shard boundaries.

Sizing a Redis Node

Memory Considerations:

Allocate twice the amount of memory needed for write-heavy applications to accommodate snapshots and potential failures.
With 500 million DAU, estimated storage is 65 GB for leaderboard data.
Factor in overheads, replication, and snapshots: ~130 GB per node.

Performance Benchmarking:

Use redis-benchmark to simulate load and test performance.
Ensure the setup can handle the peak QPS of 250,000 updates/second.

Setup:

Configure Redis with sufficient memory to handle data and replication.
Implement read replicas to ensure high availability and quick failover.

Alternative Solution: NoSQL

To ensure scalability and performance, especially with high Daily Active Users (DAU), we can consider using NoSQL databases such as DynamoDB, Cassandra, or MongoDB. Here, we use DynamoDB as an example due to its managed nature and scalability features.

Key Properties of the Ideal NoSQL Solution

Optimized for Writes: Handle high write throughput efficiently.
Efficient Sorting: Sort items within the same partition by score.
Scalability: Automatically handle increased load.

DynamoDB Implementation

Initial Design

Leaderboard Table: Contains denormalized leaderboard data, including user IDs, scores, and any other relevant information for rendering the leaderboard.
Issues: Scanning the entire table to find top scores is inefficient as the number of users increases.

Adding Indexes

To improve efficiency, we can add indexes:

Partition Key: game_name#{year-month}
Sort Key: score

This setup works initially but creates a hot partition problem as all data for a month is stored in a single partition, leading to performance bottlenecks.

Write Sharding

To distribute the data more evenly:

Partition Key: game_name#{year-month}#p{partition_number}
Sort Key: score

This setup splits data into multiple partitions, mitigating the hot partition issue.

Global Secondary Index (GSI)

Partition Key: game_name#{year-month}#p{partition_number}
Sort Key: score

Example Workflow

User Scores a Point

Write Sharding: Determine the partition number using user_id % number_of_partitions.

DynamoDB Command:


PutItem {
  TableName: "Leaderboard",
  Item: {
    "partition_key": "chess#2021-04#p1",
    "user_id": "user123",
    "score": 10
  }
}

Fetch Top 10 Global Leaderboard

Scatter-Gather Pattern:
- Fetch top 10 scores from each partition.
- Merge results and sort them.

DynamoDB Command:


Query {
  TableName: "Leaderboard",
  IndexName: "GameScoreIndex",
  KeyConditionExpression: "partition_key = :partition_key",
  ExpressionAttributeValues: {
    ":partition_key": "chess#2021-04#p1"
  },
  Limit: 10,
  ScanIndexForward: false
}

Determining the Number of Partitions

Benchmarking: Determine the optimal number of partitions based on DAU and write volume.
Trade-offs: More partitions reduce the load on each but increase read complexity.

Handling Relative User Ranking

While exact user ranking is complex, estimating percentiles can be practical:

Cron Job Analysis: Periodically analyze score distribution.

Caching Percentiles:

Example Distribution:


10th percentile = score < 100
20th percentile = score < 500
...
90th percentile = score < 6500