News Feed
Questions
What
- What platforms does the app support? (Both mobile and web)
- What are the important features? (User can publish a post and see friends’ posts on the news feed)
- What content can the feed contain? (Images, videos, and text)
How
- How is the news feed sorted? (Reverse chronological order)
Who
- Who are the users and their connections? (Each user can have up to 5000 friends)
When
- When is the feed content updated? (Not explicitly mentioned but implied to be in real-time or near real-time)
Overview
Design Overview: Feed Publishing and News Feed Building
Feed Publishing:
- When a user publishes a post:
- Data is written into the cache and database.
- The post is populated to the user’s friends’ news feeds.
News Feed Building:
- News feed is built by aggregating friends’ posts in reverse chronological order.
Newsfeed APIs:
- Primary methods for clients to interact with servers.
- HTTP-based APIs for actions such as posting a status, retrieving the news feed, adding friends, etc.
Key APIs:
-
Feed Publishing API:
- Method:
POST - Endpoint:
/v1/me/feed - Parameters:
content: Text of the post.auth_token: Used to authenticate API requests.
Example:
POST /v1/me/feed Content-Type: application/json { "content": "This is my new post!", "auth_token": "your_auth_token_here" } - Method:
-
Newsfeed Retrieval API:
- Method:
GET - Endpoint:
/v1/me/feed - Parameters:
auth_token: Used to authenticate API requests.
Example:
GET /v1/me/feed Authorization: Bearer your_auth_token_here - Method:
News feed publisher
Flow Summary:
-
Feed Publishing:
- User publishes a post.
- Data written to cache and database.
- Post is added to friends’ news feeds.
-
News Feed Retrieval:
- User requests their news feed.
- Server aggregates friends’ posts in reverse chronological order.
- News feed is returned to the user.
Components
Web Servers
- Functions:
- Communicate with clients.
- Enforce authentication and rate-limiting.
- Only allow users with valid
auth_tokento make posts. - Limit the number of posts a user can make within a certain period to prevent spam and abuse.
Fanout Service
-
Definition: The process of delivering a post to all friends.
-
Models:
- Fanout on Write (Push Model):
- Workflow: News feed is pre-computed during write time, delivering new posts to friends’ cache immediately after publishing.
- Pros:
- Real-time news feed generation.
- Fast fetching of news feed due to pre-computation.
- Cons:
- Slow and time-consuming for users with many friends (hotkey problem).
- Wastes computing resources on inactive users.
- Fanout on Read (Pull Model):
- Workflow: News feed is generated during read time, pulling recent posts when a user loads their home page.
- Pros:
- Efficient for inactive users.
- Avoids hotkey problem.
- Cons:
- Slower news feed fetching due to on-demand generation.
- Fanout on Write (Push Model):
-
Hybrid Approach:
- Strategy: Combine benefits of both models and mitigate pitfalls.
- Implementation:
- Use push model for most users to ensure fast news feed fetching.
- Use pull model for celebrities or users with many friends/followers to avoid system overload.
- Use consistent hashing to distribute requests/data evenly and reduce hotkey problem.
Fanout Service Workflow
-
Fetch Friend IDs:
- Retrieve from the graph database, which manages friend relationships and recommendations.
-
Get Friends Info:
- Fetch from the user cache.
- Filter friends based on user settings (e.g., muted friends, selective sharing).
-
Message Queue:
- Send friends list and new post ID to the message queue.
-
Fanout Workers:
- Fetch data from the message queue.
- Store news feed data in the news feed cache as
<post_id, user_id>mappings.
-
Cache Management:
- Store only IDs in the cache to minimize memory consumption.
- Set a configurable limit to keep the memory size manageable.
- Focus on storing latest content due to low likelihood of users scrolling through thousands of posts.
Example Structure of News Feed Cache:
- News Feed Table:
- Format:
<post_id, user_id> - Only IDs stored to reduce memory usage.
- Configurable limit to maintain manageable memory size and low cache miss rate for recent content.
- Format:
News feed retriever/building
Workflow Summary:
- User sends a request to
/v1/me/feed. - Load balancer directs the request to an available web server.
- Web server requests the news feed from the news feed service.
- News feed service retrieves post IDs from the news feed cache.
- Service fetches additional data (user info, post content, media links) from user and post caches.
- The news feed includes more than just post IDs; it also includes:
- Username
- Profile picture
- Post content
- Post images/videos, etc.
- Fetches complete user and post objects from user cache and post cache to build the fully hydrated news feed.
- Media Content Storage (images, videos, etc.) should be from CDN
- Constructs the complete news feed with all necessary details.
- Returns the JSON-formatted news feed to the client.
Potential Cache Layers in a News Feed System
By dividing the cache tier into these specific layers, the news feed system can efficiently handle a high volume of requests, maintain fast access to critical data, and ensure that the user experience remains smooth and responsive.
-
News Feed Layer:
- Purpose: Stores IDs of news feeds.
- Description: This layer holds the identifiers of posts that make up a user’s news feed.
-
Content Layer:
- Purpose: Stores every post’s data.
- Description: Includes the full content of each post.
- Hot Cache: Popular content is stored here for faster access.
-
Social Graph Layer:
- Purpose: Stores user relationship data.
- Description: Manages and caches the relationships between users (e.g., friends, followers).
-
Action Layer:
- Purpose: Stores information about user interactions with posts.
- Description: Includes data on whether a user liked a post, replied to it, or took other actions.
-
Counters Layer:
- Purpose: Stores counters for various metrics.
- Description: Includes counts for likes, replies, followers, following, and other interactions.