LinkedIn has introduced a semantic capability in its content search engine to improve search results for complex queries. This enhancement addresses the limitations of the previous system, which struggled with queries that used natural language or included complex concepts. The new system aims to provide high-quality, engaging posts by optimizing two key metrics: on-topic rate and long-dwells.
Objectives
- On-topic rate: Measures the percentage of posts that are well-written and answer the query.
- Long-dwells: Measures the time spent by the searcher on each returned post, indicating engagement.
High-Level Design
The content search engine consists of two layers:
- Retrieval Layer: Selects a few thousand candidate posts from billions of posts.
- Multi-Stage Ranking Layer: Scores these candidate posts in two stages and returns a ranked list.
Retrieval Layer
The retrieval layer includes two retrievers:
- Token-Based Retriever (TBR): Selects posts containing the exact keywords from the query.
- Embedding-Based Retriever (EBR): Uses a two-tower AI model to select posts based on semantic matching. This model pre-computes post embeddings and stores them for efficient retrieval.
Multi-Stage Ranking Layer
This layer scores fewer posts in real-time using a complex model that allows interactions between query and post features. The ranking is done in two stages:
- L1 Ranking Stage: Uses a simple model to score and filter posts.
- L2 Ranking Stage: Uses a complex model to score the filtered posts and prepare the final search results.
Models and Features
- On-topicness Prediction Model: Uses query and post text embeddings to produce an on-topicness score.
- Long-Dwell Prediction Model: Uses a variety of features, including query text, post text, searcher and author features, to produce a long-dwell score.
Efficient Serving
To ensure low latency, several optimizations are made:
- Limiting the number of posts scanned during the approximate nearest neighbor search.
- Precomputing text embeddings of all posts and storing them in a key-value store.
The new content search engine has improved the on-topic rate and long-dwells by more than 10%, leading to increased engagement on LinkedIn.
LinkedIn plans to evolve the on-topic rate metric to better capture the quality expectations for various types of queries. This will involve leveraging large language models (LLMs) in the ranking layer.