Bing Enhances Search with LLM and SLM Models Optimized by TensorRT-LLM

December 18, 2024 at 5:34:30 AM

TL;DR Bing is enhancing search technology by transitioning to Large Language Models (LLMs) and Small Language Models (SLMs). LLMs can be costly and slow, while SLMs provide a 100x throughput improvement. Nvidia TensorRT-LLM optimizes SLM performance, reducing latency and operational costs by 57%. This integration results in faster, more accurate search results without sacrificing quality. Bing remains committed to advancing search technology and improving user experience.

Bing Enhances Search with LLM and SLM Models Optimized by TensorRT-LLM

Bing is advancing its search technology by integrating Large Language Models (LLMs) and Small Language Models (SLMs) to enhance search capabilities. The complexity of search queries has prompted the need for more efficient models. While LLMs are powerful, they can be costly and slow. In contrast, SLMs provide approximately 100x throughput improvement over LLMs, allowing for more precise processing of search queries.

Optimizing with TensorRT-LLM

To tackle latency and cost challenges associated with larger models, Bing has incorporated Nvidia TensorRT-LLM into its workflow, optimizing SLM inference performance. This optimization is particularly evident in the Deep search product, which utilizes SLMs to deliver optimal web results. The process involves understanding user intent and ensuring the relevance of results, with a focus on balancing speed and quality. TensorRT-LLM reduces model inference time, enhancing user experience without compromising result quality.

Before optimization, the original Transformer model had a 95th percentile latency of 4.76 seconds per batch and a throughput of 4.2 queries per second. After implementing TensorRT-LLM, latency improved to 3.03 seconds per batch, and throughput increased to 6.6 queries per second, resulting in a 57% reduction in operational costs.

Optimization Technique

The SmoothQuant technique, introduced in a research paper, allows inference using INT8 for both activations and weights while maintaining accuracy. TensorRT-LLM includes scripts for preprocessing model weights to utilize this method effectively.

Benefits for Users

The transition to SLM models and TensorRT-LLM integration offers several advantages:

  • Faster Search Results: Users experience quicker response times.
  • Improved Accuracy: Enhanced SLM capabilities provide more accurate and contextualized results.
  • Cost Efficiency: Reduced operational costs enable continued investment in innovations.

Looking Ahead

Bing is committed to refining its search technology and enhancing user experience through the ongoing development of LLM and SLM models, along with TensorRT-LLM integration. Future advancements are anticipated, promising to further push the boundaries of search technology.

Have more questions on this topic? Ask our AI assistant for in-depth insights.

Read more from sources 👇

The Only Digital Marketing Feed You'll Ever Need.

Stay informed your way. Tailored updates when and how you want them. 100% Free.

10,000+ Users

500+ Sources

1000+ Tools

Or

Related Posts

	Bing removes cache link from search results following Google's lead

Bing removes cache link from search results following Google's lead

Bing
Bing

Official Source

Official Source

Bing is a Official Source. The source has been verified by Swipe Insight team.

Official Source
 Microsoft Launches Private Preview of Copilot in Bing Webmaster Tools

Microsoft Launches Private Preview of Copilot in Bing Webmaster Tools

Microsoft
Microsoft

Official Source

Official Source

Microsoft is a Official Source. The source has been verified by Swipe Insight team.

Official Source
Microsoft Misses Q2 Expectations Despite 15% Increase in Revenue

Microsoft Misses Q2 Expectations Despite 15% Increase in Revenue

Tired of spending too much time creating audits for your clients?

Tired of spending too much time creating audits for your clients?

Featured
Reddit Now Blocks Bing from Crawling Its Site, Grants Exclusive Access to Google Trending ️‍🔥

Reddit Now Blocks Bing from Crawling Its Site, Grants Exclusive Access to Google

Bing SEO +1 more
Bing Launches New AI-Generated Search Results Trending ️‍🔥

Bing Launches New AI-Generated Search Results

Microsoft
Microsoft

Official Source

Official Source

Microsoft is a Official Source. The source has been verified by Swipe Insight team.

Official Source
Microsoft Tests Tag-Based Bing Search Filters

Microsoft Tests Tag-Based Bing Search Filters

Bing Introduces Option to Disable AI Copilot Responses in Search

Bing Introduces Option to Disable AI Copilot Responses in Search

Related Tools

Marketing Auditor logo

Marketing Auditor

Verified Tool

Verified Tool

Marketing Auditor is a Verified Tool. Want to get this badge? Contact us.

Verified Tool

Automated audits for Google Ads and Analytics.

Get Featured Here

Showcase your tool in this list.

Contact Us