Clustering SEO Keywords using OpenAI Batch API, Saving 50% in Cost

May 06, 2024 at 11:03:29 AM

TL;DR OpenAI's Batch API allows grouped requests at half the cost, ideal for non-immediate tasks. It's useful for large-scale SEO tasks like keyword clustering. The API works with all OpenAI’s models, including GPT-3.5-turbo-16k and GPT-4-turbo. A Python guide demonstrates how to use the Batch API for clustering SEO keywords, including preparing a batch file in JSONL format, uploading the file, and retrieving results.

Clustering SEO Keywords using OpenAI Batch API, Saving 50% in Cost

OpenAI's Batch API allows users to send grouped requests at half the cost, ideal for tasks that can wait up to 24 hours. It's useful when immediate responses aren't necessary or when rate limits hinder executing many queries quickly.

Batch API is beneficial for scenarios like clustering SEO Keywords, Product Feed Optimization, and Document Summarization. It works with all OpenAI’s models, including GPT-3.5-turbo-16k and GPT-4-turbo.

To use the Batch API, prepare a batch file in JSONL format. For example, to cluster over 150,000 keywords, split this into two requests due to the GPT-4 turbo model's context limit of 128k tokens. Create a list of requests, each formatted as a dictionary containing your SEO keywords.

requests = [
 {
 "custom_id": "keyword_cluster_1",
 "method": "POST",
 "url": "/v1/completions",
 "body": {
 "model": "gpt-4-turbo",
 "prompt": f"{prompt} ['SEO', 'optimization', 'Google ranking']",
 },
 },
 {
 "custom_id": "keyword_cluster_2",
 "method": "POST",
 "url": "/v1/completions",
 "body": {
 "model": "gpt-4-turbo",
 "prompt": f"{prompt} ['backlinks', 'page authority', 'domain score']",
 },
 },
]

Convert your data to a JSONL file and upload it using the file API to obtain a file ID.

import json
from openai import OpenAI
client = OpenAI()
with open('seo_keywords.jsonl', 'w') as file:
 for request in requests:
 file.write(json.dumps(request) + '\n')
with open('seo_keywords.jsonl', 'rb') as file:
 batch_input_file = client.files.create(file=file, purpose='batch')

Submit the batch with one line of code and check the status and retrieve results after 12 to 18 hours.

batch = client.batches.create(input_file_id=batch_input_file.id, endpoint="/v1/completions", completion_window="24h")
status = client.batches.retrieve(batch.id)
if status['status'] == 'completed':
 output = client.files.content(batch.output_file_id)
 print(output)

The Batch API is a cost-effective tool for managing large-scale SEO tasks like keyword clustering, saving time and reducing costs.