OpenAI's Batch API allows users to send grouped requests at half the cost, ideal for tasks that can wait up to 24 hours. It's useful when immediate responses aren't necessary or when rate limits hinder executing many queries quickly.
Batch API is beneficial for scenarios like clustering SEO Keywords, Product Feed Optimization, and Document Summarization. It works with all OpenAI’s models, including GPT-3.5-turbo-16k and GPT-4-turbo.
To use the Batch API, prepare a batch file in JSONL format. For example, to cluster over 150,000 keywords, split this into two requests due to the GPT-4 turbo model's context limit of 128k tokens. Create a list of requests, each formatted as a dictionary containing your SEO keywords.
requests = [
{
"custom_id": "keyword_cluster_1",
"method": "POST",
"url": "/v1/completions",
"body": {
"model": "gpt-4-turbo",
"prompt": f"{prompt} ['SEO', 'optimization', 'Google ranking']",
},
},
{
"custom_id": "keyword_cluster_2",
"method": "POST",
"url": "/v1/completions",
"body": {
"model": "gpt-4-turbo",
"prompt": f"{prompt} ['backlinks', 'page authority', 'domain score']",
},
},
]
Convert your data to a JSONL file and upload it using the file API to obtain a file ID.
import json
from openai import OpenAI
client = OpenAI()
with open('seo_keywords.jsonl', 'w') as file:
for request in requests:
file.write(json.dumps(request) + '\n')
with open('seo_keywords.jsonl', 'rb') as file:
batch_input_file = client.files.create(file=file, purpose='batch')
Submit the batch with one line of code and check the status and retrieve results after 12 to 18 hours.
batch = client.batches.create(input_file_id=batch_input_file.id, endpoint="/v1/completions", completion_window="24h")
status = client.batches.retrieve(batch.id)
if status['status'] == 'completed':
output = client.files.content(batch.output_file_id)
print(output)
The Batch API is a cost-effective tool for managing large-scale SEO tasks like keyword clustering, saving time and reducing costs.