Apple has released new documentation regarding the ability to block Applebot-Extended, allowing web publishers to opt out of having their website content used to train Apple’s foundation models for generative AI features. Apple emphasizes that it does not use private user data or interactions for training, relying instead on licensed materials and publicly available data.
Customizing Indexing Rules for Applebot
Applebot supports various robots meta tags in HTML documents to control indexing:
- noindex: Prevents the page from being indexed.
- nosnippet: Prevents generating a description or web answer for the page.
- nofollow: Prevents following any links on the page.
- none: Combines noindex, nosnippet, and nofollow.
- all: Allows indexing, snippet generation, and link following.
Multiple directives can be combined in a single meta tag using a comma-separated list or multiple meta tags.
Controlling Data Usage
Apple provides an additional user agent, Applebot-Extended, which gives web publishers more control over how their content is used. To opt out, add the following rule in robots.txt
:
User-agent: Applebot-Extended
Disallow: /private/
Applebot-Extended does not crawl webpages but determines how the data crawled by Applebot is used. Allowing Applebot-Extended can help improve Apple’s generative AI models.
About Search Rankings
Apple Search considers several factors for ranking web search results:
- Aggregated user engagement
- Relevancy and matching of search terms
- Number and quality of links
- User location-based signals
- Webpage design characteristics
For more details, check out Apple Documentation.