Apple has updated its Applebot documentation to clarify the distinction between the standard Applebot crawler and the Applebot-Extended crawler. Although Applebot-Extended was introduced a year ago, the update emphasizes how blocking it affects Applebot's functionality. Applebot collects data that aids in training Apple foundation models for generative AI features across various Apple products.
To prevent content from being used for training generative models, web publishers can block Applebot-Extended by adding a directive in the robots.txt
file. Allowing Applebot in robots.txt
ensures that website content is discoverable through Apple services like Spotlight, Siri, and Safari.
Identifying Applebot
Traffic from Applebot can be identified using reverse DNS in the *.applebot.apple.com
domain or by matching IP addresses with CIDR prefixes from the Applebot IP CIDRs JSON file. The host
command can verify if an IP belongs to Applebot.
User Agents
Applebot utilizes various user agents, including:
- Search: Identified by a user-agent string containing "Applebot."
- Apple Podcasts: Identified by the user-agent "iTMS," which does not follow
robots.txt
as it only crawls registered content.
Customizing robots.txt Rules
Applebot respects standard robots.txt
directives. For example, it will not crawl documents under /private/
or /not-allowed/
if specified. If robots.txt
does not mention Applebot but includes Googlebot, Applebot will adhere to Googlebot's instructions.
Rendering and Indexing
Applebot may render website content, so blocking resources like JavaScript and CSS in robots.txt
can hinder proper rendering. To ensure optimal indexing, all necessary resources should be accessible to Applebot.
Customizing Indexing Rules
Applebot supports robots meta tags in HTML documents. Key directives include:
noindex
: Prevents indexing and visibility in Spotlight or Siri.nosnippet
: Disallows generation of descriptions.nofollow
: Prevents following links.none
: Combines all restrictions.all
: Allows indexing and snippet generation.
Multiple directives can be combined in a single meta tag.
Applebot-Extended and Data Usage Control
Applebot-Extended offers web publishers additional control over their content's usage in training AI models. By disallowing Applebot-Extended in robots.txt
, publishers can opt out of their content being used for this purpose. However, disallowing it does not prevent the content from appearing in search results.
Search Rankings
Apple Search rankings may consider factors such as user engagement, relevancy of search terms, link quality, user location signals, and webpage design characteristics, without predetermined importance for each factor. Users are subject to the privacy policy governing Siri Suggestions and Search.