How To Audit AMP Using The SEO Spider
This tutorial walks you through how you can use the Screaming Frog SEO Spider to audit Accelerated Mobile Pages (AMP) quickly and efficiently. The SEO Spider uses the official AMP validator to allow bulk validation of URLs.
To get started, you’ll need to download the SEO spider which is free in lite form, for up to 500 URLs. You can download via the buttons in the right hand side bar. Crawling AMP URLs via the rel=”amphtml” link tag, requires paid access. However, you can upload a list of AMP URLs in the free version and analyse and validate them as well.
The SEO Spider will find AMP URLs, report on common SEO issues and validate them by checking on required HTML mark-up, prohibited HTML elements as per the specifications and more.
You have two options to analyse and validate AMP, which you can skip to the relevant section by clicking on your preference below –
Crawl A Site To Audit AMP
This section of the guide shows how to set-up a crawl to discover AMP URLs, audit and validate them.
1) Enable ‘Crawl’ and ‘Store’ AMP under ‘Config > Spider > Crawl’
2) Crawl The Website
Open up the SEO Spider, type or copy in the website you wish to crawl in the ‘enter url to spider’ box and hit ‘Start’.
The website will be crawled and AMP URLs will be discoverd via any rel=”amphtml” link tags within the HTML. Wait until the crawl finishes and reaches 100%.
3) View The AMP Tab
The AMP tab will show any AMP URLs discovered. It has has 17 filters (as shown in the image below) that help you identify common SEO or validation issues.
15 of the filters are available to view immediately during or at the end of a crawl. However, a couple of the filters require calculation at the end of the crawl via post ‘ Crawl Analysis ‘ for them to be populated with data (more on this in just a moment).
The right hand ‘overview’ pane, displays a ‘(Crawl Analysis Required)’ message against filters that require post crawl analysis to be populated with data.
4) Click ‘Crawl Analysis > Start’ To Populate AMP Filters
To populate these two AMP filters you simply need to click a button to start crawl analysis .
However, if you have configured ‘Crawl Analysis’ previously, you may wish to double check, under ‘Crawl Analysis > Configure’ that ‘AMP’ is ticked.
You can also untick other items that also require post crawl analysis to make this step quicker.
When crawl analysis has completed the ‘analysis’ progress bar will be at 100% and the filters will no longer have the ‘(Crawl Analysis Required)’ message.
5) Click ‘AMP’ & View Populated Filters
After performing post crawl analysis, all AMP filters will now be populated with data where applicable. In the example below, some of the AMP URLs are ‘non-200 responses’, which are in this case, 404 errors.
You’re able to filter by the following SEO related items –
- Non-200 Response – The AMP URLs do not respond with a 200 ‘OK’ status code. These will include URLs blocked by robots.txt, no responses, redirects, client and server errors.
- Missing Non-AMP Return Link – The canonical non-AMP version of the URL, does not contain a rel=”amphtml” URL back to the AMP URL. This could simply be missing from the non-AMP version, or there might be a configuration issue with the AMP canonical.
- Missing Canonical to Non-AMP – The AMP URLs canonical does not go to a non-AMP version, but to another AMP URL.
- Non-Indexable Canonical – The AMP canonical URL is a non-indexable page. Generally the non-AMP equivalent should be an indexable page.
- Indexable – The AMP URL is indexable. AMP URLs with a non-AMP equivalent should be non-indexable (as they should have a canonical to the non-AMP equivalent). Standalone AMP URLs (without an equivalent) should be indexable.
- Non-Indexable – The AMP URL is non-indexable. This is usually because they are correctly canonicalised to the non-AMP equivalent.
The following filters help identify common issues relating to AMP specifications . The SEO Spider uses the official AMP Validator for validation of AMP URLs.
- Missing HTML AMP Tag – AMP HTML documents must contain a top-level HTML or HTML AMP tag.
- Missing/Invalid Doctype HTML Tag – AMP HTML documents must start with the doctype, doctype HTML.
- Missing Head Tag – AMP HTML documents must contain head tags (they are optional in HTML).
- Missing Body Tag – AMP HTML documents must contain body tags (they are optional in HTML).
- Missing Canonical – AMP URLs must contain a canonical tag inside their head that points to the regular HTML version of the AMP HTML document, or to itself if no such HTML version exists.
- Missing/Invalid Meta Charset Tag – AMP HTML documents must contain a meta charset=”utf-8″ tag as the first child of their head tag.
- Missing/Invalid Meta Viewport Tag – AMP HTML documents must contain a meta name=”viewport” content=”width=device-width,minimum-scale=1″ tag inside their head tag. It’s also recommended to include initial-scale=1.
- Missing/Invalid AMP Script – AMP HTML documents must contain a script async src=”https://cdn.ampproject.org/v0.js” tag inside their head tag.
- Missing/Invalid AMP Boilerplate – AMP HTML documents must contain the AMP boilerplate code in their head tag.
- Contains Disallowed HTML – This flags any AMP URLs with disallowed HTML for AMP. If you want to know the exact disallowed HTML, right click on the URL and then select ‘Validation > AMP Validator’. This will open it up in the official https://validator.ampproject.org/ to view specific issues.
- Other Validation Errors – This flags any AMP URLs with other validation errors not already covered by the above filters.
6) View The AMP URL Source By Clicking ‘Inlinks’
If an AMP URL errors, you’ll want to know the source of those errors. To do this, simply click on a URL in the top window pane and then click on the ‘Inlinks’ tab at the bottom to populate the lower window pane.
The ‘amphtml’ type, are references to a URL from rel=”amphtml” link tags within the head of the HTML.
Here’s a close up view of the ‘inlinks’ lower window tab –
This is showing the desktop URL (https://www.telegraph.co.uk/business/essential-insights/cyber-resilience/) has a rel=”amphtml” link tag to the AMP version (https://www.telegraph.co.uk/business/essential-insights/cyber-resilience/amp/), which is a 404 error.
7) Use The ‘Bulk Export > AMP > X Inlinks’ Exports
To bulk export AMP inlink data, use the ‘bulk export > AMP’ top level menu.
In the screenshot above, this would export all AMP URLs that don’t respond with a ‘200’ response code, and the respective inlinks (the source pages that link to the 404s).
Upload & Audit AMP URLs Seperately
Alternatively, you can audit AMP URLs seperately, by uploading them directly in list mode. It is possible to crawl and audit just the AMP URLs by uploading them directly in list mode and crawling them.
However, if both exist we generally recommend auditing both the desktop and AMP equivalents together, which is possible by uploading the desktop versions and tweaking the configuration. This process is outlined below.
1) Click ‘Mode > List’
Via the top level menu. This enables you to upload a list of desktop URLs.
2) Disable The Crawl Depth Limit under ‘Config > Spider > Limits’
By default the crawl depth is set to ‘0’ in list mode, so only the URLs you upload are crawled. However, this should be removed, as the AMP versions (at crawl depth ‘1’) also need to be crawled.
3) Enable ‘Crawl’ and ‘Store’ AMP under ‘Config > Spider > Crawl’ & Disable All Other Resource and Page Links
In list mode with the crawl depth removed, the SEO Spider will crawl all URLs uploaded and any URLs that they link to onwards like in regular ‘Spider’ mode. Therefore, to crawl only the AMP equivalents and not other internal links, all resource and page links should be disabled, apart from AMP links.
With ‘Internal hyperlinks’ and other link types disabled, this will mean the desktop URLs uploaded and their AMP links will be crawled only.
4) Copy Desktop URLs, Then Click ‘Upload > Paste’
This uploads them into the SEO Spider so they can be crawled.
Click ‘OK’ twice, and crawl the desktop and AMP URLs until the crawl finishes.
5) Follow The Process Outlined from Point 3 In the Guide Above
Now you can follow the same process outlined from point 3 in the ‘Crawl A Site To Check AMP‘ section above. This includes running a crawl analysis at the end of a crawl to populate filters within the AMP tab.
While a list mode crawl is obviously not as comprehensive as a full website crawl, by uploading the desktop URLs and crawling their AMP equivalents, the SEO Spider will analyse the source relationships. Thus, this is a great way to quickly spot check AMP.
The guide above should help illustrate the simple steps required to bulk audit and validate Mobile Accelerated Pages (AMP) across a website.
Join the mailing list for updates, tips & giveawaysHow we use the data in this form
Back to top