Posted 23 November, 2020 by in 中国体育平台 SEO Spider

中国体育平台 SEO Spider Update – Version 14.0

We are pleased to launch Screaming Frog SEO Spider version 14.0, codenamed internally as ‘megalomaniac’.

Since the release of version 13 in July, we’ve been busy working on the next round of features for version 14, based upon user feedback and as always, a little internal steer.

Let’s talk about what’s new in this release.


1) Dark Mode

While arguably not the most significant feature in this release, it is used throughout the screenshots – so it makes sense to talk about first. You can now switch to dark mode, via ‘Config > User Interface > Theme > Dark’.

Dark Mode

Not only will this help reduce eye strain for those that work in low light (everyone living in the UK right now), it also looks super cool – and is speculated (by me now) to increase your technical SEO skills significantly.

The non-eye-strained among you may notice we’ve also tweaked some other styling elements and graphs, such as those in the right-hand overview and site structure tabs.


2) Google Sheets Export

You’re now able to export directly to Google Sheets.

Google Sheets Exports

You can add multiple Google accounts and connect to any, quickly, to save your crawl data which will appear in Google Drive within a ‘中国体育平台 SEO Spider’ folder, and be accessible via Sheets.

Exports in Google Sheets

Many of you will already be aware that Google Sheets isn’t really built for scale and has a 5m cell limit. This sounds like a lot, but when you have 55 columns by default in the Internal tab (which can easily triple depending on your config), it means you can only export around 90k rows (55 x 90,000 = 4,950,000 cells).

If you need to export more, use a different export format that’s built for the size (or reduce your number of columns). We had started work on writing to multiple sheets, but really, Sheets shouldn’t be used in that way.

This has also been integrated into scheduling and the command line . This means you can schedule a crawl, which automatically exports any tabs, filters, exports or reports to a Sheet within Google Drive.

You’re able to choose to create a timestamped folder in Google Drive, or overwrite an existing file.

Google Sheets Exporting in Scheduling

This should be helpful when sharing data in teams, with clients, or for Google Data Studio reporting.


3) HTTP Headers

You can now store, view and query full HTTP headers. This can be useful when analysing various scenarios which are not covered by the default headers extracted, such as details of caching status, set-cookie, content-language, feature policies, security headers etc.

You can choose to extract them via ‘Config > Spider > Extraction’ and selecting ‘HTTP Headers’. The request and response headers will then be shown in full in the lower window ‘HTTP Headers’ tab.

HTTP Headers

The HTTP response headers also get appended as columns in the Internal tab, so they can be viewed, queried and exported alongside all the usual crawl data.

Headers can also be exported in bulk via ‘Bulk Export > Web > All HTTP Headers’.


4) Cookies

You can now also store cookies from across a crawl. You can choose to extract them via ‘Config > Spider > Extraction’ and selecting ‘Cookies’. These will then be shown in full in the lower window Cookies tab.

Cookies

You’ll need to use JavaScript rendering mode to get an accurate view of cookies, which are loaded on the page using JavaScript or pixel image tags.

The SEO Spider will collect cookie name, value, domain (first or third party), expiry as well as attributes such as secure and HttpOnly.

This data can then be analysed in aggregate to help with cookie audits, such as those for GDPR via ‘Reports > Cookies > Cookie Summary’.

Cookie Summary report

You can also highlight multiple URLs at a time to analyse in bulk, or export via the ‘Bulk Export > Web > All Cookies’.

Please note – When you choose to store cookies, the auto exclusion performed by the SEO Spider for Google Analytics tracking tags is disabled to provide an accurate view of all cookies issued.

This means it will affect your analytics reporting, unless you choose to exclude any tracking scripts from firing by using the Exclude configuration (‘Config > Exclude’) or filter out the ‘中国体育平台 SEO Spider’ user-agent similar to excluding PSI in this FAQ .


5) Aggregated Site Structure

The SEO Spider now displays the number of URLs discovered in each directory when in directory tree view (which you can access via the tree icon next to ‘Export’ in the top tabs).

This helps better understand the size and architecture of a website, and some users find it more logical to use than traditional list view.

Directory Tree Children Count

Alongside this update, we’ve improved the right-hand ‘Site Structure’ tab to show an aggregated directory tree view of the website. This helps quickly visualise the structure of a website, and identify where issues are at a glance, such as indexability of different paths.

Site Structure Tab

If you’ve found areas of a site with non-indexable URLs, you can switch the ‘view’ to analyse the ‘indexability status’ of those different path segments to see the reasons why they are considered as non-indexable.

Indexability Status

You can also toggle the view to crawl depth across directories to help identify any internal linking issues to areas of the site, and more.

Aggregated Crawl Depth

This wider aggregated view of a website should help you visualise the architecture, and make better decisions for different sections and segments.


6) New Configuration Options

We’ve introduced two new significant configuration options – ‘Ignore Non-Indexable URLs for On-Page Filters’ and ‘Ignore Paginated URLs for Duplicate Filters’.

These are both enabled by default via ‘Config > Spider > Advanced’, and will mean non-indexable pages won’t be flagged in appropriate on-page filters for page titles, meta descriptions, or headings.

Advanced configuration

This means URLs won’t be considered as ‘Duplicate’, or ‘Over X Characters’ or ‘Below X Characters’ if for example they are noindex, and hence non-indexable. Paginated pages won’t be flagged for duplicates either.

If you’re crawling a staging website which has noindex across all pages , remember to disable these options.

These options are a little different to the ‘ respect ‘ configuration options, which remove non-indexable URLs from appearing at all. Non-indexable URLs will still appear in the interface, they just won’t be flagged for relevant issues.


Other Updates

Version 14.0 also includes a number of smaller updates and bug fixes, outlined below.

  • There’s now a new filter for ‘Missing Alt Attribute’ under the ‘Images’ tab. Previously missing and empty alt attributes would appear under the singular ‘Missing Alt Text’ filter. However, it can be useful to separate these, as decorative images should have empty alt text (alt=””), rather than leaving out the alt attribute which can cause issues in screen readers. Please see our How To Find Missing Image Alt Text & Attributes tutorial.
  • Headless Chrome used in JavaScript rendering has been updated to keep up with evergreen Googlebot.
  • ‘Accept Cookies’ has been adjusted to ‘ Cookie Storage ‘, with three options – Session Only, Persistent and Do Not Store. The default is ‘Session Only’, which mimics Googlebot’s stateless behaviour.
  • The ‘URL’ tab has new filters available around common issues including Multiple Slashes (//), Repetitive Path, Contains Space and URLs that might be part of an Internal Search.
  • The ‘ Security ‘ tab now has a filter for ‘Missing Secure Referrer-Policy Header’.
  • There’s now a ‘HTTP Version’ column in the Internal and Security tabs, which shows which version the crawl was completed under. This is in preparation for supporting HTTP/2 crawling inline with Googlebot.
  • You’re now able to right click and ‘close’ or drag and move the order of lower window tabs, in a similar way to the top tabs.
  • Non-Indexable URLs are now not included in the ‘URLs not in Sitemap’ filter, as we presume they are non-indexable correctly and therefore shouldn’t be flagged. Please see our tutorial on ‘ How To Audit XML Sitemaps ‘ for more.
  • Google rich result feature validation has been updated inline with the ever-changing documentation.
  • The ‘Google Rich Result Feature Summary’ report available via ‘ Reports ‘ in the top-level menu, has been updated to include a ‘% eligible’ for rich results, based upon errors discovered. This report also includes the total and unique number of errors and warnings discovered for each Rich Result Feature as an overview.

That’s everything for now, and we’ve already started work on features for version 15. If you experience any issues, please let us know via support and we’ll help.

Thank you to everyone for all their feature requests, feedback, and continued support.

Now, go and download version 14.0 of the Screaming Frog SEO Spider and let us know what you think!


Small Update – Version 14.1 Released 7th December 2021

We have just released a small update to version 14.1 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • Fix ‘Application Not Responding’ issue which affected a small number of users on Windows.
  • Maintain Google Sheets ID when overwriting.
  • Improved messaging in Force-Directed Crawl Diagram scaling configuration, when scaling on items that are not enabled (GA etc).
  • Removed .xml URLs from appearing in the ‘Non-Indexable URLs in Sitemap’ filter.
  • Increase the size of Custom Extraction text pop-out.
  • Allow file name based on browse selection in location chooser.
  • Add AMP HTML column to internal tab.
  • Fix crash in JavaScript crawling.
  • Fix crash when selecting ‘View in Internal Tab Tree View’ in the the Site Structure tab.
  • Fix crash in image preview details window.

Small Update – Version 14.2 Released 16th February 2021

We have just released a small update to version 14.2 of the SEO Spider. This release includes a couple of cool new features, alongside lots of small bug fixes.

Core Web Vitals Assessment

We’ve introduced a ‘Core Web Vitals Assessment’ column in the PageSpeed tab with a ‘Pass’ or ‘Fail’ using field data collected via the PageSpeed Insights API for Largest Contentful Paint, First Input Delay and Cumulative Layout Shift.

Core Web Vitals Assessment

For a page to ‘pass’ the Core Web Vital Assessment it must be considered ‘Good’ in all three metrics, based upon Google’s various thresholds for each. If there’s no data for the URL, then this will be left blank.

This should help identify problematic sections and URLs more efficiently. Please see our tutorial on How To Audit Core Web Vitals .

Broken Bookmarks (or ‘Jump Links’)

Bookmarks are a useful way to link users to a specific part of a webpage using named anchors on a link, also referred to as ‘jump links’ or ‘anchor links’. However, they frequently become broken over time – even for Googlers.

https://twitter.com/JohnMu/status/1351146583828672514

To help with this problem, there’s now a check in the SEO Spider which crawls URLs with fragment identifiers and verifies that an accurate ID exists within the HTML of the page for the bookmark.

You can enable ‘Crawl Fragment Identifiers’ under ‘Config > Spider > Advanced’, and then view any broken bookmarks under the URL tab and new ‘Broken Bookmark’ filter.

Broken Bookmarks

You can view the source pages these are on by using the ‘inlinks’ tab, and export in bulk via a right click ‘Export > Inlinks’. Please see our tutorial on How To Find Broken Bookmark & Jump Links .

14.2 also includes the following smaller updates and bug fixes.

  • Improve labeling in all HTTP headers report.
  • Update some column names to make more consistent – For those that have scripts that work from column naming, these include – Using capital case for ‘Length’ in h1 and h2 columns, and pluralising ‘Meta Keywords’ columns from singular to match the tab.
  • Update link score graph calculations to exclude self referencing links via canoncials and redirects.
  • Make srcset attributes parsing more robust.
  • Update misleading message in visualisations around respecting canonicals.
  • Treat HTTP response headers as case insensitive.
  • Relax Job Posting value property type checking.
  • Fix issue where right click ‘Export > Inlinks’ sometimes doesn’t export all the links.
  • Fix freeze on M1 mac during crawl.
  • Fix issue with Burmese text not displayed correctly.
  • Fix issue where Hebrew text can’t be input into text fields.
  • Fix issue with ‘Visualisations > Inlink Achor Text Word Cloud’ opening two windows.
  • Fix issue with Forms Based Auth unlock icon not displaying.
  • Fix issue with Forms Based Auth failing for sites with invalid certificates.
  • Fix issue with Overview Report showing incorrect site URL in some situations.
  • Fix issue with Chromium asking for webcam access.
  • Fix issue on macOS where launching via a .seospider/.dbseospider file doesn’t always load the file.
  • Fix issue with Image Preview incorrectly showing 404.
  • Fix issue with PSI CrUX data being duplicated in Origin.
  • Fix various crashes in JavaScript crawling.
  • Fix crash parsing some formats of HTML.
  • Fix crash when re-spidering.
  • Fix crash performing JavaScript crawl with empty user agent.
  • Fix crash selecting URL in master view when all tabs in details view are disabled/hidden.
  • Fix crash in JavaScript crawling when web server sends invalid UTF-8 characters.
  • Fix crash in Overview tab.

Small Update – Version 14.3 Released 17th March 2021

We have just released a small update to version 14.3 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • Add Anchor Text Alt Text & Link Path to ‘Redirect’ reports.
  • Show the display URL for duplicate content reports rather than the URL encoded URL.
  • Update right click ‘History >’ checks to be HTTPS.
  • Fix issue with Image Details tab failing to show images if page links to itself as an image.
  • Fix issue with some text labels being truncated.
  • Fix issue where API settings can’t be viewed whilst crawling.
  • Fix issue with GA E-commerce metrics not showing when reloading a DB crawl.
  • Fix issue with ‘Crawls’ UI not sorting on modified date.
  • Fix PageSpeed CrUX discrepancies between master and details view.
  • Fix crash showing authentication Browser.
  • Fix crash in visualisations.
  • Fix crash removing URLs.
  • Fix crash after editing SERP panel.
  • Fix odd colouring on fonts on macOS.
  • Fix crash during JavaScript crawling.
  • Fix crash viewing PageSpeed details tab.
  • Fix crash using Wacom tablet on Windows.