Posted 22 September, 2021 by in 中国体育平台 SEO Spider

中国体育平台 SEO Spider Update – Version 16.0

We’re excited to announce Screaming Frog SEO Spider version 16.0, codenamed internally as ‘marshmallow’.

Since the launch of crawl comparison in version 15 , we’ve been busy working on the next round of prioritised features and enhancements.

Here’s what’s new in our latest update.

1) Improved JavaScript Crawling

5 years ago we launched JavaScript rendering, as the first crawler in the industry to render web pages, using Chromium (before headless Chrome existed) to crawl content and links populated client-side using JavaScript.

As Google, technology and our understanding as an industry has evolved, we’ve updated our integration with headless Chrome to improve efficiency, mimic the crawl behaviour of Google closer, and alert users to more common JavaScript-related issues.

JavaScript Tab & Filters

The old ‘AJAX’ tab, has been updated to ‘JavaScript’, and it now contains a comprehensive list of filters around common issues related to auditing websites using client-side JavaScript.

JavaScript Tab & Filters

This will only populate in JavaScript rendering mode, which can be enabled via ‘Config > Spider > Rendering’.

Crawl Original & Rendered HTML

One of the fundamental changes in this update is that the SEO Spider will now crawl both the original and rendered HTML to identify pages that have content or links only available client-side and report other key differences.

Crawl raw and rendered HTML

This is more in line with how Google crawls and can help identify JavaScript dependencies, as well as other issues that can occur with this two-phase approach.

Identify JavaScript Content & Links

You’re able to clearly see which pages have JavaScript content only available in the rendered HTML post JavaScript execution.

For example, our homepage apparently has 4 additional words in the rendered HTML, which was new to us.

中国体育平台 word count diff

By storing the HTML and using the lower window ‘View Source’ tab, you can also switch the filter to ‘Visible Text’ and tick ‘Show Differences’, to highlight which text is being populated by JavaScript in the rendered HTML.

Visible Content Diff

Aha! There are the 4 words. Thanks, Highcharts.

Pages that have JavaScript links are reported and the counts are shown in columns within the tab.

Identify JavaScript Links

There’s a new ‘link origin’ column and filter in the lower window ‘Outlinks’ (and inlinks) tab to help you find exactly which links are only in the rendered HTML of a page due to JavaScript. For example, products loaded on a category page using JavaScript will only be in the ‘rendered HTML’.

View JavaScript links

You can bulk export all links that rely on JavaScript via ‘Bulk Export > JavaScript > Contains JavaScript Links’.

Compare HTML Vs Rendered HTML

The updated tab will tell you if page titles, descriptions, headings, meta robots or canonicals depend upon or have been updated by JavaScript. Both the original and rendered HTML versions can be viewed simultaneously.

JavaScript updating titles and descriptions

This can be useful when determining whether all elements are only in the rendered HTML, or if JavaScript is used on selective elements.

The two-phase approach of crawling the raw and rendered HTML can help pick up on easy to miss problematic scenarios, such as the original HTML having a noindex meta tag, but the rendered HTML not having one.

Previously by just crawling the rendered HTML the page would be deemed as indexable when in reality Google will see the noindex in the original HTML first, and subsequently skip rendering , meaning the removal of the noindex won’t be seen and the page won’t be indexed.

Shadow DOM & iFrames

Another enhancement we’ve wanted to make is to improve our rendering to better match Google’s own behaviour. Giacomo Zecchini’s recent ‘ Challenges of building a search engine like web rendering service ‘ talk at SMX Advanced provides an excellent summary of some of the challenges and edge cases.

Google is able to flatten and index Shadow DOM content, and will inline iframes into a div in the rendered HTML of a parent page, under specific conditions (some of which I shared in a tweet ).

After research and testing, both of these are now supported in the SEO Spider, as we try to mimic Google’s web rendering service as closely as possible.

Flatten Shadow DOM & iframes

They are enabled by default, but can be disabled when required via ‘Config > Spider > Rendering’. There are further improvements we’d like to make in this area, and if you spot any interesting edge cases then drop us an email.

2) Automated Crawl Reports For Data Studio

Data Studio is commonly the tool of choice for SEO reporting today, whether that’s for your own reports, clients or the boss. To help automate this process to include crawl report data, we’ve introduced a new Data Studio friendly custom crawl overview export available in scheduling.

Data Studio Crawl Export

This has been purpose-built to allow users to select crawl overview data to be exported as a single summary row to Google Sheets. It will automatically append new scheduled exports to a new row in the same sheet in a time series.

Custom Crawl Summary Report In Google Sheets

The new crawl overview summary in Google Sheets can then be connected to Data Studio to be used for a fully automated Google Data Studio crawl report. You’re able to copy our very own Screaming Frog Data Studio crawl report template , or create your own better versions!

中国体育平台 Data Studio Crawl Report

This allows you or a team to monitor site health and be alerted to issues without having to even open the app. It also allows you to share progress with non-technical stakeholders visually.

Please read our tutorial on ‘ How To Automate Crawl Reports In Data Studio ‘ to set this up.

We’re excited to see alternative Screaming Frog Data Studio report templates, so if you’re a Data Studio whizz and have one you’d like to share with the community, let us know and we will include it in our tutorial.

3) Advanced Search & Filtering

The inbuilt search function has been improved, it defaults to regular text search but allows you to switch to regex, choose from a variety of predefined filters (including a ‘does not match regex’) and combine rules (and/or).

Advanced search and filtering in the GUI

The search bar displays the syntax used by the search and filter system, so this can be formulated by power users to build common searches and filters quickly, without having to click the buttons to run searches.

Advanced search box

The syntax can just be pasted or written directly into the search box to run searches.

4) Translated UI

Alongside English, the GUI is now available in Spanish, German, French and Italian to further support our global users. It will detect the language used on your machine on startup, and default to using it.

Translated GUI

Language can also be set within the tool via ‘Config > System > Language’.

A big shoutout and thank you to the awesome MJ Cachón , Riccardo Mares , Jens Umland and Benjamin Thiers at Digimood for their time and amazing help with the translations. We truly appreciate it. You all rock.

Technical SEO jargon alongside the complexity and subtleties in language makes translations difficult, and while we’ve worked hard to get this right with amazing native speaking SEOs, you’re welcome to drop us an email if you have any suggestions to improve further.

We may support additional languages in the future as well.

Other Updates

Version 16.0 also includes a number of smaller updates and bug fixes, outlined below.

  • The PageSpeed Insights integration has been updated to include ‘Image Elements Do Not Have Explicit Width & Height’ and ‘Avoid Large Layout Shifts’ diagnostics, which can both improve CLS. ‘Avoid Serving Legacy JavaScript’ opportunity has also been included.
  • ‘Total Internal Indexable URLs’ and ‘Total Internal Non-Indexable URLs’ have been added to the ‘Overview’ tab and report.
  • You’re now able to open saved crawls via the command line and export any data and reports.
  • The include and exclude have both been changed to partial regex matching by default. This means you can just type in ‘blog’ rather than say .*blog.* etc.
  • The HTTP refresh header is now supported and reported!
  • Scheduling now includes a ‘Duplicate’ option to improve efficiency. This is super useful for custom Data Studio exports, where it saves time selecting the same metrics for each scheduled crawl.
  • Alternative images in the picture element are now supported when the ‘Extract Images from srcset Attribute’ config is enabled. A bug where alternative images could be flagged with missing alt text has been fixed.
  • The Google Analytics integration now has a search function to help find properties.
  • The ‘Max Links per URL to Crawl’ limit has been increased to 50k.
  • The default ‘Max Redirects to Follow’ limit has been adjusted to 10, inline with Googlebot before it shows a redirect error .
  • PSI requests are now x5 times faster, as we realised Google increased their quotas!
  • Updated a tonne of Google rich result feature changes for structured data validation .
  • Improved forms based authentication further to work in more scenarios.
  • Fix macOS launcher to trigger Rosetta install automatically when required.
  • Ate plenty of bugs.

That’s everything! As always, thanks to everyone for their continued feedback, suggestions and support. If you have any problems with the latest version, do just let us know via support and we will help.

Now, download version 16.0 of the Screaming Frog SEO Spider and let us know what you think in the comments.

Small Update – Version 16.1 Released 27th September 2021

We have just released a small update to version 16.1 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • Updated some Spanish translations based on feedback.
  • Updated SERP Snippet preview to be more in sync with current SERPs.
  • Fix issue preventing the Custom Crawl Overview report for Data Studio working in languages other than English.
  • Fix crash resuming crawls with saved Internal URL configuration.
  • Fix crash caused by highlighting a selection then clicking another cell in both list and tree views.
  • Fix crash duplicating a scheduled crawl.
  • Fix crash during JavaScript crawl.

Small Update – Version 16.2 Released 18th October 2021

We have just released a small update to version 16.2 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • Fix issue with corrupt fonts for some users.
  • Fix bug in the UI that allowed you to schedule a crawl without a crawl seed in Spider Mode.
  • Fix stall opening saved crawls.
  • Fix issues with upgrades of database crawls using excessive disk space.
  • Fix issue with exported HTML visualisations missing pop up help.
  • Fix issue with PSI going too fast.
  • Fix issue with Chromium requesting webcam access.
  • Fix crash when cancelling an export.
  • Fix crash during JavaScript crawling.
  • Fix crash accessing visualisations configuration using languages other then English.

Small Update – Version 16.3 Released 4th November 2021

We have just released a small update to version 16.3 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • The Google Search Console integration now has new filters for search type (Discover, Google News, Web etc) and supports regex as per the recent Search Analytics API update.
  • Fix issue with Shopify and CloudFront sites loading in Forms Based authentication browser.
  • Fix issue with cookies not being displayed in some cases.
  • Give unique names to Google Rich Features and Google Rich Features Summary report file names.
  • Set timestamp on URLs loaded as part of JavaScript rendering.
  • Fix crash running on macOS Monetery.
  • Fix right click focus in visualisations.
  • Fix crash in Spelling and Grammar UI.
  • Fix crash when exporting invalid custom extraction tabs on the CLI.
  • Fix crash when flattening shadow DOM.
  • Fix crash generating a crawl diff.
  • Fix crash when the Chromium can’t be initialised.

Small Update – Version 16.4 Released 14th December 2021

We have just released a small update to version 16.4 of the SEO Spider. This release includes a security patch, as well as bug fixes and small improvements –

  • Update to Apache log4j 2.15.0 to fix CVE-2021-44228 vulnerability.
  • Added scheduling history feature under ‘File > Scheduling’.
  • Added validation of scheduled tasks to list view to catch issues like removing config files after setting up crawls.
  • Allow double click to edit scheduled crawls.
  • Rate limit Google Sheets exports to prevent export failures.
  • Renaming a custom search/extraction no longer clears the filter.
  • Update failed to find GA account details to list account names and IDs.
  • Add Crawl Timestamp to URL Details tab.
  • Fix crash changing custom search mid crawl.
  • Fix JavaScript crawling bug with pages that send POST/HEAD requests.
  • Fix memory leak during JavaScript Crawling.
  • Fix crash on startup with corrupt tab config file.
  • Fix issue with scheduled crawls hanging if APIs don’t connect.
  • Fix command line crawl issue where Google Sheets limits causes subsequent exports to fail randomly.
  • Fix bug with HTTP Canonicals not being spotted when deriving indexability.
  • Fix crash extracting Chrome on start up.
  • Fix bug parsing robots.txt for User-Agents that already have rules.
  • Fix bug in hreflang filters around sitemap hreflangs and crawl order.
  • Fix crash doing hreflang validation when a sitemap is removed.
  • Fix duplicated cookies stored against a URL.
  • Fix various issues with Forms Based authentication.
  • Fix crash in GSC.
  • Fix crash selecting items in overview table.

  • Small Update – Version 16.5 Released 21st December 2021

    We have just released a small update to version 16.5 of the SEO Spider. This release includes a security patch, as well as bug fixes and small improvements –

    • Update to Apache log4j 2.17.0 to fix CVE-2021-45046 and CVE-2021-45105.
    • Show more detailed crawl analysis progress in the bottom status bar when active.
    • Fix JavaScript rendering issues with POST data.
    • Improve Google Sheets exporting when Google responds with 403s and 502s.
    • Be more tolerant of leading/trailing spaces for all tab and filter names when using the CLI.
    • Add auto naming for GSC accounts, to avoid tasks clashing.
    • Fix crash running link score on crawls with URLs that have a status of “Rendering Failed”.

    Small Update – Version 16.6 Released 3rd February 2022

    We have just released a small update to version 16.6 of the SEO Spider, which includes URL Inspection API integration. Please read our version 16.6 release notes .

    Small Update – Version 16.7 Released 2nd March 2022

    We have just released a small update to version 16.7 of the SEO Spider. Please read our version 16.7 release notes .