Here are the trends we uncovered after analyzing over 4 billion on-page SEO (search engine optimization) issues among 200 million page crawls.
Marketers have been using Raven’s Site Auditor tool since early 2013 to find on-page or onsite SEO issues with websites. For the first time, we’re releasing anonymous crawl data to the marketing industry.
Site Auditor has crawled and recrawled over 200 million webpages and found over 4 billion instances of SEO issues and counting…
Countless websites are in need of SEO optimization. We discovered marketers on average uncover more than 4,500 issues that could be affecting search visibility per website crawl.
This study outlines opportunities for you and provides benchmarks you can use when reporting SEO issues to your clients and prospects.
Cheers to data!
Co-Founder of Raven
A. The Most Widespread On-Page SEO Issue Today
Images with missing alt and title attributes accounted for about 78% of all on-page SEO issues found, making image issues the most common type of SEO issue found.
Google image search receives over 1 billion page views every single day. It’s safe to say, image search is part of many searchers’ daily Internet activity. The higher your images rank in image search, the more likely people will visit your pages through image search.
Despite the opportunity, Image optimization is still an untapped resource for many websites today. We found that the average website crawl had:
- 2,487 instances of images with missing title attributes.
- 1,153 instances of images with missing attributes.
- 32 instances of broken images.
The alt and title text attributes help search engines decipher the subject of an image and therefore when to include it in search results. Broken images provide a poor user experience for readers and therefore can cause websites to rank less prominently.
If images are a cornerstone of your website, optimizing them could be important. If not, optimizing images shouldn’t necessarily be your top priority. Optimizing images that don’t add much to your content may not make sense if you have a list of more severe SEO issues that could be tackled first. Here is how to prioritize SEO issues after an audit.
B. How Google’s Analytics Tracking Monopoly Affects You
Our study found that 83.13% of the pages crawled had Google Analytics (GA) installed on them.
Google created this dominating position by making their service free for any website and by creating an easy to use site analytics platform. GA allows a webmaster to add their visitor tracking code to an unlimited number of sites. Webmaster can then use that data to better understand where visitors are coming from and also study their behavior.
While this seems like a great deal for webmasters, Google’s “free” usually carries a price.
Google is in the business of collecting as much data as possible in order learn and profit from it. They use the data they collect to enhance their services, like AdWords and Search. That has some webmasters and SEOs concerned, because they believe Google may be using the data they’re collecting against them.
For example, they believe that sites using GA tracking code may help Google more easily record bounces and dwell time from their search results. Which means if you have a site with GA tracking code, and your site experiences a high volume of bounces from Google’s search results (visitors immediately clicking the Back button), you may be helping Google more comprehensively measure visitor behavior that results in them degrading how well you rank in their organic (non-paid) search results.
However, the opposite argument can also be made. Having GA on your site may communicate a more positive picture to Google, resulting in either no change in ranking or an improvement. Additionally, Google had denied that Google Analytics data specifically affects rankings.
Ultimately, the biggest concern has to do with the perception of unfair competition. If Google has detailed data on people’s behaviors and interests – thanks to the significant adoption of GA coupled with their other services (searches, email messages, etc…) – then they can, and most likely do, leverage that to easily outcompete their competitors.
A Summary of Past Google Analytics Research
As previously stated, we found that 83.13% of pages had Google Analytics installed. This is based on averaging data from the over 200 million pages that Site Auditor crawled and recrawled.
Our study contributes evidence that Google Analytics remains the most dominant web analytics software currently available, certainly among marketers. As of the writing of this study, W3Techs estimates that 52% of all websites use Google Analytics and BuiltWith puts it at 59%.
Our percentage is likely higher than estimates sampling the Internet at large since marketers checking for onsite SEO issues are the primary demographic for Site Auditor. Marketers have an interest in using analytics tracking software like Google Analytics so they can measure the success of marketing campaigns.
Additionally, our study averages page data, whereas the other sources cited average website data by checking the homepages of websites listed in indexes of popular websites.
C. Duplicate Content Remains at Large
Nearly 49% of pages had either duplicate content or low word count issues. Additionally Site Auditor consistently found duplicate content issues among meta description.
- 29% of pages had duplicate content
- 22% of page titles were duplicate
- 20% of pages had low word counts
- 17% of meta descriptions were duplicate
Removing duplicate and light content or replacing it with unique, high quality content can increase how high your website ranks in search results.
Google has shown its commitment to favoring high quality content in search results. For instance, in 2011 Google rolled out a major update to their search algorithm called Google Panda that ranked websites with low quality and light content lower in results. This update affected 12% of all search results.
Content quality continues to be one of the biggest factors affecting how high you’ll rank in search results. Google’s Webmaster guidelines is a great starting place for learning how Google defines quality content.
The most common content issue found was duplicate content. The fact that 29% of pages had duplicate content sheds light on a large opportunity for marketers. The average website crawl found 71 pages with duplicate content errors.
Duplicate content is blocks of text within your website that either completely match other content or that are very similar. A blog category page showing snippets of content already on the website can be an example of duplicate content.
The average website crawl also found 48 pages with low word count errors. These pages had less than 250 words. Pages with more words give search engine algorithms more context to understand the content and word count correlates with quality. On average, the top 10 results search engines return have over 2,000 words.
D. Over a Third of Pages Crawled had Missing Meta Descriptions
Over a third of all pages had no meta description at all, which shows that the average website could see a lift in results with more meta description optimization.
The following was true of the average website crawl:
- 61% of meta descriptions were too long or too short.
- 34% of pages had missing meta descriptions.
- 25% of page titles were too short or too long.
- 22% of page titles were duplicate titles.
- 17% of meta descriptions were duplicate or too similar to others.
- 16% of pages did not have Google Analytics Installed.
Meta descriptions are a short, helpful summary of your page’s content that you see in search results under each result title. Search engine generally will use your description in search results.
This control over how content is displayed in search results is an opportunity for marketers to write engaging, actionable description that draw readers in. Here’s how to write meta descriptions for maximum clicks.
Site Auditor considers it an error when page titles are not within 10-70 characters and when meta descriptions are not within 50-156 characters.
E. 80% of Pages Crawled had no Schema.org Microdata
Only 20% of pages used schema.org microdata, meaning 80% of pages did not.
Schema.org microdata is a structured data vocabulary standard endorsed by major search engines. It allows search engines to make sense of your content so they know how to display data during search experiences.
You can add schema.org microdata to pages about a person, recipe, product, movie or any of the other schema types. When data is “marked up” using schema.org microdata, it helps search engines filter and display your data. Since content is easier to interpret, it can rank higher in search results.
Schema.org was only launched in 2011, but our findings show that marketers are adopting this newer standard. In fact, over 10 million websites use Schema.org markup data according to schema.org.
Though our findings show adoption among marketers, only 20% of pages were found to be using schema.org microdata. It’s likely that more pages could benefit from using schema.org markup, so there is still a large opportunity to use this newer technology to your advantage.
Search engines might use microdata to do things like:
- Include your business hours in search results
- Filter only recipes relevant to a specific search
- Include user ratings next to a product
Some marketers are concerned that providing search engines with additional information about their content could cause searchers to get their answers within search results and not click through to websites.
Despite this concern, many marketers are adopting this new standard as a strategy to increase brand awareness and traffic to their websites. Learn how to easily add schema data to core pages of your website.
F. Over 250 Link Issues Hold Back the Average Website
The average website crawl had nearly 300 instance of link issues, including:
- 181 links with no anchor or alt text
- 91 nofollow links
- 23 broken links
Google built its search technology on the importance of links. Google sees links as a vote for a website’s quality. Links to your website matter. But, links within your website matter as well. Using the nofollow attribute and anchor text properly as well as cleaning up broken links, can help search engines know how to find and rank your content.
Anchor text is text that is blue and underlined in the traditional styling of a link. It’s the text that helps search engines better understand the context of the page you’re linking to, many times your own pages. The largest link issue found was missing anchor or alt text.
The average website had 181 links with missing anchor or alt text. Among these, 64% were internal links and 36% were external links. Links wrapped around images that don’t have alt text is similar to excluding anchor text from a link. This is why missing anchor text and alt text are counted as one error.
The nofollow attribute is tied to a link in order to tell search engines that your link isn’t a vote for the quality of another website. This is important if you don’t want to vouch for the quality of a link. The average website had 91 links using the nofollow attribute. Among these, 66% were internal links and 34% were external links.
Two thirds of nofollow links found were internal. You shouldn’t use the nofollow attribute for internal links. It’s not a best practice since you should always trust your own website.
The presence of external nofollow links on the other hand is not necessarily an issue. It’s a potential issue, depending on if it was added in error.
Broken links create a poor user experience and therefore can be a signal to search engines of low quality. The average website had 23 broken links. Among these, 52% were internal links and 48% were external links.
A Plan for Fixing On-Page SEO Issues
All this data is great for providing a benchmark for your website or showing prospects the opportunities missed by most website owners.
But how do you fix on-page SEO (search engine optimization) issues that affect your website or your client’s website? This is the type of work that can translate into money in your pocket.
The higher a website ranks in search results, the more traffic it brings. Traffic can mean more sales and leads. Ranking high in results is important since 71.33% of searchers click results on the first page, rather than interacting with others pages of results.
Here is the 3-step process we recommend for fixing on-page SEO issues:
- Crawl: Use site auditing software such as Site Auditor to discover which on-page SEO issues are holding you back.
- Fix: Figure out what to tackle first so you can make the most impact with your time.
- Report: Keep the conversation with clients going by using a reporting tool like Report Builder to show your progress.
Building a long term strategy for earning more SEO traffic takes time and expertise. It also likely means a focus on creating and promoting quality content. Make sure to review Google’s Webmaster Guidelines and don’t take any shortcuts that could penalize you.
Gaining more traffic from search takes time. Though, some SEO issues can be fixed quickly and have dramatic, lasting effects. That’s why it makes sense to make sure your foundation is solid before building upon it.
Data was collected directly from Raven databases, which was the only data source for this study. We pulled and analyzed data anonymously to protect the privacy of our users. Data in this study was collected by Raven’s SEO Site Auditor tool between 11 February 2013 – 15 June 2015.
Over 4 billion instances of individual on-page SEO issues were retrieved. Each item was tied to a broad category based on Raven’s Site Auditor’s classification of SEO issues. The categories were image, link, meta, page, visibility, and semantic.
Our sample includes data from 96,488 unique websites. These represent websites crawled by both trial and paid users. There were 888,710 entries of unique crawls, so the average website was recrawled about 9.2 times.
We derived the number of pages within the average website crawl (243.15) by dividing the total amount of pages discovered on all crawls and recrawls by the total number of crawls and recrawls conducted.
Issues such as broken links or broken images that occurred on multiple pages were counted as multiple issues.
Statements such as “the average website crawl had this many issues” was calculated by dividing total instances of an individual issues by the number of pages within the average website crawl.
Statements such as “The average website crawl had this percent of pages with this issue” were only made about issues that are linked to one page. For instance, we were able to calculate the percent of pages that had 404 Page errors or pages with duplicate content since these issues are tied to one page. We could not make statements about the percent of links or images that were broken per average website crawl because of the way we collected data.
Some calculations used derived data to derive further data, rather than deriving data from original data. This means rounding could have affected the accuracy of some calculations.
This study should be read as a case study for further research, rather than a definitive analysis of the Internet. The sample size of this study is minuscule compared to the enormity of the Internet, and there are unique demographics and limitations inherent in our data.
Assumptions and Limitations
Our Data is skewed in favor of smaller sites as data from large sites are sampled. Raven’s Site Auditor crawls up to 1,000 pages per website, so it doesn’t collect all possible issues for every website. The averaged website crawl had 243 pages.
The typical trial and paid user of Raven is interested in marketing. Additionally, Raven users may be interested in different industries than the industry you are in. Therefore, the results in this study will not be representative of every website.
Google’s search algorithm is constantly evolving as is Raven’s software. The criteria of what constitutes an SEO issue is also subjective and evolving. Things like the ideal length of a meta description title has changed over time. Site Auditor uses a proprietary process to detect duplicate content. It’s designed to closely mimic how Google’s spiders crawl and judge websites.
Varying Issue Severity
Different on-page SEO issues are more severe than others. All else being equal, a website that has ten images missing title tags on each page is better off than an image-optimized website which has a line of code in their robot.txt file asking Google not to crawl any pages.
95.5% of issues found represented bad SEO practices. 3.5% of issues found were issues that could contribute to poor SEO if they represented mistakes. 1% of issues found were good indicators rather than bad indicators.
The following are potential issues that Site Auditor found. The four potential issues below account for about 3.5% of total issues found.
- Pages blocked by robots.txt
- Pages detected to not be tracked by Google Analytics
- External links using the nofollow attribute
As an example, redirects are a potential issue since only the webmaster of a website will know if the redirects in place were added on purpose, or if they were added in error.
Additionally, pages detected to be using schema.org microdata is a good indicator rather than an issue, but is still tracked by Site Auditor so it can be measured by marketers. These indicators were still counted towards the total issues found in this study, though it only represented about 1% of total issues.
About the Contributors
Jon Henshaw is the Co-Founder of Raven and has been involved with website development and Internet strategy since 1995. He has spoken at PubCon, MozCon, SMX, SearchFest, SES and others. Find Jon on Twitter as @RavenJon.
Nathan T. Baker is the content marketing specialist at Raven. Nathan is published in NPR, The Tennessean, the SEMrush blog, and elsewhere. At night you can find him playing board games. Find Nathan on Twitter as @RavenNate.
Tamara Scott is a customer education specialist at Raven. Tamara has a teaching degree and master’s degree in English. In her “free” time, she meditates, practices yoga, and spends time with her husband and whiskey.
Jason West is Director of Development at Raven and has 20 years of software development experience. West oversees Raven’s day-to-day development operations, including leading a team of 10 developers.
Issues By Category
Here are the categories of on-page SEO issues discovered during the average website crawl.
% of Total
|Page Error Issues||119||2.54%|
Raven’s Site Auditor crawled and recrawled over 800 thousand websites from February 2013 to June 2015.
Most Common On-Page SEO Issues
Here are the most common individual on-page SEO issues found during the average website crawl.
% of Total
|Images with Missing Title Attributes||2,487||53.04%|
|Images with Missing Alt Attributes||1,153||24.59%|
|Meta Descriptions that are Too Short or Too Long||150||3.20%|
|Internal Links with No Anchor Text||116||2.47%|
|Pages with Missing Meta Descriptions||84||1.79%|
|Pages with Duplicate Content||71||1.51%|
|External Links with No Anchor Text||65||1.39%|
|Page Titles That are Too Short or Too Long||63||1.34%|
|Internal Links with NoFollow||60||1.28%|
|Duplicate Page Titles||54||1.15%|
|Pages with Low Word Counts||48||1.02%|
|Pages Detected to Be Using Schema.org Microdata||47||1.00%|
|Duplicate Meta Descriptions||42||0.90%|
|Pages Detected to Not be Tracked by Google Analytics||41||0.87%|
|Pages Blocked by Robots.txt||33||0.70%|
|External Links with NoFollow||31||0.66%|
|Pages With No Headers in Content||15||0.32%|
|Broken Internal Links||12||0.26%|
|Broken External Links||11||0.23%|
|Page Connectivity Errors||8||0.17%|
|Pages with Missing Titles||5||0.11%|
Raven’s Site Auditor crawled and recrawled over 800 thousand websites from February 2013 to June 2015.
Download Data and Charts
Use the charts and tables in this study as a benchmark in your next report for clients or prospects. Download the results data (CSV) and charts used in this study.
X-Ray Your Website For SEO Issues
Run a free SEO audit using Raven’s free trial to see what is holding back your website from ranking higher. Then report on the results.
Optimize Your Website for More Traffic