On Reddit, a user raised a concern regarding their “crawl budget,” questioning whether a significant number of 301 redirects leading to 410 error responses were causing Googlebot to deplete their crawl budget. John Mueller from Google provided insight into why the user might be observing a suboptimal crawl pattern and clarified aspects of crawl budgets in general.
The concept of a crawl budget, often invoked by SEO professionals to rationalize limited crawling of certain websites, presupposes that each site is allocated a predetermined number of crawls, effectively capping the amount of crawling activity it receives.
Understanding the genesis of the crawl budget concept is important for grasping its true nature. While Google has consistently maintained that there isn’t a singular entity dubbed a “crawl budget” within its infrastructure, the crawling behavior of Google can give the impression of such a limit.
This perspective on the crawl budget concept was subtly hinted at by Matt Cutts, a prominent Google engineer at the time, during a 2010 interview.
Matt addressed a query regarding Google’s crawl budget by initially clarifying that the concept didn’t align with how SEOs typically understand it:
“The first thing is that there isn’t really such thing as an indexation cap. A lot of people were thinking that a domain would only get a certain number of pages indexed, and that’s not really the way that it works.
There is also not a hard limit on our crawl.”
In 2017, Google released a comprehensive explanation of the crawl budget, consolidating various crawling-related information that resembled the concept traditionally known as “crawl budget” within the SEO community. This updated clarification provides greater precision compared to the previously ambiguous term “crawl budget”
Key points about the crawl budget include:
- The crawl rate corresponds to the number of URLs Google can crawl, dependent on the server’s capability to provide the requested URLs.
- Duplicate pages (e.g., faceted navigation) and low-value content can deplete server resources, limiting the pages available for Googlebot to crawl.
- Shared servers, hosting several websites, may contain hundreds of thousands, if not millions, of URLs. Thus, Google prioritizes crawling based on servers’ ability to fulfill page requests.
- Light pages are more conducive to extensive crawling.
- Inbound and internal linking patterns play a role in determining which pages receive priority for crawling.
- Soft 404 pages can redirect Google’s attention towards low-value content instead of relevant pages.
Reddit Inquiry Regarding Crawl Rate
The individual on Reddit sought clarification on whether the creation of perceived low-value pages was impacting Google’s crawl budget. Essentially, they inquired about the effect of a request for a non-secure URL of a defunct page redirecting to the secure version of the absent webpage, resulting in a 410 error response (indicating the page’s permanent removal).
It’s a valid inquiry.
Here’s their question:
“I’m trying to make Googlebot forget to crawl some very-old non-HTTPS URLs, that are still being crawled after 6 years. And I placed a 410 response, in the HTTPS side, in such very-old URLs.
So Googlebot is finding a 301 redirect (from HTTP to HTTPS), and then a 410.
http://example.com/old-url.php?id=xxxx -301-> https://example.com/old-url.php?id=xxxx (410 response)
Two questions. Is G**** happy with this 301+410?
I’m suffering ‘crawl budget’ issues, and I do not know if this two responses are exhausting Googlebot
Is the 410 effective? I mean, should I return the 410 directly, without a first 301?”
Google’s John Mueller answered:
G*?
301’s are fine, a 301/410 mix is fine.
Crawl budget is really just a problem for massive sites ( https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget ). If you’re seeing issues there, and your site isn’t actually massive, then probably Google just doesn’t see much value in crawling more. That’s not a technical issue.”
Reasons for Limited Crawling
Mueller suggested that Google likely perceives a lack of value in crawling additional webpages. This implies that a review of these webpages might be necessary to pinpoint why Google deems them unworthy of crawling.
Certain common SEO strategies often result in the creation of low-value, unoriginal webpages. For instance, it’s common practice in SEO to analyze top-ranked webpages to discern the factors contributing to their ranking success, then replicate those elements to enhance one’s pages.
While this approach may seem logical, it fails to generate genuinely valuable content. If we consider it in binary terms, where zero represents existing content in search results and one represents originality, merely replicating what’s already present in search results leads to yet another zero—a website offering nothing beyond what’s already available in SERPs.
Certainly, technical issues like server health and other factors can impact the crawl rate.
However, regarding the concept of a crawl budget, Google has consistently maintained that it’s primarily a concern for large-scale websites, rather than smaller to medium-sized ones.
Would you like to read more about “Google Responds To A Query About A Crawl Budget Concern” related articles? If so, we invite you to take a look at our other tech topics before you leave!
CamRojud Business & Professional Services
Use our Internet marketing service to help you rank on the first page of SERP.