Google's Webmaster Tools warns you when there are a lot of 404 Not Found pages found on your website. If you've just deleted a lot of spam from a site, Google might keep recrawling those URLs as long as they send 404 headers.
- A
404
header means Not Found. - A
410
header means Gone.
If you want to stop Google from adding those removed pages back to Google Webmaster Tools, you can try sending 410
headers for the removed URLs.
Does Google treat 410
status codes differently than 404
status codes?
SEJ has an article from 2018 that quotes John Muller:
"From our point of view, in the mid term/long term, a 404 is the same as a 410 for us. So in both of these cases, we drop those URLs from our index.
We generally reduce crawling a little bit of those URLs so that we don’t spend too much time crawling things that we know don’t exist.
The subtle difference here is that a 410 will sometimes fall out a little bit faster than a 404. But usually, we’re talking on the order of a couple days or so.
So if you’re just removing content naturally, then that’s perfectly fine to use either one. If you’ve already removed this content long ago, then it’s already not indexed so it doesn’t matter for us if you use a 404 or 410."
At the time I originally published this post, I was having trouble with Google's crawling of 404
pages on my site. I'm not sure what was going on, but 410
is the correct header to send for removed pages, and I think it's best to be predictable when dealing with bots.
It took me a while to figure out how to send bulk 410
headers with NGINX, but I finally came up with this solution:
Create a new file called /etc/nginx/header-maps.conf
. In that file map each gone URL to a variable, in this case the number 1 (chosen arbitrarily).
map $request_uri $gone_var {
/path-one 1;
/path-two 1;
/path-three 1;
/path-four 1;
}
Then in your main conf file for your domain, like /etc/nginx/sites-available/example.com
(replacing example.com
with your domain):
include header-include.conf;
server {
# other things here...
if ($gone_var) {
return 410 $gone_var;
}
# other things here...
}
Bulk 301
redirects in NGINX
You can also set up bulk 301
redirects in the same file, for example:
# header-include.conf
# These are for the 410 headers.
map $request_uri $gone_var {
/path-one 1;
/path-two 1;
/path-three 1;
/path-four 1;
}
# These are for the 301 redirects.
map $request_uri $redirect_uri {
/path-five /path-five.html;
/path-six /path-six.html;
/path-seven /path-seven.html;
}
Here's the NGINX conf file that tells NGINX which headers to send:
# Your domain's conf file
include header-include.conf;
server {
# other things here...
# if there is a mapped $gone_var set for the current path
if ($gone_var) {
return 410 $gone_var;
}
# if there is a $redirect_uri set for the current URI:
if ($redirect_uri) {
return 301 $redirect_uri;
}
# other things here...
}
map_hash_bucket_size
Errors
If you get an error about map_hash_bucket_size
being to small, open the /etc/nginx/nginx.conf
file (or wherever the http
block lives) and increase it something like this: map_hash_bucket_size 256
Read More
Here are some pages that might be useful for sending bulk status codes with NGINX:
- Module ngx_http_map_module
- Module ngx_http_rewrite_module
- NGINX redirects
- How to Use NGINX's map Module on Ubuntu 16.04
I don't know if it this technique will affect performance on a huge site, but it worked fine with hundreds of URLs for a site with thousands of visits per day on a relatively small server.
If it doesn't work for you or if you have any suggestions for improving this idea, please comment below.