The X-Robots-Tag header is a very important HTTP Header for SEO. If you make copies of sites and put them online so your clients or testers can view them then you do not want to be penalized by Google for having duplicate content. One way to prevent this is to use the X-Robots-Tag HTTP header to tell Googlebot not to crawl the site.
In this tutorial you will learn how to set a custom HTTP header with nginx based on the host domain. The custom header will be X-Robots-Tag for staging sites or staging domains. I am using WordPress for this example but you can apply the same logic to other CMSes or sites.
Setting X-Robots-Tag Based on nginx Host
In the first section we will set the X-Robots-Tag
header based on the host containing the text ‘staging’.
In the second section we will set the X-Robots-Tag
header based on the host containing a domain you use for staging sites.
Host Containing Staging String
This goes in the http {
block for your nginx configuration.
Instead of modifying /etc/nginx/nginx.conf
directly, you can add this above the server {
block for your virtual host.
map $http_host $robots {
default "";
# if $http_host contains the word staging
"~*staging" "noindex, nofollow, nosnippet, noarchive";
}
In your nginx virtual host find the PHP location and add the add_header X-Robots-Tag $robots;
line.
location ~ \.php$ {
try_files $uri =404;
# add cache status
add_header WP-Bullet-Fastcgi-Cache $upstream_cache_status;
# add the cache skip reason if relevant
add_header WP-Bullet-Skip $skip_reason;
# add exception
add_header X-Exception $exception;
add_header X-Robots-Tag $robots;
include fastcgi_params;
fastcgi_pass unix:/run/php/php7.3-fpm.sock;
fastcgi_split_path_info ^(.+\.php)(.*)$;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_cache_bypass $http_secret_header $skip_cache $bypass;
fastcgi_no_cache $skip_cache;
fastcgi_cache WORDPRESS;
fastcgi_cache_valid 404 1m;
fastcgi_cache_valid 1h;
}
Testing our conditional header works using curl
.
curl -I https://staging.wp-bullet.com?wpbullet
Output should show the X-Robots-Tag
header as it does below.
HTTP/2 200
server: nginx
date: Wed, 10 Jul 2019 12:18:19 GMT
content-type: text/html; charset=UTF-8
vary: Accept-Encoding
cache-control: max-age=1000
x-ua-compatible: IE=edge
link: <https://guides.wp-bullet.com/wp-json/>; rel="https://api.w.org/"
wp-bullet-fastcgi-cache: BYPASS
wp-bullet-skip: QueryString-QueryString
x-robots-tag: noindex, nofollow, nosnippet, noarchive
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
Now we can do the same thing with a staging domain.
Host Containing Custom Staging Domain
Let’s say you use a specific root domain like devderp.me for staging sites for your clients. We can use the same trick above but apply it to the whole domain, this way you just use your template and the correct header will always be applied.
The snippet below goes in the http {
block for your nginx configuration.
Instead of modifying /etc/nginx/nginx.conf
directly, you can add this above the server {
block for your virtual host.
map $http_host $robots {
default "";
# if the $http_host contains devderp.me
"~*devderp.me" "noindex, nofollow, nosnippet, noarchive";
}
In your nginx virtual host find the PHP location and add the add_header X-Robots-Tag $robots;
line.
location ~ \.php$ {
try_files $uri =404;
# add cache status
add_header WP-Bullet-Fastcgi-Cache $upstream_cache_status;
# add the cache skip reason if relevant
add_header WP-Bullet-Skip $skip_reason;
# add exception
add_header X-Exception $exception;
add_header X-Robots-Tag $robots;
include fastcgi_params;
fastcgi_pass unix:/run/php/php7.3-fpm.sock;
fastcgi_split_path_info ^(.+\.php)(.*)$;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_cache_bypass $http_secret_header $skip_cache $bypass;
fastcgi_no_cache $skip_cache;
fastcgi_cache WORDPRESS;
fastcgi_cache_valid 404 1m;
fastcgi_cache_valid 1h;
}
You can test this as well
curl -I https://client.devderp.me
Output should show the X-Robots-Tag
header as it does below for devderp.me.
HTTP/2 200
server: nginx
date: Wed, 11 Jul 2019 14:12:11 GMT
content-type: text/html; charset=UTF-8
vary: Accept-Encoding
cache-control: max-age=1000
x-ua-compatible: IE=edge
link: <https://client.devderp.me/wp-json/>; rel="https://api.w.org/"
wp-bullet-fastcgi-cache: BYPASS
x-robots-tag: noindex, nofollow, nosnippet, noarchive
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
x-xss-protection: 1; mode=block