Set Custom Conditional HTTP Headers in nginx Based on Host Domain

The X-Robots-Tag header is a very important HTTP Header for SEO. If you make copies of sites and put them online so your clients or testers can view them then you do not want to be penalized by Google for having duplicate content. One way to prevent this is to use the X-Robots-Tag HTTP header to tell Googlebot not to crawl the site.

In this tutorial you will learn how to set a custom HTTP header with nginx based on the host domain. The custom header will be X-Robots-Tag for staging sites or staging domains. I am using WordPress for this example but you can apply the same logic to other CMSes or sites.

Setting X-Robots-Tag Based on nginx Host

In the first section we will set the X-Robots-Tag header based on the host containing the text ‘staging’.

In the second section we will set the X-Robots-Tag header based on the host containing a domain you use for staging sites.

Host Containing Staging String

This goes in the http { block for your nginx configuration.

Instead of modifying /etc/nginx/nginx.conf directly, you can add this above the server { block for your virtual host.

map $http_host $robots {
    default "";
    # if $http_host contains the word staging
    "~*staging" "noindex, nofollow, nosnippet, noarchive";
}

In your nginx virtual host find the PHP location and add the add_header X-Robots-Tag $robots; line.

location ~ \.php$ {
    try_files $uri =404;
    # add cache status
    add_header WP-Bullet-Fastcgi-Cache $upstream_cache_status;
    # add the cache skip reason if relevant
    add_header WP-Bullet-Skip $skip_reason;
    # add exception
    add_header X-Exception $exception;
    add_header X-Robots-Tag $robots;
    include fastcgi_params;
    fastcgi_pass unix:/run/php/php7.3-fpm.sock;
    fastcgi_split_path_info ^(.+\.php)(.*)$;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_cache_bypass $http_secret_header $skip_cache $bypass;
    fastcgi_no_cache $skip_cache;
    fastcgi_cache WORDPRESS;
    fastcgi_cache_valid 404 1m;
    fastcgi_cache_valid 1h;
}

Testing our conditional header works using curl.

curl -I https://staging.wp-bullet.com?wpbullet

Output should show the X-Robots-Tag header as it does below.

HTTP/2 200
server: nginx
date: Wed, 10 Jul 2019 12:18:19 GMT
content-type: text/html; charset=UTF-8
vary: Accept-Encoding
cache-control: max-age=1000
x-ua-compatible: IE=edge
link: <https://guides.wp-bullet.com/wp-json/>; rel="https://api.w.org/"
wp-bullet-fastcgi-cache: BYPASS
wp-bullet-skip: QueryString-QueryString
x-robots-tag: noindex, nofollow, nosnippet, noarchive
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
x-xss-protection: 1; mode=block

Now we can do the same thing with a staging domain.

Host Containing Custom Staging Domain

Let’s say you use a specific root domain like devderp.me for staging sites for your clients. We can use the same trick above but apply it to the whole domain, this way you just use your template and the correct header will always be applied.

The snippet below goes in the http { block for your nginx configuration.

Instead of modifying /etc/nginx/nginx.conf directly, you can add this above the server { block for your virtual host.

map $http_host $robots {
    default "";
    # if the $http_host contains devderp.me
    "~*devderp.me" "noindex, nofollow, nosnippet, noarchive";
}

In your nginx virtual host find the PHP location and add the add_header X-Robots-Tag $robots; line.

location ~ \.php$ {
    try_files $uri =404;
    # add cache status
    add_header WP-Bullet-Fastcgi-Cache $upstream_cache_status;
    # add the cache skip reason if relevant
    add_header WP-Bullet-Skip $skip_reason;
    # add exception
    add_header X-Exception $exception;
    add_header X-Robots-Tag $robots;
    include fastcgi_params;
    fastcgi_pass unix:/run/php/php7.3-fpm.sock;
    fastcgi_split_path_info ^(.+\.php)(.*)$;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_cache_bypass $http_secret_header $skip_cache $bypass;
    fastcgi_no_cache $skip_cache;
    fastcgi_cache WORDPRESS;
    fastcgi_cache_valid 404 1m;
    fastcgi_cache_valid 1h;
}

You can test this as well

curl -I https://client.devderp.me

Output should show the X-Robots-Tag header as it does below for devderp.me.

HTTP/2 200
server: nginx
date: Wed, 11 Jul 2019 14:12:11 GMT
content-type: text/html; charset=UTF-8
vary: Accept-Encoding
cache-control: max-age=1000
x-ua-compatible: IE=edge
link: <https://client.devderp.me/wp-json/>; rel="https://api.w.org/"
wp-bullet-fastcgi-cache: BYPASS
x-robots-tag: noindex, nofollow, nosnippet, noarchive
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
x-xss-protection: 1; mode=block

Sources

nginx map module
X-Robots-Tag Play