How to Add robots.txt File to NextJS App
In this article, I will explain the significance of robots.txt in a NextJS project and how you can add both static and dynamic robots.txt file to your web app.
NextJS is a popular front-end JavaScript framework built on top of ReactJS. NextJS enables development of fast and responsive web apps with features such as static rendering, image optimisation, page routing, code splitting/bundling and API routes for a fullstack application. It is a highly recommended framework for building any consumer Web application intended for public access.
One of the common requirements for an internet facing Web application is the need for the robots.txt file. It is a simple text file hosted on the root of the application domain. It is intended for various web crawlers that index your site for search. The robots.txt file specifies which parts of your Web application are allowed to be crawled by the web-crawlers. Search engines first look at robots.txt before crawling and indexing your site. Not all web-crawlers support robot.txt, however popular and important ones such as Googlebot web crawler supports this file. So if you specify a set of files or folders as disallowed, Google web crawler will not crawl its content. You can also optionally specify location of the sitemap file for your website in robots.txt (see example below).
It is highly recommended to add a robots.txt file to your NextJS web application. There will be components or folders in your NextJS application that you don't want search engines such as Google to index. For example, usually API routes are intended for the private use of your pages and you don't want search engines or people to directly access them. You can disallow access to such URLs by adding them to robots.txt file.
How to Add robots.txt File to a NextJS Project
By default, any files added to the public sub folder of a NextJS project is accessible from the root of the application. Hence if you have your NextJS application running on www.example.com, you just need to add the robots.txt to the public folder of the NextJS project and it will be available at the URL, www.example.com/robots.txt. This is the location web crawlers look for the robots.txt file.
Here is a sample robots.txt I have in my NextJS project. Note that this file tells Google crawler to avoid indexing the /api folder since it contains all my API endpoints intended only for the NextJS app itself. Also note that I have specified the location of the sitemap in this file. This is optional and you can also add path to sitemap file directly in the Google's Webmaster tools console (Obvious disadvantage is that it is only applicable for Google).
Disallow: /api/
Sitemap: https://www.quickprogrammingtips.com/sitemap.xml
How to Create a Dynamic robots.txt in NextJS App?
If you are working with a large project where you need the robots.txt content to be dynamic, you can also create custom rewrite for robots.txt path in NextJS and then serve dynamic content. For example, add the following to next.config.js. This tells Next to return content from /api/dynamicrobot when a web browser or crawler tries to access /robots.txt.
// next.config.js module.exports = { async rewrites() { return [ { source: '/robots.txt', destination: '/api/dynamicrobot' } ]; } }
Then create an API endpoint JS file (/pages/api/dynamicrobot.js) that dynamically creates the robots.txt content.
// Content of /pages/api/dynamicrobot.js file export default function handler(req, res) { // Logic here to generate dynamic robots.txt file content res.send('The full robots.txt file content dynamically created is here.'); }