Robots.txt file is one of the most critical data you have on your website. So much crucial that an incorrectly written robot.txt file can even cut you off from search engines.
So, in this guide, I will show you how to create a custom robots.txt file for your WordPress website.
SEO consists of many factors, but nothing will work if search engines aren’t able to crawl your website. So, before you move to robots.txt, you must understand what is crawling.
What is a Search Engine Crawler
Search engines have programs called crawlers, also known as “bots” or “web-spiders.”
These crawlers visit, scan, and read all of the web pages into their reach to prepare a search engine index of all known links that includes information about pages’ content and other information.
These crawlers work to go through all of the web networks and discover new posts, websites, and other updates on the internet.
Crawlers have a fixed budget, a limit on how much they can crawl a website, and the time they take to do that. – This is known as the crawl rate limit or budget.
It also depends on crawl demand: the number of URLs and pages a crawler wants and needs to crawl on your website.
If you let the bot crawl unnecessary parts of your website and the crawl rate limit is reached, or demand gets fulfilled, it will leave your site and might not crawl the essential pages you want to rank on Google.
What is Robots.txt
Crawlers keep following the links to every other page on a website until all pages have been read, and robots.txt is used to give the crawler an instruction to stop or control this.
Robots.txt tells crawlers to leave a single or specific group of pages and links from crawling. If they aren’t crawled by bots, they will most probably not appear in search engine result pages.
It depends on the crawler. If it obeys what robots.txt has instructed, you can’t force them.
Robots.txt is a text file located at your server’s root folder. See example.
It is also known as “robots exclusion protocol and robots exclusion standard. They speak and understand a specific language known as robots exclusion protocol.
When search engine crawlers visit your website, robots.txt is the first thing it crawls. It will either follow the instruction given in your file, or it will ignore.
Search engine crawler is less likely to ignore your robots.txt instruction, it is the malware or “bad” bots that will ignore them every time, and you can not do anything to stop them.
Basics Of Robots.txt
There are few instruction commands you need to know, namely:
- User-agent: *
- Allow: /
- Disallow: /
The above 3 basic commands form all of a robots.txt file.
First, you type a user-agent:
An asterisk after “user-agent” means that the robots.txt instruction will apply to every bot that visits the website.
When you want to allow a bot to crawl a page of your website, you use:
When you don’t want any of your page to be crawled, use:
Usually, you would only want to disallow specific pages, so you must provide a URL after the slash. If you simply use disallow command like shown above, it will stop crawlers from crawling any of your web pages.
You won’t let that happen in most cases, so after the disallow command, you put a URL you don’t want crawlers to crawl.
The Basic Robots.txt Instructions for a WordPress blog:
Below I’ve written a basic command of robots.txt for WordPress you can copy and paste to use it on your blog:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
Use of the above commands in your robots.txt is necessary.
SEO Optimized Robots.txt
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /?* Disallow: /index.php Disallow: /xmlrpc.php
Similarly, the “/index” page is a page you won’t want to show up in search engines because it consists of your files located on your server. You block them too.
Then “/xmlrpc” is used for pingbacks and trackbacks in WordPress, and you should also add “/?*” in the disallow command. It will block bots from scanning your internal search results and help prevent duplicated and too many pages.
This helps to save a lot of crawl budget.
How To Create A Robots.txt File
First, you should check if you already have a robots.txt file or not.
Go to “yourdomain.com/robots.txt” to check.
*Replace “yourdomain.com” with your real domain.
If there is a robots.txt file already, you can simply edit it. If there’s not a robots.txt file on your server, that means you’ll have to create it.
An effortless way to create a robots.txt file for your WordPress website is by installing the Squirrly plugin.
Squirrly is an SEO plugin that will automatically create a robots.txt file for your website upon activation. You can edit it by going to the advanced settings:
However, it does not create a physical robots.txt, so if in the future you delete this plugin and its data, your robots.txt file will also get lost. WordPress creates a virtual robots.txt file on your server if there isn’t one. To edit it, you can either use Squirrly or Yoast.
I recommend you create and upload a robots.txt file on your server.
1. Access your server on the FTP level.
You should first install an FTP client on your PC to access your web hosting server. I prefer Filezilla for doing this.
Install Filezilla and connect to your server using your FTP/SFTP username and password. If you don’t know about them, ask your hosting provider.
2. Find the robots.txt file in public.html
Click on the public.html folder in your FTP server area. Below you’ll see the robots.txt file. If you don’t have a robots.txt file already, you may don’t see it. In this case, you’ll have to create one.
3. Open a text editor
Notepad would work. Open Notepad on your PC and copy/paste this robots.txt instruction:
User-Agent: * Disallow: /wp-admin Disallow: /xmlrpc Disallow: /index.php
You can also add your sitemap here, but it’s not necessary to do that.
Save this text file and name it robots. Make sure you put extension “.txt” and do not add “.txt” in the name field.
4. Upload it to your server’s root
Go back to Filezilla and click on the public.html folder.
Drag and drop your robots.txt file in the blank space on the left side of your computer screen.
That’s it. Your robots.txt file is now live.
Update 2019: Google recently announced that the Nofollow tag (rel=”nofollow) will be treated as a hint. This means Google may or may not honor the Nofollow tag. Google has introduced two new tags named “UGC (user-generated content)” and “sponsored.”
rel=”UGC” tag can be used for user-generated content such as blog comments, forum links, and rel=” sponsored” tag can be sued for affiliate links and partner links. You can read the official announcement here.
I hope you now know how to prepare an SEO-optimized robots.txt file for your WordPress website. Now search engine crawlers won’t crawl useless pages of your website and blog, saving crawl budget.
This will let them crawl all of the pages you want to rank.
Let me know your thoughts in the comments section below.