Robots.txt file is one of the most critical data you have on your website. So much crucial that an incorrectly written robot.txt file can even cut you off from search engines.
So, in this guide, I will show you how to create a custom robots.txt file for your WordPress website.
SEO consists of lots of factors, but firstly, it all starts with a crawl. So, before you move to robots.txt, you must understand what is crawling in a search engine.
- What is a Search Engine Crawler
- What is Robots.txt
- Basics Of Robots.txt
- The Basic Robots.txt Instructions for a WordPress blog:
- How To Create A Robots.txt File
What is a Search Engine Crawler
Search engines have programs called crawlers also known as “bots” or “web-spiders.”
These crawlers visit scan and read all of the web pages into their reach to prepare a search engine index of all known links that includes information about pages’ content and other information.
It is the work of these crawlers to go through all of the webs and discover new posts, websites, and other updates that happen on the internet.
Crawlers have a fixed budget, a limit on how much they can crawl a website and the time they take to do that. – This is known as the crawl rate limit or budget.
It also depends on crawl demand, which is the number of URLs and pages a crawler wants and needs to crawl on your website.
If you let the bot crawl unnecessary parts of your website and the crawl rate limit is reached, or demand gets fulfilled, it will leave your site and might not crawl the essential pages that you want to rank on Google.
What is Robots.txt
Crawlers keep following the links to every other the pages on a website until all pages have been read and robots.txt is used to give the crawler an instruction to stop or control this.
Robots.txt tell crawlers to leave a single or specific group of pages and links from crawling. If they aren’t crawled by bots, they will most probably not appear in search engine result pages.
It depends on the crawler if it obeys what robots.txt has instructed, you can’t force them.
Robots.txt is a text file located at your server’s root folder. See example.
It is also known as “robots exclusion protocol and robots exclusion standard. They speak and understand a specific language known as robots exclusion protocol.
A search engine crawler when visits your website, robots.txt is the first thing it checks. It will either follow the instruction given in your file, or it will ignore.
Search engine crawler is less likely to ignore your robots.txt instruction, it is the malware or bad bots that will ignore them every time, and you can not do anything to stop them.
Basics Of Robots.txt
There are few instruction commands you need to know, namely:
- User-agent: *
- Allow: /
- Disallow: /
The above 3 basic commands form all of a robots.txt file.
First, you type a user agent:
When you want to allow a bot to crawl a page of your website you use:
When you don’t want any of your page to be crawled use:
Usually, you would only want to disallow specific pages so, after the slash, you must provide a URL. If you simply use disallow command like shown above, it will stop crawlers from crawling any of your web pages.
II most cases you won’t let that happen so after the disallow command you put a URL you don’t want crawlers to crawl.
The Basic Robots.txt Instructions for a WordPress blog:
Below I’ve written a basic command of robots.txt for WordPress you can copy and paste to use it on your blog:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
Use of the above commands in your robots.txt is necessary.
SEO Optimized Robots.txt
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /?* Disallow: /index.php Disallow: /xmlrpc.php
Similarly, “/index” page is a page you won’t want to show up in search engines because it consists of your files located on your server. You block them too.
Then “/xmlrpc,” used for pingbacks and trackbacks in WordPress and you should also add “/?*” in the disallow command. It will block bots to scan your internal search results and will help in the prevention of duplicated and too many pages.
This helps to save a lot of crawl budget.
How To Create A Robots.txt File
First, you should check if you already have a robots.txt file or not.
Go to “yourdoman.com/robots.txt” to check.
*Replace “yourdomain.com” with your website domain.
If there is a robots.txt file already, you can simply edit it. If there’s not a robots.txt file on your server, that means you’ll have to create it.
There’s an effortless way to create a robots.txt file for your WordPress website – installing Squirrly plugin.
Squirrly is an SEO plugin that will automatically create a robots.txt file for your website upon activating. You can edit it by going to the advanced settings:
However, it does not create a physical robots.txt, so if in future you delete this plugin and its data, your robots.txt file will also get lost. WordPress creates a virtual robots.txt file on your server if there isn’t one. To edit it you can either use Squirrly or Yoast.
I recommend you to create and upload a robots.txt file at your server.
1. Access your server on FTP level.
You should first install an FTP client on your PC to access your web hosting server. I prefer Filezilla for doing this.
Install Filezilla and connect to your server using your FTP/SFTP username and password. If you don’t know about them, ask your hosting provider.
2. Find the robots.txt file in public.html
Click on the public.html folder in your FTP server area. Below you’ll see the robots.txt file. If you don’t have a robots.txt file already, you may don’t see it. In this case, you’ll have to create one.
3. Open a text editor
Notepad would work. Open Notepad on your PC and copy/paste this robots.txt instruction:
User-Agent: * Disallow: /wp-admin Disallow: /xmlrpc Disallow: /index.php
You can also add your sitemap here, but it’s not necessary to do that.
Save this text file and name it robots. Make sure you put extension “.txt” and do not add “.txt” in the name field.
4. Upload it to your server’s root
Go back to Filezilla and click on the public.html folder.
Drag and drop your robots.txt file in the blank space on the left side of your computer screen.
That’s it. Your robots.txt file is now live.
Update 2019: Google recently announced that Nofollow tag (rel=”nofollow) will be treated as a hint. Which means Google may or may not honor the Nofollow tag. Google has introduced two new tags named “UGC (user-generated content)” and “sponsored”.
rel=”UGC” tag can be used for user-generated content such as blog comments, forum links, and rel=”sponsored” tag can be sued for affiliate links and partner links. You can read the official announcement here.
I hope you now know how to prepare an SEO optimized robots.txt file for your WordPress website. Now search engine crawler won’t crawl useless pages of your website and blog saving crawl budget.
This will let them crawl all of your pages you want to rank.
Let me know your thoughts in the comments section below.