A website is not just a group of files stored on a remote disk. Instead, websites work as fully-fledged systems of their own. All search engines respect privacy of websites and they cannot go against the permissions granted by webmasters to them. These rules are stored in the Robots.txt file which itself should be located at the root of website in order to work.
Robots.txt does not contain kind of special code, though you will have to write the rules in a certain format. As a standard, search engines only understand Unix text syntax.
To a normal person writing rules in Robots.txt can be hard. But after getting the hang of it, you can easily write the rules on your own.
Read: SEO tool that really worked for me
Robot.txt file is just a group of set of rules. In each set, the webmaster has to tell which search engines the rule will apply to and then they have to define the rule itself. The below text snippet is a perfect example of this.
Here the first line is used to address search engine(s). ‘*’ means all search engines. The second and lines tell the search engines to not index the “images” and “js” directory of the respective website.
Note: Search engine bots are case sensitive. Be sure to enter the syntax and folder names properly.
Some examples of Robot.txt files:
(A blank Disallow command means search engines can index all files and folders of the respective website)
(This rule is telling all search engines to not index a particular file in on the server)
(Here the rule is telling search engines to not index the whole “css” folder)
(Here the rule is telling search engines to not index the whole website)
Example 4 (grouping different rules together)
User-agent: Opera 9
User agent: *
(Here the Robots.txt file is telling Opera 9 bot to not index the ‘css’ folder. At the same time the second rule in the same file allows all other search engines to index the whole website)
What To Allow For Indexing:
- All images
- All java script files
- All css files
- All html files
- Anything else that is linked or embedded to your website
What Not To Allow For Indexing:
- Personal files that you do not want to be displayed in search results
- Admin folder or admin pages
Hint: In Robots.txt, it is not required to actively mention the files you want to get indexed. Instead, it is a file that just tells the search engines what not to index.