You may have noticed some pages on your site are showing up on search engines, even your images are now showing in image searches! Learn how to stop web crawlers from showing your pages in searches.
What are Web Crawlers?
Search engines have a robot (also known as a web crawler or a spider) that visits your website. It scans your pages and stores the data on a server. People can then see your site in the results from a search engine. The images on your site could also appear in an image search like Google Images. If you don't want robots showing your site's pages or it's images in their search engine you can create a file called robots.txt to stop this.
When a robot visits your site it checks if there's a file in your home directory called robots.txt. In this file you can put instructions on what you want the robot to 'scan'. If the robot does not find a file called robots.txt it will scan every page on your site. In this article I will show you some codes you can place in your robots.txt to prevent certain files, folders, etc. from being indexed by robots.
** This will not guarantee robots will listen to the robots.txt file. Bad robots may not follow what your robots.txt file says **
To use any of the below codes create a file called robots.txt, get the codes you want to use, then upload it to your server.
Stop Robots From Scanning Your Site
User-agent: *
Disallow: /If you don't want robots to scan any files on your site this code will stop them.
Stop Robots From Scanning A Certain Directory
User-agent: *
Disallow: /directory here/Use this code to stop robots from scanning a certain directory on your site.
Only Allow Scanning to a Certain File
User-agent: *
Disallow: /
Allow: /myfile.htmlThis code will block scanning to all files on your site except the files you list. You can add more allowed files by adding multiple lines underneath the disallow line.
Stop Robots From Indexing Your Images
User-agent: *
Disallow: /images/You need to save all your images in a folder called images for this code to work.
Block Google From Indexing Your Images
User-agent: Googlebot-Image
Disallow: /This will only block google from indexing your images.
Thanks for reading. If you have any questions or comments please post them below.
