设置robots.txt可以防止搜索引擎爬虫爬取重要目录,但是可能也会泄露出去。

题目:

In this little training challenge, you are going to learn about the Robots_exclusion_standard.
The robots.txt file is used by web crawlers to check if they are allowed to crawl and index your website or only parts of it.
Sometimes these files reveal the directory structure instead protecting the content from being crawled.

Enjoy!

这题没有提交按钮。

但是最后给出了一个提示:有时,robots.txt文件会泄露目录结构,而不是防止内容被爬取。

所以先访问下: http://www.wechall.net/robots.txt 查看一下内容:

User-agent: *
Disallow: /challenge/training/www/robots/T0PS3CR3T


User-agent: Yandex
Disallow: *

/challenge/training/www/robots/T0PS3CR3T 是禁止爬虫爬取的,那么一定有问题。

把这个目录加到主站点,就是这个网址:http://www.wechall.net/challenge/training/www/robots/T0PS3CR3T/

访问下,解题成功。