Scrapy Tutorial on Scrapy Crawling

description

to execute your spider, run the following command within your first_scrapy directory −

scrapy crawl first

where, first is the name of the spider specified while creating the spider.

once the spider crawls, you can see the following output −

2016-08-09 18:13:07-0400 [scrapy] info: scrapy started (bot: tutorial)
2016-08-09 18:13:07-0400 [scrapy] info: optional features available: ...
2016-08-09 18:13:07-0400 [scrapy] info: overridden settings: {}
2016-08-09 18:13:07-0400 [scrapy] info: enabled extensions: ...
2016-08-09 18:13:07-0400 [scrapy] info: enabled downloader middlewares: ...
2016-08-09 18:13:07-0400 [scrapy] info: enabled spider middlewares: ...
2016-08-09 18:13:07-0400 [scrapy] info: enabled item pipelines: ...
2016-08-09 18:13:07-0400 [scrapy] info: spider opened
2016-08-09 18:13:08-0400 [scrapy] debug: crawled (200) 
<get http://www.dmoz.org/computers/programming/languages/python/resources/> (referer: none)
2016-08-09 18:13:09-0400 [scrapy] debug: crawled (200) 
<get http://www.dmoz.org/computers/programming/languages/python/books/> (referer: none)
2016-08-09 18:13:09-0400 [scrapy] info: closing spider (finished)

as you can see in the output, for each url there is a log line which (referer: none) states that the urls are start urls and they have no referrers. next, you should see two new files named books.html and resources.html are created in your first_scrapy directory.