Courses Scrapy Tutorial on Scrapy Crawling

Scrapy Tutorial on Scrapy Crawling

Back to Course

Scrapy Basic Concepts

Scrapy Overview

Read

Scrapy Environment

Read

Scrapy Command Line Tools

Read

Scrapy Spiders

Read

Scrapy Selectors

Read

Scrapy Items

Read

Scrapy Item Loaders

Read

Scrapy Shell

Read

Scrapy Item Pipeline

Read

Scrapy Feed exports

Read

Scrapy Link Extractors

Read

Scrapy Settings

Read

Scrapy Exceptions

Read

Scrapy Live Project

Scrapy Create a Project

Read

Scrapy Define an Item

Read

Scrapy First Spider

Read

Scrapy Crawling

Read

Scrapy Extracting Items

Read

Scrapy Using an Item

Read

Scrapy Following Links

Read

Scrapy Scraped Data

Read

Scrapy Built In Services

Scrapy Logging

Read

Scrapy Stats Collection

Read

Scrapy Sending an Email

Read

Scrapy Telnet Console

Read

Scrapy Web Services

Read

Scrapy Quick Guide

Read

Scrapy Useful Resources

Read

description

to execute your spider, run the following command within your first_scrapy directory −

scrapy crawl first

where, first is the name of the spider specified while creating the spider.

once the spider crawls, you can see the following output −

2016-08-09 18:13:07-0400 [scrapy] info: scrapy started (bot: tutorial)
2016-08-09 18:13:07-0400 [scrapy] info: optional features available: ...
2016-08-09 18:13:07-0400 [scrapy] info: overridden settings: {}
2016-08-09 18:13:07-0400 [scrapy] info: enabled extensions: ...
2016-08-09 18:13:07-0400 [scrapy] info: enabled downloader middlewares: ...
2016-08-09 18:13:07-0400 [scrapy] info: enabled spider middlewares: ...
2016-08-09 18:13:07-0400 [scrapy] info: enabled item pipelines: ...
2016-08-09 18:13:07-0400 [scrapy] info: spider opened
2016-08-09 18:13:08-0400 [scrapy] debug: crawled (200) 
<get http://www.dmoz.org/computers/programming/languages/python/resources/> (referer: none)
2016-08-09 18:13:09-0400 [scrapy] debug: crawled (200) 
<get http://www.dmoz.org/computers/programming/languages/python/books/> (referer: none)
2016-08-09 18:13:09-0400 [scrapy] info: closing spider (finished)

as you can see in the output, for each url there is a log line which (referer: none) states that the urls are start urls and they have no referrers. next, you should see two new files named books.html and resources.html are created in your first_scrapy directory.

Previous Lesson

Next Lesson