Courses Scrapy Tutorial on Scrapy Using an Item

Scrapy Tutorial on Scrapy Using an Item

Back to Course

Scrapy Basic Concepts

Scrapy Overview

Read

Scrapy Environment

Read

Scrapy Command Line Tools

Read

Scrapy Spiders

Read

Scrapy Selectors

Read

Scrapy Items

Read

Scrapy Item Loaders

Read

Scrapy Shell

Read

Scrapy Item Pipeline

Read

Scrapy Feed exports

Read

Scrapy Link Extractors

Read

Scrapy Settings

Read

Scrapy Exceptions

Read

Scrapy Live Project

Scrapy Create a Project

Read

Scrapy Define an Item

Read

Scrapy First Spider

Read

Scrapy Crawling

Read

Scrapy Extracting Items

Read

Scrapy Using an Item

Read

Scrapy Following Links

Read

Scrapy Scraped Data

Read

Scrapy Built In Services

Scrapy Logging

Read

Scrapy Stats Collection

Read

Scrapy Sending an Email

Read

Scrapy Telnet Console

Read

Scrapy Web Services

Read

Scrapy Quick Guide

Read

Scrapy Useful Resources

Read

description

item objects are the regular dicts of python. we can use the following syntax to access the attributes of the class −

>>> item = dmozitem()
>>> item['title'] = 'sample title'
>>> item['title']
'sample title'

add the above code to the following example −

import scrapy

from tutorial.items import dmozitem

class myprojectspider(scrapy.spider):
   name = "project"
   allowed_domains = ["dmoz.org"]
   
   start_urls = [
      "http://www.dmoz.org/computers/programming/languages/python/books/",
      "http://www.dmoz.org/computers/programming/languages/python/resources/"
   ]
   def parse(self, response):
      for sel in response.xpath('//ul/li'):
         item = dmozitem()
         item['title'] = sel.xpath('a/text()').extract()
         item['link'] = sel.xpath('a/@href').extract()
         item['desc'] = sel.xpath('text()').extract()
         yield item

the output of the above spider will be −

[scrapy] debug: scraped from <200 
http://www.dmoz.org/computers/programming/languages/python/books/>
   {'desc': [u' - by david mertz; addison wesley. book in progress, full text, 
      ascii format. asks for feedback. [author website, gnosis software, inc.\n],
   'link': [u'http://gnosis.cx/tpip/'],
   'title': [u'text processing in python']}
[scrapy] debug: scraped from <200 
http://www.dmoz.org/computers/programming/languages/python/books/>
   {'desc': [u' - by sean mcgrath; prentice hall ptr, 2000, isbn 0130211192, 
      has cd-rom. methods to build xml applications fast, python tutorial, dom and 
      sax, new pyxie open source xml processing library. [prentice hall ptr]\n'],
   'link': [u'http://www.informit.com/store/product.aspx?isbn=0130211192'],
   'title': [u'xml processing with python']}

Previous Lesson

Next Lesson