Scrapy Tutorial on Scrapy Using an Item

description

item objects are the regular dicts of python. we can use the following syntax to access the attributes of the class −

>>> item = dmozitem()
>>> item['title'] = 'sample title'
>>> item['title']
'sample title'

add the above code to the following example −

import scrapy

from tutorial.items import dmozitem

class myprojectspider(scrapy.spider):
   name = "project"
   allowed_domains = ["dmoz.org"]
   
   start_urls = [
      "http://www.dmoz.org/computers/programming/languages/python/books/",
      "http://www.dmoz.org/computers/programming/languages/python/resources/"
   ]
   def parse(self, response):
      for sel in response.xpath('//ul/li'):
         item = dmozitem()
         item['title'] = sel.xpath('a/text()').extract()
         item['link'] = sel.xpath('a/@href').extract()
         item['desc'] = sel.xpath('text()').extract()
         yield item

the output of the above spider will be −

[scrapy] debug: scraped from <200 
http://www.dmoz.org/computers/programming/languages/python/books/>
   {'desc': [u' - by david mertz; addison wesley. book in progress, full text, 
      ascii format. asks for feedback. [author website, gnosis software, inc.\n],
   'link': [u'http://gnosis.cx/tpip/'],
   'title': [u'text processing in python']}
[scrapy] debug: scraped from <200 
http://www.dmoz.org/computers/programming/languages/python/books/>
   {'desc': [u' - by sean mcgrath; prentice hall ptr, 2000, isbn 0130211192, 
      has cd-rom. methods to build xml applications fast, python tutorial, dom and 
      sax, new pyxie open source xml processing library. [prentice hall ptr]\n'],
   'link': [u'http://www.informit.com/store/product.aspx?isbn=0130211192'],
   'title': [u'xml processing with python']}