description
feed exports is a method of storing the data scraped from the sites, that is generating a "export file".
serialization formats
using multiple serialization formats and storage backends, feed exports use item exporters and generates a feed with scraped items.
the following table shows the supported formats−
| sr.no | format & description |
|---|---|
| 1 |
json feed_format is json exporter used is class scrapy.exporters.jsonitemexporter |
| 2 |
json lines feed_fromat is jsonlines exporter used is class scrapy.exporters.jsonlinesitemexporter |
| 3 |
csv feed_format is csv exporter used is class scrapy.exporters.csvitemexporter |
| 4 |
xml feed_format is xml exporter used is class scrapy.exporters.xmlitemexporter |
using feed_exporters settings, the supported formats can also be extended −
| sr.no | format & description |
|---|---|
| 1 |
pickle feed_format is pickel exporter used is class scrapy.exporters.pickleitemexporter |
| 2 |
marshal feed_format is marshal exporter used is class scrapy.exporters.marshalitemexporter |
storage backends
storage backend defines where to store the feed using uri.
following table shows the supported storage backends −
| sr.no | storage backend & description |
|---|---|
| 1 |
local filesystem uri scheme is file and it is used to store the feeds. |
| 2 |
ftp uri scheme is ftp and it is used to store the feeds. |
| 3 |
s3 uri scheme is s3 and the feeds are stored on amazon s3. external libraries botocore or boto are required. |
| 4 |
standard output uri scheme is stdout and the feeds are stored to the standard output. |
storage uri parameters
following are the parameters of storage url, which gets replaced while the feed is being created −
- %(time)s: this parameter gets replaced by a timestamp.
- %(name)s: this parameter gets replaced by spider name.
settings
following table shows the settings using which feed exports can be configured −
| sr.no | setting & description |
|---|---|
| 1 |
feed_uri it is the uri of the export feed used to enable feed exports. |
| 2 |
feed_format it is a serialization format used for the feed. |
| 3 |
feed_export_fields it is used for defining fields which needs to be exported. |
| 4 |
feed_store_empty it defines whether to export feeds with no items. |
| 5 |
feed_storages it is a dictionary with additional feed storage backends. |
| 6 |
feed_storages_base it is a dictionary with built-in feed storage backends. |
| 7 |
feed_exporters it is a dictionary with additional feed exporters. |
| 8 |
feed_exporters_base it is a dictionary with built-in feed exporters. |