FileSystemStorageClient

File system implementation of the storage client.

This storage client provides access to datasets, key-value stores, and request queues that persist data to the local file system. Each storage type is implemented with its own specific file system client that stores data in a structured directory hierarchy.

Data is stored in JSON format in predictable file paths, making it easy to inspect and manipulate the stored data outside of the Crawlee application if needed.

All data persists between program runs but is limited to access from the local machine where the files are stored.

Warning: This storage client is not safe for concurrent access from multiple crawler processes. Use it only when running a single crawler process at a time.

Hierarchy

StorageClient
- FileSystemStorageClient

Index

Methods

create_dataset_client

async create_dataset_client(*, id, name, alias, configuration): DatasetClient

Overrides StorageClient.create_dataset_client
Create a dataset client.
Parameters
- optionalkeyword-onlyid: str | None = None
- optionalkeyword-onlyname: str | None = None
- optionalkeyword-onlyalias: str | None = None
- optionalkeyword-onlyconfiguration: Configuration | None = None
Returns DatasetClient

create_kvs_client

async create_kvs_client(*, id, name, alias, configuration): KeyValueStoreClient

Overrides StorageClient.create_kvs_client
Create a key-value store client.
Parameters
- optionalkeyword-onlyid: str | None = None
- optionalkeyword-onlyname: str | None = None
- optionalkeyword-onlyalias: str | None = None
- optionalkeyword-onlyconfiguration: Configuration | None = None
Returns KeyValueStoreClient

create_rq_client

async create_rq_client(*, id, name, alias, configuration): RequestQueueClient

Overrides StorageClient.create_rq_client
Create a request queue client.
Parameters
- optionalkeyword-onlyid: str | None = None
- optionalkeyword-onlyname: str | None = None
- optionalkeyword-onlyalias: str | None = None
- optionalkeyword-onlyconfiguration: Configuration | None = None
Returns RequestQueueClient

get_rate_limit_errors

get_rate_limit_errors(): dict[int, int]

Inherited from StorageClient.get_rate_limit_errors
Return statistics about rate limit errors encountered by the HTTP client in storage client.
Returns dict[int, int]

get_storage_client_cache_key

get_storage_client_cache_key(configuration): Hashable

Overrides StorageClient.get_storage_client_cache_key
Return a cache key that can differentiate between different storages of this and other clients.

Can be based on configuration or on the client itself. By default, returns a module and name of the client class.
Parameters
- configuration: Configuration
Returns Hashable

Hierarchy

Index

Methods

Methods

create_dataset_client

Parameters

optionalkeyword-onlyid: str | None = None

optionalkeyword-onlyname: str | None = None

optionalkeyword-onlyalias: str | None = None

optionalkeyword-onlyconfiguration: Configuration | None = None

Returns DatasetClient

create_kvs_client

Parameters

optionalkeyword-onlyid: str | None = None

optionalkeyword-onlyname: str | None = None

optionalkeyword-onlyalias: str | None = None

optionalkeyword-onlyconfiguration: Configuration | None = None

Returns KeyValueStoreClient

create_rq_client

Parameters

optionalkeyword-onlyid: str | None = None

optionalkeyword-onlyname: str | None = None

optionalkeyword-onlyalias: str | None = None

optionalkeyword-onlyconfiguration: Configuration | None = None

Returns RequestQueueClient

get_rate_limit_errors

Returns dict[int, int]

get_storage_client_cache_key

Parameters

configuration: Configuration

Returns Hashable