FileSystemStorageClient
Hierarchy
- StorageClient- FileSystemStorageClient
 
Index
Methods
create_dataset_client
- Create a dataset client. - Parameters- optionalkeyword-onlyid: str | None = None
- optionalkeyword-onlyname: str | None = None
- optionalkeyword-onlyalias: str | None = None
- optionalkeyword-onlyconfiguration: Configuration | None = None
 - Returns DatasetClient
create_kvs_client
- Create a key-value store client. - Parameters- optionalkeyword-onlyid: str | None = None
- optionalkeyword-onlyname: str | None = None
- optionalkeyword-onlyalias: str | None = None
- optionalkeyword-onlyconfiguration: Configuration | None = None
 - Returns KeyValueStoreClient
create_rq_client
- Create a request queue client. - Parameters- optionalkeyword-onlyid: str | None = None
- optionalkeyword-onlyname: str | None = None
- optionalkeyword-onlyalias: str | None = None
- optionalkeyword-onlyconfiguration: Configuration | None = None
 - Returns RequestQueueClient
get_rate_limit_errors
- Return statistics about rate limit errors encountered by the HTTP client in storage client. - Returns dict[int, int]
get_storage_client_cache_key
- Return a cache key that can differentiate between different storages of this and other clients. - Can be based on configuration or on the client itself. By default, returns a module and name of the client class. - Parameters- configuration: Configuration
 - Returns Hashable
File system implementation of the storage client.
This storage client provides access to datasets, key-value stores, and request queues that persist data to the local file system. Each storage type is implemented with its own specific file system client that stores data in a structured directory hierarchy.
Data is stored in JSON format in predictable file paths, making it easy to inspect and manipulate the stored data outside of the Crawlee application if needed.
All data persists between program runs but is limited to access from the local machine where the files are stored.
Warning: This storage client is not safe for concurrent access from multiple crawler processes. Use it only when running a single crawler process at a time.