Upload

The Upload class represents the data you want to store in a storage backend. It's a key component in file-keeper, encapsulating the file's content, metadata, and instructions for how to transfer it to storage. This document explains how to create and use Upload objects, covering streaming uploads and hashing for data integrity.

What is an Upload?

Think of an Upload as a package containing everything needed to send a file to storage. It's more than just the raw data; it includes information about the file itself, like its name, size, and content type. This metadata is essential for proper storage and retrieval.

The Upload object decouples the source of the data from the transfer process. This allows file-keeper to handle various data sources – files on disk, in-memory buffers, network streams – without changing the core storage logic.

Creating an Upload Object

You can create an Upload object in several ways, depending on the source of your data. The recommended way is using make_upload helper:

File-like objectByte stringWerkzeug's FileStorageManually

If you have an open file, you can directly pass it to the make_upload function. file-keeper will handle reading the data from the file.

src = open("my_image.jpg", "rb")
upload = make_upload(src)

If your data is already in memory as a byte string, you can pass it directly.

data = b"This is the content of my file."
upload = make_upload(data)

When writing an application using werkzeug-based framework you can handle uploaded files in this way.

from werkzeug.datastructures import FileStorage

data = FileStorage(..., "my_data.txt")
upload = make_upload(data)

This is useful for large files that don't fit in memory. You need to provide an object that has methods read and __iter__ producing byte string as a first argument to Upload class. If you have a generator that yields data, wrap it into IterableBytesReader instead of manually implementing class with required methods:

from file_keeper import Upload, IterableBytesReader

def data_generator():
    yield b"hello"
    yield b" "
    yield b"world"

stream = IterableBytesReader(data_generator())
upload = Upload(stream, "my_file.txt", 11, "text/plain")

The make_upload function automatically determines the file size and content type for supported source types.

Streaming Uploads

For very large files, loading the entire content into memory is impractical. Streaming uploads allow you to send the data in chunks, reducing memory usage and improving performance.

As shown in the example above, you can create an Upload object from an iterable of bytes. file-keeper will then stream the data to the storage backend as it becomes available. This is the preferred method for handling large files, but it expects that you compute the size and content type of the upload in advance and provide these details to the Upload constructor.

Hashing for Data Integrity

Data integrity is crucial when transferring files. Hashing ensures that the data you upload is exactly the same as the data stored in the backend. file-keeper automatically calculates a hash of the upload data during the transfer process.

The calculated hash is stored as part of the FileData metadata. When you retrieve the file, file-keeper can recalculate the hash via Storage.analyze method to verify data integrity. If the hashes don't match, it indicates that the file has been corrupted or tampered with.

Hashing is performed transparently during the upload process, so you don't need to worry about implementing it yourself. It provides an extra layer of assurance that your data is stored reliably.