Multipart, resumable and signed uploads
This feature has many names, but it basically divides a single upload into multiple stages. It can be used in following situations:
- a really big file must be uploaded to cloud. It cannot fit into server's temporal storage, so you split the file into smaller part and upload them separately. Every part is uploaded to server and next part must wait till the previous moved from server to cloud. This is a multipart upload.
- client has unstable or slow connection. Any upload takes ages and quite often connection is interrupted so user has to spend extra time for re-uploading files. To improve user experience, you want to track the upload progress and keep incomplete file on server. If connection is interrupted, user can continue upload from the point he stopped the last time, appending content to existing incomplete file. This is a resumable upload.
- files are kept on cloud and uploads are quite intense on the portal. You don't want to spend server resources on transferring content from client to cloud. Instead you generate a URL that allows user to upload a single file directly into specific location on cloud. User sends data to this URL and only notifies the application, when upload is finished, so that the application can make file visible. This is a signed upload.
All these situations are handled by 4 API actions, which are available if
storage has MULTIPART
capability:
files_multipart_start
: initialize multipart upload and set expected final size and MIMEtype. Real multipart upload usually just return upload ID from this action. Resumable upload creates empty file in the storage to accumulate content inside it. Signed upload produces a URL for direct upload.files_multipart_update
: upload the fragment of the file of modify the upload in some other way. Most often this action accepts ID of the upload andupload
field with fragment of the uploaded file.files_multipart_refresh
: this action synchronizes and returns current upload progress. It can be used if upload was paused and client does not know how many bytes were uploaded and from which byte the next upload fragment starts.files_multipart_complete
: finalize the upload and convert it into normal file, available to other parts of the application. Multipart upload usually combines all uploaded parts into single file here. Resumable upload verifies that the result has expected MIMEtype and size. Signed upload just registers completed file in the system.
Implementation of multipart upload depends on the used adapter, so make sure you checked its documentation before using any multipart actions. There are some common steps in multipart upload workflow that are usually the same among all adapters:
files_multipart_start
requirescontent_type
andsize
parameters. These values will be used to validate completed upload.files_multipart_start
allowshash
parameter. This value will be used to validate completed upload. Unlikecontent_type
andsize
,hash
is usually optional, because it may be difficult for client to compute it.files_multipart_update
accepts upload ID asid
and fragment of the file asupload
. Sequence of calls tofiles_multipart_update
with non-overlapping fragments can be used to upload the file. Even if adapter implements signed uploads and client is supposed to send file to the signed URL instead of usingfiles_multipart_update
.files_multipart_complete
comparescontent_type
,size
andhash
(if present) specified during initialization of upload with actual values. If they are different, upload is not converted into normal file. Depending on implementation, storage may just ignore incorrect initial expectations an assign a real values to the file as long as they are allowed by storage configuration. But it's recommended to reject such uploads, so it safer to assume, that incorrect expectations are not accepted.
Incomplete files support most of normal file actions, but you need to pass
completed=False
to action when working with incomplete files. I.e, if you
want to remove incomplete upload, use its ID and completed=False
:
ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24 completed=False
Incompleted files do not support streaming and downloading via public interface of the extension. But storage adapter can expose such features via custom methods if it's technically possible.
Example of basic multipart upload is shown above. files:fs
adapter can be
used for running this example, as it implements MULTIPART
.
First, create text file and check its size:
echo 'hello world!' > /tmp/file.txt
wc -c /tmp/file.txt
... 13 /tmp/file.txt
The size is 13
bytes and content type is text/plain
. These values must be
used for upload initialization.
ckanapi action files_multipart_start name=file.txt size=13 content_type=text/plain
... {
... "content_type": "text/plain",
... "ctime": "2024-06-22T14:47:01.313016+00:00",
... "hash": "",
... "id": "90ebd047-96a0-4f32-a810-ffc962cbc380",
... "location": "77e629f2-8938-4442-b825-8e344660e119",
... "name": "file.txt",
... "owner_id": "59ea0f6c-5c2f-438d-9d2e-e045be9a2beb",
... "owner_type": "user",
... "pinned": false,
... "size": 13,
... "storage": "default",
... "storage_data": {
... "uploaded": 0
... }
... }
Here storage_data
contains {"uploaded": 0}
. It may be different for other
adaptes, especially if they implement non-consecutive uploads, but generally
it's the recommended way to keep upload progress.
Now we'll upload first 5 bytes of file.
ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \
upload@<(dd if=/tmp/file.txt bs=1 count=5)
... {
... "content_type": "text/plain",
... "ctime": "2024-06-22T14:47:01.313016+00:00",
... "hash": "",
... "id": "90ebd047-96a0-4f32-a810-ffc962cbc380",
... "location": "77e629f2-8938-4442-b825-8e344660e119",
... "name": "file.txt",
... "owner_id": "59ea0f6c-5c2f-438d-9d2e-e045be9a2beb",
... "owner_type": "user",
... "pinned": false,
... "size": 13,
... "storage": "default",
... "storage_data": {
... "uploaded": 5
... }
... }
If you try finalizing upload right now, you'll get an error.
ckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380
... ckan.logic.ValidationError: None - {'upload': ['Actual value of upload size(5) does not match expected value(13)']}
Let's upload the rest of bytes and complete the upload.
ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \
upload@<(dd if=/tmp/file.txt bs=1 skip=5)
ckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380
... {
... "atime": null,
... "content_type": "text/plain",
... "ctime": "2024-06-22T14:57:18.483716+00:00",
... "hash": "c897d1410af8f2c74fba11b1db511e9e",
... "id": "a740692f-e3d5-492f-82eb-f04e47c13848",
... "location": "77e629f2-8938-4442-b825-8e344660e119",
... "mtime": null,
... "name": "file.txt",
... "owner_id": null,
... "owner_type": null,
... "pinned": false,
... "size": 13,
... "storage": "default",
... "storage_data": {}
... }
Now file can be used normally. You can transfer file ownership to someone, stream or modify it. Pay attention to ID: completed file has its own unique ID, which is different from ID of the incomplete upload.