Group/organization images
Note
internally, groups and organizations are the same entity, so this workflow describes both of them.
First of all, you need a configured storage that supports public links. As all
group/organization images are stored inside local filesystem, you can use
files:public_fs
storage adapter.
Storage name
This extension expects that the name of group images storage will be
group_images
. This name will be used in all other commands of this migration
workflow. If you want to use different name for group images storage, override
ckanext.files.group_images_storage
config option which has default value
group_images
and don't forget to adapt commands if you use a different name
for the storage.
Size restriction
This configuration example sets 10MiB restriction on upload size via
ckanext.files.storage.group_images.max_size
option. Feel free to change it or
remove completely to allow any upload size. This restriction is applied to
future uploads only. Any existing file that exceeds limit is kept.
Type restriction
Uploads restricted to image/*
MIMEtype via
ckanext.files.storage.group_images.supported_types
option. You can make this
option more or less restrictive. This restriction is applied to future uploads
only. Any existing file with wrong MIMEtype is kept.
Location
ckanext.files.storage.group_images.path
controls location of the upload
folder in filesystem. It should match value of ckan.storage_path
option plus
storage/uploads/group
. In example below we assume that value of
ckan.storage_path
is /var/storage/ckan
.
Public URL
ckanext.files.storage.group_images.public_root
option specifies base URL from
which every group image can be accessed. In most cases it's CKAN URL plus
uploads/group
. If you are serving CKAN application from the ckan.site_url
,
leave this option unchanged. If you are using ckan.root_path
, like /data/
,
insert this root path into the value of the option. Example below uses
%(ckan.site_url)s
wildcard, which will be automatically replaced with the
value of ckan.site_url
config option. You can specify site URL explicitely if
you don't like this wildcard syntax.
ckanext.files.storage.group_images.type = files:public_fs
ckanext.files.storage.group_images.max_size = 10MiB
ckanext.files.storage.group_images.supported_types = image
ckanext.files.storage.group_images.path = /var/storage/ckan/storage/uploads/group
ckanext.files.storage.group_images.public_root = %(ckan.site_url)s/uploads/group
Now let's run a command that show us the list of files available under newly configured storage:
ckan files scan -s group_images
All these files are not tracked by files extension yet, i.e they don't have corresponding record in DB with base details, like size, MIMEtype, filehash, etc. Let's create these details via the command below. It's safe to run this command multiple times: it will gather and store information about files not registered in system and ignore any previously registered file.
ckan files scan -s group_images -t
Finally, let's run the command, that shows only untracked files. Ideally, you'll see nothing upon executing it, because you just registered every file in the system.
ckan files scan -s group_images -u
Note
All the file are still available inside storage directory. If previous command
shows nothing, it only means that CKAN already knows details about each file
from the storage directory. If you want to see the list of the files again,
omit -u
flag(which stands for "untracked") and you'll see again all the files
in the command output:
ckan files scan -s group_images
Now, when all images are tracked by the system, we can give the ownership over these files to groups/organizations that are using them. Run the command below to connect files with their owners. It will search for groups/organizations first and report, how many connections were identified. There will be suggestion to show identified relationship and the list of files that have no owner(if there are such files). Presence of files without owner usually means that you removed group/organization from database, but did not remove its image.
Finally, you'll be asked if you want to transfer ownership over files. This operation does not change existing data and if you disable ckanext-files after ownership transfer, you won't see any difference. The whole ownership transfer is managed inside custom DB tables generated by ckanext-files, so it's safe operation.
ckan files migrate groups group_images
Here's an example of output that you can see when running the command:
Found 3 files. Searching file owners...
[####################################] 100% Located owners for 2 files out of 3.
Show group IDs and corresponding file? [y/N]: y
d7186937-3080-429f-a434-22b74b9a8d39: file-1.png
87e2a1aa-7905-4a28-a087-90433f8e169e: file-2.png
Show files that do not belong to any group? [y/N]: y
file-3.png
Transfer file ownership to group identified in previous steps? [y/N]: y
Transfering file-2.png [####################################] 100%
Now comes the most complex part. You need to change metadata schema and UI in order to:
- make sure that all new files are uploaded and managed by ckanext-files instead of native CKAN's uploader
- generate image URLs using ckanext-files functionality. Right now, while files stored in the original storage folder it makes no difference. But if you change upload directory in future or even decide to move files from local filesystem into different storage backend, it will guarantee that files are remain visible.
Original CKAN workflow for uploading files was:
- just save image URL provided by user or
- upload a file
- put it into directory that is publicly served by application
- replace uploaded file in the HTML form/group metadata with the public URL of the uploaded file
This approach is different from strategy recommended by ckanext-files. But in order to make the migration as simple as possible, we'll stay close to original workflow.
Note
suggested approach resembles existing process of file uploads in CKAN. But ckanext-files was designed as a system, that gives you a choice. Check file upload strategies to learn more about alternative implementations of upload and their pros/cons.
First, we need to replace Upload/Link widget on group/organization form. If
you are using native group templates, create group/snippets/group_form.html
and organization/snippets/organization_form.html
. Inside both files, extend
original template and override block basic_fields
. You only need to replace last field
{{ form.image_upload(
data, errors, is_upload_enabled=h.uploads_enabled(),
is_url=is_url, is_upload=is_upload) }}
with
{{ form.image_upload(
data, errors, is_upload_enabled=h.files_group_images_storage_is_configured(),
is_url=is_url, is_upload=is_upload,
field_upload="files_image_upload") }}
There are two differences with the original. First, we use
h.files_group_images_storage_is_configured()
instead of
h.uploads_enabled()
. As we are using different storage for different upload
types, now upload widgets can be enabled independently. And second, we pass
field_upload="files_image_upload"
argument into macro. It will send uploaded
file to CKAN inside files_image_upload
instead of original image_upload
field. This must be done because CKAN unconditionally strips image_upload
field from submission payload, making processing of the file too unreliable. We
changed the name of upload field and CKAN keeps this new field, so that we can
process it as we wish.
Tip
If you are using ckanext-scheming, you only need to replace
form_snippet
of the image_url
field, instead of rewriting the whole
template.
Now, let's define validation rules for this new upload field. We need to create plugins that modify validation schema for group and organization. Due to CKAN implementation details, you need separate plugin for group and organization.
Tip
If you are using ckanext-scheming, you can add files_image_upload
validators to schemas of organization and group. Check the list of validators
that must be applied to this new field below.
Here's an example of plugins that modify validation schemas of group and organization. As you can see, they are mostly the same:
from ckan.lib.plugins import DefaultGroupForm, DefaultOrganizationForm
from ckan.logic.schema import default_create_group_schema, default_update_group_schema
def _modify_schema(schema, type):
schema["files_image_upload"] = [
tk.get_validator("ignore_empty"),
tk.get_validator("files_into_upload"),
tk.get_validator("files_validate_with_storage")("group_images"),
tk.get_validator("files_upload_as")(
"group_images",
type,
"id",
"public_url",
type + "_patch",
"image_url",
),
]
class FilesGroupPlugin(p.SingletonPlugin, DefaultGroupForm):
p.implements(p.IGroupForm, inherit=True)
is_organization = False
def group_types(self):
return ["group"]
def create_group_schema(self):
return _modify_schema(default_create_group_schema(), "group")
def update_group_schema(self):
return _modify_schema(default_update_group_schema(), "group")
class FilesOrganizationPlugin(p.SingletonPlugin, DefaultOrganizationForm):
p.implements(p.IGroupForm, inherit=True)
is_organization = True
def group_types(self):
return ["organization"]
def create_group_schema(self):
return _modify_schema(default_create_group_schema(), "organization")
def update_group_schema(self):
return _modify_schema(default_update_group_schema(), "organization")
There are 4 validators that must be applied to the new upload field:
ignore_empty
: to skip validation, when image URL set manually and no upload selected.files_into_upload
: to convert value of upload field into normalized format, which is expected by ckanext-filesfiles_validate_with_storage(STORAGE_NAME)
: this validator requires an argument: the name of the storage we are using for image uploads. The validator will use storage settings to verify size and MIMEtype of the appload.-
files_upload_as(STORAGE_NAME, GROUP_TYPE, NAME_OF_ID_FIELD, "public_url", NAME_OF_PATCH_ACTION, NAME_OF_URL_FIELF)
: this validator is the most challenging. It accepts 6 arguments:- the name of storage used for image uploads
group
ororganization
depending on processed entity- name of the ID field of processed entity. It's
id
in your case. public_url
- use this exact value. It tells which property of file you want to use as link to the file.group_patch
ororganization_patch
depending on processed entityimage_url
- name of the field that contains URL of the image. ckanext-files will put the public link of uploaded file into this field when form is processed.
That's all. Now every image upload for group/organization is handled by
ckanext-files. To verify it, do the following. First, check list of files
currently stored in group_images
storage via command that we used in the
beginning of the migration:
ckan files scan -s group_images
You'll see a list of existing files. Their names follow format
<ISO_8601_DATETIME><FILENAME>
, e.g 2024-06-14-133840.539670photo.jpg
.
Now upload an image into existing group, or create a new group with any
image. When you check list of files again, you'll see one new record. But this
time this record resembles UUID: da046887-e76c-4a68-97cf-7477665710ff
.