Skip to content

DCAT JSON

The DCAT JSON harvester is a CKAN harvester that can be used to harvest metadata from DCAT JSON files.

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.

Warning

This harvester is based on the original DCAT harvester from ckanext-dcat, therefore it requires the ckanext-dcat library to be installed.

Enable the Harvester

To enable the harvester, add basket_dcat_json_harvester to the ckan.plugins setting in your CKAN configuration file (e.g., ckan.ini or production.ini).

ckan.plugins = ... basket_dcat_json_harvester ...

Configuration options

tsm_schema [optional]

Transmute schema allows you to define a schema that will be used to transform the harvested data before we're trying to create/update a dataset in CKAN.

This is useful when the harvested data doesn't match the CKAN dataset schema and you need to transform it.

Otherwise, you'd need to write a custom harvester and process the remote data yourself.

See the ckanext-transmute documentation to learn more about the transmute schema syntax.

Example
{
    "root": "Dataset",
    "types": {
        "Dataset": {
            "fields": {
                "title": {
                    "validators": [
                        "tsm_string_only",
                        "tsm_to_lowercase",
                        "tsm_name_validator"
                    ],
                    "map": "name"
                },
                "resources": {
                    "type": "Resource",
                    "multiple": true,
                    "map": "attachments"
                },
                "metadata_created": {
                    "validators": [
                        "tsm_isodate"
                    ],
                    "default": "2022-02-03T15:54:26.359453"
                },
                "metadata_modified": {
                    "validators": [
                        "tsm_isodate"
                    ],
                    "default_from": "metadata_created"
                },
                "metadata_reviewed": {
                    "validators": [
                        "tsm_isodate"
                    ],
                    "replace_from": "metadata_modified"
                },
            }
        },
        "Resource": {
            "fields": {
                "title": {
                    "validators": [
                        "tsm_string_only"
                    ],
                    "map": "name"
                },
                "extension": {
                    "validators": [
                        "tsm_string_only",
                        "tsm_to_uppercase"
                    ],
                    "map": "format"
                },
                "web": {
                    "validators": [
                        "tsm_string_only"
                    ],
                    "map": "url"
                },
                "sub-resources": {
                    "type": "Sub-Resource",
                    "multiple": true
                },
            },
        }
    }
}

Type: dict[str, Any]

Default: None