Skip to content

Junar

The Junar harvester is a CKAN harvester that can be used to harvest metadata from Junar data portals.

Junar is a data platform that allows organizations to publish, share, and manage data in the cloud.

Enable the Harvester

To enable the harvester, add junar_harvester to the ckan.plugins setting in your CKAN configuration file (e.g., ckan.ini or production.ini).

ckan.plugins = ... junar_harvester ...

Configuration options

auth_key [optional]

The Junar API requires a key to operate. To get an auth key, you'll have to access the "Developers" page on a respective Junar portal. For example, see the City of Palo Alto data portal.

alt text

Warning

If the auth_key is not provided, the SearchError exception will be raised.

Type: str

Default: None

To use the Junar harvester you need to provide a Junar API key. You can get one by registering at Junar.

max_datasets [optional]

The max_datasets parameters is used to limit an amount of datasets you want to harvest for this harvest source.

This feature is useful for testing or development purposes, allowing you to perform a quick test with a smaller subset of data and verify that the harvested data meets your requirements.

If set to 0, all available datasets will be harvested.

Type: int

Default: 0

tsm_schema [optional]

Transmute schema allows you to define a schema that will be used to transform the harvested data before we're trying to create/update a dataset in CKAN.

This is useful when the harvested data doesn't match the CKAN dataset schema and you need to transform it.

Otherwise, you'd need to write a custom harvester and process the remote data yourself.

See the ckanext-transmute documentation to learn more about the transmute schema syntax.

Example
{
    "root": "Dataset",
    "types": {
        "Dataset": {
            "fields": {
                "title": {
                    "validators": [
                        "tsm_string_only",
                        "tsm_to_lowercase",
                        "tsm_name_validator"
                    ],
                    "map": "name"
                },
                "resources": {
                    "type": "Resource",
                    "multiple": true,
                    "map": "attachments"
                },
                "metadata_created": {
                    "validators": [
                        "tsm_isodate"
                    ],
                    "default": "2022-02-03T15:54:26.359453"
                },
                "metadata_modified": {
                    "validators": [
                        "tsm_isodate"
                    ],
                    "default_from": "metadata_created"
                },
                "metadata_reviewed": {
                    "validators": [
                        "tsm_isodate"
                    ],
                    "replace_from": "metadata_modified"
                },
            }
        },
        "Resource": {
            "fields": {
                "title": {
                    "validators": [
                        "tsm_string_only"
                    ],
                    "map": "name"
                },
                "extension": {
                    "validators": [
                        "tsm_string_only",
                        "tsm_to_uppercase"
                    ],
                    "map": "format"
                },
                "web": {
                    "validators": [
                        "tsm_string_only"
                    ],
                    "map": "url"
                },
                "sub-resources": {
                    "type": "Sub-Resource",
                    "multiple": true
                },
            },
        }
    }
}

Type: dict[str, Any]

Default: None