OpenDataSoft
The OpenDataSoft Harvester is a CKAN harvester that allows to harvest datasets from OpenDataSoft platforms.
OpenDataSoft is a cloud-based turnkey platform for data publishing and API management. It is designed for data owners who need to share data with a larger audience in a simple and cost-effective way.
Enable the Harvester
To enable the harvester, add ods_harvester
to the ckan.plugins
setting in your CKAN configuration file (e.g., ckan.ini
or production.ini
).
ckan.plugins = ... ods_harvester ...
Configuration options
where
[optional]
The where
parameter is used to filter the results returned by the remote API.
A where
filter is a text expression performing a simple full-text search that can also include logical operations (NOT, AND, OR...) and lots of other functions to perform complex and precise search operations.
For more information, see Opendatasoft Query Language (ODSQL) reference documentation.
Example
my_numeric_field > 10 and my_text_field like "paris" or within_distance(my_geo_field, geom'POINT(1 1)', 1km)
Type: str
Default: None
max_datasets
[optional]
The max_datasets
parameters is used to limit an amount of datasets you want to harvest for this harvest source.
This feature is useful for testing or development purposes, allowing you to perform a quick test with a smaller subset of data and verify that the harvested data meets your requirements.
If set to 0
, all available datasets will be harvested.
Type: int
Default: 0
tsm_schema
[optional]
Transmute schema allows you to define a schema that will be used to transform the harvested data before we're trying to create/update a dataset in CKAN.
This is useful when the harvested data doesn't match the CKAN dataset schema and you need to transform it.
Otherwise, you'd need to write a custom harvester and process the remote data yourself.
See the ckanext-transmute
documentation to learn more about the transmute schema syntax.
Example
{
"root": "Dataset",
"types": {
"Dataset": {
"fields": {
"title": {
"validators": [
"tsm_string_only",
"tsm_to_lowercase",
"tsm_name_validator"
],
"map": "name"
},
"resources": {
"type": "Resource",
"multiple": true,
"map": "attachments"
},
"metadata_created": {
"validators": [
"tsm_isodate"
],
"default": "2022-02-03T15:54:26.359453"
},
"metadata_modified": {
"validators": [
"tsm_isodate"
],
"default_from": "metadata_created"
},
"metadata_reviewed": {
"validators": [
"tsm_isodate"
],
"replace_from": "metadata_modified"
},
}
},
"Resource": {
"fields": {
"title": {
"validators": [
"tsm_string_only"
],
"map": "name"
},
"extension": {
"validators": [
"tsm_string_only",
"tsm_to_uppercase"
],
"map": "format"
},
"web": {
"validators": [
"tsm_string_only"
],
"map": "url"
},
"sub-resources": {
"type": "Sub-Resource",
"multiple": true
},
},
}
}
}
Type: dict[str, Any]
Default: None