Basic Information
We utilize different technologies to gather data from various platforms like ODS, ArcGIS, Socrata, DKAN, Junar, and more. With these harvesters, you can automatically pull datasets from different sources into your CKAN instance, helping you manage and share data more efficiently.
Some data portal provides an API to access their data, and some do not. For those that do not provide an API, we can use web scraping to gather data. Web scraping is a technique to extract data from websites. Web scraping is not so reliable, as using an API, but it is the only option when the data portal does not provide a public API.
Transmute Schema
Each harvester has an integration with the ckanext-transmute extension, which allows you to transform datasets during the harvesting process using a harvest source configuration.
It's helpful, when you need to adjust the result data to fit your CKAN instance dataset schema. To use the ckanext-transmute
, you'll have to install and enable it in your CKAN instance. Other from that, you need to provide a tsm_config
key into your harvest source configuration.
The syntax and all the information about installing and using the ckanext-transmute
extension can be found in the ckanext-transmute documentation
Example
{
"root": "Dataset",
"types": {
"Dataset": {
"fields": {
"title": {
"validators": [
"tsm_string_only",
"tsm_to_lowercase",
"tsm_name_validator"
],
"map": "name"
},
"resources": {
"type": "Resource",
"multiple": true,
"map": "attachments"
},
"metadata_created": {
"validators": [
"tsm_isodate"
],
"default": "2022-02-03T15:54:26.359453"
},
"metadata_modified": {
"validators": [
"tsm_isodate"
],
"default_from": "metadata_created"
},
"metadata_reviewed": {
"validators": [
"tsm_isodate"
],
"replace_from": "metadata_modified"
},
}
},
"Resource": {
"fields": {
"title": {
"validators": [
"tsm_string_only"
],
"map": "name"
},
"extension": {
"validators": [
"tsm_string_only",
"tsm_to_uppercase"
],
"map": "format"
},
"web": {
"validators": [
"tsm_string_only"
],
"map": "url"
},
"sub-resources": {
"type": "Sub-Resource",
"multiple": true
},
},
}
}
}