Socrata
The Socrata harvester is a CKAN harvester that can be used to harvest metadata from Socrata open data portals.
Socrata is a company that provides an open data repository service that many government agencies use to make open data available to the public.
Enable the Harvester
To enable the harvester, add socrata_harvester
to the ckan.plugins
setting in your CKAN configuration file (e.g., ckan.ini
or production.ini
).
ckan.plugins = ... socrata_harvester ...
Configuration options
limit
[optional]
The limit
parameter is used to control the number of entities returned in a single response.
When we're working with remote APIs, we utilize the pagination technique to limit the number of entities returned in a single response. This is done to avoid performance issues and to ensure that the response is manageable.
Type: int
Default: 50
max_datasets
[optional]
The max_datasets
parameters is used to limit an amount of datasets you want to harvest for this harvest source.
This feature is useful for testing or development purposes, allowing you to perform a quick test with a smaller subset of data and verify that the harvested data meets your requirements.
If set to 0
, all available datasets will be harvested.
Type: int
Default: 0
tsm_schema
[optional]
Transmute schema allows you to define a schema that will be used to transform the harvested data before we're trying to create/update a dataset in CKAN.
This is useful when the harvested data doesn't match the CKAN dataset schema and you need to transform it.
Otherwise, you'd need to write a custom harvester and process the remote data yourself.
See the ckanext-transmute
documentation to learn more about the transmute schema syntax.
Example
{
"root": "Dataset",
"types": {
"Dataset": {
"fields": {
"title": {
"validators": [
"tsm_string_only",
"tsm_to_lowercase",
"tsm_name_validator"
],
"map": "name"
},
"resources": {
"type": "Resource",
"multiple": true,
"map": "attachments"
},
"metadata_created": {
"validators": [
"tsm_isodate"
],
"default": "2022-02-03T15:54:26.359453"
},
"metadata_modified": {
"validators": [
"tsm_isodate"
],
"default_from": "metadata_created"
},
"metadata_reviewed": {
"validators": [
"tsm_isodate"
],
"replace_from": "metadata_modified"
},
}
},
"Resource": {
"fields": {
"title": {
"validators": [
"tsm_string_only"
],
"map": "name"
},
"extension": {
"validators": [
"tsm_string_only",
"tsm_to_uppercase"
],
"map": "format"
},
"web": {
"validators": [
"tsm_string_only"
],
"map": "url"
},
"sub-resources": {
"type": "Sub-Resource",
"multiple": true
},
},
}
}
}
Type: dict[str, Any]
Default: None