Usage
Register collection
Collection can be initialized anywhere in code
Example
from my.module import MyCollection
col = MyCollection()
But it's recommended to register collections globally.
Collections are registered via ICollection interface or via CKAN signals. Registered collection can be initialized anywhere in code using helper. It can also be used in a number of generic endpoints that render collection as HTML or export it into different formats.
import ckan.plugins as p
from ckanext.collection import shared
class MyPlugin(p.SingletonPlugin):
p.implements(shared.ICollection, inherit=True)
def get_collection_factories(
self,
) -> dict[str, shared.types.CollectionFactory]:
return {
"my-collection": MyCollection,
}
import ckan.plugins as p
class MyPlugin(p.SingletonPlugin):
p.implements(p.ISignal)
def get_signal_subscriptions(self) -> types.SignalMapping:
return {
tk.signals.ckanext.signal("collection:register_collections"): [
get_collection_factories,
],
}
def get_collection_factories(sender: None): # (1)!
return {
"my-collection": MyCollection,
}
- Signal listerners must receive at least one argument containing the sender
of the signal. Signal that register collections always sets
None
as a sender.
get_collection_factories
returns a dictionary with collection names(letters,
digits, underscores and hyphens are allowed) as keys, and collection factories
as values. In most generic case, collection factory is just a collection's
class itself. But you can use any function with signature (str, dict[str,
Any], **Any) -> Collection
as a factory.
Example
The following function is a valid collection factory and it can be returned
from get_collection_factories
def my_factory(name: str, params: dict[str, Any], **kwargs: Any):
"""Collection that shows 100 numbers per page"""
params.setdefault("rows_per_page", 100)
return MyCollection(name, params, **kwargs)
Initialize collection
Collection class defines the data source of collection and different aspects of its behavior. But collection class itself does not contain any data and collection instance must be created to work with data.
Any collection can be initialized directly, using collection class. And every
registered collection can be initialized via
get_collection
function. Arguments are the same in both cases. Collection
requires the name, parameters and accepts arbitrary number of keyword-only
arguments, that are passed to underlying services.
col = get_collection(
"my-collection",
{},
pager_settings={"rows_per_page": 100}
)
Tip
Second argument of get_collection
expects parameters prefixed by collection
name. In example above, to choose the second page, you need to pass
{"my-collection:page": 2}
as parameters.
If you are using unprefixed parameters, like {"page": 2}
and don't want to
adapt them to expected form, pass True
as the third argument to
get_collection
, and every key inside parameters will get required prefix
automatically.
col = get_collection(
"my-collection",
{"page": 2},
True,
pager_settings={"rows_per_page": 100}
)
col = MyCollection(
"",
{},
pager_settings={"rows_per_page": 100},
)
Use collection data
Tip
If you want to try examples below, but you haven't defined any collection yet, you can use the following definition for collection of numbers from 1 to 25:
from ckanext.collection.shared import collection, data
class MyCollection(collection.Collection):
DataFactory = data.StaticData.with_attributes(data=[
{"number": num, "index": idx}
for idx, num in enumerate(range(1,26))
])
Intended way to access the data is iteration over collection instance. In this way, you access only specific chunk of data, limited by collection's pager.
>>> col = MyCollection()
>>> list(col)
[{'number': 1, 'index': 0},
{'number': 2, 'index': 1},
{'number': 3, 'index': 2},
{'number': 4, 'index': 3},
{'number': 5, 'index': 4},
{'number': 6, 'index': 5},
{'number': 7, 'index': 6},
{'number': 8, 'index': 7},
{'number': 9, 'index': 8},
{'number': 10, 'index': 9}]
Different page can be accessed by passing page
inside params to collection's
constructor
>>> col = MyCollection("", {"page": 3}) # (1)!
>>> list(col)
[{'number': 21, 'index': 20},
{'number': 22, 'index': 21},
{'number': 23, 'index': 22},
{'number': 24, 'index': 23},
{'number': 25, 'index': 24}]
- More idiomatic form of this initialization is
MyCollection(pager_settings={"page": 3}))
. But this form is longer an required deeper knowledge of collections.
If you need to iterate over all collection items, without pagination, you can
use data
attribute of the collection.
Warning
Using data directly can result in enormous memory consumption. Avoid
transforming data into list(list(col.data)
) or processing it as single object
in any other way. Instead, iterate over collection items using loops or similar
tools.
Example
>>> sum = 0
>>> for item in col.data:
>>> sum += item["number"]
>>>
>>> print(sum)
325
Serialize collection
The ultimate goal of every collection is serialization. It may be serialization as HTML to show collection on one of application web-pages. Or serialization as JSON to send collection to the external API. Or serialization as CSV to allow user downloading the collection. Or even serialization as pandas' DataFrame to process data from the collection using more advanced tools.
serializer
service of collection is responsible for serialization. If
required format of collections output is known in advance, SerializerFactory
can be defined on collection level.
If the format of serialization can vary, serializer
can be initialized
separately.
from ckanext.collection.shared import collection, serialize, data
class MyCollection(collection.Collection):
DataFactory = data.StaticData.with_attributes(data=[
{"number": num, "index": idx}
for idx, num in enumerate(range(1,26))
])
SerializerFactory = serialize.CsvSerializer
col = MyCollection()
print(col.serializer.serialize())
from ckanext.collection.shared import collection, serialize, data
class MyCollection(collection.Collection):
DataFactory = data.StaticData.with_attributes(data=[
{"number": num, "index": idx}
for idx, num in enumerate(range(1,26))
])
col = MyCollection()
json = serialize.JsonSerializer(col)
print(json.serialize())
csv = serialize.CsvSerializer(col)
print(csv.serialize())
Warning
Keep in mind, as any other collection's service, serializer attaches itself to collection when initialized and replaces the previous serializer.
>>> col = MyCollection()
>>> isinstance(col.serializer, serialize.Serializer)
True
>>> serialize.JsonSerializer(col)
>>> isinstance(col.serializer, serialize.JsonSerializer)
True
>>> serialize.CsvSerializer(col)
>>> isinstance(col.serializer, serialize.CsvSerializer)
True