2020 |
Arnald Marcer Elspeth Haston, Quentin Groom Arturo Ariño Arthur Chapman Torkild Bakken Paul Braun Mathias Dillen Marcus Ernst Agustí Escobar David Fichtmüller Laurence Livermore Nicky Nicolson Kaloust Paragamian Deborah Paul Lars Pettersson Sarah Phillips Jack Plummer Heimo Rainer Isabel Rey Tim Robertson Dominik Röpert Joaquim Santos Francesc Uribe John Waller John Wieczorek H D B R Quality issues in georeferencing: From physical collections to digital data repositories for ecological research Journal Article 2020. Links | BibTeX | Tags: eco-evolutionary research, georeferencing, global biodiversity information facility, natural history collections, uncertainty workshop @article{Marcer2020, title = {Quality issues in georeferencing: From physical collections to digital data repositories for ecological research}, author = {Arnald Marcer,Elspeth Haston,Quentin Groom,Arturo H. Ariño,Arthur D. Chapman,Torkild Bakken,Paul Braun,Mathias Dillen,Marcus Ernst,Agustí Escobar,David Fichtmüller,Laurence Livermore,Nicky Nicolson,Kaloust Paragamian,Deborah Paul,Lars B. Pettersson,Sarah Phillips,Jack Plummer,Heimo Rainer,Isabel Rey,Tim Robertson,Dominik Röpert,Joaquim Santos,Francesc Uribe,John Waller,John R. Wieczorek}, doi = {https://doi.org/10.1111/ddi.13208}, year = {2020}, date = {2020-12-03}, keywords = {eco-evolutionary research, georeferencing, global biodiversity information facility, natural history collections, uncertainty workshop}, pubstate = {published}, tppubtype = {article} } |
Hardy, Helen; Knapp, Sandra; Allan, Louise E; Berger, Frederik; Dixey, Katherine; Döme, Bernadette; Gagnier, Pierre-Yves; Frank, Jiri; Margaret Haston, Elspeth; Holstein, Joachim; Kiel, Steffen; Marschler, Maria; Mergen, Patricia; Phillips, Sarah; Rabinovich, Rivka; Chillón, Begoña Sanchez; V Sorensen, Martin; Thines, Marco; Trekels, Maarten; Vogt, Robert; Wilson, Scott; Wiltschke-Schrotta, Karin SYNTHESYS+ Virtual Access - Report on the Ideas Call (October to November 2019) Journal Article Research Ideas and Outcomes, 6 , pp. e50354, 2020. Abstract | Links | BibTeX | Tags: access, collaboration, digital data, digitisation, digitization, natural history collections, virtual data @article{10.3897/rio.6.e50354, title = {SYNTHESYS+ Virtual Access - Report on the Ideas Call (October to November 2019)}, author = {Helen Hardy and Sandra Knapp and Louise E Allan and Frederik Berger and Katherine Dixey and Bernadette Döme and Pierre-Yves Gagnier and Jiri Frank and Elspeth Margaret Haston and Joachim Holstein and Steffen Kiel and Maria Marschler and Patricia Mergen and Sarah Phillips and Rivka Rabinovich and Begoña Sanchez Chillón and Martin V Sorensen and Marco Thines and Maarten Trekels and Robert Vogt and Scott Wilson and Karin Wiltschke-Schrotta}, url = {https://doi.org/10.3897/rio.6.e50354}, doi = {10.3897/rio.6.e50354}, year = {2020}, date = {2020-01-01}, journal = {Research Ideas and Outcomes}, volume = {6}, pages = {e50354}, publisher = {Pensoft Publishers}, abstract = {The SYNTHESYS consortium has been operational since 2004, and has facilitated physical access by individual researchers to European natural history collections through its Transnational Access programme (TA). For the first time, SYNTHESYS+ will be offering virtual access to collections through digitisation, with two calls for the programme, the first in 2020 and the second in 2021. The Virtual Access (VA) programme is not a direct digital parallel of Transnational Access - proposals for collections digitisation will be prioritised and carried out based on community demand, and data must be made openly available immediately. A key feature of Virtual Access is that, unlike TA, it does not select the researchers to whom access is provided. Because Virtual Access in this way is new to the community and to the collections-holding institutions, the SYNTHESYS+ consortium invited ideas through an Ideas Call, that opened on 7th October 2019 and closed on 22nd November 2019, in order to assess interest and to trial procedures. This report is intended to provide feedback to those who participated in the Ideas Call and to help all applicants to the first SYNTHESYS+Virtual Access Call that will be launched on 20th of February 2020.}, keywords = {access, collaboration, digital data, digitisation, digitization, natural history collections, virtual data}, pubstate = {published}, tppubtype = {article} } The SYNTHESYS consortium has been operational since 2004, and has facilitated physical access by individual researchers to European natural history collections through its Transnational Access programme (TA). For the first time, SYNTHESYS+ will be offering virtual access to collections through digitisation, with two calls for the programme, the first in 2020 and the second in 2021. The Virtual Access (VA) programme is not a direct digital parallel of Transnational Access - proposals for collections digitisation will be prioritised and carried out based on community demand, and data must be made openly available immediately. A key feature of Virtual Access is that, unlike TA, it does not select the researchers to whom access is provided. Because Virtual Access in this way is new to the community and to the collections-holding institutions, the SYNTHESYS+ consortium invited ideas through an Ideas Call, that opened on 7th October 2019 and closed on 22nd November 2019, in order to assess interest and to trial procedures. This report is intended to provide feedback to those who participated in the Ideas Call and to help all applicants to the first SYNTHESYS+Virtual Access Call that will be launched on 20th of February 2020. |
Harjes, Janno; Link, Anton; Weibulat, Tanja; Triebel, Dagmar; Rambold, Gerhard Database, 2020 , 2020, ISSN: 1758-0463, (baaa059). Abstract | Links | BibTeX | Tags: @article{10.1093/database/baaa059b, title = {FAIR digital objects in environmental and life sciences should comprise workflow operation design data and method information for repeatability of study setups and reproducibility of results}, author = {Janno Harjes and Anton Link and Tanja Weibulat and Dagmar Triebel and Gerhard Rambold}, url = {https://doi.org/10.1093/database/baaa059}, doi = {10.1093/database/baaa059}, issn = {1758-0463}, year = {2020}, date = {2020-01-01}, journal = {Database}, volume = {2020}, abstract = {Repeatability of study setups and reproducibility of research results by underlying data are major requirements in science. Until now, abstract models for describing the structural logic of studies in environmental sciences are lacking and tools for data management are insufficient. Mandatory for repeatability and reproducibility is the use of sophisticated data management solutions going beyond data file sharing. Particularly, it implies maintenance of coherent data along workflows. Design data concern elements from elementary domains of operations being transformation, measurement and transaction. Operation design elements and method information are specified for each consecutive workflow segment from field to laboratory campaigns. The strict linkage of operation design element values, operation values and objects is essential. For enabling coherence of corresponding objects along consecutive workflow segments, the assignment of unique identifiers and the specification of their relations are mandatory. The abstract model presented here addresses these aspects, and the software DiversityDescriptions (DWB-DD) facilitates the management of thusly connected digital data objects and structures. DWB-DD allows for an individual specification of operation design elements and their linking to objects. Two workflow design use cases, one for DNA barcoding and another for cultivation of fungal isolates, are given. To publish those structured data, standard schema mapping and XML-provision of digital objects are essential. Schemas useful for this mapping include the Ecological Markup Language, the Schema for Meta-omics Data of Collection Objects and the Standard for Structured Descriptive Data. Data pipelines with DWB-DD include the mapping and conversion between schemas and functions for data publishing and archiving according to the Open Archival Information System standard. The setting allows for repeatability of study setups, reproducibility of study results and for supporting work groups to structure and maintain their data from the beginning of a study. The theory of ‘FAIR++’ digital objects is introduced.}, note = {baaa059}, keywords = {}, pubstate = {published}, tppubtype = {article} } Repeatability of study setups and reproducibility of research results by underlying data are major requirements in science. Until now, abstract models for describing the structural logic of studies in environmental sciences are lacking and tools for data management are insufficient. Mandatory for repeatability and reproducibility is the use of sophisticated data management solutions going beyond data file sharing. Particularly, it implies maintenance of coherent data along workflows. Design data concern elements from elementary domains of operations being transformation, measurement and transaction. Operation design elements and method information are specified for each consecutive workflow segment from field to laboratory campaigns. The strict linkage of operation design element values, operation values and objects is essential. For enabling coherence of corresponding objects along consecutive workflow segments, the assignment of unique identifiers and the specification of their relations are mandatory. The abstract model presented here addresses these aspects, and the software DiversityDescriptions (DWB-DD) facilitates the management of thusly connected digital data objects and structures. DWB-DD allows for an individual specification of operation design elements and their linking to objects. Two workflow design use cases, one for DNA barcoding and another for cultivation of fungal isolates, are given. To publish those structured data, standard schema mapping and XML-provision of digital objects are essential. Schemas useful for this mapping include the Ecological Markup Language, the Schema for Meta-omics Data of Collection Objects and the Standard for Structured Descriptive Data. Data pipelines with DWB-DD include the mapping and conversion between schemas and functions for data publishing and archiving according to the Open Archival Information System standard. The setting allows for repeatability of study setups, reproducibility of study results and for supporting work groups to structure and maintain their data from the beginning of a study. The theory of ‘FAIR++’ digital objects is introduced. |
Addink, Wouter; R Hardisty, Alex ‘openDS’ – Progress on the New Standard for Digital Specimens Journal Article Biodiversity Information Science and Standards, 4 , pp. e59338, 2020. Abstract | Links | BibTeX | Tags: @article{10.3897/biss.4.59338, title = {‘openDS’ – Progress on the New Standard for Digital Specimens}, author = {Wouter Addink and Alex R Hardisty}, url = {https://doi.org/10.3897/biss.4.59338}, doi = {10.3897/biss.4.59338}, year = {2020}, date = {2020-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {4}, pages = {e59338}, publisher = {Pensoft Publishers}, abstract = {In a Biodiversity_Next 2019 symposium, a vision of Digital Specimens based on the concept of a Digital Object Architecture (Kahn and Wilensky 2006) (DOA) was discussed as a new layer between data infrastructure of natural science collections and user applications for processing and interacting with information about specimens and collections. This vision would enable the transformation of institutional curatorial practises into joint community curation of the scientific data by providing seamless global access to specimens and collections spanning multiple collection-holding institutions and sources. A DOA-based implementation (Lannom et al. 2020) also offers wider, more flexible, and ‘FAIR’ (Findable, Accessible, Interoperable, Reusable) access for varied research and policy uses: recognising curatorial work, annotating with latest taxonomic treatments, understanding variations, working with DNA sequences or chemical analyses, supporting regulatory processes for health, food, security, sustainability and environmental change, inventions/products critical to the bio-economy, and educational uses.To make this vision a reality, a specification is needed that describes what a Digital Specimen is, and how to technically implement it. This specification is named 'openDS' for open Digital Specimen. It needs to describe how machines and humans can act on a Digital Specimen and gain attribution for their work; how the data can be serialized and packaged; and it needs to describe the object model (the scientific content part and its structure). The object model should describe how to include the specimen data itself as well as all data derived from the specimen, which is in principle the same as what the Extended Specimen model aims to describe. This part will therefore be developed in close collaboration with people working on that model.After the Biodiversity_Next symposium, the idea of a standard for Digital Specimens has been further discussed and detailed in a MOBILISE Workshop in Warsaw, 2020, with stakeholders like the GBIF, iDigBio, CETAF and DiSSCo. The workshop examined the technical basis of the new specification, agreed on scope and structure of the new specification and laid groundwork for future activities in the Research Data Alliance (RDA), Biodiversity Information Standards (TDWG), and technical workshops. A working group in the DiSSCo Prepare project has begun on the technical specification of the ‘open Digital Specimen’ (openDS). This specification will provide the definition of what a Digital Specimen is, its logical structure and content, and the operations permitted on that. The group is also working on a document with frequently asked questions.Realising the vision of Digital Specimen on a global level requires openDS to become a new TDWG standard and to be aligned with the vision for Extended Specimens. A TDWG Birds-of-a-Feather working session in September 2020 discusses and plans this further. The object model will include concepts from ABCD 3.0 and EFG extension for geo-sciences, and also extend from bco:MaterialSample in the OBO Foundry’s Biological Collection Ontology (BCO), which is linked to Darwin Core and from iao:InformationContentEntity in OBO Foundry's Information Artifact Ontology (IAO). openDS will also make use of the RDA/TDWG attribution metadata recommendation and other RDA recommendations. A publication is in preparation that describes the relationship with RDA recommendations in more detail, which will also be presented in the TDWG symposium.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In a Biodiversity_Next 2019 symposium, a vision of Digital Specimens based on the concept of a Digital Object Architecture (Kahn and Wilensky 2006) (DOA) was discussed as a new layer between data infrastructure of natural science collections and user applications for processing and interacting with information about specimens and collections. This vision would enable the transformation of institutional curatorial practises into joint community curation of the scientific data by providing seamless global access to specimens and collections spanning multiple collection-holding institutions and sources. A DOA-based implementation (Lannom et al. 2020) also offers wider, more flexible, and ‘FAIR’ (Findable, Accessible, Interoperable, Reusable) access for varied research and policy uses: recognising curatorial work, annotating with latest taxonomic treatments, understanding variations, working with DNA sequences or chemical analyses, supporting regulatory processes for health, food, security, sustainability and environmental change, inventions/products critical to the bio-economy, and educational uses.To make this vision a reality, a specification is needed that describes what a Digital Specimen is, and how to technically implement it. This specification is named 'openDS' for open Digital Specimen. It needs to describe how machines and humans can act on a Digital Specimen and gain attribution for their work; how the data can be serialized and packaged; and it needs to describe the object model (the scientific content part and its structure). The object model should describe how to include the specimen data itself as well as all data derived from the specimen, which is in principle the same as what the Extended Specimen model aims to describe. This part will therefore be developed in close collaboration with people working on that model.After the Biodiversity_Next symposium, the idea of a standard for Digital Specimens has been further discussed and detailed in a MOBILISE Workshop in Warsaw, 2020, with stakeholders like the GBIF, iDigBio, CETAF and DiSSCo. The workshop examined the technical basis of the new specification, agreed on scope and structure of the new specification and laid groundwork for future activities in the Research Data Alliance (RDA), Biodiversity Information Standards (TDWG), and technical workshops. A working group in the DiSSCo Prepare project has begun on the technical specification of the ‘open Digital Specimen’ (openDS). This specification will provide the definition of what a Digital Specimen is, its logical structure and content, and the operations permitted on that. The group is also working on a document with frequently asked questions.Realising the vision of Digital Specimen on a global level requires openDS to become a new TDWG standard and to be aligned with the vision for Extended Specimens. A TDWG Birds-of-a-Feather working session in September 2020 discusses and plans this further. The object model will include concepts from ABCD 3.0 and EFG extension for geo-sciences, and also extend from bco:MaterialSample in the OBO Foundry’s Biological Collection Ontology (BCO), which is linked to Darwin Core and from iao:InformationContentEntity in OBO Foundry's Information Artifact Ontology (IAO). openDS will also make use of the RDA/TDWG attribution metadata recommendation and other RDA recommendations. A publication is in preparation that describes the relationship with RDA recommendations in more detail, which will also be presented in the TDWG symposium. |
Woodburn, Matt; Paul, Deborah L; Addink, Wouter; Baskauf, Steven J; Blum, Stanley; Chapman, Cat; Grant, Sharon; Groom, Quentin; Jones, Janeen; Petersen, Mareike; Raes, Niels; Smith, David; Tilley, Laura; Trekels, Maarten; Trizna, Michael; Ulate, William; Vincent, Sarah; Walls, Ramona; Webbink, Kate; Zermoglio, Paula Unity in Variety: Developing a collection description standard by consensus Journal Article Biodiversity Information Science and Standards, 4 , pp. e59233, 2020. Abstract | Links | BibTeX | Tags: @article{10.3897/biss.4.59233, title = {Unity in Variety: Developing a collection description standard by consensus}, author = {Matt Woodburn and Deborah L Paul and Wouter Addink and Steven J Baskauf and Stanley Blum and Cat Chapman and Sharon Grant and Quentin Groom and Janeen Jones and Mareike Petersen and Niels Raes and David Smith and Laura Tilley and Maarten Trekels and Michael Trizna and William Ulate and Sarah Vincent and Ramona Walls and Kate Webbink and Paula Zermoglio}, url = {https://doi.org/10.3897/biss.4.59233}, doi = {10.3897/biss.4.59233}, year = {2020}, date = {2020-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {4}, pages = {e59233}, publisher = {Pensoft Publishers}, abstract = {Digitisation and publication of museum specimen data is happening worldwide, but far from complete. Museums can start by sharing what they know about their holdings at a higher level, long before each object has its own record. Information about what is held in collections worldwide is needed by many stakeholders including collections managers, funders, researchers, policy-makers, industry, and educators. To aggregate this information from collections, the data need to be standardised (Johnston and Robinson 2002). So, the Biodiversity Information Standards (TDWG) Collection Descriptions (CD) Task Group is developing a data standard for describing collections, which gives the ability to provide:automated metrics, using standardised collection descriptions and/or data derived from specimen datasets (e.g., counts of specimens) anda global registry of physical collections (i.e., digitised or non-digitised).Outputs will include a data model to underpin the new standard, and guidance and reference implementations for the practical use of the standard in institutional and collaborative data infrastructures.The Task Group employs a community-driven approach to standard development. With international participation, workshops at the Natural History Museum (London 2019) and the MOBILISE workshop (Warsaw 2020) allowed over 50 people to contribute this work. Our group organized online "barbecues" (BBQs) so that many more could contribute to standard definitions and address data model design challenges. Cloud-based tools (e.g., GitHub, Google Sheets) are used to organise and publish the group's work and make it easy to participate. A Wikibase instance is also used to test and demonstrate the model using real data.There are a range of global, regional, and national initiatives interested in the standard (see Task Group charter). Some, like GRSciColl (now at the Global Biodiversity Information Facility (GBIF)), Index Herbariorum (IH), and the iDigBio US Collections List are existing catalogues. Others, including the Consortium of European Taxonomic Facilities (CETAF) and the Distributed System of Scientific Collections (DiSSCo), include collection descriptions as a key part of their near-term development plans. As part of the EU-funded SYNTHESYS+ project, GBIF organized a virtual workshop: Advancing the Catalogue of the World's Natural History Collections to get international input for such a resource that would use this CD standard.Some major complexities present themselves in designing a standardised approach to represent collection descriptions data. It is not the first time that the natural science collections community has tried to address them (see the TDWG Natural Collections Description standard). Beyond natural sciences, the library community in particular gave thought to this (Heaney 2001, Johnston and Robinson 2002), noting significant difficulties. One hurdle is that collections may be broken down into different degrees of granularity according to different criteria, and may also overlap so that a single object can be represented in more than one collection description. Managing statistics such as numbers of objects is complex due to data gaps and variable degrees of certainty about collection contents. It also takes considerable effort from collections staff to generate structured data about their undigitised holdings. We need to support simple, high-level collection summaries as well as detailed quantitative data, and to be able to update as needed. We need a simple approach, but one that can also handle the complexities of data, scope, and social needs, for digitised and undigitised collections.The data standard itself is a defined set of classes and properties that can be used to represent groups of collection objects and their associated information. These incorporate common characteristics ('dimensions') by which we want to describe, group and break down our collections, metrics for quantifying those collections, and properties such as persistent identifiers for tracking collections and managing their digital counterparts. Existing terms from other standards (e.g. Darwin Core, ABCD) are re-used if possible.The data model (Fig. 1) underpinning the standard defines the relationships between those different classes, and ensures that the structure as well as the content are comparable across different datasets. It centres around the core concept of an 'object group', representing a set of physical objects that is defined by one or more dimensions (e.g., taxonomy and geographic origin), and linked to other entities such as the holding institution. To the object group, quantitative data about its contents are attached (e.g. counts of objects or taxa), along with more qualitative information describing the contents of the group as a whole. In this presentation, we will describe the draft standard and data model with examples of early adoption for real-world and example data. We will also discuss the vision of how the new standard may be adopted and its potential impact on collection discoverability across the collections community.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Digitisation and publication of museum specimen data is happening worldwide, but far from complete. Museums can start by sharing what they know about their holdings at a higher level, long before each object has its own record. Information about what is held in collections worldwide is needed by many stakeholders including collections managers, funders, researchers, policy-makers, industry, and educators. To aggregate this information from collections, the data need to be standardised (Johnston and Robinson 2002). So, the Biodiversity Information Standards (TDWG) Collection Descriptions (CD) Task Group is developing a data standard for describing collections, which gives the ability to provide:automated metrics, using standardised collection descriptions and/or data derived from specimen datasets (e.g., counts of specimens) anda global registry of physical collections (i.e., digitised or non-digitised).Outputs will include a data model to underpin the new standard, and guidance and reference implementations for the practical use of the standard in institutional and collaborative data infrastructures.The Task Group employs a community-driven approach to standard development. With international participation, workshops at the Natural History Museum (London 2019) and the MOBILISE workshop (Warsaw 2020) allowed over 50 people to contribute this work. Our group organized online "barbecues" (BBQs) so that many more could contribute to standard definitions and address data model design challenges. Cloud-based tools (e.g., GitHub, Google Sheets) are used to organise and publish the group's work and make it easy to participate. A Wikibase instance is also used to test and demonstrate the model using real data.There are a range of global, regional, and national initiatives interested in the standard (see Task Group charter). Some, like GRSciColl (now at the Global Biodiversity Information Facility (GBIF)), Index Herbariorum (IH), and the iDigBio US Collections List are existing catalogues. Others, including the Consortium of European Taxonomic Facilities (CETAF) and the Distributed System of Scientific Collections (DiSSCo), include collection descriptions as a key part of their near-term development plans. As part of the EU-funded SYNTHESYS+ project, GBIF organized a virtual workshop: Advancing the Catalogue of the World's Natural History Collections to get international input for such a resource that would use this CD standard.Some major complexities present themselves in designing a standardised approach to represent collection descriptions data. It is not the first time that the natural science collections community has tried to address them (see the TDWG Natural Collections Description standard). Beyond natural sciences, the library community in particular gave thought to this (Heaney 2001, Johnston and Robinson 2002), noting significant difficulties. One hurdle is that collections may be broken down into different degrees of granularity according to different criteria, and may also overlap so that a single object can be represented in more than one collection description. Managing statistics such as numbers of objects is complex due to data gaps and variable degrees of certainty about collection contents. It also takes considerable effort from collections staff to generate structured data about their undigitised holdings. We need to support simple, high-level collection summaries as well as detailed quantitative data, and to be able to update as needed. We need a simple approach, but one that can also handle the complexities of data, scope, and social needs, for digitised and undigitised collections.The data standard itself is a defined set of classes and properties that can be used to represent groups of collection objects and their associated information. These incorporate common characteristics ('dimensions') by which we want to describe, group and break down our collections, metrics for quantifying those collections, and properties such as persistent identifiers for tracking collections and managing their digital counterparts. Existing terms from other standards (e.g. Darwin Core, ABCD) are re-used if possible.The data model (Fig. 1) underpinning the standard defines the relationships between those different classes, and ensures that the structure as well as the content are comparable across different datasets. It centres around the core concept of an 'object group', representing a set of physical objects that is defined by one or more dimensions (e.g., taxonomy and geographic origin), and linked to other entities such as the holding institution. To the object group, quantitative data about its contents are attached (e.g. counts of objects or taxa), along with more qualitative information describing the contents of the group as a whole. In this presentation, we will describe the draft standard and data model with examples of early adoption for real-world and example data. We will also discuss the vision of how the new standard may be adopted and its potential impact on collection discoverability across the collections community. |
ResearchGate Link : https://www.researchgate.net/project/MOBILISE-COST-Action-CA17106-Mobilising-Data-Policies-and-Experts-in-Scientific-Collections