2022 |
Arnald Marcer Arthur D. Chapman, John Wieczorek Xavier Picó Francesc Uribe John Waller Arturo Ariño R F H ECOGRAPHY, 2022 , 2022, ISSN: 0906-7590, 1600-0587. Abstract | Links | BibTeX | Tags: ecological niche modelling (ENM), ecological research, GBIF, georeferencing, natural history collections, preserved specimens, species distribution modelling (SDM), Uncertainty @article{Marcer2022b, title = {Uncertainty matters: ascertaining where specimens in natural history collections come from and its implications for predicting species distributions}, author = {Arnald Marcer,Arthur D. Chapman,John R. Wieczorek,F. Xavier Picó,Francesc Uribe,John Waller,Arturo H. Ariño}, url = {https://onlinelibrary.wiley.com/doi/epdf/10.1111/ecog.06025}, doi = {/10.1111/ecog.06025}, issn = {0906-7590, 1600-0587}, year = {2022}, date = {2022-05-09}, journal = {ECOGRAPHY}, volume = {2022}, abstract = {Natural history collections (NHCs) represent an enormous and largely untapped wealth of information on the Earth’s biota, made available through GBIF as digital preserved specimen records. Precise knowledge of where the specimens were collected is paramount to rigorous ecological studies, especially in the field of species distribution modelling. Here, we present a first comprehensive analysis of georeferencing quality for all preserved specimen records served by GBIF, and illustrate the impact that coordinate uncertainty may have on predicted potential distributions. We used all GBIF preserved specimen records to analyse the availability of coordinates and associated spatial uncertainty across geography, spatial resolution, taxonomy, publishing institutions and collection time. We used three plant species across their native ranges in different parts of the world to show the impact of uncertainty on predicted potential distributions. We found that 38% of the 180+ million records provide coordinates only and 18% coordinates and uncertainty. Georeferencing quality is determined more by country of collection and publishing than by taxonomic group. Distinct georeferencing practices are more determinant than implicit characteristics and georeferencing difficulty of specimens. Availability and quality of records contrasts across world regions. Uncertainty values are not normally distributed but peak at very distinct values, which can be traced back to specific regions of the world. Uncertainty leads to a wide spectrum of range sizes when modelling species distributions, potentially affecting conclusions in biogeographical and climate change studies. In summary, the digitised fraction of the world’s NHCs are far from optimal in terms of georeferencing and quality mainly depends on where the collections are hosted. A collective effort between communities around NHC institutions, ecological research and data infrastructure is needed to bring the data on a par with its importance and relevance for ecological research.}, keywords = {ecological niche modelling (ENM), ecological research, GBIF, georeferencing, natural history collections, preserved specimens, species distribution modelling (SDM), Uncertainty}, pubstate = {published}, tppubtype = {article} } Natural history collections (NHCs) represent an enormous and largely untapped wealth of information on the Earth’s biota, made available through GBIF as digital preserved specimen records. Precise knowledge of where the specimens were collected is paramount to rigorous ecological studies, especially in the field of species distribution modelling. Here, we present a first comprehensive analysis of georeferencing quality for all preserved specimen records served by GBIF, and illustrate the impact that coordinate uncertainty may have on predicted potential distributions. We used all GBIF preserved specimen records to analyse the availability of coordinates and associated spatial uncertainty across geography, spatial resolution, taxonomy, publishing institutions and collection time. We used three plant species across their native ranges in different parts of the world to show the impact of uncertainty on predicted potential distributions. We found that 38% of the 180+ million records provide coordinates only and 18% coordinates and uncertainty. Georeferencing quality is determined more by country of collection and publishing than by taxonomic group. Distinct georeferencing practices are more determinant than implicit characteristics and georeferencing difficulty of specimens. Availability and quality of records contrasts across world regions. Uncertainty values are not normally distributed but peak at very distinct values, which can be traced back to specific regions of the world. Uncertainty leads to a wide spectrum of range sizes when modelling species distributions, potentially affecting conclusions in biogeographical and climate change studies. In summary, the digitised fraction of the world’s NHCs are far from optimal in terms of georeferencing and quality mainly depends on where the collections are hosted. A collective effort between communities around NHC institutions, ecological research and data infrastructure is needed to bring the data on a par with its importance and relevance for ecological research. |
Arnald Marcer Agustí Escobar, Víctor Garcia-Font Francesc Uribe Ali-Bey - an open collaborative georeferencing web application Journal Article 2022. Links | BibTeX | Tags: collaborative database, digital specimens, georeferencing, natural history collections, site name versioning, traceability, web application @article{Marcer2022, title = {Ali-Bey - an open collaborative georeferencing web application}, author = {Arnald Marcer, Agustí Escobar, Víctor Garcia-Font, Francesc Uribe}, doi = {https://doi.org/10.3897/BDJ.10.e81282}, year = {2022}, date = {2022-04-28}, keywords = {collaborative database, digital specimens, georeferencing, natural history collections, site name versioning, traceability, web application}, pubstate = {published}, tppubtype = {article} } |
2021 |
Marcer Arnald; Groom, Quentin; Haston Elspeth; Uribe Francesc Natural History Collections Georeferencing Survey Report. Current georeferencing practices across institutions worldwide. Journal Article 2021. Links | BibTeX | Tags: Botanical Garden, Data quality, georeferencing, Herbaria, Natural History Collections (NHC), Natural History Institutions Museum, Uncertainty @article{Marcer2021, title = {Natural History Collections Georeferencing Survey Report. Current georeferencing practices across institutions worldwide.}, author = { Marcer, Arnald; Groom, Quentin; Haston, Elspeth; Uribe, Francesc}, doi = {https://doi.org/10.5281/zenodo.4644529}, year = {2021}, date = {2021-03-30}, keywords = {Botanical Garden, Data quality, georeferencing, Herbaria, Natural History Collections (NHC), Natural History Institutions Museum, Uncertainty}, pubstate = {published}, tppubtype = {article} } |
2020 |
Arnald Marcer Elspeth Haston, Quentin Groom Arturo Ariño Arthur Chapman Torkild Bakken Paul Braun Mathias Dillen Marcus Ernst Agustí Escobar David Fichtmüller Laurence Livermore Nicky Nicolson Kaloust Paragamian Deborah Paul Lars Pettersson Sarah Phillips Jack Plummer Heimo Rainer Isabel Rey Tim Robertson Dominik Röpert Joaquim Santos Francesc Uribe John Waller John Wieczorek H D B R Quality issues in georeferencing: From physical collections to digital data repositories for ecological research Journal Article 2020. Links | BibTeX | Tags: eco-evolutionary research, georeferencing, global biodiversity information facility, natural history collections, uncertainty workshop @article{Marcer2020, title = {Quality issues in georeferencing: From physical collections to digital data repositories for ecological research}, author = {Arnald Marcer,Elspeth Haston,Quentin Groom,Arturo H. Ariño,Arthur D. Chapman,Torkild Bakken,Paul Braun,Mathias Dillen,Marcus Ernst,Agustí Escobar,David Fichtmüller,Laurence Livermore,Nicky Nicolson,Kaloust Paragamian,Deborah Paul,Lars B. Pettersson,Sarah Phillips,Jack Plummer,Heimo Rainer,Isabel Rey,Tim Robertson,Dominik Röpert,Joaquim Santos,Francesc Uribe,John Waller,John R. Wieczorek}, doi = {https://doi.org/10.1111/ddi.13208}, year = {2020}, date = {2020-12-03}, keywords = {eco-evolutionary research, georeferencing, global biodiversity information facility, natural history collections, uncertainty workshop}, pubstate = {published}, tppubtype = {article} } |
Hardy, Helen; Knapp, Sandra; Allan, Louise E; Berger, Frederik; Dixey, Katherine; Döme, Bernadette; Gagnier, Pierre-Yves; Frank, Jiri; Margaret Haston, Elspeth; Holstein, Joachim; Kiel, Steffen; Marschler, Maria; Mergen, Patricia; Phillips, Sarah; Rabinovich, Rivka; Chillón, Begoña Sanchez; V Sorensen, Martin; Thines, Marco; Trekels, Maarten; Vogt, Robert; Wilson, Scott; Wiltschke-Schrotta, Karin SYNTHESYS+ Virtual Access - Report on the Ideas Call (October to November 2019) Journal Article Research Ideas and Outcomes, 6 , pp. e50354, 2020. Abstract | Links | BibTeX | Tags: access, collaboration, digital data, digitisation, digitization, natural history collections, virtual data @article{10.3897/rio.6.e50354, title = {SYNTHESYS+ Virtual Access - Report on the Ideas Call (October to November 2019)}, author = {Helen Hardy and Sandra Knapp and Louise E Allan and Frederik Berger and Katherine Dixey and Bernadette Döme and Pierre-Yves Gagnier and Jiri Frank and Elspeth Margaret Haston and Joachim Holstein and Steffen Kiel and Maria Marschler and Patricia Mergen and Sarah Phillips and Rivka Rabinovich and Begoña Sanchez Chillón and Martin V Sorensen and Marco Thines and Maarten Trekels and Robert Vogt and Scott Wilson and Karin Wiltschke-Schrotta}, url = {https://doi.org/10.3897/rio.6.e50354}, doi = {10.3897/rio.6.e50354}, year = {2020}, date = {2020-01-01}, journal = {Research Ideas and Outcomes}, volume = {6}, pages = {e50354}, publisher = {Pensoft Publishers}, abstract = {The SYNTHESYS consortium has been operational since 2004, and has facilitated physical access by individual researchers to European natural history collections through its Transnational Access programme (TA). For the first time, SYNTHESYS+ will be offering virtual access to collections through digitisation, with two calls for the programme, the first in 2020 and the second in 2021. The Virtual Access (VA) programme is not a direct digital parallel of Transnational Access - proposals for collections digitisation will be prioritised and carried out based on community demand, and data must be made openly available immediately. A key feature of Virtual Access is that, unlike TA, it does not select the researchers to whom access is provided. Because Virtual Access in this way is new to the community and to the collections-holding institutions, the SYNTHESYS+ consortium invited ideas through an Ideas Call, that opened on 7th October 2019 and closed on 22nd November 2019, in order to assess interest and to trial procedures. This report is intended to provide feedback to those who participated in the Ideas Call and to help all applicants to the first SYNTHESYS+Virtual Access Call that will be launched on 20th of February 2020.}, keywords = {access, collaboration, digital data, digitisation, digitization, natural history collections, virtual data}, pubstate = {published}, tppubtype = {article} } The SYNTHESYS consortium has been operational since 2004, and has facilitated physical access by individual researchers to European natural history collections through its Transnational Access programme (TA). For the first time, SYNTHESYS+ will be offering virtual access to collections through digitisation, with two calls for the programme, the first in 2020 and the second in 2021. The Virtual Access (VA) programme is not a direct digital parallel of Transnational Access - proposals for collections digitisation will be prioritised and carried out based on community demand, and data must be made openly available immediately. A key feature of Virtual Access is that, unlike TA, it does not select the researchers to whom access is provided. Because Virtual Access in this way is new to the community and to the collections-holding institutions, the SYNTHESYS+ consortium invited ideas through an Ideas Call, that opened on 7th October 2019 and closed on 22nd November 2019, in order to assess interest and to trial procedures. This report is intended to provide feedback to those who participated in the Ideas Call and to help all applicants to the first SYNTHESYS+Virtual Access Call that will be launched on 20th of February 2020. |
Harjes, Janno; Link, Anton; Weibulat, Tanja; Triebel, Dagmar; Rambold, Gerhard Database, 2020 , 2020, ISSN: 1758-0463, (baaa059). Abstract | Links | BibTeX | Tags: @article{10.1093/database/baaa059b, title = {FAIR digital objects in environmental and life sciences should comprise workflow operation design data and method information for repeatability of study setups and reproducibility of results}, author = {Janno Harjes and Anton Link and Tanja Weibulat and Dagmar Triebel and Gerhard Rambold}, url = {https://doi.org/10.1093/database/baaa059}, doi = {10.1093/database/baaa059}, issn = {1758-0463}, year = {2020}, date = {2020-01-01}, journal = {Database}, volume = {2020}, abstract = {Repeatability of study setups and reproducibility of research results by underlying data are major requirements in science. Until now, abstract models for describing the structural logic of studies in environmental sciences are lacking and tools for data management are insufficient. Mandatory for repeatability and reproducibility is the use of sophisticated data management solutions going beyond data file sharing. Particularly, it implies maintenance of coherent data along workflows. Design data concern elements from elementary domains of operations being transformation, measurement and transaction. Operation design elements and method information are specified for each consecutive workflow segment from field to laboratory campaigns. The strict linkage of operation design element values, operation values and objects is essential. For enabling coherence of corresponding objects along consecutive workflow segments, the assignment of unique identifiers and the specification of their relations are mandatory. The abstract model presented here addresses these aspects, and the software DiversityDescriptions (DWB-DD) facilitates the management of thusly connected digital data objects and structures. DWB-DD allows for an individual specification of operation design elements and their linking to objects. Two workflow design use cases, one for DNA barcoding and another for cultivation of fungal isolates, are given. To publish those structured data, standard schema mapping and XML-provision of digital objects are essential. Schemas useful for this mapping include the Ecological Markup Language, the Schema for Meta-omics Data of Collection Objects and the Standard for Structured Descriptive Data. Data pipelines with DWB-DD include the mapping and conversion between schemas and functions for data publishing and archiving according to the Open Archival Information System standard. The setting allows for repeatability of study setups, reproducibility of study results and for supporting work groups to structure and maintain their data from the beginning of a study. The theory of ‘FAIR++’ digital objects is introduced.}, note = {baaa059}, keywords = {}, pubstate = {published}, tppubtype = {article} } Repeatability of study setups and reproducibility of research results by underlying data are major requirements in science. Until now, abstract models for describing the structural logic of studies in environmental sciences are lacking and tools for data management are insufficient. Mandatory for repeatability and reproducibility is the use of sophisticated data management solutions going beyond data file sharing. Particularly, it implies maintenance of coherent data along workflows. Design data concern elements from elementary domains of operations being transformation, measurement and transaction. Operation design elements and method information are specified for each consecutive workflow segment from field to laboratory campaigns. The strict linkage of operation design element values, operation values and objects is essential. For enabling coherence of corresponding objects along consecutive workflow segments, the assignment of unique identifiers and the specification of their relations are mandatory. The abstract model presented here addresses these aspects, and the software DiversityDescriptions (DWB-DD) facilitates the management of thusly connected digital data objects and structures. DWB-DD allows for an individual specification of operation design elements and their linking to objects. Two workflow design use cases, one for DNA barcoding and another for cultivation of fungal isolates, are given. To publish those structured data, standard schema mapping and XML-provision of digital objects are essential. Schemas useful for this mapping include the Ecological Markup Language, the Schema for Meta-omics Data of Collection Objects and the Standard for Structured Descriptive Data. Data pipelines with DWB-DD include the mapping and conversion between schemas and functions for data publishing and archiving according to the Open Archival Information System standard. The setting allows for repeatability of study setups, reproducibility of study results and for supporting work groups to structure and maintain their data from the beginning of a study. The theory of ‘FAIR++’ digital objects is introduced. |
Addink, Wouter; R Hardisty, Alex ‘openDS’ – Progress on the New Standard for Digital Specimens Journal Article Biodiversity Information Science and Standards, 4 , pp. e59338, 2020. Abstract | Links | BibTeX | Tags: @article{10.3897/biss.4.59338, title = {‘openDS’ – Progress on the New Standard for Digital Specimens}, author = {Wouter Addink and Alex R Hardisty}, url = {https://doi.org/10.3897/biss.4.59338}, doi = {10.3897/biss.4.59338}, year = {2020}, date = {2020-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {4}, pages = {e59338}, publisher = {Pensoft Publishers}, abstract = {In a Biodiversity_Next 2019 symposium, a vision of Digital Specimens based on the concept of a Digital Object Architecture (Kahn and Wilensky 2006) (DOA) was discussed as a new layer between data infrastructure of natural science collections and user applications for processing and interacting with information about specimens and collections. This vision would enable the transformation of institutional curatorial practises into joint community curation of the scientific data by providing seamless global access to specimens and collections spanning multiple collection-holding institutions and sources. A DOA-based implementation (Lannom et al. 2020) also offers wider, more flexible, and ‘FAIR’ (Findable, Accessible, Interoperable, Reusable) access for varied research and policy uses: recognising curatorial work, annotating with latest taxonomic treatments, understanding variations, working with DNA sequences or chemical analyses, supporting regulatory processes for health, food, security, sustainability and environmental change, inventions/products critical to the bio-economy, and educational uses.To make this vision a reality, a specification is needed that describes what a Digital Specimen is, and how to technically implement it. This specification is named 'openDS' for open Digital Specimen. It needs to describe how machines and humans can act on a Digital Specimen and gain attribution for their work; how the data can be serialized and packaged; and it needs to describe the object model (the scientific content part and its structure). The object model should describe how to include the specimen data itself as well as all data derived from the specimen, which is in principle the same as what the Extended Specimen model aims to describe. This part will therefore be developed in close collaboration with people working on that model.After the Biodiversity_Next symposium, the idea of a standard for Digital Specimens has been further discussed and detailed in a MOBILISE Workshop in Warsaw, 2020, with stakeholders like the GBIF, iDigBio, CETAF and DiSSCo. The workshop examined the technical basis of the new specification, agreed on scope and structure of the new specification and laid groundwork for future activities in the Research Data Alliance (RDA), Biodiversity Information Standards (TDWG), and technical workshops. A working group in the DiSSCo Prepare project has begun on the technical specification of the ‘open Digital Specimen’ (openDS). This specification will provide the definition of what a Digital Specimen is, its logical structure and content, and the operations permitted on that. The group is also working on a document with frequently asked questions.Realising the vision of Digital Specimen on a global level requires openDS to become a new TDWG standard and to be aligned with the vision for Extended Specimens. A TDWG Birds-of-a-Feather working session in September 2020 discusses and plans this further. The object model will include concepts from ABCD 3.0 and EFG extension for geo-sciences, and also extend from bco:MaterialSample in the OBO Foundry’s Biological Collection Ontology (BCO), which is linked to Darwin Core and from iao:InformationContentEntity in OBO Foundry's Information Artifact Ontology (IAO). openDS will also make use of the RDA/TDWG attribution metadata recommendation and other RDA recommendations. A publication is in preparation that describes the relationship with RDA recommendations in more detail, which will also be presented in the TDWG symposium.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In a Biodiversity_Next 2019 symposium, a vision of Digital Specimens based on the concept of a Digital Object Architecture (Kahn and Wilensky 2006) (DOA) was discussed as a new layer between data infrastructure of natural science collections and user applications for processing and interacting with information about specimens and collections. This vision would enable the transformation of institutional curatorial practises into joint community curation of the scientific data by providing seamless global access to specimens and collections spanning multiple collection-holding institutions and sources. A DOA-based implementation (Lannom et al. 2020) also offers wider, more flexible, and ‘FAIR’ (Findable, Accessible, Interoperable, Reusable) access for varied research and policy uses: recognising curatorial work, annotating with latest taxonomic treatments, understanding variations, working with DNA sequences or chemical analyses, supporting regulatory processes for health, food, security, sustainability and environmental change, inventions/products critical to the bio-economy, and educational uses.To make this vision a reality, a specification is needed that describes what a Digital Specimen is, and how to technically implement it. This specification is named 'openDS' for open Digital Specimen. It needs to describe how machines and humans can act on a Digital Specimen and gain attribution for their work; how the data can be serialized and packaged; and it needs to describe the object model (the scientific content part and its structure). The object model should describe how to include the specimen data itself as well as all data derived from the specimen, which is in principle the same as what the Extended Specimen model aims to describe. This part will therefore be developed in close collaboration with people working on that model.After the Biodiversity_Next symposium, the idea of a standard for Digital Specimens has been further discussed and detailed in a MOBILISE Workshop in Warsaw, 2020, with stakeholders like the GBIF, iDigBio, CETAF and DiSSCo. The workshop examined the technical basis of the new specification, agreed on scope and structure of the new specification and laid groundwork for future activities in the Research Data Alliance (RDA), Biodiversity Information Standards (TDWG), and technical workshops. A working group in the DiSSCo Prepare project has begun on the technical specification of the ‘open Digital Specimen’ (openDS). This specification will provide the definition of what a Digital Specimen is, its logical structure and content, and the operations permitted on that. The group is also working on a document with frequently asked questions.Realising the vision of Digital Specimen on a global level requires openDS to become a new TDWG standard and to be aligned with the vision for Extended Specimens. A TDWG Birds-of-a-Feather working session in September 2020 discusses and plans this further. The object model will include concepts from ABCD 3.0 and EFG extension for geo-sciences, and also extend from bco:MaterialSample in the OBO Foundry’s Biological Collection Ontology (BCO), which is linked to Darwin Core and from iao:InformationContentEntity in OBO Foundry's Information Artifact Ontology (IAO). openDS will also make use of the RDA/TDWG attribution metadata recommendation and other RDA recommendations. A publication is in preparation that describes the relationship with RDA recommendations in more detail, which will also be presented in the TDWG symposium. |
Woodburn, Matt; Paul, Deborah L; Addink, Wouter; Baskauf, Steven J; Blum, Stanley; Chapman, Cat; Grant, Sharon; Groom, Quentin; Jones, Janeen; Petersen, Mareike; Raes, Niels; Smith, David; Tilley, Laura; Trekels, Maarten; Trizna, Michael; Ulate, William; Vincent, Sarah; Walls, Ramona; Webbink, Kate; Zermoglio, Paula Unity in Variety: Developing a collection description standard by consensus Journal Article Biodiversity Information Science and Standards, 4 , pp. e59233, 2020. Abstract | Links | BibTeX | Tags: @article{10.3897/biss.4.59233, title = {Unity in Variety: Developing a collection description standard by consensus}, author = {Matt Woodburn and Deborah L Paul and Wouter Addink and Steven J Baskauf and Stanley Blum and Cat Chapman and Sharon Grant and Quentin Groom and Janeen Jones and Mareike Petersen and Niels Raes and David Smith and Laura Tilley and Maarten Trekels and Michael Trizna and William Ulate and Sarah Vincent and Ramona Walls and Kate Webbink and Paula Zermoglio}, url = {https://doi.org/10.3897/biss.4.59233}, doi = {10.3897/biss.4.59233}, year = {2020}, date = {2020-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {4}, pages = {e59233}, publisher = {Pensoft Publishers}, abstract = {Digitisation and publication of museum specimen data is happening worldwide, but far from complete. Museums can start by sharing what they know about their holdings at a higher level, long before each object has its own record. Information about what is held in collections worldwide is needed by many stakeholders including collections managers, funders, researchers, policy-makers, industry, and educators. To aggregate this information from collections, the data need to be standardised (Johnston and Robinson 2002). So, the Biodiversity Information Standards (TDWG) Collection Descriptions (CD) Task Group is developing a data standard for describing collections, which gives the ability to provide:automated metrics, using standardised collection descriptions and/or data derived from specimen datasets (e.g., counts of specimens) anda global registry of physical collections (i.e., digitised or non-digitised).Outputs will include a data model to underpin the new standard, and guidance and reference implementations for the practical use of the standard in institutional and collaborative data infrastructures.The Task Group employs a community-driven approach to standard development. With international participation, workshops at the Natural History Museum (London 2019) and the MOBILISE workshop (Warsaw 2020) allowed over 50 people to contribute this work. Our group organized online "barbecues" (BBQs) so that many more could contribute to standard definitions and address data model design challenges. Cloud-based tools (e.g., GitHub, Google Sheets) are used to organise and publish the group's work and make it easy to participate. A Wikibase instance is also used to test and demonstrate the model using real data.There are a range of global, regional, and national initiatives interested in the standard (see Task Group charter). Some, like GRSciColl (now at the Global Biodiversity Information Facility (GBIF)), Index Herbariorum (IH), and the iDigBio US Collections List are existing catalogues. Others, including the Consortium of European Taxonomic Facilities (CETAF) and the Distributed System of Scientific Collections (DiSSCo), include collection descriptions as a key part of their near-term development plans. As part of the EU-funded SYNTHESYS+ project, GBIF organized a virtual workshop: Advancing the Catalogue of the World's Natural History Collections to get international input for such a resource that would use this CD standard.Some major complexities present themselves in designing a standardised approach to represent collection descriptions data. It is not the first time that the natural science collections community has tried to address them (see the TDWG Natural Collections Description standard). Beyond natural sciences, the library community in particular gave thought to this (Heaney 2001, Johnston and Robinson 2002), noting significant difficulties. One hurdle is that collections may be broken down into different degrees of granularity according to different criteria, and may also overlap so that a single object can be represented in more than one collection description. Managing statistics such as numbers of objects is complex due to data gaps and variable degrees of certainty about collection contents. It also takes considerable effort from collections staff to generate structured data about their undigitised holdings. We need to support simple, high-level collection summaries as well as detailed quantitative data, and to be able to update as needed. We need a simple approach, but one that can also handle the complexities of data, scope, and social needs, for digitised and undigitised collections.The data standard itself is a defined set of classes and properties that can be used to represent groups of collection objects and their associated information. These incorporate common characteristics ('dimensions') by which we want to describe, group and break down our collections, metrics for quantifying those collections, and properties such as persistent identifiers for tracking collections and managing their digital counterparts. Existing terms from other standards (e.g. Darwin Core, ABCD) are re-used if possible.The data model (Fig. 1) underpinning the standard defines the relationships between those different classes, and ensures that the structure as well as the content are comparable across different datasets. It centres around the core concept of an 'object group', representing a set of physical objects that is defined by one or more dimensions (e.g., taxonomy and geographic origin), and linked to other entities such as the holding institution. To the object group, quantitative data about its contents are attached (e.g. counts of objects or taxa), along with more qualitative information describing the contents of the group as a whole. In this presentation, we will describe the draft standard and data model with examples of early adoption for real-world and example data. We will also discuss the vision of how the new standard may be adopted and its potential impact on collection discoverability across the collections community.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Digitisation and publication of museum specimen data is happening worldwide, but far from complete. Museums can start by sharing what they know about their holdings at a higher level, long before each object has its own record. Information about what is held in collections worldwide is needed by many stakeholders including collections managers, funders, researchers, policy-makers, industry, and educators. To aggregate this information from collections, the data need to be standardised (Johnston and Robinson 2002). So, the Biodiversity Information Standards (TDWG) Collection Descriptions (CD) Task Group is developing a data standard for describing collections, which gives the ability to provide:automated metrics, using standardised collection descriptions and/or data derived from specimen datasets (e.g., counts of specimens) anda global registry of physical collections (i.e., digitised or non-digitised).Outputs will include a data model to underpin the new standard, and guidance and reference implementations for the practical use of the standard in institutional and collaborative data infrastructures.The Task Group employs a community-driven approach to standard development. With international participation, workshops at the Natural History Museum (London 2019) and the MOBILISE workshop (Warsaw 2020) allowed over 50 people to contribute this work. Our group organized online "barbecues" (BBQs) so that many more could contribute to standard definitions and address data model design challenges. Cloud-based tools (e.g., GitHub, Google Sheets) are used to organise and publish the group's work and make it easy to participate. A Wikibase instance is also used to test and demonstrate the model using real data.There are a range of global, regional, and national initiatives interested in the standard (see Task Group charter). Some, like GRSciColl (now at the Global Biodiversity Information Facility (GBIF)), Index Herbariorum (IH), and the iDigBio US Collections List are existing catalogues. Others, including the Consortium of European Taxonomic Facilities (CETAF) and the Distributed System of Scientific Collections (DiSSCo), include collection descriptions as a key part of their near-term development plans. As part of the EU-funded SYNTHESYS+ project, GBIF organized a virtual workshop: Advancing the Catalogue of the World's Natural History Collections to get international input for such a resource that would use this CD standard.Some major complexities present themselves in designing a standardised approach to represent collection descriptions data. It is not the first time that the natural science collections community has tried to address them (see the TDWG Natural Collections Description standard). Beyond natural sciences, the library community in particular gave thought to this (Heaney 2001, Johnston and Robinson 2002), noting significant difficulties. One hurdle is that collections may be broken down into different degrees of granularity according to different criteria, and may also overlap so that a single object can be represented in more than one collection description. Managing statistics such as numbers of objects is complex due to data gaps and variable degrees of certainty about collection contents. It also takes considerable effort from collections staff to generate structured data about their undigitised holdings. We need to support simple, high-level collection summaries as well as detailed quantitative data, and to be able to update as needed. We need a simple approach, but one that can also handle the complexities of data, scope, and social needs, for digitised and undigitised collections.The data standard itself is a defined set of classes and properties that can be used to represent groups of collection objects and their associated information. These incorporate common characteristics ('dimensions') by which we want to describe, group and break down our collections, metrics for quantifying those collections, and properties such as persistent identifiers for tracking collections and managing their digital counterparts. Existing terms from other standards (e.g. Darwin Core, ABCD) are re-used if possible.The data model (Fig. 1) underpinning the standard defines the relationships between those different classes, and ensures that the structure as well as the content are comparable across different datasets. It centres around the core concept of an 'object group', representing a set of physical objects that is defined by one or more dimensions (e.g., taxonomy and geographic origin), and linked to other entities such as the holding institution. To the object group, quantitative data about its contents are attached (e.g. counts of objects or taxa), along with more qualitative information describing the contents of the group as a whole. In this presentation, we will describe the draft standard and data model with examples of early adoption for real-world and example data. We will also discuss the vision of how the new standard may be adopted and its potential impact on collection discoverability across the collections community. |
2019 |
Addink, Wouter; Koureas, Dimitrios; Casino Rubio, Ana DiSSCo as a New Regional Model for Scientific Collections in Europe Journal Article Biodiversity Information Science and Standards, 3 , pp. e37502, 2019. Abstract | Links | BibTeX | Tags: DiSSCo, Europe, natural history, research infrastructures, RI, scientific collections, specimens @article{10.3897/biss.3.37502, title = {DiSSCo as a New Regional Model for Scientific Collections in Europe}, author = {Wouter Addink and Dimitrios Koureas and Ana Casino Rubio}, url = {https://doi.org/10.3897/biss.3.37502}, doi = {10.3897/biss.3.37502}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e37502}, publisher = {Pensoft Publishers}, abstract = {European Natural Science Collections (NSC) are part of the global natural and cultural capital and represent 80% of the world bio-and geo-diversity. Data derived from these collections underpin thousands of scholarly publications and official reports (used to support legislative and regulatory processes relating to health, food, security, sustainability and environmental change) and let to inventions and products that today play an important role in our bio-economy. In the last decades, the research practice in natural sciences changed dramatically. Advances in digital, genomic and information technologies enable natural science collections to provide new insights but also ask for changing the current operational and business models of individual collections held at local natural history museums and universities. A new business model that provides unified access to collection objects and all scientific data derived from them. Although aggregating infrastructures like the Global Biodiversity Information Facility, GenBank and Catalogue of Life now successfully aggregate data on specific data classes, the landscape remains fragmented with limited capacity to bring together this information in a systematic and robust manner and with scattered access to the physical objects. The Distributed System of Scientific Collections (DiSSCo) represents a pan-European initiative, and the largest ever agreement of natural science museums, to jointly address the fragmentation of European collections. DiSSCo is unifying European natural science collections into a coherent new research infrastructure, able to provide bio- and geo-diversity data at the scale, form and precision required by a multi-disciplinary user base in science. DiSSCo is harmonising digitisation, curation and publication processes and workflows across the scientific collections in Europe and enables linking of occurrence, genomic, chemical and morphological data classes as well as publications and experts to the physical object. In this paper we will present the socio-cultural and governance aspects of this research infrastructure. DiSSCo is receiving political support from 11 countries in Europe and will gradually change its funding model from institutional to national funding, with temporary funding from the EC to support the preparation and development. Solutions to achieve large scale digitisation are currently designed in the EC funded ICEDIG project to underpin the future large scale digitisation carried out by the countries. Unified virtual (digitisation on demand) and transnational physical access to the collections is over the next four years being developed in the EC funded SYNTHESYS+ project. The governance of DiSSCo is designed to gradually change from a steering committee composed of a few large natural history museums contributing in cash to initiate the development into a legal entity in which national consortia are represented, with a central coordination office for daily management. Each country individually decides how its entities (scientific collection facilities, research councils, governmental bodies) are organised in their national consortium. A stakeholder and user forum, Scientific Advisory Board and International Advisory Board will ensure that DiSSCo will be functional in enabling science across disciplines and within the international landscape of infrastructures. Training and short scientific missions are being developed in the MOBILISE COST Action to build capacity in FAIR data production, publication and usage of scientific collection-derived data in Europe and to initiate the socio-cultural changes needed in the collection-holding institutes. A Helpdesk is being constructed in the SYNTHESYS+ and DiSSCo Prepare projects to further facilitate the use and scientific use cases have been collected in ICEDIG to develop and facilitate e-services tailored to scientific needs.}, keywords = {DiSSCo, Europe, natural history, research infrastructures, RI, scientific collections, specimens}, pubstate = {published}, tppubtype = {article} } European Natural Science Collections (NSC) are part of the global natural and cultural capital and represent 80% of the world bio-and geo-diversity. Data derived from these collections underpin thousands of scholarly publications and official reports (used to support legislative and regulatory processes relating to health, food, security, sustainability and environmental change) and let to inventions and products that today play an important role in our bio-economy. In the last decades, the research practice in natural sciences changed dramatically. Advances in digital, genomic and information technologies enable natural science collections to provide new insights but also ask for changing the current operational and business models of individual collections held at local natural history museums and universities. A new business model that provides unified access to collection objects and all scientific data derived from them. Although aggregating infrastructures like the Global Biodiversity Information Facility, GenBank and Catalogue of Life now successfully aggregate data on specific data classes, the landscape remains fragmented with limited capacity to bring together this information in a systematic and robust manner and with scattered access to the physical objects. The Distributed System of Scientific Collections (DiSSCo) represents a pan-European initiative, and the largest ever agreement of natural science museums, to jointly address the fragmentation of European collections. DiSSCo is unifying European natural science collections into a coherent new research infrastructure, able to provide bio- and geo-diversity data at the scale, form and precision required by a multi-disciplinary user base in science. DiSSCo is harmonising digitisation, curation and publication processes and workflows across the scientific collections in Europe and enables linking of occurrence, genomic, chemical and morphological data classes as well as publications and experts to the physical object. In this paper we will present the socio-cultural and governance aspects of this research infrastructure. DiSSCo is receiving political support from 11 countries in Europe and will gradually change its funding model from institutional to national funding, with temporary funding from the EC to support the preparation and development. Solutions to achieve large scale digitisation are currently designed in the EC funded ICEDIG project to underpin the future large scale digitisation carried out by the countries. Unified virtual (digitisation on demand) and transnational physical access to the collections is over the next four years being developed in the EC funded SYNTHESYS+ project. The governance of DiSSCo is designed to gradually change from a steering committee composed of a few large natural history museums contributing in cash to initiate the development into a legal entity in which national consortia are represented, with a central coordination office for daily management. Each country individually decides how its entities (scientific collection facilities, research councils, governmental bodies) are organised in their national consortium. A stakeholder and user forum, Scientific Advisory Board and International Advisory Board will ensure that DiSSCo will be functional in enabling science across disciplines and within the international landscape of infrastructures. Training and short scientific missions are being developed in the MOBILISE COST Action to build capacity in FAIR data production, publication and usage of scientific collection-derived data in Europe and to initiate the socio-cultural changes needed in the collection-holding institutes. A Helpdesk is being constructed in the SYNTHESYS+ and DiSSCo Prepare projects to further facilitate the use and scientific use cases have been collected in ICEDIG to develop and facilitate e-services tailored to scientific needs. |
Casino, Ana; Raes, Niels; Addink, Wouter; Woodburn, Matt Collections Digitization and Assessment Dashboard, a Tool for Supporting Informed Decisions Journal Article Biodiversity Information Science and Standards, 3 , pp. e37505, 2019. Abstract | Links | BibTeX | Tags: alignment, biodiversity and geodiversity, dashboard, digitization, DiSSCo, high-level information, informed decision-making, institutional description, mechanisms, natural science collections, research infrastructure, tools, visualization @article{10.3897/biss.3.37505, title = {Collections Digitization and Assessment Dashboard, a Tool for Supporting Informed Decisions}, author = {Ana Casino and Niels Raes and Wouter Addink and Matt Woodburn}, url = {https://doi.org/10.3897/biss.3.37505}, doi = {10.3897/biss.3.37505}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e37505}, publisher = {Pensoft Publishers}, abstract = {Natural Science Collections (NSCs) contain specimen-related data from which we extract valuable information for science and policy. Openness of those collections facilitates development of science. Moreover, virtual accessibility to physical containers by means of their digitization will allow an exponential increase in the level of available information. Digitization of collections will allow us to set a comprehensive registry of reliable, accurate, updated, comparable and interconnected information. Equally, the scope of interested potential users will largely expand and so will the different levels of granularity required by researchers, institutions and governmental bodies. Meeting diverse needs entails a special effort in data management and data analysis to extract, digest and present information on a compressed but still precise and objective-oriented format. The Collections Digitisation Dashboard (CDD) underpins such an attempt. The CDD stands as a practical tool that specifically aims to support high-level decisions with a wide coverage of data, by providing a visual, simplified and structured arrangement that will allow discovery of key indicators concerning digitization of bio- and geodiversity collections. The realm of possible approaches to the CDD covers levels of digitization, collection exceptionality, resourceavailability and many others. Still all those different angles need to be aligned and processed at once to provide an overall overview of the status of NSCs in the digitization process and analyse its further development. The CDD is a powerful mechanism to identify priorities, specialisation lines together with regional development, gaps and niches and future capabilities as well, and strengths and weaknesses across collections, institutions, countries and regions. It can perfectly underpin measurable and comparable assessments, with evolution indexes and progress indicators, all under an overarching homogenous approach. The Distributed System of Scientific Collections (DiSSCo) Research Infrastructure, currently in its preparatory phase, is built on top of the largest ever community of collections-related institutions across Europe and anchored on the Consortium of European Taxonomic Facilities (CETAF). It aims to provide a unique virtual access point to NSCs by facilitating a large and massive digitisation effort throughout Europe. Setting up priorities and specialization areas is pivotal to its success. To that end, the DiSSCo CDD will provide a valuation tool to summarize and showcase NSC's digitization status on a first-hand visualization. Different projects and initiatives will contribute, jointly and on a synergetic basis, to the production of the DiSSCo CDD. The ICEDIG project will address its basics features, terms of classification and tiers of information, and will produce a prototype and a set of recommendations on how to better attempt a massive dashboard by collating specific collections-based information and defining global strategic representations. CETAF working groups on collections and digitization will provide the desired homogeneity in describing and capturing the different implementation requirements from the users’ perspectives, which will be complemented by the contributions made under the umbrella of the COST Action MOBILISE. The Action will use networking activities to identify the right standards and policies to enable enlarging the scope of the DiSSCo CDD and its broader implementation by linking to the TDWG criteria and adopted standards. Complementarily, the ELViS platform to be developed under the SYNTHESYS+ project will provide the right virtual environment. Furthermore, SYNTHESYS+ will address the assessment capabilities of the CDD to enable the visual representation becoming a practical assessment mechanism and endow it with a dynamic feature for analysis over the time. The DiSSCo CDD will thus become an instrumental mechanism for decision-taking that will be embedded into the clustering initiative of products and services provided to the EOSC by the ENVRI-FAIR project in the environmental domain.}, keywords = {alignment, biodiversity and geodiversity, dashboard, digitization, DiSSCo, high-level information, informed decision-making, institutional description, mechanisms, natural science collections, research infrastructure, tools, visualization}, pubstate = {published}, tppubtype = {article} } Natural Science Collections (NSCs) contain specimen-related data from which we extract valuable information for science and policy. Openness of those collections facilitates development of science. Moreover, virtual accessibility to physical containers by means of their digitization will allow an exponential increase in the level of available information. Digitization of collections will allow us to set a comprehensive registry of reliable, accurate, updated, comparable and interconnected information. Equally, the scope of interested potential users will largely expand and so will the different levels of granularity required by researchers, institutions and governmental bodies. Meeting diverse needs entails a special effort in data management and data analysis to extract, digest and present information on a compressed but still precise and objective-oriented format. The Collections Digitisation Dashboard (CDD) underpins such an attempt. The CDD stands as a practical tool that specifically aims to support high-level decisions with a wide coverage of data, by providing a visual, simplified and structured arrangement that will allow discovery of key indicators concerning digitization of bio- and geodiversity collections. The realm of possible approaches to the CDD covers levels of digitization, collection exceptionality, resourceavailability and many others. Still all those different angles need to be aligned and processed at once to provide an overall overview of the status of NSCs in the digitization process and analyse its further development. The CDD is a powerful mechanism to identify priorities, specialisation lines together with regional development, gaps and niches and future capabilities as well, and strengths and weaknesses across collections, institutions, countries and regions. It can perfectly underpin measurable and comparable assessments, with evolution indexes and progress indicators, all under an overarching homogenous approach. The Distributed System of Scientific Collections (DiSSCo) Research Infrastructure, currently in its preparatory phase, is built on top of the largest ever community of collections-related institutions across Europe and anchored on the Consortium of European Taxonomic Facilities (CETAF). It aims to provide a unique virtual access point to NSCs by facilitating a large and massive digitisation effort throughout Europe. Setting up priorities and specialization areas is pivotal to its success. To that end, the DiSSCo CDD will provide a valuation tool to summarize and showcase NSC's digitization status on a first-hand visualization. Different projects and initiatives will contribute, jointly and on a synergetic basis, to the production of the DiSSCo CDD. The ICEDIG project will address its basics features, terms of classification and tiers of information, and will produce a prototype and a set of recommendations on how to better attempt a massive dashboard by collating specific collections-based information and defining global strategic representations. CETAF working groups on collections and digitization will provide the desired homogeneity in describing and capturing the different implementation requirements from the users’ perspectives, which will be complemented by the contributions made under the umbrella of the COST Action MOBILISE. The Action will use networking activities to identify the right standards and policies to enable enlarging the scope of the DiSSCo CDD and its broader implementation by linking to the TDWG criteria and adopted standards. Complementarily, the ELViS platform to be developed under the SYNTHESYS+ project will provide the right virtual environment. Furthermore, SYNTHESYS+ will address the assessment capabilities of the CDD to enable the visual representation becoming a practical assessment mechanism and endow it with a dynamic feature for analysis over the time. The DiSSCo CDD will thus become an instrumental mechanism for decision-taking that will be embedded into the clustering initiative of products and services provided to the EOSC by the ENVRI-FAIR project in the environmental domain. |
B Georgiev, Boyko; Casino, Ana; Voreadou, Catherina Training Taxonomists for the Digital World: Are we prepared? Journal Article Biodiversity Information Science and Standards, 3 , pp. e36106, 2019. Abstract | Links | BibTeX | Tags: capacity building, Digital knowledge, digital skills, DiSSCo, education, MOBILISE, natural history collections, research, taxonomy, training, young researchers @article{10.3897/biss.3.36106, title = {Training Taxonomists for the Digital World: Are we prepared?}, author = {Boyko B Georgiev and Ana Casino and Catherina Voreadou}, url = {https://doi.org/10.3897/biss.3.36106}, doi = {10.3897/biss.3.36106}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e36106}, publisher = {Pensoft Publishers}, abstract = {Digital knowledge and skills are rapidly becoming integral part of the work of the modern taxonomist. Their importance is further increased with the recent recognition of DiSSCo (Distributed System of Scientific Collections, https://dissco.eu). This new pan-European research infrastructure envisions placing European natural science collections at the centre of data-intensive scientific excellence and innovation for taxonomic and environmental research, food security, health and the bioeconomy. The mission of this ambitious project is to mobilise, unify and deliver bio- and geo-diversity information at the scale, form and precision required by scientific communities as well as to transform a fragmented landscape into a coherent and responsive research infrastructure. An important step in improving the capacity of the research community underpinning DiSSCo is the COST Action MOBILISE (Mobilising Data, Policies and Experts in Scientific Collections, https://www.mobilise-action.eu). One of major capacity-building objectives is to facilitate implementation of common standards and newly-developed techniques by training and education. Its achievement is envisaged by standardised training modules such as training courses, workshops, webinars, online tutorials and short-term visits to other research units. The first impression from surveying interests of candidates to be included into training events, demonstrates an uneven distribution of digital knowledge and skills across countries, institutions and generations. We advocate that a massive coordinated training programme may result in more efficient establishment of common standards and, consequently, better implementation of the forthcoming joint efforts in the development of the new pan-European research infrastricture.}, keywords = {capacity building, Digital knowledge, digital skills, DiSSCo, education, MOBILISE, natural history collections, research, taxonomy, training, young researchers}, pubstate = {published}, tppubtype = {article} } Digital knowledge and skills are rapidly becoming integral part of the work of the modern taxonomist. Their importance is further increased with the recent recognition of DiSSCo (Distributed System of Scientific Collections, https://dissco.eu). This new pan-European research infrastructure envisions placing European natural science collections at the centre of data-intensive scientific excellence and innovation for taxonomic and environmental research, food security, health and the bioeconomy. The mission of this ambitious project is to mobilise, unify and deliver bio- and geo-diversity information at the scale, form and precision required by scientific communities as well as to transform a fragmented landscape into a coherent and responsive research infrastructure. An important step in improving the capacity of the research community underpinning DiSSCo is the COST Action MOBILISE (Mobilising Data, Policies and Experts in Scientific Collections, https://www.mobilise-action.eu). One of major capacity-building objectives is to facilitate implementation of common standards and newly-developed techniques by training and education. Its achievement is envisaged by standardised training modules such as training courses, workshops, webinars, online tutorials and short-term visits to other research units. The first impression from surveying interests of candidates to be included into training events, demonstrates an uneven distribution of digital knowledge and skills across countries, institutions and generations. We advocate that a massive coordinated training programme may result in more efficient establishment of common standards and, consequently, better implementation of the forthcoming joint efforts in the development of the new pan-European research infrastricture. |
J. Groom, Quentin; Besombes, Chloé; Brown, Josh; Chagnoux, Simon; Georgiev, Teodor; Kearney, Nicole; Marcer, Arnald; Nicolson, Nicky; Page, Roderic; Phillips, Sarah; Rainer, Heimo; Riccardi, Greg; Röpert, Dominik; Peter Shorthouse, David; Stoev, Pavel; Margaret Haston, Elspeth Progress in Authority Management of People Names for Collections Journal Article Biodiversity Information Science and Standards, 3 , pp. e35074, 2019. Abstract | Links | BibTeX | Tags: authority file, identifer, ORCID, PID, VIAF, Wikidata @article{10.3897/biss.3.35074, title = {Progress in Authority Management of People Names for Collections}, author = {Quentin J. Groom and Chloé Besombes and Josh Brown and Simon Chagnoux and Teodor Georgiev and Nicole Kearney and Arnald Marcer and Nicky Nicolson and Roderic Page and Sarah Phillips and Heimo Rainer and Greg Riccardi and Dominik Röpert and David Peter Shorthouse and Pavel Stoev and Elspeth Margaret Haston}, url = {https://doi.org/10.3897/biss.3.35074}, doi = {10.3897/biss.3.35074}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e35074}, publisher = {Pensoft Publishers}, abstract = {The concept of building a network of relationships between entities, a knowledge graph, is one of the most effective methods to understand the relations between data. By organizing data, we facilitate the discovery of complex patterns not otherwise evident in the raw data. Each datum at the nodes of a knowledge graph needs a persistent identifier (PID) to reference it unambiguously. In the biodiversity knowledge graph, people are key elements (Page 2016). They collect and identify specimens, they publish, observe, work with each other and they name organisms. Yet biodiversity informatics has been slow to adopt PIDs for people and people are currently represented in collection management systems as text strings in various formats. These text strings often do not separate individuals within a collecting team and little biographical information is collected to disambiguate collectors. In March 2019 we organised an international workshop to find solutions to the problem of PIDs for people in collections with the aim of identifying people unambiguously across the world's natural history collections in all of their various roles. Stakeholders were represented from 11 countries, representing libraries, collections, publishers, developers and name registers. We want to identify people for many reasons. Cross-validation of information about a specimen with biographical information on the specimen can be used to clean data. Mapping specimens from individual collectors across multiple herbaria can geolocate specimens accurately. By linking literature to specimens through their authors and collectors we can create collaboration networks leading to a much better understanding of the scientific contribution of collectors and their institutions. For taxonomists, it will be easier to identify nomenclatural type and syntype material, essential for reliable typification. Overall, it will mean that geographically dispersed specimens can be treated much more like a single distributed infrastructure of specimens as is envisaged in the European Distributed Systems of Scientific Collections Infrastructure (DiSSCo). There are several person identifier systems in use. For example, the Virtual International Authority File (VIAF) is a widely used system for published authors. The International Standard Name Identifier (ISNI), has broader scope and incorporates VIAF. The ORCID identifier system provides self-registration of living researchers. Also, Wikidata has identifiers of people, which have the advantage of being easy to add to and correct. There are also national systems, such as the French and German authority files, and considerable sharing of identifiers, particularly on Wikidata. This creates an integrated network of identifiers that could act as a brokerage system. Attendees agreed that no one identifier system should be recommended, however, some are more appropriate for particular circumstances. Some difficulties have still to be resolved to use those identifier schemes for biodiversity : 1) duplicate entries in the same identifier system; 2) handling collector teams and preserving the order of collectors; 3) how we integrate identifiers with standards such as Darwin Core, ABCD and in the Global Biodiversity Information Facility; and 4) many living and dead collectors are only known from their specimens and so they may not pass notability standards required by many authority systems. The participants of the workshop are now working on a number of fronts to make progress on the adoption of PIDs for people in collections. This includes extending pilots that have already been trialed, working with identifier systems to make them more suitable for specimen collectors and talking to service providers to encourage them to use ORCID iDs to identify their users. It was concluded that resolving the problem of person identifiers for collections is largely not a lack of a solution, but a need to implement solutions that already exist.}, keywords = {authority file, identifer, ORCID, PID, VIAF, Wikidata}, pubstate = {published}, tppubtype = {article} } The concept of building a network of relationships between entities, a knowledge graph, is one of the most effective methods to understand the relations between data. By organizing data, we facilitate the discovery of complex patterns not otherwise evident in the raw data. Each datum at the nodes of a knowledge graph needs a persistent identifier (PID) to reference it unambiguously. In the biodiversity knowledge graph, people are key elements (Page 2016). They collect and identify specimens, they publish, observe, work with each other and they name organisms. Yet biodiversity informatics has been slow to adopt PIDs for people and people are currently represented in collection management systems as text strings in various formats. These text strings often do not separate individuals within a collecting team and little biographical information is collected to disambiguate collectors. In March 2019 we organised an international workshop to find solutions to the problem of PIDs for people in collections with the aim of identifying people unambiguously across the world's natural history collections in all of their various roles. Stakeholders were represented from 11 countries, representing libraries, collections, publishers, developers and name registers. We want to identify people for many reasons. Cross-validation of information about a specimen with biographical information on the specimen can be used to clean data. Mapping specimens from individual collectors across multiple herbaria can geolocate specimens accurately. By linking literature to specimens through their authors and collectors we can create collaboration networks leading to a much better understanding of the scientific contribution of collectors and their institutions. For taxonomists, it will be easier to identify nomenclatural type and syntype material, essential for reliable typification. Overall, it will mean that geographically dispersed specimens can be treated much more like a single distributed infrastructure of specimens as is envisaged in the European Distributed Systems of Scientific Collections Infrastructure (DiSSCo). There are several person identifier systems in use. For example, the Virtual International Authority File (VIAF) is a widely used system for published authors. The International Standard Name Identifier (ISNI), has broader scope and incorporates VIAF. The ORCID identifier system provides self-registration of living researchers. Also, Wikidata has identifiers of people, which have the advantage of being easy to add to and correct. There are also national systems, such as the French and German authority files, and considerable sharing of identifiers, particularly on Wikidata. This creates an integrated network of identifiers that could act as a brokerage system. Attendees agreed that no one identifier system should be recommended, however, some are more appropriate for particular circumstances. Some difficulties have still to be resolved to use those identifier schemes for biodiversity : 1) duplicate entries in the same identifier system; 2) handling collector teams and preserving the order of collectors; 3) how we integrate identifiers with standards such as Darwin Core, ABCD and in the Global Biodiversity Information Facility; and 4) many living and dead collectors are only known from their specimens and so they may not pass notability standards required by many authority systems. The participants of the workshop are now working on a number of fronts to make progress on the adoption of PIDs for people in collections. This includes extending pilots that have already been trialed, working with identifier systems to make them more suitable for specimen collectors and talking to service providers to encourage them to use ORCID iDs to identify their users. It was concluded that resolving the problem of person identifiers for collections is largely not a lack of a solution, but a need to implement solutions that already exist. |
Frick, Holger; A Stieger, Pia; Scheidegger, Christoph SwissCollNet – A National Initiative for Natural History Collections in Switzerland Journal Article Biodiversity Information Science and Standards, 3 , pp. e37188, 2019. Abstract | Links | BibTeX | Tags: data management, digitisation, educational potential, scientific potential, standards, strategy @article{10.3897/biss.3.37188, title = {SwissCollNet – A National Initiative for Natural History Collections in Switzerland}, author = {Holger Frick and Pia A Stieger and Christoph Scheidegger}, url = {https://doi.org/10.3897/biss.3.37188}, doi = {10.3897/biss.3.37188}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e37188}, publisher = {Pensoft Publishers}, abstract = {More than 60 million specimens are housed in geological and biological collections in numerous museums and botanical gardens located all over Switzerland. They are of national and international origin. Taken together they form an entity with a high scientific value and international recognition for their contribution to scientific research. Due to the federalistic organisation of Switzerland, natural history collections are located and curated in numerous institutions. So far, no common strategy for digitisation, documentation and long-term data archiving has been developed. This shortcoming has been widely identified by concerned parties. Under the lead of the Swiss Academy of Sciences, several organisations have assembled information about Swiss natural history collections. They identified measures to be taken to promote the scientific and educational potential of natural history collections in Switzerland (Beer et al. 2019). With a national initiative, the Swiss Natural History Collections Network (SwissCollNet) aims to unite Swiss natural history collections under a common vision and with a common strategy. The goal is to promote the collections themselves and to harness the scientific and educational potential of these collections for research and training. SwissCollNet consists of representatives of research, teaching, museums and botanical gardens, the data centers for information on the national fauna and flora, the Swiss Systematics Society and the Swiss node of GBIF, the Global Biodiversity Information Facility. The initiative aims to foster research on natural history collections. It will provide a single decentralised data infrastructure framework for Swiss research related to natural history. It will help to harmonise nationwide collection data management, digitisation and long-term data archiving. It will facilitate identification of specimens and revision of taxonomic groups. New research techniques, fast-evolving computer technologies and internet connectivity, create new opportunities for deciphering and using the wealth of information housed in Swiss and international collections. The development of an agreed strategy and research priorities on a national scale will allow fluent, fluid and permanent collaboration across all Swiss natural history collections by promoting interoperability and unified access to collections as well as creating opportunities for scientific collaboration and innovation. This national approach will create an internationally compatible research data infrastructure, while respecting and integrating regional and decentralized conditions and requirements. Thus, it will maximize the impact for science, policy and society.}, keywords = {data management, digitisation, educational potential, scientific potential, standards, strategy}, pubstate = {published}, tppubtype = {article} } More than 60 million specimens are housed in geological and biological collections in numerous museums and botanical gardens located all over Switzerland. They are of national and international origin. Taken together they form an entity with a high scientific value and international recognition for their contribution to scientific research. Due to the federalistic organisation of Switzerland, natural history collections are located and curated in numerous institutions. So far, no common strategy for digitisation, documentation and long-term data archiving has been developed. This shortcoming has been widely identified by concerned parties. Under the lead of the Swiss Academy of Sciences, several organisations have assembled information about Swiss natural history collections. They identified measures to be taken to promote the scientific and educational potential of natural history collections in Switzerland (Beer et al. 2019). With a national initiative, the Swiss Natural History Collections Network (SwissCollNet) aims to unite Swiss natural history collections under a common vision and with a common strategy. The goal is to promote the collections themselves and to harness the scientific and educational potential of these collections for research and training. SwissCollNet consists of representatives of research, teaching, museums and botanical gardens, the data centers for information on the national fauna and flora, the Swiss Systematics Society and the Swiss node of GBIF, the Global Biodiversity Information Facility. The initiative aims to foster research on natural history collections. It will provide a single decentralised data infrastructure framework for Swiss research related to natural history. It will help to harmonise nationwide collection data management, digitisation and long-term data archiving. It will facilitate identification of specimens and revision of taxonomic groups. New research techniques, fast-evolving computer technologies and internet connectivity, create new opportunities for deciphering and using the wealth of information housed in Swiss and international collections. The development of an agreed strategy and research priorities on a national scale will allow fluent, fluid and permanent collaboration across all Swiss natural history collections by promoting interoperability and unified access to collections as well as creating opportunities for scientific collaboration and innovation. This national approach will create an internationally compatible research data infrastructure, while respecting and integrating regional and decentralized conditions and requirements. Thus, it will maximize the impact for science, policy and society. |
Ball-Damerow, Joan E; Brenskelle, Laura; Barve, Narayani; Soltis, Pamela S; Sierwald, Petra; Bieler, Rüdiger; LaFrance, Raphael; Ariño, Arturo H; Guralnick, Robert P Research applications of primary biodiversity databases in the digital age Journal Article PLOS ONE, 14 (9), pp. 1-26, 2019. Abstract | Links | BibTeX | Tags: @article{10.1371/journal.pone.0215794, title = {Research applications of primary biodiversity databases in the digital age}, author = {Joan E Ball-Damerow and Laura Brenskelle and Narayani Barve and Pamela S Soltis and Petra Sierwald and Rüdiger Bieler and Raphael LaFrance and Arturo H Ariño and Robert P Guralnick}, url = {https://doi.org/10.1371/journal.pone.0215794}, doi = {10.1371/journal.pone.0215794}, year = {2019}, date = {2019-01-01}, journal = {PLOS ONE}, volume = {14}, number = {9}, pages = {1-26}, publisher = {Public Library of Science}, abstract = {Our world is in the midst of unprecedented change—climate shifts and sustained, widespread habitat degradation have led to dramatic declines in biodiversity rivaling historical extinction events. At the same time, new approaches to publishing and integrating previously disconnected data resources promise to help provide the evidence needed for more efficient and effective conservation and management. Stakeholders have invested considerable resources to contribute to online databases of species occurrences. However, estimates suggest that only 10% of biocollections are available in digital form. The biocollections community must therefore continue to promote digitization efforts, which in part requires demonstrating compelling applications of the data. Our overarching goal is therefore to determine trends in use of mobilized species occurrence data since 2010, as online systems have grown and now provide over one billion records. To do this, we characterized 501 papers that use openly accessible biodiversity databases. Our standardized tagging protocol was based on key topics of interest, including: database(s) used, taxa addressed, general uses of data, other data types linked to species occurrence data, and data quality issues addressed. We found that the most common uses of online biodiversity databases have been to estimate species distribution and richness, to outline data compilation and publication, and to assist in developing species checklists or describing new species. Only 69% of papers in our dataset addressed one or more aspects of data quality, which is low considering common errors and biases known to exist in opportunistic datasets. Globally, we find that biodiversity databases are still in the initial stages of data compilation. Novel and integrative applications are restricted to certain taxonomic groups and regions with higher numbers of quality records. Continued data digitization, publication, enhancement, and quality control efforts are necessary to make biodiversity science more efficient and relevant in our fast-changing environment.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Our world is in the midst of unprecedented change—climate shifts and sustained, widespread habitat degradation have led to dramatic declines in biodiversity rivaling historical extinction events. At the same time, new approaches to publishing and integrating previously disconnected data resources promise to help provide the evidence needed for more efficient and effective conservation and management. Stakeholders have invested considerable resources to contribute to online databases of species occurrences. However, estimates suggest that only 10% of biocollections are available in digital form. The biocollections community must therefore continue to promote digitization efforts, which in part requires demonstrating compelling applications of the data. Our overarching goal is therefore to determine trends in use of mobilized species occurrence data since 2010, as online systems have grown and now provide over one billion records. To do this, we characterized 501 papers that use openly accessible biodiversity databases. Our standardized tagging protocol was based on key topics of interest, including: database(s) used, taxa addressed, general uses of data, other data types linked to species occurrence data, and data quality issues addressed. We found that the most common uses of online biodiversity databases have been to estimate species distribution and richness, to outline data compilation and publication, and to assist in developing species checklists or describing new species. Only 69% of papers in our dataset addressed one or more aspects of data quality, which is low considering common errors and biases known to exist in opportunistic datasets. Globally, we find that biodiversity databases are still in the initial stages of data compilation. Novel and integrative applications are restricted to certain taxonomic groups and regions with higher numbers of quality records. Continued data digitization, publication, enhancement, and quality control efforts are necessary to make biodiversity science more efficient and relevant in our fast-changing environment. |
Galicia, David; Amezcua, Ana; Baquero, Enrique; Cancellario, Tommaso; Chaves, Angel; Biurrun, Gabriel De; Escribano, Nora; Fernández-Eslava, Blanca; González-Alonso, Mónica; Hernández-Soto, Rubén; Ibáñez, Ricardo; Imas, María; Miqueleiz, Imanol; Miranda, Rafael; A Rodeles, Amaia; Valerio, Mercedes; Ariño, Arturo H Investment in the Long-Tail of Biodiversity Data: From local research to global knowledge Journal Article Biodiversity Information Science and Standards, 3 , pp. e37310, 2019. Abstract | Links | BibTeX | Tags: investment, local biodiversity, long-tail biodiversity, resource allocation, staff training @article{10.3897/biss.3.37310, title = {Investment in the Long-Tail of Biodiversity Data: From local research to global knowledge}, author = {David Galicia and Ana Amezcua and Enrique Baquero and Tommaso Cancellario and Angel Chaves and Gabriel De Biurrun and Nora Escribano and Blanca Fernández-Eslava and Mónica González-Alonso and Rubén Hernández-Soto and Ricardo Ibáñez and María Imas and Imanol Miqueleiz and Rafael Miranda and Amaia A Rodeles and Mercedes Valerio and Arturo H Ariño}, url = {https://doi.org/10.3897/biss.3.37310}, doi = {10.3897/biss.3.37310}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e37310}, publisher = {Pensoft Publishers}, abstract = {In business, the “long-tail economy” refers to a market strategy where the gravity center shifts from a few high-demand products to many, varied products focused on small niches. Commercialization of individually low-demand products can be profitable as long as their production cost is low and, all taken together, they aggregate into a big chunk of the market. Similarly, in the “business” of biodiversity data acquisition, we can find several mainstream products that produce zillions of bits of information every year and account for most of the budget allocated to increase our primary data-based knowledge about Earth’s biological diversity. These products play a crucial role in biodiversity research. However, along with these large global projects, there is a constellation of small-scale institutions that work locally, but whose contribution to our understanding of natural processes should not be dismissed. These information datasets can be collectively referred to as the “long-tail biodiversity data”. Here we present the case of the Museum of Sciences University of Navarra, which harbors the research activity of the Museum of Zoology (MZNA) and herbarium (PAMP) of the University of Navarra in Spain. For the last 40 years, its members have been involved in hundreds of research projects, from local to international level—but quantitatively, the vast majority of its biodiversity records come from Navarra, a smallish (10,000 sq. km) administrative region in the north of Spain. Despite its modest area, the available information about the region in the Museum database approaches one million records of thousands of species, including dozens of type series. Fifteen years ago, a series of national research grants enabled boosting digitization and public access to the database records through the Global Biodiversity Information Facility (GBIF). Although those grants were never renewed, the Museum continued its digitizing and standardizing program on vouchered collections, as well as sampling additional raw biodiversity data through long term ecological sites using the Museum’s resources, which annually provide thousands of new records at the local level. Currently, the Museum has already published through GBIF’s IPT (Integrated Publishing Toolkit) 30 datasets, containing more than half-million records of about 5700 taxa. Its records have contributed to more than 60 peer-reviewed publications over the last five years. Institutions that basically harvest biodiversity data at a local scale usually show a high degree of specialization, gathering researchers with strong (albeit often narrow) expertise in the taxonomy and ecology of nearby ecosystems. They are thus an extremely valuable tool when dealing with processes resulting in diversity changes that can be identified rather precisely, especially when their work can be traced back many decades. Making all this fine-scale information accessible and actionable requires, in most of the cases, a rather modest investment in staff training on data management (e.g., standards, database interoperability) or museum curation procedures, and on informatics infrastructure. As in the case of business, it is not a matter of choosing between producing blockbusters or independent cinema--but of leveraging available resources and maximizing output.}, keywords = {investment, local biodiversity, long-tail biodiversity, resource allocation, staff training}, pubstate = {published}, tppubtype = {article} } In business, the “long-tail economy” refers to a market strategy where the gravity center shifts from a few high-demand products to many, varied products focused on small niches. Commercialization of individually low-demand products can be profitable as long as their production cost is low and, all taken together, they aggregate into a big chunk of the market. Similarly, in the “business” of biodiversity data acquisition, we can find several mainstream products that produce zillions of bits of information every year and account for most of the budget allocated to increase our primary data-based knowledge about Earth’s biological diversity. These products play a crucial role in biodiversity research. However, along with these large global projects, there is a constellation of small-scale institutions that work locally, but whose contribution to our understanding of natural processes should not be dismissed. These information datasets can be collectively referred to as the “long-tail biodiversity data”. Here we present the case of the Museum of Sciences University of Navarra, which harbors the research activity of the Museum of Zoology (MZNA) and herbarium (PAMP) of the University of Navarra in Spain. For the last 40 years, its members have been involved in hundreds of research projects, from local to international level—but quantitatively, the vast majority of its biodiversity records come from Navarra, a smallish (10,000 sq. km) administrative region in the north of Spain. Despite its modest area, the available information about the region in the Museum database approaches one million records of thousands of species, including dozens of type series. Fifteen years ago, a series of national research grants enabled boosting digitization and public access to the database records through the Global Biodiversity Information Facility (GBIF). Although those grants were never renewed, the Museum continued its digitizing and standardizing program on vouchered collections, as well as sampling additional raw biodiversity data through long term ecological sites using the Museum’s resources, which annually provide thousands of new records at the local level. Currently, the Museum has already published through GBIF’s IPT (Integrated Publishing Toolkit) 30 datasets, containing more than half-million records of about 5700 taxa. Its records have contributed to more than 60 peer-reviewed publications over the last five years. Institutions that basically harvest biodiversity data at a local scale usually show a high degree of specialization, gathering researchers with strong (albeit often narrow) expertise in the taxonomy and ecology of nearby ecosystems. They are thus an extremely valuable tool when dealing with processes resulting in diversity changes that can be identified rather precisely, especially when their work can be traced back many decades. Making all this fine-scale information accessible and actionable requires, in most of the cases, a rather modest investment in staff training on data management (e.g., standards, database interoperability) or museum curation procedures, and on informatics infrastructure. As in the case of business, it is not a matter of choosing between producing blockbusters or independent cinema--but of leveraging available resources and maximizing output. |
Escribano, Nora; Galicia, David; Ariño, Arturo H Game of Tops: Trends in GBIF’s Community of Users Journal Article Biodiversity Information Science and Standards, 3 , pp. e37187, 2019. Abstract | Links | BibTeX | Tags: biodiversity, community structure, data users, GBIF, research trends @article{10.3897/biss.3.37187, title = {Game of Tops: Trends in GBIF’s Community of Users}, author = {Nora Escribano and David Galicia and Arturo H Ariño}, url = {https://doi.org/10.3897/biss.3.37187}, doi = {10.3897/biss.3.37187}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e37187}, publisher = {Pensoft Publishers}, abstract = {Building on the development of Biodiversity Informatics, the Global Biodiversity Information Facility (GBIF) undertook the task of enabling access to the world’s wealth of biodiversity data via the Internet. To date, GBIF has become, in many respects, the most extensive biodiversity information exchange infrastructure in the world, opening up a full range of possibilities for science. Science has benefited from such access to biodiversity data in research areas ranging from the effects of environmental change on biodiversity to the spread of invasive species, among many others. As of this writing, more than 7,000 published items (scientific papers, reviews, conference proceedings) have been indexed in the GBIF Secretariat’s literature tracking programme. On the basis on this database, we will represent trends in GBIF in the users’ behaviour over time regarding openness, social structure, and other features associated to such scientific production: what is the measurable impact of research using GBIF data? How is the GBIF community of users growing? Is the science made with, and enabled by, open data, actually open? Mapping GBIF users’ choices will show how biodiversity research is evolving through time, synthesising past and current priorities of this community in an attempt to forecast whether summer—or winter—is coming.}, keywords = {biodiversity, community structure, data users, GBIF, research trends}, pubstate = {published}, tppubtype = {article} } Building on the development of Biodiversity Informatics, the Global Biodiversity Information Facility (GBIF) undertook the task of enabling access to the world’s wealth of biodiversity data via the Internet. To date, GBIF has become, in many respects, the most extensive biodiversity information exchange infrastructure in the world, opening up a full range of possibilities for science. Science has benefited from such access to biodiversity data in research areas ranging from the effects of environmental change on biodiversity to the spread of invasive species, among many others. As of this writing, more than 7,000 published items (scientific papers, reviews, conference proceedings) have been indexed in the GBIF Secretariat’s literature tracking programme. On the basis on this database, we will represent trends in GBIF in the users’ behaviour over time regarding openness, social structure, and other features associated to such scientific production: what is the measurable impact of research using GBIF data? How is the GBIF community of users growing? Is the science made with, and enabled by, open data, actually open? Mapping GBIF users’ choices will show how biodiversity research is evolving through time, synthesising past and current priorities of this community in an attempt to forecast whether summer—or winter—is coming. |
Theeten, Franck; Adam, Marielle; Vandenberghe, Thomas; Dillen, Mathias; Semal, Patrick; Scory, Serge; Herpers, Jean-Marc; den Spiegel, Didier Van; Mergen, Patricia; Smirnova, Larissa; Engledow, Henry; Casino, Ana; Gödderz, Karsten NaturalHeritage: Bridging Belgian natural history collections Journal Article Biodiversity Information Science and Standards, 3 , pp. e37854, 2019. Abstract | Links | BibTeX | Tags: data analysis, data quality and cleaning, interoperable databases, natural history collections, search portal, standardisation, webservices @article{10.3897/biss.3.37854, title = {NaturalHeritage: Bridging Belgian natural history collections}, author = {Franck Theeten and Marielle Adam and Thomas Vandenberghe and Mathias Dillen and Patrick Semal and Serge Scory and Jean-Marc Herpers and Didier Van den Spiegel and Patricia Mergen and Larissa Smirnova and Henry Engledow and Ana Casino and Karsten Gödderz}, url = {https://doi.org/10.3897/biss.3.37854}, doi = {10.3897/biss.3.37854}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e37854}, publisher = {Pensoft Publishers}, abstract = {The Royal Belgian Institute of Natural Sciences (RBINS), the Royal Museum for Central Africa (RMCA) and Meise Botanic Garden house more than 50 million specimens covering all fields of natural history. While many different research topics have their own specificities, throughout the years it became apparent that with regards to collection data management, data publication and exchange via community standards, collection holding institutions face similar challenges (James et al. 2018, Rocha et al. 2014). In the past, these have been tackled in different ways by Belgian natural history institutions. In addition to local and national collaborations, there is a great need for a joint structure to share data between scientific institutions in Europe and beyond. It is the aim of large networks and infrastructures such as the Global Biodiversity Information Facility (GBIF), the Biodiversity Information Standards (TDWG), the Distributed System of Scientific collections (DiSSCo) and the Consortium of European Taxonomic Facilities (CETAF) to further implement and improve these efforts, thereby gaining ever increasing efficiencies. In this context, the three institutions mentioned above, submitted the NaturalHeritage project (http://www.belspo.be/belspo/brain-be/themes_3_HebrHistoScien_en.stm) granted in 2017 by the Belgian Science Policy Service, which runs from 2017 to 2020. The project provides links among databases and services. The unique qualities of each database are maintained, while the information can be concentrated and exposed in a structured way via one access point. This approach aims also to link data that are unconnected at present (e.g. relationship between soil/substrate, vegetation and associated fauna) and to improve the cross-validation of data. (1) The NaturalHeritage prototype (http://www.naturalheritage.be) is a shared research portal with an open access infrastructure, which is still in the development phase. Its backbone is an ElasticSearch catalogue, with Kibana, and a Python aggregator gathering several types of (re)sources: relational databases, REpresentational State Transfer (REST) services of objects databases and bibliographical data, collections metadata and the GBIF Internet Publishing Toolkit (IPT) for observational and taxonomical data. Semi-structured data in English are semantically analysed and linked to a rich autocomplete mechanism. Keywords and identifiers are indexed and grouped in four categories (“what”, “who”, “where”, “when”). The portal can act also as an Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) service and ease indexing of the original webpage on the internet with microdata enrichment. (2) The collection data management system of DaRWIN (Data Research Warehouse Information Network) of RBINS and RMCA has been improved as well. External (meta)data requirements, i.e. foremost publication into or according to the practices and standards of GBIF and OBIS (Ocean Biogeographic Information System: https://obis.org) for biodiversity data, and INSPIRE (https://inspire.ec.europa.eu) for geological data, have been identified and evaluated. New and extended data structures have been created to be compliant with these standards, as well as the necessary procedures developed to expose the data. Quality control tools for taxonomic and geographic names have been developed. Geographic names can be hard to confirm as their lack of context often requires human validation. To address this a similarity measure is used to help map the result. Species, locations, sampling devices and other properties have been mapped to the World Register of Marine Species and DarwinCore (http://www.marinespecies.org), Marine Regions and GeoNames, the AGRO Agronomy and Vertebrate trait ontologies and the British Oceanographic Data Centre (BODC) vocabularies (http://www.obofoundry.org/ontology/agro.html). Extensive mapping is necessary to make use of the ExtendedMeasurementOrFact Extension of DarwinCore (https://tools.gbif.org/dwca-validator/extensions.do).}, keywords = {data analysis, data quality and cleaning, interoperable databases, natural history collections, search portal, standardisation, webservices}, pubstate = {published}, tppubtype = {article} } The Royal Belgian Institute of Natural Sciences (RBINS), the Royal Museum for Central Africa (RMCA) and Meise Botanic Garden house more than 50 million specimens covering all fields of natural history. While many different research topics have their own specificities, throughout the years it became apparent that with regards to collection data management, data publication and exchange via community standards, collection holding institutions face similar challenges (James et al. 2018, Rocha et al. 2014). In the past, these have been tackled in different ways by Belgian natural history institutions. In addition to local and national collaborations, there is a great need for a joint structure to share data between scientific institutions in Europe and beyond. It is the aim of large networks and infrastructures such as the Global Biodiversity Information Facility (GBIF), the Biodiversity Information Standards (TDWG), the Distributed System of Scientific collections (DiSSCo) and the Consortium of European Taxonomic Facilities (CETAF) to further implement and improve these efforts, thereby gaining ever increasing efficiencies. In this context, the three institutions mentioned above, submitted the NaturalHeritage project (http://www.belspo.be/belspo/brain-be/themes_3_HebrHistoScien_en.stm) granted in 2017 by the Belgian Science Policy Service, which runs from 2017 to 2020. The project provides links among databases and services. The unique qualities of each database are maintained, while the information can be concentrated and exposed in a structured way via one access point. This approach aims also to link data that are unconnected at present (e.g. relationship between soil/substrate, vegetation and associated fauna) and to improve the cross-validation of data. (1) The NaturalHeritage prototype (http://www.naturalheritage.be) is a shared research portal with an open access infrastructure, which is still in the development phase. Its backbone is an ElasticSearch catalogue, with Kibana, and a Python aggregator gathering several types of (re)sources: relational databases, REpresentational State Transfer (REST) services of objects databases and bibliographical data, collections metadata and the GBIF Internet Publishing Toolkit (IPT) for observational and taxonomical data. Semi-structured data in English are semantically analysed and linked to a rich autocomplete mechanism. Keywords and identifiers are indexed and grouped in four categories (“what”, “who”, “where”, “when”). The portal can act also as an Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) service and ease indexing of the original webpage on the internet with microdata enrichment. (2) The collection data management system of DaRWIN (Data Research Warehouse Information Network) of RBINS and RMCA has been improved as well. External (meta)data requirements, i.e. foremost publication into or according to the practices and standards of GBIF and OBIS (Ocean Biogeographic Information System: https://obis.org) for biodiversity data, and INSPIRE (https://inspire.ec.europa.eu) for geological data, have been identified and evaluated. New and extended data structures have been created to be compliant with these standards, as well as the necessary procedures developed to expose the data. Quality control tools for taxonomic and geographic names have been developed. Geographic names can be hard to confirm as their lack of context often requires human validation. To address this a similarity measure is used to help map the result. Species, locations, sampling devices and other properties have been mapped to the World Register of Marine Species and DarwinCore (http://www.marinespecies.org), Marine Regions and GeoNames, the AGRO Agronomy and Vertebrate trait ontologies and the British Oceanographic Data Centre (BODC) vocabularies (http://www.obofoundry.org/ontology/agro.html). Extensive mapping is necessary to make use of the ExtendedMeasurementOrFact Extension of DarwinCore (https://tools.gbif.org/dwca-validator/extensions.do). |
Huybrechts, Pieter; Alonso-Sánchez, Felix; Böttinger, Petra; Dillen, Mathias; Groom, Quentin; Hanquart, Nicole; Koch, Walter; Gordon, Martin; Mergen, Patricia LinBi: Linking biodiversity and culture information Journal Article Biodiversity Information Science and Standards, 3 , pp. e37407, 2019. Abstract | Links | BibTeX | Tags: biodiversity, citizen science, crowdsourcing, digital heritage, Europeana, metadata enrichment @article{10.3897/biss.3.37407, title = {LinBi: Linking biodiversity and culture information}, author = {Pieter Huybrechts and Felix Alonso-Sánchez and Petra Böttinger and Mathias Dillen and Quentin Groom and Nicole Hanquart and Walter Koch and Martin Gordon and Patricia Mergen}, url = {https://doi.org/10.3897/biss.3.37407}, doi = {10.3897/biss.3.37407}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e37407}, publisher = {Pensoft Publishers}, abstract = {The LinBi project aims to enhance the discoverability of digitized objects from natural history collections hosted by institutes all over Europe. This enhancement is achieved by publishing new and enriched content to the Europeana collections platform. The use of simple vocabularies and machine-readable metadata encourages reuse and has the additional advantage of facilitating the clustering of interesting content for user groups beyond biodiversity and natural history researchers. Linking the collections of Europe together in an openly-available platform and sharing our common cultural and natural heritage with a broad audience will increase the public’s awareness of biodiversity collections. Furthermore, it will help us reach out to new user groups such as teachers, journalists and artists, who were previously unaware of, or distant from, our collections. Suitable content was selected and harmonized for interlinking with Europeana. Contributions include a large quantity of herbarium sheets, digitized glass plate negatives taken between 1880 and 1930 and a portrait collection dating from the late 19th and early 20th century. With the help of the DoeDat crowdsourcing platform, existing metadata were enriched and mobilized to allow for publication in the form of open linked data. The integration of geographical data and common names allows the Europeana platform to link scientific specimens with literature and fine art from different collections and to guide users to interesting and inspiring content via themed virtual exhibitions. One such theme is composed of content from the "Wild Roses of Crépin" collection, which will be enriched by pictures of living plants, herbarium specimens and illustrations old and modern. A second content cluster consists of an enriched and curated collection of botanical illustrations originating from a corpus of special and rare books ranging from the 15th to 19th centuries. Careful curation increases the potential for re-use and provides additional oppertunities for the general public to interact with this collection. The LinBi platform has the long-term ambition of forming a sustainable and open source solution integrated into the Europeana Core Service. This will further improve cooperation between institutes by building international infrastructure and networks, thus contributing to a more open cross-border society.}, keywords = {biodiversity, citizen science, crowdsourcing, digital heritage, Europeana, metadata enrichment}, pubstate = {published}, tppubtype = {article} } The LinBi project aims to enhance the discoverability of digitized objects from natural history collections hosted by institutes all over Europe. This enhancement is achieved by publishing new and enriched content to the Europeana collections platform. The use of simple vocabularies and machine-readable metadata encourages reuse and has the additional advantage of facilitating the clustering of interesting content for user groups beyond biodiversity and natural history researchers. Linking the collections of Europe together in an openly-available platform and sharing our common cultural and natural heritage with a broad audience will increase the public’s awareness of biodiversity collections. Furthermore, it will help us reach out to new user groups such as teachers, journalists and artists, who were previously unaware of, or distant from, our collections. Suitable content was selected and harmonized for interlinking with Europeana. Contributions include a large quantity of herbarium sheets, digitized glass plate negatives taken between 1880 and 1930 and a portrait collection dating from the late 19th and early 20th century. With the help of the DoeDat crowdsourcing platform, existing metadata were enriched and mobilized to allow for publication in the form of open linked data. The integration of geographical data and common names allows the Europeana platform to link scientific specimens with literature and fine art from different collections and to guide users to interesting and inspiring content via themed virtual exhibitions. One such theme is composed of content from the "Wild Roses of Crépin" collection, which will be enriched by pictures of living plants, herbarium specimens and illustrations old and modern. A second content cluster consists of an enriched and curated collection of botanical illustrations originating from a corpus of special and rare books ranging from the 15th to 19th centuries. Careful curation increases the potential for re-use and provides additional oppertunities for the general public to interact with this collection. The LinBi platform has the long-term ambition of forming a sustainable and open source solution integrated into the Europeana Core Service. This will further improve cooperation between institutes by building international infrastructure and networks, thus contributing to a more open cross-border society. |
Arvanitidis, Christos; Hollingsworth, Peter; Mergen, Patricia; Semal, Patrick; Keklikoglou, Kleoniki; Chatzinikolaou, Eva; Addnik, Wouter; Smith, Vincent; Koureas, Dimitrios Combined High-Throughput Imaging and Sequencing: Addressing the collections on demand requirement in SYNTHESYS+ project Journal Article Biodiversity Information Science and Standards, 3 , pp. e37290, 2019. Abstract | Links | BibTeX | Tags: DiSCCo, high-throughput imaging, high-throughput sequencing, LifeWatch ERIC, research infrastructures, SYNTHESYS+, virtual laboratories (vLabs), virtual research environments (VREs) @article{10.3897/biss.3.37290, title = {Combined High-Throughput Imaging and Sequencing: Addressing the collections on demand requirement in SYNTHESYS+ project}, author = {Christos Arvanitidis and Peter Hollingsworth and Patricia Mergen and Patrick Semal and Kleoniki Keklikoglou and Eva Chatzinikolaou and Wouter Addnik and Vincent Smith and Dimitrios Koureas}, url = {https://doi.org/10.3897/biss.3.37290}, doi = {10.3897/biss.3.37290}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e37290}, publisher = {Pensoft Publishers}, abstract = {Imagine you are a scientist, working on collections. You have your pet taxon and you need information which is distributed in a number of books and publications but also in the specimens deposited in Museums or Herbaria. Instead of paying visits to these establishments, around the world, you wish there was a means to transform all the information you need into a digitized form of the physical objects, you can reach from the screen of your laptop, tablet or cell phone. You dream you were able to watch, inspect and even dissect the type material you need online but also to compare it with others they way sequences are blasted against large databases, these days. You plan to make global research on this taxon and try to derive patterns from both the molecular and organismal level of the biological organization and to link the patterns resulting to the drivers of change for this taxon. This is the vision of the Virtual Museum of Natural History and one of the ways to achieve this vision is to address the “collections on demand” requirement. One of the possible means to address this requirement is the digitization through the use of the micro-ct technology. The micro-CT virtual laboratory (vLab), developed by LifeWatchGreece research infrastructure (RI), makes it possible the online exploration and dissemination of micro-CT datasets, which are only rarely made available to the public due to their large size and a lack of dedicated online platforms for the interactive manipulation of 3D data. This presentation shows the development of such a “collections on demand” function, implemented by the SYNTHESYS+ project (DiSSCo RI), which combines such high-throughput technologies, that is micro-ct and genomics, to address the scientific community’s requirements. We show that this approach to combine patterns deriving from the application of novel techniques, which represent different kinds of observations is possible and we propose certain case studies as examples. The innovation aspects of this function include: Expansion and development of cost models for Collections on Demand; Development of standards and guidelines for exchange of collection-derived imaging data; Construction of new data pipelines and standard workflows, enabling access to complex digital content such as 3D scans; Development of novel molecular lab protocols, workflows and informatics pipelines, to enable large scale; DNA sequencing of NH collections.}, keywords = {DiSCCo, high-throughput imaging, high-throughput sequencing, LifeWatch ERIC, research infrastructures, SYNTHESYS+, virtual laboratories (vLabs), virtual research environments (VREs)}, pubstate = {published}, tppubtype = {article} } Imagine you are a scientist, working on collections. You have your pet taxon and you need information which is distributed in a number of books and publications but also in the specimens deposited in Museums or Herbaria. Instead of paying visits to these establishments, around the world, you wish there was a means to transform all the information you need into a digitized form of the physical objects, you can reach from the screen of your laptop, tablet or cell phone. You dream you were able to watch, inspect and even dissect the type material you need online but also to compare it with others they way sequences are blasted against large databases, these days. You plan to make global research on this taxon and try to derive patterns from both the molecular and organismal level of the biological organization and to link the patterns resulting to the drivers of change for this taxon. This is the vision of the Virtual Museum of Natural History and one of the ways to achieve this vision is to address the “collections on demand” requirement. One of the possible means to address this requirement is the digitization through the use of the micro-ct technology. The micro-CT virtual laboratory (vLab), developed by LifeWatchGreece research infrastructure (RI), makes it possible the online exploration and dissemination of micro-CT datasets, which are only rarely made available to the public due to their large size and a lack of dedicated online platforms for the interactive manipulation of 3D data. This presentation shows the development of such a “collections on demand” function, implemented by the SYNTHESYS+ project (DiSSCo RI), which combines such high-throughput technologies, that is micro-ct and genomics, to address the scientific community’s requirements. We show that this approach to combine patterns deriving from the application of novel techniques, which represent different kinds of observations is possible and we propose certain case studies as examples. The innovation aspects of this function include: Expansion and development of cost models for Collections on Demand; Development of standards and guidelines for exchange of collection-derived imaging data; Construction of new data pipelines and standard workflows, enabling access to complex digital content such as 3D scans; Development of novel molecular lab protocols, workflows and informatics pipelines, to enable large scale; DNA sequencing of NH collections. |
Knapp, Sandra; Vincent, Sarah; Arvanitidis, Christos; Dixey, Katherine; Mergen, Patricia Access to Natural History Collections – from SYNTHESYS to DiSSCo Journal Article Biodiversity Information Science and Standards, 3 , pp. e37149, 2019. Abstract | Links | BibTeX | Tags: access, collections, DiSSCo, physical access, SYNTHESYS+, virtual access @article{10.3897/biss.3.37149, title = {Access to Natural History Collections – from SYNTHESYS to DiSSCo}, author = {Sandra Knapp and Sarah Vincent and Christos Arvanitidis and Katherine Dixey and Patricia Mergen}, url = {https://doi.org/10.3897/biss.3.37149}, doi = {10.3897/biss.3.37149}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e37149}, publisher = {Pensoft Publishers}, abstract = {Any one collection of objects never tells the whole story. Enabling access to natural history collections by users external to a given institution, has a long history–even that great stay-at-home, Linnaeus, relied on specimens in the hands of others. Neglecting collections outside one’s institution results in duplication and inefficiency, as can be seen in the history of synonymy. Physical access had always been the norm, but difficult for the single individual. A student in the late 20th century had to decide if money were better spent going to one collection or another, or if the sometimes rather fuzzy photographs really represented the taxon she was working with. Loans between institutions were a way to provide access, but came with their own risks. The very individualised–to users as well as institutions–system of access provisioning still operates today but has fundamentally changed in several respects. The SYNTHESYS (Synthesis of Systematic Resources) projects brought a set of European institutions into a consortium with one aim: to provide access to natural history collections in order to stimulate their use across communities. The SYNTHESYS Transnational Access (TA) programme provided access not only to the collections of participating institutions, but also to infrastructures such as laboratories and analytical facilities. The trajectory of TA has led to a change in thinking about natural history collections, along with access to them. Because access has been subsidised at both the individual and institutional levels, participating institutions began to function more as a collective; one infrastructure, albeit loosely dispersed. In the most recent iteration of the SYNTHESYS programme, SYNTHESYS+, access has changed yet again with the times. Technological advances in imaging permit high-quality surrogates of natural history specimens to be exchanged more freely, and Virtual Access (VA) forms an integral part of the SYNTHESYS+ access programme, alongside TA. Virtual access has been operating for some time in the natural history collections community, but like TA, with individual scientists requesting images/sequences/scans from individual institutions or curators. VA, as a centralised service, will be piloted in SYNTHESYS+ in order to establish the basis for community change in access provisioning. But what next? Will we continue to need physical access to specimens and facilities as VA becomes increasingly feasible? As European collections-based institutions coalesce into the DiSSCo (Distributed System of Systematics Collections) infrastructure, will the model established in SYNTHESYS+ continue to function in the absence of centralised funding? In this talk, we will explore the trajectory of access through SYNTHESYS and provide some scenarios for how access to natural history collections–both physical and virtual–may change as we transition to the broader infrastructure that DiSSCo represents.}, keywords = {access, collections, DiSSCo, physical access, SYNTHESYS+, virtual access}, pubstate = {published}, tppubtype = {article} } Any one collection of objects never tells the whole story. Enabling access to natural history collections by users external to a given institution, has a long history–even that great stay-at-home, Linnaeus, relied on specimens in the hands of others. Neglecting collections outside one’s institution results in duplication and inefficiency, as can be seen in the history of synonymy. Physical access had always been the norm, but difficult for the single individual. A student in the late 20th century had to decide if money were better spent going to one collection or another, or if the sometimes rather fuzzy photographs really represented the taxon she was working with. Loans between institutions were a way to provide access, but came with their own risks. The very individualised–to users as well as institutions–system of access provisioning still operates today but has fundamentally changed in several respects. The SYNTHESYS (Synthesis of Systematic Resources) projects brought a set of European institutions into a consortium with one aim: to provide access to natural history collections in order to stimulate their use across communities. The SYNTHESYS Transnational Access (TA) programme provided access not only to the collections of participating institutions, but also to infrastructures such as laboratories and analytical facilities. The trajectory of TA has led to a change in thinking about natural history collections, along with access to them. Because access has been subsidised at both the individual and institutional levels, participating institutions began to function more as a collective; one infrastructure, albeit loosely dispersed. In the most recent iteration of the SYNTHESYS programme, SYNTHESYS+, access has changed yet again with the times. Technological advances in imaging permit high-quality surrogates of natural history specimens to be exchanged more freely, and Virtual Access (VA) forms an integral part of the SYNTHESYS+ access programme, alongside TA. Virtual access has been operating for some time in the natural history collections community, but like TA, with individual scientists requesting images/sequences/scans from individual institutions or curators. VA, as a centralised service, will be piloted in SYNTHESYS+ in order to establish the basis for community change in access provisioning. But what next? Will we continue to need physical access to specimens and facilities as VA becomes increasingly feasible? As European collections-based institutions coalesce into the DiSSCo (Distributed System of Systematics Collections) infrastructure, will the model established in SYNTHESYS+ continue to function in the absence of centralised funding? In this talk, we will explore the trajectory of access through SYNTHESYS and provide some scenarios for how access to natural history collections–both physical and virtual–may change as we transition to the broader infrastructure that DiSSCo represents. |
Lecoq, Marie-Elise; Archambeau, Anne-Sophie; Figueira, Rui; Martin, David; Pamerlon, Sophie; Robertson, Tim; Lebbe, Régine Vignes; Villaverde, Cristina The Living Atlases Community of Practice Journal Article Biodiversity Information Science and Standards, 3 , pp. e35779, 2019. Abstract | Links | BibTeX | Tags: ALA, GBIF, Living Atlases @article{10.3897/biss.3.35779, title = {The Living Atlases Community of Practice}, author = {Marie-Elise Lecoq and Anne-Sophie Archambeau and Rui Figueira and David Martin and Sophie Pamerlon and Tim Robertson and Régine Vignes Lebbe and Cristina Villaverde}, url = {https://doi.org/10.3897/biss.3.35779}, doi = {10.3897/biss.3.35779}, year = {2019}, date = {2019-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {3}, pages = {e35779}, publisher = {Pensoft Publishers}, abstract = {The power and configurability of the the Atlas of Living Australia tools have enabled more and more institutions and participants of the Global Biodiversity Information Facility adapt and install biodiversity platforms. For six years, we have demonstrated that the community around this platform was needed and ready for its adoption. During the symposium organized for the SPNHC+TDWG 2018, we started a discussion that has led us to the creation of a more structured and sustainable community of practice. We want to create a community that follows the structure of open-source technical projects such as Linux or Apache foundation. After the GBIF Governing Board (GB25), the Kilkenny accord was agreed among 8 country or institution partners and early adopters of ALA platform to outline the scope of the new Living Atlases community. Thanks to this accord, we have begun to set up a new structure based on the Community of Practice (CoP) model. In summary, the governance will be held by a Management committee and a Technical advisory committee. Adding to these, the Living Atlases community will have two coordinators with technical and administrative duties. This presentation will briefly summarise the community history leading up to the agreement of the Kilkenny accord and provide information and context of the key points contained. Then, we will present and launch the new Living Atlases Community of Practice . Through this presentation, we aim to collect lessons learned and good practices from other CoP in topics like governance, communications, sustainability, among others to incorporate them in the consolidation process of the Living Atlases community.}, keywords = {ALA, GBIF, Living Atlases}, pubstate = {published}, tppubtype = {article} } The power and configurability of the the Atlas of Living Australia tools have enabled more and more institutions and participants of the Global Biodiversity Information Facility adapt and install biodiversity platforms. For six years, we have demonstrated that the community around this platform was needed and ready for its adoption. During the symposium organized for the SPNHC+TDWG 2018, we started a discussion that has led us to the creation of a more structured and sustainable community of practice. We want to create a community that follows the structure of open-source technical projects such as Linux or Apache foundation. After the GBIF Governing Board (GB25), the Kilkenny accord was agreed among 8 country or institution partners and early adopters of ALA platform to outline the scope of the new Living Atlases community. Thanks to this accord, we have begun to set up a new structure based on the Community of Practice (CoP) model. In summary, the governance will be held by a Management committee and a Technical advisory committee. Adding to these, the Living Atlases community will have two coordinators with technical and administrative duties. This presentation will briefly summarise the community history leading up to the agreement of the Kilkenny accord and provide information and context of the key points contained. Then, we will present and launch the new Living Atlases Community of Practice . Through this presentation, we aim to collect lessons learned and good practices from other CoP in topics like governance, communications, sustainability, among others to incorporate them in the consolidation process of the Living Atlases community. |
Hardisty, Alex R; Michener, William K; Agosti, Donat; García, Enrique Alonso; Bastin, Lucy; Belbin, Lee; Bowser, Anne; Buttigieg, Pier Luigi; Canhos, Dora A L; Egloff, Willi; Giovanni, Renato De; Figueira, Rui; Groom, Quentin; Guralnick, Robert P; Hobern, Donald; Hugo, Wim; Koureas, Dimitris; Ji, Liqiang; Los, Wouter; Manuel, Jeffrey; Manset, David; Poelen, Jorrit; Saarenmaa, Hannu; Schigel, Dmitry; Uhlir, Paul F; Kissling, Daniel W The Bari Manifesto: An interoperability framework for essential biodiversity variables Journal Article Ecological Informatics, 49 , pp. 22 - 31, 2019, ISSN: 1574-9541. Abstract | Links | BibTeX | Tags: Cyberinfrastructure, Data products, E-infrastructure, Essential biodiversity variables, Informatics, Interoperability @article{HARDISTY201922, title = {The Bari Manifesto: An interoperability framework for essential biodiversity variables}, author = {Alex R Hardisty and William K Michener and Donat Agosti and Enrique Alonso García and Lucy Bastin and Lee Belbin and Anne Bowser and Pier Luigi Buttigieg and Dora A L Canhos and Willi Egloff and Renato De Giovanni and Rui Figueira and Quentin Groom and Robert P Guralnick and Donald Hobern and Wim Hugo and Dimitris Koureas and Liqiang Ji and Wouter Los and Jeffrey Manuel and David Manset and Jorrit Poelen and Hannu Saarenmaa and Dmitry Schigel and Paul F Uhlir and Daniel W Kissling}, url = {http://www.sciencedirect.com/science/article/pii/S1574954118301961}, doi = {https://doi.org/10.1016/j.ecoinf.2018.11.003}, issn = {1574-9541}, year = {2019}, date = {2019-01-01}, journal = {Ecological Informatics}, volume = {49}, pages = {22 - 31}, abstract = {Essential Biodiversity Variables (EBV) are fundamental variables that can be used for assessing biodiversity change over time, for determining adherence to biodiversity policy, for monitoring progress towards sustainable development goals, and for tracking biodiversity responses to disturbances and management interventions. Data from observations or models that provide measured or estimated EBV values, which we refer to as EBV data products, can help to capture the above processes and trends and can serve as a coherent framework for documenting trends in biodiversity. Using primary biodiversity records and other raw data as sources to produce EBV data products depends on cooperation and interoperability among multiple stakeholders, including those collecting and mobilising data for EBVs and those producing, publishing and preserving EBV data products. Here, we encapsulate ten principles for the current best practice in EBV-focused biodiversity informatics as ‘The Bari Manifesto’, serving as implementation guidelines for data and research infrastructure providers to support the emerging EBV operational framework based on trans-national and cross-infrastructure scientific workflows. The principles provide guidance on how to contribute towards the production of EBV data products that are globally oriented, while remaining appropriate to the producer's own mission, vision and goals. These ten principles cover: data management planning; data structure; metadata; services; data quality; workflows; provenance; ontologies/vocabularies; data preservation; and accessibility. For each principle, desired outcomes and goals have been formulated. Some specific actions related to fulfilling the Bari Manifesto principles are highlighted in the context of each of four groups of organizations contributing to enabling data interoperability - data standards bodies, research data infrastructures, the pertinent research communities, and funders. The Bari Manifesto provides a roadmap enabling support for routine generation of EBV data products, and increases the likelihood of success for a global EBV framework.}, keywords = {Cyberinfrastructure, Data products, E-infrastructure, Essential biodiversity variables, Informatics, Interoperability}, pubstate = {published}, tppubtype = {article} } Essential Biodiversity Variables (EBV) are fundamental variables that can be used for assessing biodiversity change over time, for determining adherence to biodiversity policy, for monitoring progress towards sustainable development goals, and for tracking biodiversity responses to disturbances and management interventions. Data from observations or models that provide measured or estimated EBV values, which we refer to as EBV data products, can help to capture the above processes and trends and can serve as a coherent framework for documenting trends in biodiversity. Using primary biodiversity records and other raw data as sources to produce EBV data products depends on cooperation and interoperability among multiple stakeholders, including those collecting and mobilising data for EBVs and those producing, publishing and preserving EBV data products. Here, we encapsulate ten principles for the current best practice in EBV-focused biodiversity informatics as ‘The Bari Manifesto’, serving as implementation guidelines for data and research infrastructure providers to support the emerging EBV operational framework based on trans-national and cross-infrastructure scientific workflows. The principles provide guidance on how to contribute towards the production of EBV data products that are globally oriented, while remaining appropriate to the producer's own mission, vision and goals. These ten principles cover: data management planning; data structure; metadata; services; data quality; workflows; provenance; ontologies/vocabularies; data preservation; and accessibility. For each principle, desired outcomes and goals have been formulated. Some specific actions related to fulfilling the Bari Manifesto principles are highlighted in the context of each of four groups of organizations contributing to enabling data interoperability - data standards bodies, research data infrastructures, the pertinent research communities, and funders. The Bari Manifesto provides a roadmap enabling support for routine generation of EBV data products, and increases the likelihood of success for a global EBV framework. |
2018 |
H. Ariño, Arturo Putting your Finger upon the Simplest Data Journal Article Biodiversity Information Science and Standards, 2 , pp. e26300, 2018. Abstract | Links | BibTeX | Tags: bias, digitally accessible knowledge (DAK), digitization, Natural History Collections (NHC), primary biodiversity data records (PBR), trends @article{10.3897/biss.2.26300, title = {Putting your Finger upon the Simplest Data}, author = {Arturo H. Ariño}, url = {https://doi.org/10.3897/biss.2.26300}, doi = {10.3897/biss.2.26300}, year = {2018}, date = {2018-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {2}, pages = {e26300}, publisher = {Pensoft Publishers}, abstract = {Over the past decades, digitization endeavors across many institutions holding natural history collections (NHCs) have multiplied with three broad aims: first, to facilitate collection management by moving existing analog catalogues into digital form; second, to efficiently document and inventory specimens in collections, including imaging them as taxonomical surrogates; and third, to enable discovery of, and access to, the resulting collection data. NHCs contain a unique wealth of potential knowledge in the form of primary biodiversity data records (PBR): at its most basic level, the “what, where and when” of occurrences of the specimens in the collections. But as T.S. Eliot famously said, “knowledge is invariably a matter of degree”. For such data to be transformed into digitally accessible knowledge (DAK) that is conducive to an understanding about how the natural world works, release of digitized data (the “this we know”) is necessary. At least two billion specimens are estimated to exist in NHCs already, but only a small fraction can be considered properly DAK: most have either not been digitized yet, or not released through a discovery facility. Digitizing is relatively costly as it often entails manually processing each specimen unit (e.g. a herbarium sheet, a pinned insect, or a vial full of invertebrates). How long could it take us to transform all NHCs into DAK? Can we keep up with the natural growth in collections? The Global Biodiversity Information Facility (GBIF) has become the de facto main index of PBR, both originated in NHCs or as field observations. Digitized NHC that are standards-compliant and can be connected to, or harvested by, GBIF, effectively become DAK. I have examined GBIF growth data looking for a pattern of DAK generation. I found that the rate of NHC-based PBR accrual is remarkably constant: the total DAK shows a strongly linear growth, as opposed to the exponential growth exhibited by cumulative observation data. Projecting the trend to the estimated holdings shoots the completion many decades ahead. In addition, digitized data appear to be taxonomically biased. Digitization efforts must therefore step up qualitatively in order to enable processing the backlog, let alone newly-acquired accessions, within one generation. Among several possible solutions, emerging, industrial-scale mass-digitization techniques may help harnessing this otherwise daunting task—but there’s also a risk that DAK becomes even more uneven across taxon groups because of the narrow application specificity of such techniques, thus potentially biasing our knowledge of nature.}, keywords = {bias, digitally accessible knowledge (DAK), digitization, Natural History Collections (NHC), primary biodiversity data records (PBR), trends}, pubstate = {published}, tppubtype = {article} } Over the past decades, digitization endeavors across many institutions holding natural history collections (NHCs) have multiplied with three broad aims: first, to facilitate collection management by moving existing analog catalogues into digital form; second, to efficiently document and inventory specimens in collections, including imaging them as taxonomical surrogates; and third, to enable discovery of, and access to, the resulting collection data. NHCs contain a unique wealth of potential knowledge in the form of primary biodiversity data records (PBR): at its most basic level, the “what, where and when” of occurrences of the specimens in the collections. But as T.S. Eliot famously said, “knowledge is invariably a matter of degree”. For such data to be transformed into digitally accessible knowledge (DAK) that is conducive to an understanding about how the natural world works, release of digitized data (the “this we know”) is necessary. At least two billion specimens are estimated to exist in NHCs already, but only a small fraction can be considered properly DAK: most have either not been digitized yet, or not released through a discovery facility. Digitizing is relatively costly as it often entails manually processing each specimen unit (e.g. a herbarium sheet, a pinned insect, or a vial full of invertebrates). How long could it take us to transform all NHCs into DAK? Can we keep up with the natural growth in collections? The Global Biodiversity Information Facility (GBIF) has become the de facto main index of PBR, both originated in NHCs or as field observations. Digitized NHC that are standards-compliant and can be connected to, or harvested by, GBIF, effectively become DAK. I have examined GBIF growth data looking for a pattern of DAK generation. I found that the rate of NHC-based PBR accrual is remarkably constant: the total DAK shows a strongly linear growth, as opposed to the exponential growth exhibited by cumulative observation data. Projecting the trend to the estimated holdings shoots the completion many decades ahead. In addition, digitized data appear to be taxonomically biased. Digitization efforts must therefore step up qualitatively in order to enable processing the backlog, let alone newly-acquired accessions, within one generation. Among several possible solutions, emerging, industrial-scale mass-digitization techniques may help harnessing this otherwise daunting task—but there’s also a risk that DAK becomes even more uneven across taxon groups because of the narrow application specificity of such techniques, thus potentially biasing our knowledge of nature. |
Güntsch, Anton; Groom, Quentin; Hyam, Roger; Chagnoux, Simon; Röpert, Dominik; Berendsohn, Walter G; Casino, Ana; Droege, Gabriele; Gerritsen, Willfred; Holetschek, Jörg; Marhold, Karol; Mergen, Patricia; Rainer, Heimo; Stuart Smith, Vincent; Triebel, Dagmar Standardised Globally Unique Specimen Identifiers Journal Article Biodiversity Information Science and Standards, 2 , pp. e26658, 2018. Abstract | Links | BibTeX | Tags: collection management, linked open data, specimen identifier @article{10.3897/biss.2.26658, title = {Standardised Globally Unique Specimen Identifiers}, author = {Anton Güntsch and Quentin Groom and Roger Hyam and Simon Chagnoux and Dominik Röpert and Walter G Berendsohn and Ana Casino and Gabriele Droege and Willfred Gerritsen and Jörg Holetschek and Karol Marhold and Patricia Mergen and Heimo Rainer and Vincent Stuart Smith and Dagmar Triebel}, url = {https://doi.org/10.3897/biss.2.26658}, doi = {10.3897/biss.2.26658}, year = {2018}, date = {2018-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {2}, pages = {e26658}, publisher = {Pensoft Publishers}, abstract = {A simple, permanent and reliable specimen identifier system is needed to take the informatics of collections into a new era of interoperability. A system of identifiers based on HTTP URI (Uniform Resource Identifiers), endorsed by the Consortium of European Taxonomic Facilities (CETAF), has now been rolled out to 14 member organisations (Güntsch et al. 2017). CETAF-Identifiers have a Linked Open Data redirection mechanism for both human- and machine-readable access and, if fully implemented, provide Resource Description Framework (RDF) -encoded specimen data following best practices continuously improved by members of the initiative. To date, more than 20 million physical collection objects have been equipped with CETAF Identifiers (Groom et al. 2017). To facilitate the implementation of stable identifiers, simple redirection scripts and guidelines for deciding on the local identifier syntax have been compiled (http://cetafidentifiers.biowikifarm.net/wiki/Main_Page). Furthermore, a capable "CETAF Specimen URI Tester" (http://herbal.rbge.info/) provides an easy-to-use service for testing whether the existing identifiers are operational. For the usability and potential of any identifier system associated with evolving data objects, active links to the source information are critically important. This is particularly true for natural history collections facing the next wave of industrialised mass digitisation, where specimens come online with only basic, but rapidly evolving label data. Specimen identifier systems must therefore have components for monitoring the availability and correct implementation of individual data objects. Our next implementation steps will involve the development of a "Semantic Specimen Catalogue", which has a list of all existing specimen identifiers together with the latest RDF metadata snapshot. The catalogue will be used for semantic inference across collections as well as the basis for periodic testing of identifiers.}, keywords = {collection management, linked open data, specimen identifier}, pubstate = {published}, tppubtype = {article} } A simple, permanent and reliable specimen identifier system is needed to take the informatics of collections into a new era of interoperability. A system of identifiers based on HTTP URI (Uniform Resource Identifiers), endorsed by the Consortium of European Taxonomic Facilities (CETAF), has now been rolled out to 14 member organisations (Güntsch et al. 2017). CETAF-Identifiers have a Linked Open Data redirection mechanism for both human- and machine-readable access and, if fully implemented, provide Resource Description Framework (RDF) -encoded specimen data following best practices continuously improved by members of the initiative. To date, more than 20 million physical collection objects have been equipped with CETAF Identifiers (Groom et al. 2017). To facilitate the implementation of stable identifiers, simple redirection scripts and guidelines for deciding on the local identifier syntax have been compiled (http://cetafidentifiers.biowikifarm.net/wiki/Main_Page). Furthermore, a capable "CETAF Specimen URI Tester" (http://herbal.rbge.info/) provides an easy-to-use service for testing whether the existing identifiers are operational. For the usability and potential of any identifier system associated with evolving data objects, active links to the source information are critically important. This is particularly true for natural history collections facing the next wave of industrialised mass digitisation, where specimens come online with only basic, but rapidly evolving label data. Specimen identifier systems must therefore have components for monitoring the availability and correct implementation of individual data objects. Our next implementation steps will involve the development of a "Semantic Specimen Catalogue", which has a list of all existing specimen identifiers together with the latest RDF metadata snapshot. The catalogue will be used for semantic inference across collections as well as the basis for periodic testing of identifiers. |
Mergen, Patricia; Meissner, Kristian; Hering, Daniel; Leese, Florian; Bouchez, Agnès; Weigand, Alexander; Kueckmann, Sarah DNAqua-Net or how to navigate on the stormy waters of standards and legislations Journal Article Biodiversity Information Science and Standards, 2 , pp. e25953, 2018. Abstract | Links | BibTeX | Tags: Bioassessments, e-DNA, GDPR, legislation, metagenomics, Nagoya Protocol, standards, water quality @article{10.3897/biss.2.25953, title = {DNAqua-Net or how to navigate on the stormy waters of standards and legislations}, author = {Patricia Mergen and Kristian Meissner and Daniel Hering and Florian Leese and Agnès Bouchez and Alexander Weigand and Sarah Kueckmann}, url = {https://doi.org/10.3897/biss.2.25953}, doi = {10.3897/biss.2.25953}, year = {2018}, date = {2018-01-01}, journal = {Biodiversity Information Science and Standards}, volume = {2}, pages = {e25953}, publisher = {Pensoft Publishers}, abstract = {Several national and international environmental laws require countries to meet clearly defined targets with respect to the ecological status of aquatic ecosystems. In Europe, the EU-Water Framework Directive (WFD; 2000/60/EC) represents such a detailed piece of legislation. The WFD that requires the European member countries to achieve an at least ‘good’ ecological status of all surface waters at latest by the year 2027. In order to assess the ecological status of a given water body under the WFD, data on its aquatic biodiversity are obtained and compared to reference status. The mismatch between these two metrics then is used to derive the respective ecological status class. While the workflow to carry out the assessment is well established, it relies only on few biological groups (typically fish, macroinvertebrates and a few algal taxa such as diatoms), is time consuming and remains at a lower taxonomic resolution, so that the identifications can be done routinely by non-experts with an acceptable learning curve. Here, novel genetic and genomic tools provide new solutions to speed up the process and allow to include a much greater proportion of biodiversity in the assessment process. further, results are easily comparable through the genetic ‘barcodes’ used to identify organisms. The aim of the large international COST Action DNAqua-Net (http://dnaqua.net/) is to develop strategies on how to include novel genetic tools in bioassessment of aquatic ecosystems in Europe and beyond and how to standardize these among the participating countries. It is the ambition of the network to have these new genetic tools accepted in future legal frameworks such as the EU-Water Framework Directive (WFD; 2000/60/EC) and the Marine Strategy Framework Directive (2008/56/EC). However, a prerequisite is that various aspects that start from the validation and completion of DNA Barcode reference databases, to the lab and field protocols, to the analysis processes as well as the subsequently derived biotic indices and metrics are dealt with and commonly agreed upon. Furthermore, many pragmatic questions such as adequate short and long-term storage of samples or specimens for further processing or to serve as an accessible reference need also be addressed. In Europe the conformity and backward compatibility of the new methods with the existing legislation and workflows are further of high importance. Without rigorous harmonization and inter-calibration concepts, the implementation of the powerful new genetic tools will be substantially delayed in real-world legal framework applications. After a short introduction on the structure and vision of DNAqua-Net, we discuss how the DNAqua-Net community considers possibilities to include novel DNA-based approaches into current bioassessment and how formal standardization e.g. through the framework of CEN (The European Committee for Standardization) may aid in that process (Hering et al. 2018, Leese et al. 2016, Leese et al. 2018. Further we explore how TDWG data standards can further facilitate swift adoption of the genetic methods in routine use. We further present potential impacts of the legislative requirements of the Nagoya Protocol on the exchange of genetic resources and their implications for biomonitoring. Last but not least, we will touch upon the rather unexpected influence that the new General Data Protection Regulation (GDPR) may have on the bioassessment work in practice.}, keywords = {Bioassessments, e-DNA, GDPR, legislation, metagenomics, Nagoya Protocol, standards, water quality}, pubstate = {published}, tppubtype = {article} } Several national and international environmental laws require countries to meet clearly defined targets with respect to the ecological status of aquatic ecosystems. In Europe, the EU-Water Framework Directive (WFD; 2000/60/EC) represents such a detailed piece of legislation. The WFD that requires the European member countries to achieve an at least ‘good’ ecological status of all surface waters at latest by the year 2027. In order to assess the ecological status of a given water body under the WFD, data on its aquatic biodiversity are obtained and compared to reference status. The mismatch between these two metrics then is used to derive the respective ecological status class. While the workflow to carry out the assessment is well established, it relies only on few biological groups (typically fish, macroinvertebrates and a few algal taxa such as diatoms), is time consuming and remains at a lower taxonomic resolution, so that the identifications can be done routinely by non-experts with an acceptable learning curve. Here, novel genetic and genomic tools provide new solutions to speed up the process and allow to include a much greater proportion of biodiversity in the assessment process. further, results are easily comparable through the genetic ‘barcodes’ used to identify organisms. The aim of the large international COST Action DNAqua-Net (http://dnaqua.net/) is to develop strategies on how to include novel genetic tools in bioassessment of aquatic ecosystems in Europe and beyond and how to standardize these among the participating countries. It is the ambition of the network to have these new genetic tools accepted in future legal frameworks such as the EU-Water Framework Directive (WFD; 2000/60/EC) and the Marine Strategy Framework Directive (2008/56/EC). However, a prerequisite is that various aspects that start from the validation and completion of DNA Barcode reference databases, to the lab and field protocols, to the analysis processes as well as the subsequently derived biotic indices and metrics are dealt with and commonly agreed upon. Furthermore, many pragmatic questions such as adequate short and long-term storage of samples or specimens for further processing or to serve as an accessible reference need also be addressed. In Europe the conformity and backward compatibility of the new methods with the existing legislation and workflows are further of high importance. Without rigorous harmonization and inter-calibration concepts, the implementation of the powerful new genetic tools will be substantially delayed in real-world legal framework applications. After a short introduction on the structure and vision of DNAqua-Net, we discuss how the DNAqua-Net community considers possibilities to include novel DNA-based approaches into current bioassessment and how formal standardization e.g. through the framework of CEN (The European Committee for Standardization) may aid in that process (Hering et al. 2018, Leese et al. 2016, Leese et al. 2018. Further we explore how TDWG data standards can further facilitate swift adoption of the genetic methods in routine use. We further present potential impacts of the legislative requirements of the Nagoya Protocol on the exchange of genetic resources and their implications for biomonitoring. Last but not least, we will touch upon the rather unexpected influence that the new General Data Protection Regulation (GDPR) may have on the bioassessment work in practice. |
Silva, Ana Serra; Groz, Maria Pitta; Leandro, Paula; Assis, Carlos A; Figueira, Rui Ichthyological collection of the Museu Oceanográfico D. Carlos I Journal Article ZooKeys, 752 , pp. 137-148, 2018, ISSN: 1313-2989. Abstract | Links | BibTeX | Tags: Actinopterygii, Animalia, D. Carlos I, Elasmobranchii, Holocephali, Myxini, Natural History collection, Occurrence, Petromyzonti, Portugal @article{10.3897/zookeys.752.20086, title = {Ichthyological collection of the Museu Oceanográfico D. Carlos I}, author = {Ana Serra Silva and Maria Pitta Groz and Paula Leandro and Carlos A Assis and Rui Figueira}, url = {https://doi.org/10.3897/zookeys.752.20086}, doi = {10.3897/zookeys.752.20086}, issn = {1313-2989}, year = {2018}, date = {2018-01-01}, journal = {ZooKeys}, volume = {752}, pages = {137-148}, publisher = {Pensoft Publishers}, abstract = {The collection of the Museu Oceanográfico D. Carlos I is a historical specimen, instrument, and document collection that has been housed at the Aquário Vasco da Gama since 1935. The collection is largely the result of several scientific campaigns conducted by Dom Carlos de Bragança between 1896 and 1907. Specifically, the ichthyological collection consists of 675 surviving catalogue records of specimens caught, acquired or offered to D. Carlos I between 1892 to 1907, and includes the type specimen for Odontaspis nasutus Bragança, 1904 (junior synonym of Mitsukurina owstoni Jordan, 1898), along with several specimens of deep sea species. All specimens were captured in coastal Portuguese waters, and were preserved in alcohol, formalin, or mounted.}, keywords = {Actinopterygii, Animalia, D. Carlos I, Elasmobranchii, Holocephali, Myxini, Natural History collection, Occurrence, Petromyzonti, Portugal}, pubstate = {published}, tppubtype = {article} } The collection of the Museu Oceanográfico D. Carlos I is a historical specimen, instrument, and document collection that has been housed at the Aquário Vasco da Gama since 1935. The collection is largely the result of several scientific campaigns conducted by Dom Carlos de Bragança between 1896 and 1907. Specifically, the ichthyological collection consists of 675 surviving catalogue records of specimens caught, acquired or offered to D. Carlos I between 1892 to 1907, and includes the type specimen for Odontaspis nasutus Bragança, 1904 (junior synonym of Mitsukurina owstoni Jordan, 1898), along with several specimens of deep sea species. All specimens were captured in coastal Portuguese waters, and were preserved in alcohol, formalin, or mounted. |
2017 |
Monteiro, Miguel; Figueira, Rui; Melo, Martim; Mills, Michael Stuart Lyne; Beja, Pedro; Bastos-Silveira, Cristiane; Ramos, Manuela; Rodrigues, Diana; Neves, Isabel Queirós; Consciência, Susana; Reino, Luís The collection of birds from Mozambique at the Instituto de Investigação Científica Tropical of the University of Lisbon (Portugal) Journal Article ZooKeys, 708 , pp. 139-152, 2017, ISSN: 1313-2989. Abstract | Links | BibTeX | Tags: Animalia, Aves, Biodiversity databases, Chordata, museum, southern Africa, species occurrence data, specimen @article{10.3897/zookeys.708.13351, title = {The collection of birds from Mozambique at the Instituto de Investigação Científica Tropical of the University of Lisbon (Portugal)}, author = {Miguel Monteiro and Rui Figueira and Martim Melo and Michael Stuart Lyne Mills and Pedro Beja and Cristiane Bastos-Silveira and Manuela Ramos and Diana Rodrigues and Isabel Queirós Neves and Susana Consciência and Luís Reino}, url = {https://doi.org/10.3897/zookeys.708.13351}, doi = {10.3897/zookeys.708.13351}, issn = {1313-2989}, year = {2017}, date = {2017-01-01}, journal = {ZooKeys}, volume = {708}, pages = {139-152}, publisher = {Pensoft Publishers}, abstract = {The Instituto de Investigação Científica Tropical of the University of Lisbon, which resulted from the recent merger (in 2015) of the former state laboratory Instituto de Investigação Científica Tropical in the University of Lisbon, holds an important collection of bird skins from the Portuguese-speaking African Countries (Angola, Mozambique, São Tomé and Príncipe, Guinea Bissau and Cape Verde), gathered as a result of several scientific expeditions made during the colonial period. In this paper, the subset from Mozambique is described, which was taxonomically revised and georeferenced. It contains 1585 specimens belonging to 412 taxa, collected between 1932 and 1971, but mainly in 1948 (43% of specimens) and 1955 (30% of specimens). The collection covers all eleven provinces of the country, although areas south of the Zambezi River are better represented than those north of the river. The provinces with the highest number of specimens were Maputo, Sofala, and Gaza. Although it is a relatively small collection with a patchy coverage, it adds significantly to Global Biodiversity Information Facility, with 15% of all records available before and during the collecting period (1830–1971) being the second largest dataset for that period for Mozambique.}, keywords = {Animalia, Aves, Biodiversity databases, Chordata, museum, southern Africa, species occurrence data, specimen}, pubstate = {published}, tppubtype = {article} } The Instituto de Investigação Científica Tropical of the University of Lisbon, which resulted from the recent merger (in 2015) of the former state laboratory Instituto de Investigação Científica Tropical in the University of Lisbon, holds an important collection of bird skins from the Portuguese-speaking African Countries (Angola, Mozambique, São Tomé and Príncipe, Guinea Bissau and Cape Verde), gathered as a result of several scientific expeditions made during the colonial period. In this paper, the subset from Mozambique is described, which was taxonomically revised and georeferenced. It contains 1585 specimens belonging to 412 taxa, collected between 1932 and 1971, but mainly in 1948 (43% of specimens) and 1955 (30% of specimens). The collection covers all eleven provinces of the country, although areas south of the Zambezi River are better represented than those north of the river. The provinces with the highest number of specimens were Maputo, Sofala, and Gaza. Although it is a relatively small collection with a patchy coverage, it adds significantly to Global Biodiversity Information Facility, with 15% of all records available before and during the collecting period (1830–1971) being the second largest dataset for that period for Mozambique. |
2016 |
Dauby, Gilles; Zaiss, Rainer; Blach-Overgaard, Anne; Catarino, Luís; Damen, Theo; Deblauwe, Vincent; Dessein, Steven; Dransfield, John; Droissart, Vincent; Duarte, Maria Cristina; Engledow, Henry; Fadeur, Geoffrey; Figueira, Rui; Gereau, Roy E; Hardy, Olivier J; Harris, David J; de Heij, Janneke; Janssens, Steven; Klomberg, Yannick; Ley, Alexandra C; MacKinder, Barbara A; Meerts, Pierre; van de Poel, Jeike L; Sonké, Bonaventure; Sosef, Marc S M; Stévart, Tariq; Stoffelen, Piet; Svenning, Jens-Christian; Sepulchre, Pierre; van der Burgt, Xander; Wieringa, Jan J; Couvreur, Thomas L P RAINBIO: a mega-database of tropical African vascular plants distributions Journal Article PhytoKeys, 74 , pp. 1-18, 2016, ISSN: 1314-2011. Abstract | Links | BibTeX | Tags: biodiversity assessment, cultivated species, digitization, georeferencing, habit, Herbarium specimens, native species, taxonomic backbone, tropical forests @article{10.3897/phytokeys.74.9723, title = {RAINBIO: a mega-database of tropical African vascular plants distributions}, author = {Gilles Dauby and Rainer Zaiss and Anne Blach-Overgaard and Luís Catarino and Theo Damen and Vincent Deblauwe and Steven Dessein and John Dransfield and Vincent Droissart and Maria Cristina Duarte and Henry Engledow and Geoffrey Fadeur and Rui Figueira and Roy E Gereau and Olivier J Hardy and David J Harris and Janneke de Heij and Steven Janssens and Yannick Klomberg and Alexandra C Ley and Barbara A MacKinder and Pierre Meerts and Jeike L van de Poel and Bonaventure Sonké and Marc S M Sosef and Tariq Stévart and Piet Stoffelen and Jens-Christian Svenning and Pierre Sepulchre and Xander van der Burgt and Jan J Wieringa and Thomas L P Couvreur}, url = {https://doi.org/10.3897/phytokeys.74.9723}, doi = {10.3897/phytokeys.74.9723}, issn = {1314-2011}, year = {2016}, date = {2016-01-01}, journal = {PhytoKeys}, volume = {74}, pages = {1-18}, publisher = {Pensoft Publishers}, abstract = {The tropical vegetation of Africa is characterized by high levels of species diversity but is undergoing important shifts in response to ongoing climate change and increasing anthropogenic pressures. Although our knowledge of plant species distribution patterns in the African tropics has been improving over the years, it remains limited. Here we present RAINBIO, a unique comprehensive mega-database of georeferenced records for vascular plants in continental tropical Africa. The geographic focus of the database is the region south of the Sahel and north of Southern Africa, and the majority of data originate from tropical forest regions. RAINBIO is a compilation of 13 datasets either publicly available or personal ones. Numerous in depth data quality checks, automatic and manual via several African flora experts, were undertaken for georeferencing, standardization of taxonomic names and identification and merging of duplicated records. The resulting RAINBIO data allows exploration and extraction of distribution data for 25,356 native tropical African vascular plant species, which represents ca. 89% of all known plant species in the area of interest. Habit information is also provided for 91% of these species.}, keywords = {biodiversity assessment, cultivated species, digitization, georeferencing, habit, Herbarium specimens, native species, taxonomic backbone, tropical forests}, pubstate = {published}, tppubtype = {article} } The tropical vegetation of Africa is characterized by high levels of species diversity but is undergoing important shifts in response to ongoing climate change and increasing anthropogenic pressures. Although our knowledge of plant species distribution patterns in the African tropics has been improving over the years, it remains limited. Here we present RAINBIO, a unique comprehensive mega-database of georeferenced records for vascular plants in continental tropical Africa. The geographic focus of the database is the region south of the Sahel and north of Southern Africa, and the majority of data originate from tropical forest regions. RAINBIO is a compilation of 13 datasets either publicly available or personal ones. Numerous in depth data quality checks, automatic and manual via several African flora experts, were undertaken for georeferencing, standardization of taxonomic names and identification and merging of duplicated records. The resulting RAINBIO data allows exploration and extraction of distribution data for 25,356 native tropical African vascular plant species, which represents ca. 89% of all known plant species in the area of interest. Habit information is also provided for 91% of these species. |
2014 |
Romeiras, Maria M; Figueira, Rui; Duarte, Maria Cristina; Beja, Pedro; Darbyshire, Iain PLOS ONE, 9 (7), pp. 1-11, 2014. Abstract | Links | BibTeX | Tags: @article{10.1371/journal.pone.0103403, title = {Documenting Biogeographical Patterns of African Timber Species Using Herbarium Records: A Conservation Perspective Based on Native Trees from Angola}, author = {Maria M Romeiras and Rui Figueira and Maria Cristina Duarte and Pedro Beja and Iain Darbyshire}, url = {https://doi.org/10.1371/journal.pone.0103403}, doi = {10.1371/journal.pone.0103403}, year = {2014}, date = {2014-01-01}, journal = {PLOS ONE}, volume = {9}, number = {7}, pages = {1-11}, publisher = {Public Library of Science}, abstract = {In many tropical regions the development of informed conservation strategies is hindered by a dearth of biodiversity information. Biological collections can help to overcome this problem, by providing baseline information to guide research and conservation efforts. This study focuses on the timber trees of Angola, combining herbarium (2670 records) and bibliographic data to identify the main timber species, document biogeographic patterns and identify conservation priorities. The study recognized 18 key species, most of which are threatened or near-threatened globally, or lack formal conservation assessments. Biogeographical analysis reveals three groups of species associated with the enclave of Cabinda and northwest Angola, which occur primarily in Guineo-Congolian rainforests, and evergreen forests and woodlands. The fourth group is widespread across the country, and is mostly associated with dry forests. There is little correspondence between the spatial pattern of species groups and the ecoregions adopted by WWF, suggesting that these may not provide an adequate basis for conservation planning for Angolan timber trees. Eight of the species evaluated should be given high conservation priority since they are of global conservation concern, they have very restricted distributions in Angola, their historical collection localities are largely outside protected areas and they may be under increasing logging pressure. High conservation priority was also attributed to another three species that have a large proportion of their global range concentrated in Angola and that occur in dry forests where deforestation rates are high. Our results suggest that timber tree species in Angola may be under increasing risk, thus calling for efforts to promote their conservation and sustainable exploitation. The study also highlights the importance of studying historic herbarium collections in poorly explored regions of the tropics, though new field surveys remain a priority to update historical information.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In many tropical regions the development of informed conservation strategies is hindered by a dearth of biodiversity information. Biological collections can help to overcome this problem, by providing baseline information to guide research and conservation efforts. This study focuses on the timber trees of Angola, combining herbarium (2670 records) and bibliographic data to identify the main timber species, document biogeographic patterns and identify conservation priorities. The study recognized 18 key species, most of which are threatened or near-threatened globally, or lack formal conservation assessments. Biogeographical analysis reveals three groups of species associated with the enclave of Cabinda and northwest Angola, which occur primarily in Guineo-Congolian rainforests, and evergreen forests and woodlands. The fourth group is widespread across the country, and is mostly associated with dry forests. There is little correspondence between the spatial pattern of species groups and the ecoregions adopted by WWF, suggesting that these may not provide an adequate basis for conservation planning for Angolan timber trees. Eight of the species evaluated should be given high conservation priority since they are of global conservation concern, they have very restricted distributions in Angola, their historical collection localities are largely outside protected areas and they may be under increasing logging pressure. High conservation priority was also attributed to another three species that have a large proportion of their global range concentrated in Angola and that occur in dry forests where deforestation rates are high. Our results suggest that timber tree species in Angola may be under increasing risk, thus calling for efforts to promote their conservation and sustainable exploitation. The study also highlights the importance of studying historic herbarium collections in poorly explored regions of the tropics, though new field surveys remain a priority to update historical information. |
ResearchGate Link : https://www.researchgate.net/project/MOBILISE-COST-Action-CA17106-Mobilising-Data-Policies-and-Experts-in-Scientific-Collections