British Library (BL) – Contact person: Sergio Ruiz

The British Library is the national library of the United Kingdom and one of the world’s greatest research libraries. It supports the UK’s research infrastructure, serving business and industry, researchers, academics and students, world-wide, as well as in the UK. The BL receives a copy of every publication produced in the UK and Ireland via legal deposit. Its collection includes well over
150 million items, in most known languages and grows by approximately three million items a year. A major concern is the collection and preservation of born-digital material, including websites. The British Library is passionate about providing both physical and digital access to world-class information where and when people need it. Over 16,000 people use the collections of the British Library each day (on site and online). In the course of a year six million searches are generated by the British Library online catalogue and nearly 400,000 people visit the Reading rooms. With these impressive figures and the experience gathered in the past, the British Library demonstrates capability for engaging with the social sciences research community and taking on responsibility for delivering the proof of concept in this area.

The British Library is committed to improving access to, reuse of and tracking impact of research datasets. It is a founding member of DataCite, which assigns persistent digital object identifiers (DOI names) to research datasets, and it is the UK registration agency for this project. DataCite has assigned DOI names to the majority of the collection of the Economic and Social Data Service, which
is the UK’s main data repository for social sciences research data. The Library’s social science strategy is committed to improving access to research data, and supporting efforts to improve re-use of social sciences data and to highlight the impact of research.

The BL is the project coordinator and will lead WP3, providing the proof of concept in the HSS domain.


Australian National Data Services (ANDS) – Contact person: Amir Aryani

The Australian government recognises the need for Australian researchers to have unprecedented access to data, enabling more efficient and more new research to be conducted in a richer data environment which can address major research challenges. To enable this, the Australian National Data Service (ANDS) was established focusing on bringing about four transformations to data – from unmanaged to managed, from disconnected to connected, from invisible to findable and from single use to reusable – that will enable Australia’s research data as a whole to become a national strategic resource.

To this end ANDS is partnering with research institutions across Australia to create and build the Australian Research Data Commons (ARDC) – a cohesive national collection of research resources – to provide:
– A set of data collections that are shareable
– Descriptions of those collections
– An infrastructure that enables populating and exploiting the commons
– Connections between data, researchers, research, instruments and institutions

The ARDC is a meeting place for researchers and data, designed to make better use of Australia’s research outputs; enable Australian researchers to easily publish, discover, access and use data; and enable new and more efficient research.

ANDS is implementing a number of significant national services to enable researchers and research institutions to improve their research data management, leading to routine publication of their data with ANDS persistent identifiers into data stores that feed information to the ANDS collections registry. In addition, researchers will be able to find a wide variety of data sets through a variety of discovery paths. Most importantly ANDS will have engaged the research community to the extent that researchers see publishing their research data as their default practice.

The Australian National Data Service was established in January 2009. ANDS is supported by the Australian Government. The ANDS Project partners are Monash University (lead institution), CSIRO and the Australian National University.

ANDS will contribute its perspective as a national, multi-disciplinary registration agency for PIs and experience from ARDC. It will lead WP6.


Cornell University Library (arXiv) – Contact person: Simeon Warner 

The arXiv e-print repository has transformed scholarly communication in multiple fields of physics, and plays an increasingly prominent role in other areas of physics, mathematics, computer science, and related disciplines. Open-access to the full-text of all articles is a founding principle and early adoption of the web allowed arXiv to demonstrate a model of linked online scholarly communication that has been widely adopted since.

Since its inception in 1991 — then with a focus on the high energy physics community — arXiv has expanded enormously in both its subject coverage and user base. It moved to Cornell with its founder, physicist Paul Ginsparg, in 2001, and operated and managed by the Cornell University Library. arXiv now has over 710,000 articles with more than 70,000 new articles deposited each year by some of its 120,000 registered authors. There were over 40M full-text downloads in 2010. It is considered and essential service by many scientists and Cornell is engaged in work to develop an effective sustainability model for arXiv to guarantee continued development and innovation.

Data sets are of critical importance in the subject areas arXiv serves. For some years it has provided rudimentary facilities for data to be submitted alongside articles and held locally. However, while articles in an environment such as arXiv must link to associated datasets it is too constraining to demand that data be held locally — instead it must be accurately identified and link to whether stored locally or elsewhere. Remote data deposit is currently provided in collaboration with the NSF funded Data Conservancy at Johns Hopkins University and we are eager to generalize this prototype service. The first web interface for arXiv was created in 1994 and ever since searching was available the most popular type of search has been on author names. The limitations of string based author search have be apparent for many years and arXiv has a local author identifier system tied to user accounts which provides more accurate authorship information that is possible with search. ORCID will not only allow arXiv to implement this system in a standard way but also allow author identities to be linked with other systems such as INSPIRE, ADS and other information systems. Cornell University Library is a founding member of ORCID.

arXiv distributes the majority of new articles in high-energy physics, and together with INSPIRE forms by far the most important scholarly communication system for the discipline. The two systems thus provide an ideal venue for the proof-of-concept in WP3, demonstrating data set and author linking in physics. arXiv will also support agile validation of the interoperability findings of WP4.


International Data Citation Initiative (DataCite) – Contact person: Jan Brase

The association DataCite, founded under German Law in December 2009, has set itself the goal of making the online access to research data for scientists easier; promoting the acceptance of research data as individual, referable scientific objects in their own right and in doing so guarantee the adherence to the rules of sound scientific practice. 17 partners from 12 countries have come together under the DataCite umbrella:
– The German National Library of Science and Technology (TIB)
– The German National Library of Medicine (ZB MED)
– The German National Library of Economis (ZBW)
– The Leibniz Institute for Social Sciences GESIS in Germany
– The British Library
– The French L’Institut de L’Information Scientifique et Technique (INIST)
– The Technical Information Center of Denmark
– The TU Delft Library from the Netherlands
– The Swedish National Data Service (SND)
– Conferenza dei Rettori delle Università Italiane (CRUI)
– The Library of the ETH Zurich in Switzerland
– The Canada Institute for Scientific and Technical Information (CISTI)
– The California Digital Library (USA)
– The Purdue University (USA)
– The Office of Scientific and Technical Information (OSTI) of the U.S. Department of Energy
– The Australian National Data Service (ANDS)
– The Korea Institute of Science and Technology Information (KISTI)

DataCite is an official DOI (Digital Object Identifier)-Registration agency and a member of the international DOI Foundation (IDF). DataCite is oriented to the successful work of the German National Library of Science and Technology (TIB). TIB was the first registering agent for research data worldwide. DataCite has already registered around 1,300,000 research data sets and in doing so it has enabled simple access to data and citation of data sets. DataCite is cooperating with libraries, data centres and publishers to allow easy referencing and linking of datasets from online catalogues and out of scientific articles. The DataCite’s office is with the TIB in Hannover, it is a registered non-for-profit association (gemeinnütziger Verein).

DataCite will be the scientific coordinator in WP1 and will lead WP4. It will through its members actively support the assignment of DOI names to data sets in all relevant other activities of the project.


European Organization for Nuclear Research (CERN) – Contact person: Salvatore Mele

CERN, “where the Web was born”, is funded by 20 European Member States with a budget of more than €800M/yr. It has 2,500 permanent staff and hosts over 10,000 HEP scientists from more than 250 institutes in 85 countries. CERN flagship project, the Large Hadron Collider (LHC), is breaking all records for particle accelerators and its results will soon unveil a new understanding of how our Universe works. It is the world’s largest and most powerful scientific instrument, producing around 15 Petabytes of particle-collision data per year. CERN chose Grid technology to address the huge data storage and analysis challenge of LHC. It has prominently contributed to dozens of EC co-funded Grid projects and operates the largest multi-disciplinary Grid infrastructure in the world. Over half a century ago the CERN charter enshrined that “… the results of its experimental and theoretical work shall be published or otherwise made generally available” and it plays a leading role
in both the European and worldwide Open Access movements. This Open Access vision and IT innovation come together through the development of Invenio, an Open Source digital library platform, which CERN and collaborating partners in Europe, the United States and Japan use to power INSPIRE, a one-stop-shop digital library for HEP, serving 1 million records to 50’000 scientists worldwide, and increasingly hosting additional content, bridging the gap between data and publications.

CERN is a founding member of ORCID and contributes to several FP7 projects relevant to the topics of this call: SOAP (coordinator, 230220) charged to study Open Access publishing business models; PARSE.Insight (223758) aiming to shed light on issues of preservation of data in the light of a pan- European e-Infrastructure for scientific data; ODE (261530, coordinator) which unveiled opportunities for e-infrastructures to enable open data; OpenAIRE (246686) and OpenAIREPlus (283595) which is powering the Open Access pilot of the EC and its extension to data.

CERN offers a unique complementary perspective of a producer of unique primary research data, the host of scientific collaborations of unprecedented scale, a pioneer in scholarly communication, a major player in the design and construction of e-Infrastructures. The motivation for CERN to participate in ODIN is to explore opportunities generated by widespread access to unique and non-reproducible primary research data in HEP and beyond and will lead WP5 and deliver the HEP proof of concept in WP3.


Duke University (Dryad Digital Repository) – Contact person: Todd Vision

The mission of Dryad is to enable the long-term availability of the data that forms the evidence base for findings in the published bioscience literature for validation and reuse. The centerpiece of the organization is the Dryad Digital Repository, which aims to make it both simple and rewarding for researchers to publish data.

Data files may be associated with any published article, and may include scripts or other information integral to the article. There is no restriction regarding format, and files may be larger than typically allowed by journal websites. The deposit process is streamlined through integration with the manuscript submission process of a growing number of journals. Terms of reuse are explicit and non-restrictive, through application of a Creative Commons Zero waiver, thus maximizing scientific impact. Data may be embargoed until publication, or for a limited time afterward, as consistent with journal policy. Researchers may update data files and make corrections or amendments without overwriting the original versions. Data are made securely available for peer review upon request of the journal. Researchers get credit for reuse of data through promotion of best-practice data citation policy and infrastructure for tracking data citations. The DataCite DOI system provides global, persistent, resolvable identifiers that stamp data as first class scholarly objects and allow reciprocal linking from journal articles. Metadata are curated, and are indexed by major bibliographic and web resources, to promote data discoverability. Download statistics are available to assess usage of individual data files, which captures use even in contexts that citations may miss (e.g. classrooms).
Data are preserved and made available for the long-term, even beyond the life of Dryad, through file format migration and a multilayered backup and replication network.

For publishers, Dryad frees them from the responsibility for managing supplemental data and increases the benefits of journals to their user communities. For funding organizations, Dryad provides a cost-effective mechanism for enabling new science and making publicly funded research results publicly available.

Dryad is designed to curate the data from thousands of new articles each year. Since its founding in 2009, Dryad has already received data associated with nearly 100 different journals. The organization is governed by the Dryad Consortium, which includes scientific societies,
publishers, and other stakeholder organizations. Dryad is in the process of registering as a non-profit member organization in the United States and implementing a revenue model for long-term sustainability based primarily on deposit charges. Dryad is an active participant in organizations developing best-practices for data management such as DataCite and ORCID.

Dryad will contribute to WP3, 4 and 5 through its advisory and validation role in WP6.


Open Researcher & Contributor ID initiative (ORCID EU) – Contact person: Josh Brown

ORCID Inc was incorporated as an independent, non-profit organization in 2010 to solve the name  
ambiguity problem in scholarly research and communications by establishing a global, open registry
to provide persistent, unique person-identifiers for researchers. This, and associated disambiguation,
will enhance the process of scientific discovery as well as the efficiency of research funding and
collaboration and accuracy of output and impact tracking.

More than 275 organizations from 40 countries (40% of them from 19 European countries) are now
participating in ORCID, including academic institutions, publishers, corporate organizations, non-
profit organizations, scholarly societies, and government agencies. ORCID EU is a non-profit
organisation based in Belgium, created to “solve the author/contributor name ambiguity problem in
scholarly communications by encouraging the use in Europe of ORCID as a central registry of unique
identifiers for individual researchers” and “create interoperability between ORCID and other
researcher identifier schemes in Europe”.

The ORCID system creates a unique identifier for each researcher. Researchers are given an ID
through a combination of bulk uploading of basic researcher details by funders, academic institutions,
and publishers, and by individual researchers registering and ‘claiming’ an ID.

The ORCID system is being designed with an ability to inter-operate with other researcher and author
ID schemes and researcher profile systems. ORCIDs should become part of the publication
manuscript submission system, be used by data centers for submitted datasets, be used by funding
agencies in grant applications, and be used to populate researcher profiles and keep them up-to-date
by bringing key information together at minimum burden to researchers.

The ORCID initiative is guided by a set of principles that reflect the mission of the organization, its
inclusiveness and its openness:
– ORCID will work to support the creation of a permanent, clear and unambiguous record of
scholarly communication by enabling reliable attribution of authors and contributors.
– ORCID will transcend discipline, geographic, national and institutional, boundaries.
– Participation in ORCID is open to any organization that has an interest in scholarly
– Researchers will be able to create, edit, and maintain an ORCID ID and profile free of charge.
– Researchers will control the defined privacy settings of their own ORCID profile data.
– All profile data contributed to ORCID by researchers or claimed by them will be available in
standard formats for free download (subject to the researchers’ own privacy settings) that is
updated once a year and released under the CC0 waiver.
– ORCID will be governed by representatives from a broad cross-section of stakeholders, the
majority of whom are not-for-profit, and will strive for maximal transparency by publicly posting
summaries of all board meetings and annual financial reports.

Through its openness and collaboration with all key stakeholders ORCID aims to become the de-facto
standard for scholarly author identification.

ORCID will lead WP2. It will through its members actively support the assignment of DOI names to
data sets in all relevant other activities of the project.