Useful data: developing standards for describing clinical research data across repositories

提高透明度的运动导致越来越多的文档,期刊出版物和与特定临床试验有关的原始数据集包含在广泛的存储库中。这是该领域的一个很好的方向,但是如果我们要以最有效的方式使用这些“数据对象”,我们需要以这些存储库的描述(并因此可以搜索)建立一致性。欧洲临床研究基础设施网络的史蒂夫·坎纳姆(Steve Canham)和克里斯蒂安·奥曼(Christian Ohmann)最近有了published a Methodology in试验proposing standards for consistent descriptions of these data objects and here, both authors highlight its importance for the field and summarize their proposal.

Trialists are under increasing pressure, from funders, journal editors, and a general cultural shift towards full transparency in science, to make the original data and documents of their studies available to others.

Systems are slowly evolving to support such access, including the development of various data repositories. But a fundamental issue, of ‘discoverability’, needs to be resolved before the full promise of this new transparency can be realized, given the fact that the various ‘data objects’ (the generic term for any document or dataset available in electronic format) may be scattered between different repositories, publishers and institutions.

…the clinical research community needs to agree a simple, consistent metadata scheme for clinical research data objects, and deploy it…

具体来说,我们需要用于人类和机器的系统,这些系统可用于定位和表征研究中存在的各种数据集和文档,以及用于访问它们所需的信息。但是,我们认为,只有在源始终如一的方式描述各种数据对象的情况下,只有以具有成本效益的方式(因此可持续)进行这种方式,可以定期提供软件系统可以收集的数据。

简而言之,我们断言,临床研究社区需要在所有这些数据对象的所有位置中,同意一个简单,一致的元数据方案,并将其部署或至少映射到它。

Any such schema has to:

  1. 明确地识别数据对象的研究或研究,该研究对象(或从内部使用 /使用)。
  2. Characterize the research object itself – e.g. its type, authorship, contents, size, language, etc.
  3. Describe the object’s location and the access regime under which it is available. If not public, the regime needs to be described in sufficient detail for a potential user to be able to apply for access.
  4. 重量足够轻,以便容易适用,尤其是那些首先生成数据对象的人。
  5. 尽可能地利用现有数据模式中的元素

Published today in试验, we propose such a schema, which is based on the widely used DataCite standard to describe the data or document itself, but with two extensions to cover the specific needs of clinical researchers.

这些提供:a)研究识别数据,包括临床试验注册表ID,b)涵盖位置,所有权和对数据对象的访问的数据。表1总结了我们的建议,并指示了我们认为是强制性,建议或可选的哪些数据点。

强制的 Recommended Optional
A.1来源研究标题* A.2 Study Identifier records*

A.3研究主题*

B.1 doi(1)

B.3对象标题

B.5 Version

B.2 Object Other Identifiers*

B.4 Object Additional Titles*

C.1 Creators* C.2贡献者*
D.1创造年 D.2 Dates*
E.1资源类型常规 E.2资源类型

E.3描述*

E.5 Language

E.6 Related Identifiers*

E.4 Subjects (of data object)*
F.1 Publisher

F.3访问类型

F.4 Access Details (2)

F.5 Access Contact (2)

F.6 Resources*

F.2 Other Hosting Institutions*

F.7权利*

(1)针对所有其他人建议公开访问的数据对象的强制性;

(2) Mandatory if access is non-public.

We argue instead that the use of a common metadata schema is an absolute pre-requisite for implementing systems that can discover and index clinical research data objects on an ongoing basis.

We are well aware of the problem, summarized by the well-known cartoon athttps://xkcd.com/927/,试图开发任何新的普通标准风险只会在现有标准列表中添加更多(即使这些标准列表,即使当前通常是特定于特定存储库系统)。我们还承认,为研究和数据对象开发明确的标识符可能存在问题,但我们认为这些标识符是无法克服的。

We argue instead that the use of a common metadata schema is an absolute pre-requisite for implementing systems that can discover and index clinical research data objects on an ongoing basis. It is therefore key to supporting data sharing and the effective cataloging of clinical research resources in general.

Our initial proposals are made with the intention of initiating a debate amongst interested stakeholders, and inviting others – repositories, trialists and standard development organizations – to comment upon them and to discuss ways in which these proposals, or any schema that evolves from them, can be implemented as a widely used standard. Please, contact us with your comments!

View the latest posts on the On Medicine homepage

注释