I'm not a librarian: 8月 2019

2019年8月4日星期日

劍橋大學出版社開發新的開放研究平台Cambridge Open Engage

在開放科學、開放研究導向的未來，研究成果的傳播與分享越來越快速，也越來越「早」開始。

劍橋大學出版社以最新科技開發新的開放研究平台Cambridge Open Engage，協助研究者、作者出版研究的預印本(preprint)、摘要、會議論文、會議海報、灰色文獻和開放數據(open data)，作者可以免費上傳這些資料，這些資料也免費開放給讀者。

這個平台不只是用來傳播研究內容，而是希望可以支援和鼓勵研究者在整個研究流程中可以產生更多的合作與連結:研究者可以在研究通過同儕審評之前就預先分享其研究、與同儕討論研究成果，在研究正式出版之前可以先建立讀者群；同時，這個平台也可以幫助研究者達到跨學科的合作與連結。

Cambridge Open Engage is the new early content platform from Cambridge University Press, designed to provide researchers with the space and resources to connect and collaborate with their communities, and rapidly disseminate early research. The platform is currently under development using a co-creation approach and we’re inviting researchers to actively input to help us shape the features and functionality. Register your interest below to stay up to date and to participate in its progress!

平台網址及頁面截圖，內含平台簡介影片 https://www.cambridge.org/engage/

摘譯自 Cambridge announces open researchplatform, Cambridge Open Engage

2019年8月1日星期四

JISC的研究資料分享服務(RDSS)

JISC採用工程與物理科學研究學會(Engineering and Physical Sciences Research Council, EPSRC)對於Research data的定義:

‘Research data is defined as recorded factual material commonly retained by and accepted in the scientific community as necessary to validate research findings; although the majority of such data is created in digital format, all research data is included irrespective of the format in which it is created.’

JISC約於2015年開始Research data shared service (RDSS)計畫，該計畫將聚焦資料生命週期的循環，在這個循環中，最終資料物件(finalized data objects)的獲取(ingest)、出版、長期的儲存與保存以用於出版或典藏，並且可以創建連結至現存在data creation 和managing active data的服務。

資料來源 https://insights.uksg.org/articles/10.1629/uksg.346/

RDSS服務的建立

需求調查-機構

JISC在2015年下半年啟動需求調查，了解高教單位對RDM的需求，主要包含三個部份

對各機構進行研究系統的調查(institutional survey around research systems)，了解JISC各會員館機構內RDM的現況；
針對現存需求的二手資料研究 (desk research around existing requirements)
需求調查工作坊或是專家會議 (requirements gathering workshops.)

最終的需求調查結果與分析發布在The Official Journal of the European Union (OJEU)的招標公告中 (Jisc Research Data Shared Service Operational Requirements，https://zenodo.org/record/48261#.XUOqavL7SUk)

團隊建立-實驗機構(pilot institutions)

JISC從申請參與計畫實驗的機構中依照申請機構的SIZE與類型，平衡選取了13個機構成為實驗機構，這些機構包含倫敦帝國學院、劍橋大學、紐約大學等，與JISC一起合作發展RDSS的服務。

團隊建立-供應者(suppliers)

JISC使用OJEU的採購流程建立了供應商架構，這個架構可以分成8個部分：

Lot 1 – Research Data Repository Suppliers
Lot 2 – Repository Interfaces Suppliers
Lot 3 – Research Data Exchange Interface Suppliers
Lot 4 – Research Information and Administration Systems Integrations Suppliers
Lot 5 – Research Data Preservation Platforms Suppliers
Lot 6 – Research Data Preservation Tools Development Suppliers
Lot 7 – Research Data Reporting Suppliers
Lot 8 – User Experience Enhancement Suppliers.

關於供應商的需求及更多資訊可以參考

RDSS Operational Requirements document

Jisc RDM blog

符合研究者的需求-Data Asset Framework

RDSS將讓研究者可以儲存資料以用於出版、發現、安全儲存、長期的典藏與保存，這衍生了一些問題，例如:

What forms of data do researchers have?
How much data are we talking about?
Where do they store their data currently?
Who else needs access to it?
How long does the data need to be kept?
What motivates researchers to share their data – or to keep it closed?

Data Asset Framework 發展於2009年，當時幫助許多單位解決處理資料集的問題，但經過數十年的變化，原先的DAF已不符合實際需求，JISC及RDSS的實驗機構開始新版本的調查(The 2016 DAF survey)，了解目前英國RDM的情形，調查結果大約如下:

The RDSS can fill an important gap – 75% of researchers look first to their institution to preserve their data – but we know a lot of institutions cannot fully meet this need at present. This is where the RDSS can help.
Access to institutional support for RDM remains low – only 16% of respondents are currently accessing university RDM support services. This is a twofold challenge: institutions not only need to make appropriate support services available, but also make researchers aware that they exist.
We are pushing at an open door – 68% of respondents either already share data, or expect to do so in the future. Most of them do so because they believe that research is a public good which should be open to all. We just need to make data-sharing easier.
We still have a long way to go – only 40% of respondents currently have an RDM plan, and only 18% follow established metadata standards or guidelines. Delivering change will take time.

符合研究者的需求-Metadata

為了瞭解RDSS的元數據(Metadata)和資料模組是否符合需求，JISC與Clax合作舉辦了9場焦點團體，參與者是實驗機構中的研究者，結果簡略如下:

The focus groups expressed concern about a number of areas with regard to metadata. Some can be addressed by training and support; many can be addressed by suppliers working with institutions and RDSS. A few require new technologies or culture change.
Early creation and collection of metadata was often mentioned. This can be achieved through the use of dynamic data management plans so that metadata is collected from the planning stage and updated throughout the data collection and analysis process.
Systems should preserve the form and content of the deposited data while allowing updating of the metadata to link to related data sets, subsequent publications and other materials which may have been created after the data was deposited. They should also allow updating of keywords and descriptive materials to reflect changes in the discipline. The facility to allow metadata to include links to other digital object identifiers (DOIs) and URLs – where a DOI does not exist – is essential.
It is often assumed that the collection of metadata will involve researchers in arduous and time-consuming form filling at data deposit time. This is undesirable and unlikely to produce good metadata. Instead, automation of tools, collection processes, equipment and metadata collection integrated into researchers’ workflow throughout the research will, ideally, allow a push-button submission of the data, with metadata already attached, to the repository.

RDSS的概念架構圖

資料來源 https://insights.uksg.org/articles/10.1629/uksg.346/

RDSS 挑戰

從實驗機構端獲得的回饋可以歸納出幾點RDSS目前面對到的挑戰:

defining a ‘minimum viable product’ with a multitude of systems, priorities and expectations
fitting with existing institution and researcher workflows – for example, fitting RDSS into an institutional policy with the CRIS as the front door for researchers
making preservation work for research data, when the development of systems and tools have been led by the cultural heritage system
managing large data, data too large to be uploaded over the web, so greater than 5GB and including the challenges of big data
managing sensitive data including commercial, personally identifiable information and medical data.

Research Data Network

摘譯自

Establishing a shared research data servicefor UK universities

Open Science概念圖

Open Science概念圖，存參

連結網址的原圖有每個名詞的解釋，推薦

原圖網址
FOSTER Help us promote Open Science and contribute training content.
https://www.fosteropenscience.eu/resources

支援STEM數據分享的新模式-Data Communities

"STEM researchers must be convinced to share their data in the first place before they can be taught how to share it well."

化學家、農業科學家、公共衛生學者和土木環境工程師比較偏向個人、一對一或是與認識、信任的人(最常見的就是合作者)分享數據，他們倚重專業領域間的網路做分享，研究者之間非正式的聯絡網路也是很重要的分享管道，資料分享成為一種社交活動(social activity)，成功的資料分享常發生於Data Community。

“A data community is a fluid and informal network of researchers who share and use a certain type of data.”

另一個重要的分析面向是Creation of data curation profiles，分析數據生命週期的實證數據可以讓我們專注在分享特定類型資料的技術層面上(technical process)，而非以學科概括。另外也必須注意正在發生中，在更廣泛、技術支援範圍內的資料分享，有數個數據儲存庫(data repositories)可以被視為成功案例好好觀察：

1.劍橋結構資料庫(Cambridge Crystallographic Data Centre’s Cambridge Structural Database, CSD)-起源於1965年，由劍橋晶體數據中心建立的晶體結構的數據儲存庫

2. FlyBase-基因和基因組序列的數據庫
1992年由NIH的國家人類基因組研究機構資助建立，該網站不只有資料取用與繳交的功能，還包含許多智慧導向工具(sophisticated navigation tool)、研究者名錄、線上論壇等。

3. DesignSafe-CI
美國國家科學基金會NSF所資助的計畫Natural hazards engineering research infrastructure所建立的數據儲存庫，研究者可以在雲端儲存、取用和分析自然災害的相關數據。該儲存庫可以接受任何形式的數據，但是資料檔案的形式(file format)是標準化的；該數據儲存庫另一個強項是整合了研究流程的各個階段，研究者可以上傳100T以上的raw data後以內建的工具分析資料並讓資料開放取用。

我們可以從以上這三個成功的Data community案例歸納出三個共同特色：

1. 由下而上、由小到大開始Bottom-up Development
這三個歷史久遠的data communities都是開始於研究者之間小規模的合作，長期的資助和組織的支持則是逐漸讓他們在數據產出、數據儲存和分享上納入新科技的使用，當研究者或是其同仁開始注意到資料分享的益處之後，communities也開始漸漸壯大，接著出版社與資助者對於資料分享的規範也有助於發展communities的規範。

2. 減少技術障礙 Absence or Mitigation of Technical Barriers
在三個成功案例中，研究者分享的資料在技術上都可以輕易上傳、轉換(transfer)及再利用，資料文件(data files)檔案不大、未包含敏感或是個人資料、文件具有標準且易懂(intelligible)的形式，data community的出現與可以使獲得重要的元數據(metadata)且使其標準化變得簡單的技術發展有關，例如CSD的成功有部分也歸功於.CIF文件形式的廣泛採用。data community的發展應該減少資料分享上倫理及技術上的障礙，或是發展出可以降低障礙、提升資料文件標準化的技術。

3. Community Norms
Data communities thrive when they cultivate formal or informal norms through which data sharing comes to be expected within the community. 數據分享的動機不能只因為數據會被引用，而應該在community裡建立規範與風氣，例如分享資料時加上穩定持久的識別碼(PID)DOIs有助於數據引用及作者可以得到Credit，另外出版社與資助者對資料分享的要求也可以有效建立community的規範及分享的風氣。

學術圖書館及其館員的角色

學術圖書館的規模/等級可能不足以處理研究者或科學家面對的挑戰，圖書館員若是想要有效地支持科學家，就必須找到有創意的方式，更廣泛、跨機構及跨領域地來貢獻(contribute)他們的專業。

覺知(awareness)是很重要的一件事，了解機構內的科學家/研究者屬於哪一個data community，很多data communities的網路及機構基本設備都建在特定的機構中，例如DesignSafe-CI是由奧斯丁德州大學的研究者所領導，所以DesignSafe-CI的雲端儲存和分析能力都歸功於德州大學進階運算中心，館員可以提供的專業協助可以發會在智財權及著作權的議題。

摘譯自ITHAKA S+R報告
Data Communities: A New Model for Supporting STM Data Sharing

訂閱：文章 (Atom)

2019年8月4日 星期日