Open Science, Open Data, Open Access – How open can my research be? How open ought it be?

This will be far from a comprehensive post about the topics in the title – and how could it be! But I want to share my two cents on the issues at hand against the backdrop of the current debate. Drawing from my own experience and my own research and teaching projects.

Last year I decided to move all my projects to GitHub. GitHub was originally a platform for developers and before I took my first Software Carpentry course in early 2016 I had never heard anything about it. Since I am not a developer (at least: I don’t see myself as such), it didn’t come naturally to me to use GitHub for much else than keeping track of a couple of projects from the field of digital humanities I thought were cool and creating the occasional ‘guacamole recipe’ or ‘moon base’ repository when learning and teaching versioning with git and GitHub with The Carpentries.

I don’t remember exactly when it happened – or what triggered the decision, but at some point I decided to embrace GitHub as a platform not just for storing my few attempts at coding with Python but as a home for all of my research and finally ‘moved’ all my stuff there. Which also means spreading it out in the front of the world: in all its unfinished, unpolished, fragmented, interrupted, re-written, edited, occasionally aborted and often halted, and eventually finished and published state.

Since my background is in the humanities, my research has been mainly about gathering historical texts, reading and analysing them, and then publishing articles, chapters, or books about them. Not much of this is suitable for being ‘put online’, if not for copyright issues then mainly because it is too many too large files which are not machine-readable and will result in too large repositories that few other researchers will have use of but me. And since ‘open science’ hasn’t been much of a thing in the humanities, releasing my research data (texts in various stages of ‘old’) hasn’t been seen as necessary or even recommended. Also: who wants to store all this stuff, and where? If it’s available in digital form, it’s usually libraries which offer some form of access and thus, I don’t need to re-publish it. If it’s in analogue form (which, unfortunately, most of my research material is…), I am simply not allowed to make a digital copy and put it just anywhere online. Or, if copyright isn’t the issue, it is webspace (which I don’t want to pay for to have an abundance of) and accessibility (just putting it somewhere isn’t helpful if you want it to be easily findable and re-usable).

So, I came up with the following workflow for research projects:

Zotero Reference Manager

All reference material – as well as source material in textual form – will be collected, stored and maintained with Zotero. For those who don’t know this: Zotero is awesome! Not only does it allow to grab bibliographical data from the web and extracts metadata from uploaded pdfs, it also lets you tag and categorise the items and create sophisticated queries. Apart from being easily integrated into your word processor of choice of Google Docs! It can also be used with a couple of analytical plugins, for example doing topic modelling by using the Zotero-Voyant-plugin or doing a form of citation network analysis using ZotNet. – The best part is: you can publish your references as a (curated) bibliography and share it online! Which makes the often invisible part of a research project not only visible (and thus opens up for critique) but easily re-usable. When you are done with your research project, someone else might want to carry on and can build on your collections. I’ve been doing this for my book historical project “Ethica Complementoria” as well as for the “Norwegian Correspondences” multi-partner project and will release the mega bibliography I created for my Ph.D. dissertation on modern German textual scholarship alongside its open access publication after some necessary editing.

GitHub

On GitHub, the research project will be hosted. This includes for example all files that I create for the digital edition of the Ethica Complementoria, all style sheets for later publication in the open access repository “Deutsches Textarchiv” but also the .xml versions of my book on the transmission of the Ethica and other digital publications. It will, in addition, contain scripts for analysing texts and other data sets. For the Norwegian Correspondences project, the data is stored in .xml files as well as in .csv files – which, for now, are also hosted on GitHub. In addition, there’s schema files here too for creating structured data sets. The repositories for the drama network analysis project I contribute to DraCor (IbsDraCor and NorDraCor) only contain XML TEI P5 encoded texts of Norwegian dramatic texts that will be integrated into a larger corpus. The repository of my project on Medieval Religious Plays contains a couple of jupyter notebooks where I develop code for doing network analysis with Python and store network data in different data formats.

These projects are pretty much all “works in progress”. I would have developed them on my local machine and only once they were finished I would have sent them on for publication. I will not do this any more. Everything I do, I will develop openly – be it on GitHub, GitLab or some other platform. Because most of my projects never got any funding I have to work on them on weekends and in my free time – or during vacation. It means they will take significantly longer than a funded project would take, where you can dedicated 100% of your work time and resources to it. It also means that they are always in danger of never being finished: because time runs out, energy is depleted or one has simply moved on. But why throw the research that has been done until then into the trash bin? If I cannot finish a project – perhaps someone else can? That’s why I share it online so that people can pick it up – or contribute with their time and effort.

Academic Blogging

Another outlet for my research has been blogging. Initially, I had been very sceptical of blogs; that was, before I learned about academic blog platforms like Hypotheses.org. Hypotheses is a fantastic outlet for smaller research publications as well as it can serve as a “face” of your project and a place to share information about the status quo. I’ve been starting my first academic blog on Hypotheses in 2014 to accompany my digital edition project Ethica Complementoria. I had hosted a website for this project on my own domain since 2012 but decided to move to a curated platform that not just indexes each blog and website but also archives everything and has an editorial board actively maintaining and promoting quality contributions. Hypotheses has a variety of search functions that make a blog easily findable and since it is a such a vast platform, it is likely the first place where someone would look for a research blog. Another benefit is that with having a weblog on Hypotheses one isn’t connected to an institution. In the age of fixed-term contracts and academic nomadism, I wouldn’t want to publish my blogs on an institutions website that I will loose all editing rights once my contract has expired. Hypotheses is independent of this and it doesn’t exclude independent researchers without institutional affiliations either. Since 2014, I have been creating another thematic weblog for my digital humanities interests and projects, called Digital Textology and I have been creating and using another weblog for my master seminar on Digital Humanities at the University of Oslo in spring 2018.

Twitter, Google+ etc.

In addition to all the above, I have been using Twitter and Google+ for sharing milestones and publications. I’ve joined (academic) Twitter in July 2012 at the International Conference for Digital Humanities in Hamburg, Germany, after witnessing almost all scholars there using it. Mainly for sharing news about research, publications, but also following along conferences, workshops, seminars and panels when one cannot be physically present. I’ve since established quite the network, following conversations in Norway, Germany, and the English speaking world, with the occasional French, Spanish, Russian and Japanese tweet in between.

I also have a social media account on Google+ which I rarely use but nevertheless share my blog posts on. I has created some traffic on my blogs, though, which I count as a success in reaching audiences. It’s a strictly professional profile though and serves mainly as a point of entry with links to my personal website, blogs and research platform profiles.

Research Gate, GoogleScholar etc.

Having been an early user of Academia.edu and Mendeley and other, much less known platforms due to my status as independent researcher, I have been following the trail of academic nomads making their camp here and there. I like the option of not just sharing some professional information about myself and creating another point of entry for an audience for my research but also sharing papers and other publications online. We have the same problem here that we have with institution hosted websites: once your contract has expired and you’re forced to move on, you loose access to your university repository. Sure, all the papers and articles that you archived there while you were working at said institution are still there, but keeping some form of personal collection of research publications in one place is not possible this way. Self-hosting is another option, but – in my view – might make it harder for your audience to find what they are looking for because they will not look on your personal website. So I’ve been using ResearchGate now for all of my publications even though not all have pdfs or other files attached. I use GoogleScholar little, but take a peek at my citations semi-regularly (it’s bad at tracking citations in print-only or paywalled publications, who would have thought!). I also have an ORCID profile – and lately joined Zenodo (where you don’t have a profile, but can archive your stuff!). I’m not 100% happy with any of these, and will likely move on to something else when the cons outweigh the pros.

Unlike, for example, the bio sciences, the humanities have not yet established a culture of online archiving and sharing of (pre-)publications on platforms like arXiv or biorXiv and our publication habits are still strongly attached to print culture and publishing houses. The digital humanities have opened this up a bit, but are far from embracing open access fully.

Conclusion

So, what’s the conclusion? I believe that research should be shared openly. From the perspective of the public institution, I don’t think I have to repeat the arguments for fully embracing open science and open data. This has been done ad nauseam and there’s hardly anything that serves as a substantial and strong counter argument. As for research conducted by private companies and other non-publicly funded institutions: I don’t know. I think it depends on a number of variables and decisions cannot be made without taking these variables into account. It is also something I have very little knowledge about or insight into, so I leave it at that. When it comes to the individual researcher though, I would urge them to share as much of their research as openly as possible. Without an institutional affiliation it is already hard enough to get resources and access to academia. Without affiliations it can be extremely hard to “get in”, that is: getting published or even considered for publication, attending conferences etc. Sharing your research openly increases your and your research’s visibility and adds credibility – because it can be assessed and critiqued by peers without having to squeeze through the institutional bottleneck. Publishing and sharing your data, your scripts for analysis, your research design and results online will help them get archived (remember: the cloud is just someone else’s computer!). You will make a contribution, even if your projects don’t reach their goal because you run out of funding, time, or energy. Someone else might pick up where you left or might build on your material. In any case, it will be of more use than it is on your local hard drive!

With that being said, I will continue putting my research online, using many different channels. Not all will have a home on GitHub (but some and many of my future projects), not all will live on institutionally run servers (but I will use them whenever I am allowed to). And I will more openly invite people to contribute and make use of my stuff: it’s a lot, go play with it!