“Research software” presents a significant challenge for efforts aimed at ensuring reproducibility of scholarship. In a collaboration between the UC Berkeley Library and the California Digital Library, John Borghi and I (Yasmin AlNoamany) conducted a survey study examining practices and perceptions related to research software. Based on 215 participants, representing a variety of research disciplines, we presented the findings of asking researchers questions related to using, sharing, and valuing software. We addressed three main research questions: What are researchers doing with code? How do researchers share their code? What do researchers value about their code? The survey instrument consisted of 56 questions.
We are pleased to announce the publication of paper describing the results of our survey “Towards computational reproducibility: researcher perspectives on the use and sharing of software” in PeerJ Computer Science. Here are some interesting findings from our research:
- Results showed that software-related practices are often misaligned with those broadly related to reproducibility. In particular, while scholars often save their software for long periods of time, many do not actively preserve or maintain it. This perspective is perhaps best encapsulated by one of our participants who, when completing our open response question about the definition of sharing and preserving software, wrote ” ‘Sharing’ means making it publicly available on Github. ‘Preserving’ means leaving it on GitHub”.
- Only 50.51% of our participants were aware of software-related community standards in their field or discipline.
- Participants from computer scientists reported that they provide information about dependencies and comments in their source code more than those from other disciplines.
- Regarding to sharing software, we found that the majority of participants who do not share their code, they indicated that had privacy issues and time limitation to prepare code for sharing.
- Regarding to preservation, only a 20% of our participants reported that they save their software for eight years or more, 40% indicated that they do not prepare their software for long term preservation. The majority of participants (76.2%) indicated that they use Github for preserving software.
- The majority of our participants indicated that view code or software as “first class” research products that should be assessed, valued, and shared in the same way as a journal article. However, our results also indicate that there remains a significant gap between this perception and actual practice. As a result we encourage the community to work together for creating programs to train researchers early on how to maintain their code in the active phase of their research.
- Some of researchers’ perspectives on the usage of code/software:
“Software is the main driver of my research and development program. I use it for everything from exploratory data analysis, to writing papers…
- “I use code to document in a reproducible manner all steps of data analysis, from collecting data from where they are stored to preparing the final reports (i.e. a set of scripts can fully reproduce a report or manuscript given the raw data, with little human intervention).”
- Some of researchers’ perspectives on sharing and preservation:
- “I think of sharing code as making it publicly accessible, but not necessarily advertising it. I think of preserving code as depositing it somewhere remotely, where I can’t accidentally delete it. I realize that GitHub should not be the end goal of code preservation, but as of yet I have not taken steps to preserve my code anywhere more permanently than GitHub.”
- “…’Sharing’, to me, means that somebody else can discover and obtain the code, probably (but not necessarily) along with sufficient documentation to use it themselves. ‘Preserve’ has stronger connotations. It implies a higher degree of documentation, both about the software itself, but also its history, requirements, dependencies, etc., and also feels more “official”- so my university’s data repository feels more ‘preserve’-ish than my group’s Github page.”
For more details and in-depth discussion on the initial research, the paper is available and open access here: https://peerj.com/articles/cs-163/. All the other related files to this project can be found here: https://yasmina85.github.io/swcuration/