Hopefully you know about AGU’s new Data Policy. I have a previous post about it. A big question that has come up about this is the availability of code.
The policy, located here:
has this line in it:
- “New code/computer software used to generate results or analyses reported in the paper.”
The availability of “code” comes at several levels: it could mean the code used to actually make the plot (e.g., the short IDL script), or it could mean the first-principles code that the authors used to make the “numerical data” (e.g., the complicated set of FORTRAN files). While both ends of the spectrum pose issues, it is the latter, bigger request that raises the biggest concern. This is a very sensitive issue for some in our community.
For me, it is about reproducibility. Scientific advancement is based on the ability of others to reproduce your results and verify the accuracy of the analysis and therefore the robustness of the finding. Towards this, it would be great if everything were open source and available for public scrutiny. In reality, though, your code is your intellectual property and the copyright on it is probably held by your company, university, or institute. The code might be (and usually is, these days) a large collaborative effort rather than a single person’s work, making it awkward for one of them (the author of a paper using the code) to provide access without the other code authors’ consent. Plus, the author of the code might want to restrict access to the source files so that they have a competitive advantage on proposals for funding. In addition, if you are basing your paper on results from the Community Coordinated Modeling Center, then you might not even have access to the original code.
It has been argued to me that a scientific code is a lot like a scientific instrument. It is not required to share the actual instrument that collected the data, only the data itself. Therefore, the numerical code used to produce the “numerical data” should not be required to be public, just the output from the code. This point is made especially clear for those using CCMC output for their study. In general, I agree with this assessment.
However, before the data analysis papers appear in journals, the instrument is usually written up with a detailed description paper, or series of papers, somewhere, and perhaps also patents for specific parts. Furthermore, the calibration and processing software is extensively tested and usually available to those that ask for it.
What I am getting to is this: like scientific instrument, numerical models on which scientific findings are based need to be thoroughly tested and verified. Such model description and verification presentations could be within a “research article” JGR-Space Physics paper as a lengthy methodology section, it could be its own paper in JGR-Space within the “Technical Reports: Methods” category, or could be published elsewhere like Journal of Computational Physics, Geoscientific Model Development, or in the near future in AGU’s new journal Earth and Space Science. That is, the “instrument” used to produce the “data” needs to be adequately described and tested so that readers know the “data” is trustworthy. I think that the code itself, though, does not need to be made public, just like the actual instrument (or its technical design documents) do not need to be made public.
The best verification, however, is to make the model available and let others examine and assess the source code. So, I urge you to release your model to the world. At the very least, share it with a colleague and have another set of eyes scrutinize it.
Until we are told or decide otherwise, this will be the implementation of the “code” part of the Data Policy for JGR-Space Physics.