Data Set Repositories

AGU’s enactment of an open data policy for all papers in its journals has moved up a notch. The current enforcement of the policy is that “available upon request from the author” is no longer allowed. The data that you use in a paper, on which you are advancing our understanding of the space environment, must be available to others. Remember, “data” is not just observed values but numerically generated values as well.

For many observational data sets, openness is required by the major US funding agencies, NASA and NSF. In fact, even for small grants, they now require data management plans about how the data produced by the project will be stored and made accessible to others. NOAA has a lot of its data freely available through several avenues. If you do simulations at the CCMC, then your run output is available to all at that website. That is, for many things, we can simply list a website and call it good.

The issue is for small data sets, like laboratory experiments or temporary instrument installations, and in-house simulation results. Authors using such data need to make the numbers available to readers without the reader being required to go through the author. Furthermore, the website where the data are available needs to be a permanent and independent repository, not the author’s personal site. We need others to be able to independently check our results, reproduce our plots and tables, and verify our claims.

For those at big institutions, like me, such places are creating open repositories for their researchers. For instance, the University of Michigan has a site called Deep Blue. We are putting data bricks there from specific, published papers.

Many have asked about “public repositories” that will accept a data brick accompanying a journal article. There are several. AGU is associated with COPDESS, the Coalition on Publishing Data in the Earth and Space Sciences, which is an organization that maintains a list of scientific repositories. It is easily searchable and includes heliophysics and space physics as taxonomy groups. One data base listed there for space physics is NCEI, the National Center for Environmental Information, which has this for its data archive submission front page. AGU also recommends general ones like  ZenodoDryad, or Figshare – each can assign a DOI for deposited data. Github is becoming a common place to share not only code but also code output.

Data_Repositories_small

            The AGU Data Policy FAQ page has a lot of good information about current implementation and additional suggestions of repositories willing to host your data.

Another question that I get is, “how much to upload?” My common answer is, “As much as you can.” Seriously, though, some numerical simulations produces hundreds of GB of output, and some statistical surveys of observational data can cover several TB of values. I don’t want to quote all of the policies for every data repository but there are some out there that will take very large data sets. The minimum set should be “those data used in the paper.” This includes the values behind any plot, table, or value in the paper.

Advertisements

Unconscious Bias in Space Physics

I attended the Triennial Earth-Sun Summit meeting a couple of weeks ago, and there was a very good plenary session on unconscious bias in space physics. The presenters were the authors of the Clancy et al. paper in JGR Planets on bias in astronomy and planetary science. They summarized the findings of that paper, which quantified the extent of women and minorities reporting feeling unsafe or encountering a hostile work environment in these science fields. The numbers are not encouraging, with 80% of women experiencing some kind of sexist remark and two-thirds of women-of-color hearing racist remarks in the workplace. Furthermore, over a quarter of women have felt unsafe in their current position because of their gender or race. This is disturbing to me that the numbers are so large in 2018.

Unconcious Bias Plenary Handout title

            Fortunately, the conversation is not ending with the TESS plenary session. The organizers created a handout that was available to everyone at the session and online with the session description. I highly encourage everyone to read this tri-fold pamphlet. They encourage people to take the Harvard implicit bias test and read through the materials at the U of Arizona’s StepUp! by-stander intervention program. The sheet is filled with tips on how to identify and minimize implicit bias. Two of the biggest things that individuals can do immediately: amplify minority voices is group discussions (but don’t he-peat) and avoid making sexual remarks in the work environment.

As for JGR Space Physics, fighting implicit bias can be done in several ways. The first is to be cordial in your correspondence, especially to early career researchers like graduate students, and to apply the Platinum Rule in your interactions with others, thinking about how they want to be treated and considering the interaction from their perspective. Authors, please use gender-neutral pronouns in responses to anonymous reviewers. Reviewers, consider using one of the links in the handout for quantifying gender bias in writing. Finally, I hope that you all make a personal DEI pledge to promote diversity, equity, and inclusion. People leave the field because of sexism in the workplace, and for our discipline, the workplace includes manuscript correspondence. I occasionally hear from advisors whose students have had a bad interaction with a reviewer.

Thanks to the TESS meeting and session organizers and for coordinating this panel discussion. Let’s continue to strive to do better to reduce implicit bias in space physics.

The Moldwin Paper on Citations

Mark Moldwin and I recently published a Commentary on, well, hopefully the title says it all: High-citation papers in space physics: Examination of gender, country, and paper characteristics. He obtained the article information for every paper published in JGR Space Physics in the year 2012, including the citation count as of June 2016 for each paper, and then classified the papers according to, you guessed it, gender, country, and paper characteristics. There were 705 papers in the journal that year, so this task took quite a while to complete, plus we took some time discussing which parameters to even classify for later use. We then analyzed these results to see which qualities about the paper had a statistically significant connection to citation count. A fairly recent year was deliberately chosen to investigate the factors related to citations early in a paper’s lifespan, a time interval of relevance to the calculation of the Journal Impact Factor. As of today, it is still “in press,” so just the accepted version is online, but the paper is Open Access so it is free to read the full text.

MoldwinPaper_header

            Here are the major findings. These qualities of the paper are correlated with more citations in the first few years after publication:

  • More coauthors
  • More institutions in the author affiliations
  • More countries in the author affiliations
  • More references in the paper
  • A colon in the title

These qualities of the paper had no significant correlation with citations:

  • Gender of the first author
  • Number of words in the title
  • Acronyms in the title
  • Geophysical region names in the title

Keep in mind that the standard deviations are wide, so these findings are not necessarily true when comparing any two papers from the “high” and “low” classes. Welch’s t-test statistic uses the standard deviation of the mean, which is a much smaller number than standard deviation (the spread for any one data point in the set), Any individual paper, regardless of its characteristics, could have a high or low citation count a few years after publication. That is, we did not find a “magic parameter” that clearly identifies what will make a paper get many citations, nor one that easily picks out the low-citation papers.

Furthermore, the underlying distribution of values is not Gaussian – but any subset we considered, there is a long, positive tail creating a non-negligible skew to the histogram – yet the probabilities for significance that we used are based on a normal distribution for the two populations. This is why we used a 99% “highly significant” threshold to determine those qualities that are connected to citations.

So, take all of these findings with a grain of salt. Nevertheless, we think the results are interesting for the space physics community to know. The main conclusions that more authors, institutions, countries, and references increases eventual citations are not particularly surprising, but this is the first time it’s been quantified for papers in the field of space physics.

Two results are surprising to us. The first is that there is not a statistical difference in the citation of papers based on the gender of the first author. Other studies have found such bias in other fields, including in other closely related natural sciences, like astronomy. Unlike those studies of other fields, we did not find a statistically significant difference in citations to JGR Space Physics papers based on that parameter.

We did not expect to find any “title parameters” to be connected with citations and most were not. We were rather amused, however, to find that a colon in the title is linked to higher citations. About 20% of the papers that year had a colon in the title. That’s over 100 papers so this is a decently large sample size. We have guesses but, really, we have no good explanation for this. For those wondering, yes, this finding did indeed influence the title of our paper.

In summary, our advice to potential authors of manuscripts for JGR Space Physics is this: collaborate with others and cite the literature. It’s not a guarantee that your paper will receive above-average citations but, based on our analysis, it might help. Happy writing!

Preprint Servers: Challenges

A third (and probably final, for now) post on of ESSOAr, AGU’s new preprint server for Earth and space sciences. The first described it, the second touted it, and now this one is the ethical scold of how best to use it.

The biggest point to remember is that preprint servers are not peer-reviewed journals. Yes, there is an editorial board that checks submissions for scientific scope, but there is no vetting of the accuracy of the content. The editorial check takes a day or two, maybe a week max, but it is not a real review process. Yes, content here gets a DOI, but we should all remember that content on preprint servers are essentially just a step above “private communication” in terms of referencing authority. That is, it could be wrong.

We hope that content on ESSOAr, and any other preprint server, will eventually be published in a scientific journal. Researchers are putting their reputation out there with each new post on one of these servers, so the content is, for the most part, respectable. Go ahead and use it to learn what is being done by your colleagues. Because preprint server content has not been through the peer review process, though, it should be replaced with the “final” version of the study from whatever journal it eventually appears in.

To summarize in a graphic:

Caution-preprints

            Peer review should still be the standard for what is accepted as “knowledge” of the subject. Even this can be wrong but at least it has been thoroughly scrutinized by experts. You should be very skeptical of older preprints on the server (say, more than 2 years since original posting) that lack a link to a final published version of the paper. That work either was not submitted or did not pass peer review. If the former, then it is perhaps the case that the authors found a problem with the study and therefore never submitted that version of the paper. If the latter, then perhaps the editor or referees found a problem with the study and declined publication of it. Either way, the study did not reach its “final” form in the literature.

The advice to the community about older preprints can be summed up like this:

  • Authors: use caution when citing an older preprint.
  • Reviewers: pay extra attention to citations of older preprints.
  • Editors: ask reviewers to check the appropriateness of older preprint citations.
  • Societies: set policy about citing older preprints.

I am told that the astrophysics community, which regularly uses the arXiv preprint server, understands this difference in “publication” levels. That is, research communities can learn to use preprint servers and make it their go-to place for the latest content across a number of journals, as I am told that many in astrophysics do. They also know, however, that when it comes time to write your own paper, don’t rely on preprints as your main entries in the reference list. The astrophysics community, I am told, understands the guidelines about preprint servers and only uses it for finding the latest work on a topic.

We, the Earth and space science research community, should adopt this same mentality about preprint servers, not only ESSOAr but any server (and there are several being created). Such servers should be a place to get the latest studies from across a variety of journals, learning about content as the manuscripts are submitted rather than months later when they are accepted and eventually published. We should only use it for the latest work, though. A preprint server is not the place for full literature searches – those should be done in Web of Science, Google Scholar, Scopus, ADS, or other services that scan the published, peer-reviewed literature. And, as an editor, I strongly urge you to please conduct a full literature search, because a recent study by Mark Moldwin and me showed that the more complete your reference is, the more citations your paper will get (on average).

Use ESSOAr, but know its purpose within the hierarchy of scientific publications.

Preprint Servers: Benefits

With the launch of ESSOAr, AGU (and all of the other supporting societies on the advisory board) has entered the market of posting scholarly content prior to official acceptance by a peer-reviewed journal. Yesterday I discussed the “how” of ESSOAr, here I discuss the “why.”

The big reason is to increase scientific communication and collaboration. AGU’s mission is to promote discovery in Earth and space sciences, and many of the society’s honors, medals, and awards cite “unselfish cooperation in research” as a primary criterion for selection. Posting scholarly work to a preprint server increases its visibility and, hopefully, impact within the research community. It gets your findings into the hands of other scientists a bit sooner than normal – a bit closer to when the work was done rather than after months of reviews and revisions. It helps increase the “speed” of scientific discovery, as we learn about what’s new a little bit earlier than we would have from journals alone.

Here is the “why” answer from the ESSOAr FAQ page:

ESSOAr_banner_and_benefits

In addition to a lot of the same arguments I write above, there is an interesting comment in the middle of the paragraph, “You can establish priority.” Rather than the publication date being your time stamp laying claim so some finding, posting on a preprint server establishes that claim a bit sooner.

In a somewhat selfish consideration, the anecdotal evidence that I have heard is that posting your work on a preprint server increases the “early lifetime” citations to the paper. That is, it is thought that the page views and downloads of the preprint leads to faster incorporation of your findings in the work of other scientists, and citations to it therefore should begin a few months sooner. I am not sure how true this is, because the citation rate with year since publication is fairly constant at ~3/year in JGR Space Physics. Furthermore, I am told that the solar physics community extensively uses the arXiv preprint server, yet the journal Solar Physics has a Journal Impact Factor about the same or even slightly lower than JGR Space Physics. In support of preprint servers, I am told the astrophysics community uses arXiv even moreso that solar researchers, and The Astrophysical Journal has a JIF several points higher that the JIF for JGR Space Physics. So, perhaps my awareness of the solar community’s usage of that server is overestimated. This is all speculation, though; we need some quantitative statistics on usage and eventual citations to robustly claim anything. My point is that, while the evidence is mixed about the effectiveness of preprint servers, there is a plausible argument that they should lead to higher citations soon after publication.

Because it s really very little time and effort to upload, I think that it is worth it to do so. I suggest doing this when you submit to the peer-reviewed journal. I haven’t gone through it yet to see it for myself, but I am told that there is a link within the GEMS process for automatically sending the newly-submitted manuscript over to ESSOAr. The trickiest thing about submitting to ESSOAr was the license agreement. There are 4 levels of user licenses available to you. The most lenient is “CC-BY”, for which the only restriction is that users must properly cite it. For my Fall AGU poster, I selected the second level, “CC-BY-NC,” which places the additional constraint of no commercial reuse without my permission. The next level adds a restriction on “derivative use” without permission of the authors. The fourth one is the most restrictive and basically says it can be here on ESSOAr with no other use allowed. Aside from this, the process is very straightforward and easy.

The second step to achieving the full benefits of a preprint server is using ESSOAr as a place to learn about the latest results in your field. This requires signing up for new content alerts. Once you have logged in, conduct a search with some keywords of relevance to you. Once the results are up, then in the upper right area of the page is this:

ESSOAr_followresults

The first link, the magnifying glass with the plus symbol, will “save the search” for you. This opens up a new window where you can name the search and indicate how often you want it to automatically run this for you and send you an alert about it. It looks like this:

ESSOAr_savethissearch

The second symbol opens a page for setting up RSS alerts for the individual posters and preprints found in the search. Actually, both of these links are there regardless of whether you have signed in, you just can’t actually save the search until you log in.

On the page for each poster or preprint in the database, there are two links, “Track Citations” and “Add to Favorites.” The first allows you to get alerts on citations to that specific post, while the second just provides a quick link to that post. These settings, and the saved searches, can all be managed from your profile page. To get there, click on your name in the upper right corner and then on the “Profile” tab. On the new page that loads, the left-column menu has Alerts, Favorites, and Saved Searches.

There isn’t much content available yet – a handful of manuscript preprints and about 50 poster PDFs. If we all collectively start using it, though, then ESSOAr will blossom into a place where space scientists go to learn about the latest work being prepared for publication.