This is a continuation of the last two posts, the first on the general concept of a Similarity Report and the second that walked through an example report. In this one, I will discuss our method of interpreting these reports and making a decision based on it (that is, to send it out for review or reject it based on a high cross check content).
First off, I would just like to say that evaluating a Similarity Report is a subjective task because we do not want to impose rigid rules on ourselves regarding the numeric values within the report. That said, we have some guidelines that we follow. Here’s a quick overview.
In a Similarity Report, there are several numbers that we check. The first is the Similarity Index, the total overlap with other sources. A Similarity Index number greater than 15% is unusual and requires special assessment by the editor. A value of over 25% will almost certainly have the paper sent back to the authors for revision.
When we open the Similarity Report, in our “Document Viewer” display of it we see the highlighted manuscript in one frame and a listing of the sources in another. For each source, two numbers are given: the total number of words of overlap and the percentage of the new manuscript that was found to overlap with this source. Any source below 50 words overlap is nothing to worry about. However, overlap values for individual sources over 100 words or over 5% are unusual and require special assessment by the editor. Sometimes these are just a bunch of small stuff and there is nothing to worry about (affiliations, citations, and commonly used phrases).
Other times, however, entire sentences or even paragraphs are copied verbatim from a previous paper. It is overlap of this kind for which we are scanning the document. Even still, one big chunk of overlapping text usually isn’t enough to make us send it back to you. It rises to the level of rejection when the large-block overlaps exceed a page or so.
Here is an example of a page with serious overlap:
I’ve lowered the resolution so that you cannot read the text. You can still see the highlighting, however, and you can see that the colored text covers nearly the entire page. A typical double-spaced manuscript page has 350-400 words, so this represents hundreds of words of overlap. At this point, we really start to consider rejection.
For a typical paper with 5000 – 10,000 words, a single source value over 10% will probably have the paper sent back to the authors for revision. Let me keep iterating the point, though, that there is no hard threshold for this, and single source values as low as 3% could warrant rejection while a value of 15% might actually be okay. It all depends on how it is distributed through the manuscript and it is, in the end, a judgment call by the editor.
I understand that authors only see these reports if the editor deems it to be a problem. So, authors don’t have experience viewing and interpreting these files. This is why I am spending several blog posts on this topic. If you would like to know more, please just ask and I’ll keep posting about it.
One more point of confusion is that we, the editors, perhaps haven’t been as clear as we could be about what exactly in the Similarity Report made us reject the manuscript and therefore what specifically the authors need to change to make it acceptable. We will try to be much better about this in the future, adding a paragraph to our decision letters that itemizes the places that we want changed in the text.