There is no magical advice, here, I am just pointing out the basics. I think it’s worth pointing out, though, because I have been asked about the availability of similarity check software for everyone’s use. AGU sends every “initial submission” manuscript it receives through iThenticate. This checks the manuscript against tens of millions of scholarly documents. You can use this service, also, but it is not free; running a check on a single paper will cost you $100. AGU gets a volume discount from this base rate but they pay a hefty sum to process all of the thousands of new manuscripts received across their 19 journal titles.
Really, though, you probably do not need the full iThenticate service, you simply need a check of a few relevant papers against your manuscript to ensure that you didn’t inadvertently use verbatim text from another source. I know that this can happen all too easily. For me, the common trap is that I borrow text from a paper for a proposal and then borrow it from that proposal into a later paper, thinking that I wrote it from scratch when I wrote it for the proposal. After writing so many blog posts about avoiding self-plagiarism, I know it is in my karmic future to have a manuscript rejected due to a high crosscheck overlap.
My method for conducting pre-submission crosscheck is rather rudimentary and takes a few minutes, but it gets the job done. Basically, it entails putting a copy of all of the possible source documents into a folder along with the new manuscript and conducting searches, just within that folder, for exact matches of complete phrases or even full sentences. I usually do several searches, checking each of the key phrases or sentences extracted from the sections of the new manuscript that I think might have been copied from an earlier paper.
From the Command Line (Linux/Unix/Mac Terminal): the grep command does it. Navigate to the folder you created with the new manuscript and the possible overlap files and type this:
grep –v ‘string’ *
where string is the phrase or sentence you want to find. The funny thing about grep is that it treats PDF and Office files as binary files, so it won’t tell you the exact line of the match but it will indicate that the string exists in the file.
Also for Linux and Windows: there is a program called searchmonkey that apparently does a grep-like search, but fancier.
On the Mac: this is done in the “search box” in the upper right corner of each browser window. Cut and paste the suspicious text into the box, but put it in double quotes. If the Finder window is open at the folder with all of the files, then once it starts doing its search you can click the button across the top that only searches that folder. I have found that this isn’t very good, though, as my Mac, at least, doesn’t find all of the files with the exact phrase in it. Perhaps it has to do with what is indexed by Spotlight. Personally, I open the terminal and use grep.
On a Windows machine: I don’t use this, but I have been told that FileString Finder is a good program for searching for long phrases in multiple files, as well as the program WindowsGrep, which is a menu-driven version of grep for PCs.
If others have better software or a better procedure that conducts these searches for identical text in multiple files, then please post a comment below sharing your knowledge.