Alex Zhurakovskyi's Blog

Bulk Reference Resolving

My friend invited me to co-author a review with him. After a brief moment of excitement, I have realized that it means browsing through thousands of references, in order to select the few hundreds relevant to us.

The initial "seed" was provided as a bunch of Scopus search outputs, 100 to 200 pages each. Scopus Result Listing

The Scopus hyperlinks were of no use to me, since they linked to my friend's account. I had to find all the papers manually. Initially, I have copied and pasted the DOIs into Chemistry Reference Resolver with a mouse, one by one. After a terrible evening, I was terrified by the prospect of doing the same again. Something had to be done. Whenever you a have a repetitive task on computer, it can be automated. I have pasted all the raw text from a PDF into a plain text editor (Notepad++ in this case). Then I searched by the following regexp:

DOI Regexp

This selected all the lines containing "DOI:" in them. The way it works can be better explained as an image:

DOI Regexp Explanation

I then copy-pasted the selected text into a new document and feeded it into my local copy of the Resolver. This one has been tweaked to allow 20 references at once. Doing batches of 20, I quickly finished the preliminary search. What's left was a much more daunting task of actually reading those 200 initial references.

DOI to Process