Caution, DOI! Bibliographic detective story in the era of digitalization
Victor Kozyakin

TL;DR
This paper examines inconsistencies in bibliographic data from popular digital services, analyzing their causes and highlighting challenges in maintaining accurate scholarly records.
Contribution
It provides a detailed case study of bibliographic inconsistencies and discusses underlying reasons in the context of digitalization.
Findings
Identifies common sources of bibliographic errors
Highlights impact of digitalization on data accuracy
Suggests need for improved bibliographic verification
Abstract
An example of inconsistencies in information provided by popular bibliographic services is described and the reasons for these inconsistencies are discussed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Caution, DOI!
Bibliographic detective story in the era of digitalization
Victor Kozyakin
Kharkevich Institute for Information Transmission Problems
Russian Academy of Sciences
Bolshoj Karetny lane 19, Moscow 127051, Russia
Abstract
An example of inconsistencies in information provided by popular bibliographic services is described and the reasons for these inconsistencies are discussed.
Keywords: bibliography, digital object identifier, DOI, BibTeX, RIS
*The best laid schemes o’ Mice an’ Men Gang aft agley.
Robert Burns, “To a Mouse”*
Introduction
The collection of exact and complete bibliographic references is inevitable in scientific research. Researchers of the precomputer era remember how difficult it was to collect a reference list on paper correctly and in accordance with the requirements for journal reference list typography. The situation changes drastically as computer methods for publications preparation have been introduced—we will only talk about the LaTeX system and its various add-ons, although this is, to a certain extent, true for other typography software systems, both proprietary and free.
The BibTEX software created by Oren Patashnik became a widespread and, importantly, convenient way of preparing bibliographic references. The process of preparing bibliographic references with BibTEX is divided into two stages: the manual preparation of the database of the needed references, in which for every publication we typeset the necessary bibliographic data in a certain format and the automatic (with BibTEX) typography of the reference list according to .bst style files that are designed by many publishers following bibliographic typography and citation styling preferences.
Note that the creation of a database of publications for subsequent processing with BibTEX takes time and requires attention and certain knowledge about the rules for formatting its structural elements. However, the time and effort it takes to typeset the publication database is more than compensated by the simplicity of the next application of these databases in formatting bibliographies in different papers and, more importantly, by the crucial reduction of errors in publication formatting. Note also that the manual composition of the BibTEX database is not necessary in most cases, as a rule, because the needed records are usually formed by publishers and many bibliographic online services in the required format.
Figure 1 shows a fragment of the title page with the publisher’s imprint of a paper [1].
Below, we present the record in BibTEX format corresponding to this paper, from the bibliographic system MR Lookup of the American Mathematical Society (\hrefhttps://mathinet.ams.org/mrlookuphttps://mathscinet.ams.org/mrlookup):
@article {MR2999086, AUTHOR = {Kloeden, Peter E. and Kozyakin, Victor S.}, TITLE = {Asymptotic behaviour of random tridiagonal {M}arkov chains in biological applications}, JOURNAL = {Discrete Contin. Dyn. Syst. Ser. B}, FJOURNAL = {Discrete and Continuous Dynamical Systems. Series B. A Journal Bridging Mathematics and Sciences}, VOLUME = {18}, YEAR = {2013}, NUMBER = {2}, PAGES = {453--465}, ISSN = {1531-3492}, MRCLASS = {60J10 (15B48 92C99)}, MRNUMBER = {2999086}, MRREVIEWER = {Ross S. McVinish}, DOI = {10.3934/dcdsb.2013.18.453}, URL = {https://doi.org/10.3934/dcdsb.2013.18.453}}
A similar record provided from the zbMATH system of the European Mathematical Society (\hrefhttps://zbmath.orghttps://zbmath.org) is as follows:
@Article{zbMATH06146721, Author = {Peter E. {Kloeden} and Victor {Kozyakin}}, Title = {{Asymptotic behaviour of random tridiagonal Markov chains in biological applications.}}, FJournal = {{Discrete and Continuous Dynamical Systems. Series B}}, Journal = {{Discrete Contin. Dyn. Syst., Ser. B}}, ISSN = {1531-3492; 1553-524X/e}, Volume = {18}, Number = {2}, Pages = {453--465}, Year = {2013}, Publisher = {American Institute of Mathematical Sciences (AIMS), Springfield, MO}, Language = {English}, MSC2010 = {60J10 15B48 37H10}, Zbl = {1277.60118}}
We see that all significant bibliographic information (authors, name of the journal, publisher’s imprint, etc.) in both BibTEX records coincides. At the same time, the formatting of the corresponding records is slightly different, and, in addition, include some individual fields (for instance, the identification numbers in the corresponding systems: MRNUMBER and Zbl) reflecting the preferences of the authors of these records. In particular, the record of the MR Lookup contains an important DOI field, which is the digital object identifier, with which we can jump to the publisher’s webpage, at least containing the annotation and bibliographic data of the sought publication (sometimes, its full text), using the International DOI Foundation (IDF) service (\hrefhttp://www.doi.orghttp://www.doi.org).
Finally, we have arrived at DOI
The DOI system was created due to publishing industry initiative, which admits the need for unique identification of the content objects, rather than reference to their location. In 1998, the International DOI Foundation was founded to develop the system; the necessary technologies and standards have been created for the introduction of the DOI system [2, 3]. The first service for registering the DOI names began operating in 2000, and, towards the beginning of 2009, there were already allocated approximately eight million DOI names through eight registration services. The most used application of the DOI system is the service of cross links between publishers called Crossref (\hrefhttps://www.crossref.orghttps://www.crossref.org), which allows associating references from the citation directly with the cited content on the platform of another publisher with account for the access-control methods of the goal publisher.
The original DOI names may be represented by long lines of symbols, which is sometimes inconvenient for reference organization. To avoid this, the International DOI Foundation opened a service of the reduced DOI names called shortDOI (\hrefhttp://shortdoi.orghttp://shortdoi.org). When we request the shortDOI® with the original DOI, its shortened nickname is created in the format 10/abcde (or the previously shortened nickname is returned), and we may work with it further as with the original DOI.
Currently, various functions with the use of DOI are carried out by multiple services, proprietary and free bibliography managers, such as
- •
EndNote by the Clarivate Analytics company,
\hrefhttps://endnote.comhttps://endnote.com,
- •
Mendeley by the Elsevier company,
\hrefhttps://www.mendeley.comhttps://www.mendeley.com,
- •
Citavi by the Swiss Academic Software company,
\hrefhttps://www.citavi.comhttps://www.citavi.com,
- •
Zotero, a free and open-source reference management software, \hrefhttps://www.zotero.orghttps://www.zotero.org,
- •
ZoteroBib, \hrefhttps://zbib.orghttps://zbib.org (a simplified variant of Zotero),
- •
Docear, \hrefhttp://www.docear.orghttp://www.docear.org,
free desktop applications
- •
JabRef, \hrefhttp://www.jabref.orghttp://www.jabref.org (Windows, Linux, MacOS),
- •
BibDesk, \hrefhttps://bibdesk.sourceforge.iohttps://bibdesk.sourceforge.io (MacOS),
- •
KBibTeX, \hrefhttps://userbase.kde.org/KBibTeXhttps://userbase.kde.org/KBibTeX (Linux),
as well as many other internet services and desktop applications among which we also name the doi2bib service (\hrefhttps://www.doi2bib.orghttps://www.doi2bib.org), converting the DOI names into the bibliographic records in BibTEX format.
The introduction of the DOI system dramatically changes the entire technology of using bibliographic data—the users become a tool for instantaneous access to the electronic version of a publication through the Digital Object Identifier service and for the same instantaneous access to the required bibliographic information using the above-mentioned Crossref, EndNote, Mendeley services, etc.
This seems a time for the universal happiness of bibliographic data users, when all required data may be practically instantaneously obtained, having been verified. However, this appears unfortunately not a blessing (see the epigraph).
Dead DOIs have appeared that correspond to nothing. The reasons may be various: an error in the DOI name, a closing and structural change of the website where the corresponding publication was located, the transfer of the publication to another website, etc. 2. 2.
Semidead DOIs have appeared that are processed by some services, but ignored by others. For instance, the DOI 10.1000/182 of publication [2] is apparently processed only by the Digital Object Identifier service and is not processed by the Crossref, EndNote and Mendeley services. The situation in which the DOIs generated by the request of the users of researchers’ social network ResearchGate (\hrefhttps://www.researchgate.nethttps://www.researchgate.net) is the same.. 3. 3.
The wrong DOIs that point to other publications. 4. 4.
Finally, the bibliographic data provided by different services on the DOI request may differ. For instance, the data request [1] with DOI 10.3934/dcdsb.2013.18.453 to the Crossref service leads to the citation given in Fig. 2, where the 2012 year of publication is different from the true year (2013) of the journal (print) publication. A similar situation appears upon request from the EndNote, Mendeley, ZoteroBib, doi2bib services and on attempting to obtain data via JabRef—all provide the wrong year of publication of paper [1] with its DOI.
The first two mentioned disadvantages are not critical. Here, at least, the corresponding service requested for bibliographic data on the DOI directly informs us that these data cannot be given. The third disadvantage causes annoyance, however everyone can make a mistake. Fortunately, the first three disadvantages have an accidental character.
The last disadvantage appears to be sufficiently unpleasant because it manifests systematically and none of the above-mentioned services inform us that the data given require additional verification. This lowers the sense of the digital object identifier, the DOI, to a large extent.
Investigation
A reasonable question arises: How could it be that different services provide different information on the same DOI?
We note that BibTEX is not the only nor most widespread format for storing bibliographic data. BibTEX became widely used in the scientific publications environment prepared mostly with the LaTeX system and its various add-ons. In the publishing industry, the different formats for storing and exchanging bibliographic data are the most widespread (appearing well before BibTEX).Among them one of the most widely used is the RIS format developed by Research Information System and applied as the main format of digital libraries such as IEEE Xplore, Scopus, ScienceDirect, and SpringerLink and bibliographic services such as Zotero, Citavi, Mendeley, EndNote, and Crossref. The record for paper [1] in RIS from the Crossref has the following form:
TY - JOUR DO - 10.3934/dcdsb.2013.18.453 UR - http://dx.doi.org/10.3934/dcdsb.2013.18.453 TI - Asymptotic behaviour of random tridiagonal Markov chains in biological applications T2 - Discrete and Continuous Dynamical Systems - Series B AU - Kloeden, Peter E. AU - Kozyakin, Victor PY - 2012 DA - 2012/11 PB - American Institute of Mathematical Sciences (AIMS) SP - 453-465 IS - 2 VL - 18 SN - 1531-3492 ER -
In this record there are two parameters characterizing the data: DA (Date) and PY (Publication year), and in both the 2012 year is specified!
Unfortunately, in the RIS description available via
\href
/http://refman.com/support/risformat_intro.asp,
there is no detailed explanation on the sense of these parameters. However, the value 2012/11 of the PY parameter in the RIS record of paper [1] coincides with the publication of the first online version of this paper. This is probably the sense of the PY: parameter: the year of the first public appearance of the publication. At the same time, the Year parameter in BibTEX is described as the imprint year of the printed publication. Apparently, it is the reason, i.e., the never mentioned difference in the interpretation of the concept “publication year”, that leads to inconsistencies in the bibliographic data of paper [1] provided by the MR Lookup and zbMATH on the one side and Crossref, Mendeley, ZoteroBib, doi2bib, JabRef on the other side.
Conclusions
We have appealed by describing inconsistency in the data provided by different bibliographic services to the forum of an application (its name is insignificant) unambiguously promoting the idea of the advantage of receiving bibliographic data from internet sources. Unfortunately, this appeal led to nothing: we received an answer “explaining” that the application has nothing to do with it and relies on data provided by the services without verification. Organizing the dialog between the two groups of information services with an invitation to coordinate/standardize ways of interpreting the bibliographic data is more than an ordinary user can manage. And, taking into account that the situation described above is not unique, we conclude this note: Do you use DOI? Trust but verify.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] P. E. Kloeden, V. Kozyakin, \href http://www.aimsciences.org/article/doi/10.3934/dcdsb.2013.18.453Asymptotic behaviour of random tridiagonal Markov chains in biological applications, Discrete Contin. Dyn. Syst. Ser. B 18 (2) (2013) 453–466. \href http://arxiv.org/abs/1112.5844 \path ar Xiv:1112.5844, \href http://dx.doi.org/10.3934/dcdsb.2013.18.453 \path doi:10.3934/dcdsb.2013.18.453.
- 2[2] \href http://www.doi.org/hb.html DOI ® Handbook, International DOI Foundation, [Online; updated August 16, 2018]. \href http://dx.doi.org/10.1000/182 \path doi:10.1000/182.
- 3[3] N. Paskin, \href http://www.doi.org/overview/DOI_article_ELIS 3.pdf Digital Object Identifier (DOI ® ) System, in: M. J. Bates, M. N. Maack (Eds.), Encyclopedia of Library and Information Sciences, 3rd Edition, CRC Press, Boca Raton, 2009, pp. 1586–1592.
