LAC Session Type
Paper
Date & Time
Thursday, November 7, 2024, 10:30 AM - 12:30 PM
Location Name
Galleria South
Name
Our Future is in the Past - The Predictive Power of Consortial ILL Transaction Data
Description

Purpose & Goals

The primary goal of the Boston Library Consortium’s Controlled Digital Lending (CDL) Data Analysis project was to identify a set of print book titles that could pre-seed a digital repository, so that interlibrary CDL transactions would not require scan-on-demand workflows. Rather, it would allow the project to focus on other aspects of the delivery of the files (library workflows, user experience, etc.). As such, we sought to identify titles having a high probability of being requested by BLC member libraries going forward. More broadly, this project resulted in a template and methodology for consortium and other non-integrated library networks for working with non-standardized (aka “dirty”) data, for de-duping such data and enriching them from multiple sources in order to gain insight into the shared use of the collective holdings among these institutions.

Design & Methodology

An unusual challenge was integrating usage data from 13 BLC member libraries. We opted for a methodology using non-standardized ILL transaction data for the consortium, which required integrating data elements drawn from additional sources. Gathering and cleaning the data was a multistage process.. As our goal was to produce a list of potential titles for a corpus, not delineate full bibliographic data for every title that was lent, there were several instances where we just worked with “good enough” data rather than seeking the complete dataset. Even accepting that the data would be imperfect, it still required several rounds of data clean up during multiple stages of the process. We focused on titles that had been requested and filled between BLC libraries via RapidR during the last five years. Due to libraries limiting operations during the pandemic, data from March 2020 forward showed dramatically fewer transactions than the period of January 2018-March 2020, so we focused on titles that were requested and filled by BLC libraries in both 2018 and 2019. That dataset was then enriched using the OCLC API to return more complete bibliographic data, including year of publication, publisher, LC class. We cleaned up the year of publication, used a list of publishers from Gobi to standardize publisher names and mark if they are trade or university publishers. This dataset was then matched against 2021 and 2022 (to May 19) RapidR transactions to identify books that were requested in at least 3 of the past 5 years–that is, in both 2018 and 2019, and either 2021 or 2022. This resulted in a corpus of 588 titles. Finally, we shared OCLC numbers of those 588 titles with Gobi, who then provided information on e-book availability.

Findings

When we received the complete annual RapidR data at the end of calendar 2022 we decided to test the utility of the corpus by seeing how many titles were requested after the data cut-off date (May 19, 2022). Of the 588 titles, 129 were requested between May 20 and December 31, 2022, representing 22% of the corpus. Many of the titles had multiple requests, with the number of transactions in this seven-and-a-half month period totaling 173 (more than 5 per week). As a comparison, we pulled a random sample of 615 titles from 2018-2019 data: only 48 of these titles (8%) were requested between May 20 and December 31, 2022, demonstrating that our corpus performed significantly better than sheer chance. This was corroborated by running a similar comparison using the 2023 RapidR transaction data: 38% of the corpus had a least one loan during that period. While deemed a success, the project did raise questions that are nearly impossible to answer from the data in hand. For example, is it possible to establish when requests for different editions are meaningful? Does a patron have a rationale for requesting a particular version of Orwell’s 1984, or would they be satisfied with any copy of the text? Slightly easier—but still difficult to tease out—is establishing unique requests for a title: One patron submission can generate multiple requests if it was rejected by several lenders before it is fulfilled, or several individuals may have submitted requests for a single title almost simultaneously. Telling one situation from the other, while perhaps possible, is far from straightforward.

Action & Impact

As the Boston Library Consortium’s CDL program enters its pilot phase, our project has confirmed the existence of a body of print titles distributed across the BLC membership that is routinely being physically shipped between campuses to meet patron demands. This is a base level confirmation that CDL is worthy venture for the Consortium. More significantly, it is an indication that maintaining a shared digital repository will be a positive component of the project as it will reduce the need for scanning at the point of request, which would require more elaborate technologies and workflows, increased staffing, and longer delivery times. It also establishes that ebook availability of print holdings—with about half of the routinely requested corpus title having a library-licensable ebook version—is an issue we should take under consideration. It has surfaced the need for member libraries to pursue whole-ebook ILL rights collectively and more systematically, as well as the need for incorporating functionality to route print requests to ebook alternatives (including purchase-on-demand options) as part of the development of any CDL systems.

Practical Implications & Value

Consortium and shared-print programs play an increasingly strategic role for libraries and their approaches to maintaining their circulating collections. There is a growing need for tools and methodologies that support the analysis and assessment of these “collective collections,” whether among a set of institutions or by a single member of a group. The methodology we have developed can be extended to probe other questions and aid in collective decision-making. For example, the ebook availability data could be used to identify priorities for consortial purchasing. Or the subject matter of transacted titles could illuminate gaps in collecting and serve as basis for adjusting acquisition policies in order to achieve more diverse holdings across a consortium; while this could be done using our data set based on LC classification, such a project could also look to further enhance the dataset with subject headings to improve granularity. ILL data is limited and inconsistent, but it also is capable of providing a unique window on how the collective collections are shared among the members of consortium. Our methodology is proof that it can be a viable option for informal or spontaneous peer or assessment groups.

View Slides (PDF)

Keywords
controlled digital lending (CDL), consortial analytics, Interlibrary loan usage & analysis, collection assessment, shared print programs/collective collections