Simulating patron behavior to understand how people use the New York Public Library

Written by Shengqi Zhu

The Challenge

The New York Public Library (NYPL)’s neighborhood branches serve as local public spaces where people can read, learn, and connect. The libraries offer a wide range of materials, from books to digital resources, along with community programs for all ages, such as story times for children and workshops for adults. Each branch seeks to serve the character and needs of its neighborhood, and it is always exploring how to better support the patrons.

Understanding the community use patterns of a library, however, involves notable practical challenges. Probing into the full records and detailed habits of a patron can be viewed as overly invasive, especially for non-members and casual visitors (e.g. ones who occasionally drop by to use the public space to work). Further, even if data is collected with patron consent, traditional survey methodology is heavily bound by the available sample size and self-reporting biases (e.g., people who answer the survey may overstate their frequency of visits). 

Another way to study patron behavior and their usage patterns, without heavy manual intervention, is via anonymized, naturally-occurring data, such as the entrance/exit records, anonymized item circulation records, program attendance numbers, etc. Yet, such data are unaligned and not trackable, i.e., there is no way to tell who is who, or how many of the people that attended a program also picked up a reserved book as they left. Therefore, we needed an approach that incorporated raw overall data such as "number of people exiting in the past half hour" or "number of total participants in this program" without tracking the full, exact trajectories of any individuals."

The Project

As a Siegel PiTech PhD Impact Fellow working with NYPL this summer, I used an Agent-Based Model (ABM) simulation as a data-centric approach to modeling community use patterns of a library. The ABM method was a good choice for this work, as it did not rely on assumptions of the actual data, but rather created a simulated scenario with predicted statistics such as program attendance, which could be further compared against the library’s real observations. 

With great guidance from NYPL Strategy & Public Impact staff, I first created a “day model” that simulated a natural day (library work hours) at a chosen library branch and the programs offered by the branch. I then created probabilistic “patron models” that simulated the possible behaviors of a library visitor. Within a simulated day, we “observe” the user behaviors every 15 minutes, as each simulated patron either continues with their current activities or “transitions” to a different one. We set up a combination of parameters that determine the probability threshold for (dis)continuing the current activity and/or switching to a new one, including possibly choosing to leave. 

After setting up a simulation framework, I ran the daily library model over the span of half a year, initialized with the real-world visitor entrance records. I then used an algorithm that makes more and more educated estimations of parameters in the day model and patron models – such as the expected peak visit time on a weekday, or an average patron’s probability to continue reading after 30 minutes – by comparing the predicted and real versions of exit sequence and program attendance. After running several iterations and comparing with real data, the model obtained a general idea of such parameters and started to choose sets of parameters more accurately. Finally, we collected the results from a final iteration of the estimation and optimization.

The approach I selected with NYPL’s Strategy & Public Impact team turned out to be effective in describing and analyzing the complexities in patron behavior patterns, and most findings matched well with the small-scale survey data we collected at the start of my Fellowship. We found that in our library of interest, children's programs and computer use were the types of activities expected to last the longest. Additionally, using the public space and participating in children's programs are the two activities most likely associated with other actions at the library, as opposed to directly exiting without other forms of interaction.

Shengqi Zhu

Ph.D. Student, Information Science, Cornell University

Impact and Path Forward

Our work provided novel insights into assessing how a library is currently serving its patrons, especially around how the library layout and vibe impact the patrons’ perception and their willingness to conduct certain activities. We also sought to understand the multifunctionality of a branch, i.e., patrons interacting with multiple functions of the library in one visit. In the future, we hope our model can generalize well and extend to cover more branches, and possibly sets up an interpretable system that allows direct comparisons between branches in terms of how a library motivates and encourages (or discourages) its community to conduct different activities.

Previous
Previous

Leveraging Predictive Analytics to Support Emergency Operations in NYC 

Next
Next

Automating Noise Pollution Enforcement: Using AI to Streamline NYC's Noise Camera Enforcement Program