Abstract
This article details the technical development of a Retrieval-Augmented Generation (RAG) system designed to enhance discovery within an academic library's institutional repository. Conducted during a six-month research leave in 2025, this project explores the practical application of emerging cloud-based AI tools in a library context. We developed a prototype that integrates the University of Manitoba’s MSpace repository with Microsoft Azure AI services. The system utilizes an OAI-PMH harvester to retrieve metadata, generates semantic vector embeddings via the text-embedding-ada-002 model, and indexes these vectors in Azure AI Search. A custom front-end application facilitates both traditional keyword search and generative, context-aware chat interactions. This paper documents the development environment, script logic, and specific technical challenges overcome—such as OAI-PMH pagination errors and API versioning conflicts—providing a reproducible roadmap for libraries seeking to explore semantic search technologies.
References
Alshammari, S., Basalelah, L., Walaa Abu Rukbah, Alsuhibani, A., & Wijesinghe, D. S. (2024). PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation. arXiv.Org. https://doi.org/10.48550/arXiv.2405.07963
Burtsev, M., Reeves, M., & Job, A. (2024). The Working Limitations of Large Language Models. MIT Sloan Management Review, 65(2), 8–10.
Columbia University. (2024, May 24). Enhancing library search system with AI technology at Columbia University. https://etc.cuit.columbia.edu/news/AICoP-library-augment-discovery-with-AI
Finkelstein, J., Moskovitch, R., & Parimbelli, E. (2024). Artificial Intelligence in Medicine: 22nd International Conference, AIME 2024, Salt Lake City, UT, USA, July 9-12, 2024, Proceedings, Part I (2024th edition, Vol. 14844). Springer.
Huang, Y., & Huang, J. (2024). A survey on retrieval-augmented text generation for large language models. arXiv.Org. http://arxiv.org/abs/2404.10981
Kamath, U., Keenan, K., Somers, G., & Sorenson, S. (2024). Large language models : a deep dive : bridging theory and practice (2024th edition). Springer. https://doi.org/10.1007/978-3-031-65647-7
Kaplinsky, P., Singh, R., Fusillo, T. F., Leader, A., Zwicker, J. I., & Mantha, S. (2024). Retrieval augmented generation for the detection of major bleeding events in the electronic health record. Blood, 144(Supplement 1), 2263. https://doi.org/10.1182/blood-2024-203911
Kassorla, M., Georgieva, M., & Papini, A. (2024). AI literacy in teaching and learning: A durable framework for higher education. Educause. https://www.educause.edu/content/2024/ai-literacy-in-teaching-and-learning/introduction
Kautonen, H., & Gasparini, A. A. (2024). B-Wheel – Building AI competences in academic libraries. The Journal of Academic Librarianship, 50(4), Article 102886. https://doi.org/10.1016/j.acalib.2024.102886
Meakin, L. (2024). Exploring the Impact of Generative Artificial Intelligence on Higher Education Students’ Utilization of Library Resources: A Critical Examination. Information Technology and Libraries, 43(3), Article 17246. https://doi.org/10.5860/ital.v43i3.17246
Ni, Z., Qian, Y., Chen, S., Jaulent, M.-C., & Bousquet, C. (2024). Scientific evidence and specific context: leveraging large language models for health fact-checking. Online Information Review, 48(7), 1488–1514. https://doi.org/10.1108/OIR-02-2024-0111
Pride, D., Cancellieri, M., & Knoth, P. (2023). CORE-GPT: Combining open access research and large language models for credible, trustworthy question answering. arXiv.Org http://arxiv.org/abs/2307.04683
Rahman, M. H., & Islam, M. N. (2024). The Impact of ChatGPT for Enhancing Knowledge Management in University Libraries. Journal of Web Librarianship, 18(4), 177–196. https://doi.org/10.1080/19322909.2024.2391907
Toro, S., Anagnostopoulos, A. V., Bello, S., Blumberg, K., Cameron, R., Carmody, L., Diehl, A. D., Dooley, D., Duncan, W., Fey, P., Gaudet, P., Harris, N. L., Joachimiak, M., Kiani, L., Lubiana, T., Munoz-Torres, M. C., O’Neil, S., Osumi-Sutherland, D., Puig, A., … Mungall, C. J. (2024). Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI). arXiv.Org. https://doi.org/10.48550/arxiv.2312.10904
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. arXiv.Org, https://doi.org/10.48550/arXiv.1706.03762
Wang, C., Ong, J., Wang, C., Ong, H., Cheng, R., & Ong, D. (2024). Potential for GPT Technology to Optimize Future Clinical Decision-Making Using Retrieval-Augmented Generation. Annals of Biomedical Engineering, 52(5), 1115–1118. https://doi.org/10.1007/s10439-023-03327-6
Wheatley, A., & Hervieux, S. (2024). Comparing generative artificial intelligence tools to voice assistants using reference interactions. The Journal of Academic Librarianship, 50(5), Article 102942. https://doi.org/10.1016/j.acalib.2024.102942
Yun, L., Yun, S., & Xue, H. (2024). Improving citizen-government interactions with generative artificial intelligence: Novel human-computer interaction strategies for policy understanding through large language models. PloS One, 19(12), Article e0311410. https://doi.org/10.1371/journal.pone.0311410

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2026 International Journal of Librarianship

