A Sleepless Night from Generative A.I.

So. I had a difficult weekend, working on a collaborative research project using an open-source generative A.I. server I’ve built with a small research team (https://ai.sauer.studio/) in the laundry room of my house.

Today, I seem to have dug myself out of the hole? I survived, and I’ve learned something from the experience.

Those of you who read my posts know that I’ve been generally very pleased by open-source generative A.I. LLM systems. I’ve been convinced that they’re an excellent alternative to soulless environment-destroying, intellectual-property-abusing, mediocrity-inspiring, and prosaic corporate generative A.I.s.

I have been upset since last Friday, however, when I discovered that my own prototype alternative system wasn’t as smart as I’d thought.

It was having trouble parsing some actual student prompts, gathered during a usability-testing session run by a colleague and friend. The system just provided incorrect, incomplete answers to one of the student questions. In thirty variants of LLM and temperature settings, almost none of the system’s answers was substantially correct.

Sigh.

As a result, I didn’t sleep well on Friday night. I’m collaborating with teammates on this research, and we’d committed six months already to this technology. If it didn’t work, we could still report negative results, but I didn’t want to. The system had worked so well, at first. I feared I’d be letting my entire team down.

I was unhappy-worried.

TL;DR

Problem: Open‑WebUI’s default RAG setup gave spotty answers for graduate‑student queries about the TTU English Graduate Handbook.

Fix: I swapped the default RAG retriever for Apache Tika, bumped the retrieval top‑K from 4 to 10, and crafted a longer, more detailed RAG engine prompt.

Result: Accuracy jumps from “slightly useful” to “highly reliable” for real‑world student questions.

The Context: Why RAG Matters for Students

Graduate students often need quick, reliable answers to procedural questions—deadlines, funding policies, coursework requirements. I have found in recent years, at two universities, that PDF handbooks do not tend to be understood very well by graduate students; they’re just not confident that they understand the system which governs them. I’d theorized (widely, on social media) that a small, free, open-source LLM backed by Retrieval‑Augmented Generation (RAG) could pull up‑to‑date information from institutional documents, making the model a friend who answers any question they have, behaving like a living FAQ.

Open‑WebUI is a popular open‑source front‑end that bundles a powerful LLM (e.g., ChatGPT‑like) with a simple RAG pipeline. By default, it uses a lightweight retriever and a modest top‑K of 4 passages. That works great for general knowledge, but when you ask the system to pull from a specialized, dense text like the TTU English Graduate Handbook, my research team and I found the answers begin to drift. The LLM can especially be distracted by errant words, spelling errors, and grammatical errors in the users’ prompts.

The Symptom: Inexact Answers

When I ran a test suite of actual graduate‑student prompts—things with unclear elements, spelling and grammar errors, such as “Does Texas Tech’s PhD Program in Technical Communication and Rhetoric give a sequence of exams and [sic] to receive the degree and what are the consequences if these milestones are not met?” the default Open‑WebUI configuration produced responses that were nearly correct but often missed key details about the third-year qualifying exam (the only event marked in the text as an exam).

Why?

Sparse Retrieval – The default retriever (a simple TF-IDF and BM25) only fetched 4 passages (the default “top K” setting). The Handbook is long; the relevant snippet can be buried among many others.

Shallow Prompt – The prompt given to the language model was short, lacking explicit instructions to synthesize information from the retrieved passages.

Engine Limitations – The default embedding model wasn’t tuned for legal or procedural text, so the ranking was sub‑optimal.

The Fix: A Three‑Step Customization

Replace the Default Retriever with Apache Tika

As I lay in bed, sleepless (I’d spent six months on this prototype!), I remembered a conversation with an expert who’d been very dissatisfied themselves a few months ago with the Open-WebUI RAG settings.

I remembered, from that conversation, that Apache’s open-source Tika is a more robust content extraction framework that can parse PDFs, Word docs, and many other formats, converting them into a format easily accessible by the LLM’s vector databases. On Saturday, by adding a Tika container to the same Docker network as the Open-WebUI frontend, then re-feeding Tika‑cleaned PDF documents into the embedding pipeline, I was able to reduce noise and improve retrieval quality significantly.

Adding to the RAG Top K

Retrieving more passages also gives the language model a richer context pool. So I adjusted the RAG parser’s top‑K from the default value (4) to 10. This consumed more high-speed RAM (as did the Tika container), but now the model can cross‑check facts across multiple snippets, reducing hallucination still further.

Adjusting the RAG System Prompt

I also lengthened the system prompt Open-WebUI uses when engaging with RAG content, adding details to its default text to ensure that it considers the source PDFs more carefully.

After deploying the above changes, I reran the same graduate‑student prompts. Amazingly, it worked!

(You aren’t likely to be amazed. You’re probably either not understanding this technical jargon at all, or you understand it so well you’re shaking your head that I didn’t think of it right away. But I’m sharing the level of expertise I have, without shame, hoping it can help those who haven’t faced quite this problem to benefit without any need for your worry.)

After this fix, the model consistently cited the correct passage, reducing its guessing.

My team and I will spend the next few weeks coding several hundred AI-generated answers to student prompts, but I’m now much more confident in our system’s ability to help students. (And I’m sleeping better!)

Take‑Away Tips

Document Pre‑Processing Matters – If you’re using an open-source LLM server, consider adding a robust parser (Tika, PyMuPDF) to clean PDFs before embedding.

Top‑K is a Trade‑off – 10 is a sweet spot for medium‑sized handbooks; adjust based on document size and latency tolerance.

Prompt Engineering is Important – Explicit instructions in the RAG system prompt drastically improve answer quality.

Iterate with Real Users – When building any open-source LLM solution, always test with actual student queries, then don’t be afraid to refine the system’s settings, based on feedback. It may take more fine-tuning than most English faculty are used to — but we live in a new world, and debugging scholarly tools isn’t entirely unlike revising our scholarly writing.

Conclusions

Open‑WebUI is a powerful foundation for building an institutional FAQ system, but the default RAG pipeline isn’t always “plug‑and‑play” for specialized documents like graduate handbooks. By swapping in Apache Tika, increasing the top‑K to 10, and giving the RAG engine a clearer, longer prompt, it may be possible to transform a “good enough” assistant into a trusted resource for your student community.

But thinking back on the experience, I remember how dismissive I’d been of the conversation with a colleague a few months ago, when they’d reported issues with the default RAG engine. It worked fine for me, didn’t it?

I see, now, that I’d not wanted to consider my colleague’s experiences, in part because I was afraid I’d have to do the complex work of adjusting the server containerization, and add an additional piece of software into the complex flowchart of how my system already worked. That led me not to want to consider their story.

But best practices don’t result from what I want. They come from consistent, quality results.

It was only later, reflecting on why I was facing failure, that I considered properly what my colleague had said. It was then that I needed to remember, in detail, the complex solution they had described solved the issue for them. Today I thank goodness I’d not let my fear of their experience cause me to “zone out” as they spoke; because I paid attention at the time to the details of their solution, even if I doubted it applied to me, I was able to recognize its significance when I needed to. Later, I realized it really, really did apply to my case. Only because I’d been lucky, paying attention to a newly-emerging community of experts, could I possibly have solved this problem.

Now we’ll be able to complete our study, report very positive findings at the SIGDOC conference next month, and write our results for peer-reviewed publication.

Happy building—and may your graduate students always find the answers they need!

Share this Page