Written by
Lev Katsnelson
Professional Services Trainer at Ascribe
A few weeks back I watched a newly onboarded research team deliver what used to take them three weeks in under 48 hours. AI Coder analyzed and categorized several thousand open-ends, helped create cross-tabs and ASK drafted initial insights.
Then came the email that changed everything.
"Why did the AI group these specific responses differently than our first wave?" And just like that it felt like we were back at "you can have it fast, OR you can have it good."
The Speed vs. Quality Dilemma
We're rushing to integrate AI into every aspect of research, from survey design to reporting. But in our efficiency-driven glee, we're tripping over a quality issue we didn't prep a helmet for.
LLMs can process thousands of responses in minutes, but there's a catch. Generative AI models can, and do, produce different results from the same exact inputs depending on the temperature settings, random seeds, or even the time of day you run them. Throw the same 1,000 responses into ChatGPT and you're guaranteed to get different categorizations. Not wildly different, most of the time, but different enough to shift your strategic recommendations.
The Real Problem: Magic vs. Math
At Ascribe, we've processed over 50 million open-ends in our lifetime. A significant portion in the past year were using AI, and what we've learned is AI does not work without human intelligence in the mix.
The problem isn't the technology itself. It's pretending AI is magic instead of math. It's applying an older framework to fundamentally new processes. We're claiming to maintain rigorous methodology up until the moment we hand the reins over to an AI we don't fully understand.
The Three Pillars of Research Quality: Then and Now
For decades, research quality rested on three pillars. These same three pillars stand today, but are made of slightly different materials:
- Representative Data now also includes training data. Can your AI handle medical terminology while also being asked to categorize social reviews of the newest haute cuisine restaurant specializing in microgreens planted exclusively on every third Wednesday of the month? The answer depends as much on the AI's training data as on your sample demographics.
- Sound Methodology has expanded to AI algorithms and prompt crafting. It's not enough to have a good sampling plan if your AI is introducing systemic biases.
- Human Oversight has become even more critical but requires a new approach. When an AI processes 10,000 responses in under 10 minutes, how do you review the results in a meaningful way? The sheer volume can make traditional quality checks feel like inspecting a river with a spork.
Building Guardrails: A Practical Framework
Successful research teams aren't the ones avoiding AI. They are building guardrails and processes that make their results explainable, while leveraging AI tools that allow for transparency.
- Establish consistency benchmarks: Run the same data set through your AI solution multiple times. Note where it drifts and where the results are reproducible. Document the areas you know you will always need to verify.
- Demand confidence scores: Ask for easily accessible and understandable confidence scores to gain insights into when the AI is approximating a grouping of ideas. Establish a baseline and check anything that drops below it.
- Create audit trails: Document successful, and especially unsuccessful, outputs and the AI settings and decisions that got you there. This creates yet another audit trail that allows you to build trust and confidence.
- Implement staged review: This may be the most important part of the process. While the previous steps are important during adoption, staged review should be part of each project you involve AI in. Do what makes sense for your team and provides the most peace of mind.
Real-World Review Strategies
We've seen everything from hybrid approaches where teams require human review of any category that captures less than 5% or more than 30% of responses, to teams that have developed enough trust and understanding of the platform so that they only require targeted human review of specific findings they've identified as needing intervention.
"The new standard for research quality isn't lower or higher, it's different. Clients are buying trust in processes they can't fully see."
The Trust Factor
When you tell clients that AI was involved, they need to know you haven't sacrificed rigor for speed just to meet their deadline. The teams that thrive in the current market aren't those who resist AI or those who blindly incorporate it into every facet of their work. They're the ones that develop frameworks for quality that take into account both AI's power and its limitations.
The efficiency metrics of AI in research are seductive: faster turnarounds, larger sample processing, and the ability to find needles in a haystack of noise. But AI is a tool and not a replacement for human intelligence in research.
The Bottom Line
The question isn't really whether to use AI in research—that ship has sailed. The question is whether we'll use it in a way that enhances our credibility or erodes it.
The answer lies not in choosing between speed and quality, but in building new frameworks that deliver both.