Over the past days, we’ve been quietly running a series of internal “stress tests” on the latest open-source language models as candidates for powering our NORAI RAG Bot framework. The mission: find an AI engine that truly respects its boundaries, always answers from the documents provided, and never invents facts or drift into generic waffle.
Most AI providers promise this. A very few deliver it.
OpenAI's OSS 20B Model Surpasses Expectations
Our internal trials with OpenAI’s new OSS 20B parameter model produced results that, quite frankly, blew us away. With only a simple system prompt and two uploaded documents, the model:
Strictly answered from our docs. Even when pressed with off-topic or personal questions (“tell me about Elon Musk”, “tell me about OpenAI”, “who owns you?”), it refused to hallucinate or invent.
Handled conversation naturally. It remained friendly, helpful and context-aware – but always within the rails set by our simple system prompt and documents.
Displayed real-world business sense. For example, it solved a customer pricing question from the contract, but would not speculate or invent answers when data was missing (“Are there 5-year discounts?”).

Why Does This Matter?
For enterprise buyers, trust is everything. A chatbot that makes up facts, guesses, or loses track of compliance boundaries is not just useless – it’s a liability. Our tests show it’s now possible to build a conversational, responsive AI that’s as safe and reliable as a classic expert system, but far more usable and scalable.
This is a milestone for GDPR-compliant, EU-hosted AI:
Low hallucination risk in customer service, compliance, or regulated environments (with LLMs, it can never be zero).
No overseas API calls: Everything runs on open infrastructure, fully within Europe.
Faster onboarding: You upload your docs, set the guardrails and go live – no weeks of fine-tuning or firefighting.
Why Obedience Matters?
When it comes to enterprise AI, it’s tempting to chase benchmark scores and leaderboard rankings. But inside the real-world trenches of Retrieval-Augmented Generation (RAG), performance isn’t measured in abstract percentages – it’s measured in trust.
The difference between a model that “knows everything” and one that only knows what it’s told is the difference between a reliable digital assistant and a rogue liability.
Obedience to documents and strict guardrails isn’t a limitation; it’s a feature. Our research confirms that even the most advanced LLMs, if left unchecked, will inevitably hallucinate or make leaps outside the evidence. In contrast, a well-instructed, document-bound model delivers what enterprises need: answers grounded in the facts you provide, zero speculation, and a safety net for compliance.
In the end, obedience trumps benchmarks – because what matters isn’t what the model knows, but how faithfully it serves your business’s reality.
What’s Next?
The new OpenAI OSS 120B model is already available, and we’re currently running tests using its “little brother”, the 20B parameter version. If early results are any indicator, the larger model is expected to deliver even greater reliability, context-awareness and multilingual performance.
With this, we can confidently say we’re ready to offer enterprises a truly unique, out-of-the-box RAG solution: strict, document-grounded, conversational and safe. Testing continues, but the breakthrough is real – and the gap between “benchmarked” and “battle-ready” AI is closing fast.
Want to see the difference for yourself?
Contact us for a private demo or proof-of-concept.
This is what enterprise-ready AI actually looks like.
Victor A. Lausas
Chief Executive Officer
Subscribe to North Atlantic’s email newsletter and get your free copy of my eBook,
Artificial Intelligence Made Unlocked. 👉 https://www.northatlantic.fi/contact/
Discover Europe’s best free AI education platform, NORAI Connect, start learning AI or level up your skills with free AI courses and future-proof your AI knowledge. 👉 https://www.norai.fi/

