Our Story — Satcove

The idea Satcove makes concrete wasn't born in 2026. It's thirty years old. In 1991, in a paper that became a landmark — “Adaptive Mixtures of Local Experts” — Robert Jacobs, Michael Jordan, Steven Nowlan and Geoffrey Hinton asked a question that feels strikingly current today: rather than handing a task to a single monolithic neural network, what if several specialized sub-networks collaborated, each strong in its own area, arbitrated by a mechanism that decides whom to trust depending on the question?

It was a conceptual break. Until then the dominant intuition was to build one model, as large and complete as possible, meant to know everything. The authors showed the opposite: splitting the problem across distinct experts, then combining their views, produces better results and more stable learning. Strength no longer came from the size of a single brain, but from cooperation between several.

The seed lay dormant for years, for lack of the computing power to exploit it at scale. It took until 2017 for it to resurface with force. Noam Shazeer and his colleagues at Google published work on the “Sparse Mixture of Experts”: an architecture that, for each request, activates only a fraction of a gigantic network — the few most relevant experts. You get models of unprecedented capacity without paying that cost on every computation. The 1991 idea finally became industrial.

In 2022 the public reaped the fruit without even knowing it. Mistral popularized the approach with Mixtral 8x7B, a model where eight experts share the work, two being called on for each token produced. The “mixture of experts” moved from lab to product; it became one of the manufacturing secrets of the most capable modern AIs.

That success has a consequence often misunderstood: nearly all the AIs we use today already rest, internally, on a form of collaboration between experts. The whole industry has therefore settled the old 1991 question — yes, cooperation beats the single brain. But it settled it behind closed doors, inside each model, where the experts share the same birth certificate, the same training data and, inevitably, the same mistaken certainties. Disagreement there is domesticated, never head-on.

In other words: since 1991, science has known that a single point of view gets things wrong and that confronting several intelligences produces better answers. It's an established fact, not a fashion. What was still missing wasn't the idea of consensus. It was making it play out no longer between the parts of one machine, but between genuinely different AIs, built by different teams, able to contradict each other for real. That step, no one had yet taken for the general public.

A good idea is only worth as much as it can withstand measurement. And that's what tipped over between 2023 and 2026: multi-AI consensus went from an appealing intuition to a demonstrated, reproduced, quantified result.

The founding moment is a May 2023 paper by Yilun Du, Shuang Li, Antonio Torralba, Joshua Tenenbaum and Igor Mordatch, from MIT and Google DeepMind: “Improving Factuality and Reasoning in Language Models through Multiagent Debate.” The method is clear. You put a question to several AI instances; each answers; then you give them the others' answers to read and ask them to revise their own; you repeat over a few rounds. The result is unambiguous: by the end of the debate, the answers are more factual and the reasoning more solid than with a single AI. The paper would be accepted at ICML 2024, one of the field's most demanding conferences — a stamp of scientific seriousness.

That work also surfaced subtleties precious to anyone who wants to do this well. Making identical copies of the same model debate helps less than confronting genuinely distinct viewpoints: a brain that re-reads itself stays a prisoner of its own blind spots. And the way the AIs exchange arguments matters: structured communication, where each truly reads the other before answering, beats a mere critique run in parallel without dialogue.

In 2024, Kamal Hegazy, a researcher affiliated with Mila, drove the point home with “Diversity of Thought.” His conclusion is direct and consequential: making different models deliberate beats multiplying instances of the same model. Diversity of training — AIs that haven't seen the world through the same data — weighs more than diversity of prompt. This is precisely the frontier the intra-model “mixture of experts” could not cross: to win, you need intelligences that don't resemble each other.

That result shifts the problem's center of gravity: the question is no longer only to make models debate, but to make intelligences debate that are dissimilar enough for the confrontation to actually teach something. Two models raised on different corpora don't err in the same places; where one slips, the other often holds firm. It's this complementarity of blind spots — not one more vote — that gives the panel its value. Stacking clones reassures; crossing perspectives corrects.

Then come the numbers that leave no more room for doubt. In April 2026, the “Council Mode” study put figures on the gain. On HaluEval, a benchmark designed to hunt hallucinations, council mode reduces them by 35.9%. On TruthfulQA, which measures a model's tendency to tell the truth rather than echo widespread falsehoods, consensus gains 7.8 points over the best individual model. Not over an average or weak model: over the best one, taken alone.

The cumulative message of these works is unambiguous. Where intuition was once pitted against skepticism, we now have a body of converging evidence, signed by the best institutions, published at the best conferences. Several AIs that confront each other hallucinate less and reason better than a single AI, however brilliant. It's no longer an opinion about the future; it's a fact about the present. One question remained, and it wasn't scientific: who would take this truth out of the labs and put it into ordinary people's hands, at the precise moment they need it?

A discovery can stay confidential for years, known only to researchers. Sometimes it takes a cultural trigger for an entire category to settle into the public mind. For multi-AI consensus, that trigger has a date: 22 November 2025.

That day, Andrej Karpathy published on GitHub a project named “LLM Council.” Karpathy is no anonymous developer: a major figure of contemporary AI, formerly at OpenAI, of which he was a founding member, then director of artificial intelligence at Tesla, he is one of the most listened-to voices in the field. And he recounts coding this “council of models” in a single weekend, almost for the pleasure of the experiment.

The architecture he proposes is elegant and unfolds in three stages. First, several large models receive the same question and answer it in parallel, independently of one another. Then — and this is the finest touch — they are given the others' answers to evaluate, but anonymized: no AI knows which answer came from which model. That anonymization aims to neutralize the authority bias, the reflex of judging an answer better simply because it bears the label of a renowned model. Finally, a “Chairman,” a presiding model placed outside the panel, reads everything and writes the final synthesis.

The repository went viral within hours. Thousands of developers cloned it, commented on it, tinkered with it. The reason for the frenzy isn't only technical: it's a signal. When someone of that calibre takes the trouble to code this idea publicly and the community seizes on it at once, the tech world validates the entire category in one stroke. The implicit message is clear: querying a single AI is already the past; the future is to make them deliberate.

Beyond the enthusiasm, in a few days the project fixed a vocabulary and best practices that the whole category adopted almost immediately: the parallel answer to preserve the independence of views, the blind evaluation to neutralize brand prestige, the synthesis entrusted to an arbiter distinct from the panel. An idea scattered until then across papers became, overnight, a shared design pattern — a common language anyone could pick up and discuss.

But Karpathy also makes an observation that, far from weakening the idea, shows its maturity — and maps out what remains to be solved. He notes that the models are “surprisingly willing” to judge a competitor's answer superior to their own. This complaisance between AIs, this form of sycophancy where each gives way too readily, is an open problem: a panel has value only if its members honestly defend their position instead of falling into line out of politeness. To acknowledge this flaw is to admit that multi-AI consensus is no magic recipe but a demanding discipline — one that requires engineering, guardrails and honesty.

In the spring of 2026, let's take stock honestly. Research is teeming: more than a hundred academic papers explore debate and consensus between models. The market is busy: about ten commercial products claim some form of multi-AI. Open-source abounds: repositories, templates and integrations for developers multiply. On paper, the category looks saturated.

And yet the most important observation is one of absence. No known product in this vertical has truly broken through: none, to our knowledge, exceeds a hundred thousand active users, none has raised more than fifty million dollars in this pure niche. The scientific proof was made, the cultural validation acquired — but no one had turned all of it into a product that ordinary people actually adopt, day to day.

The reason for this void lies in a misunderstanding about the recipient. Researchers had their papers, written for their peers. Karpathy had offered a magnificent repository — for engineers, able to handle API keys, a command line and a little configuration. But the person truly concerned with the reliability of an answer is neither researcher nor engineer. It's someone facing a decision that binds them: a medical test result to understand, a contract clause to decipher, a financial trade-off, a life choice. That person had no simple app, no clear verdict, and not the slightest guarantee that their most intimate question wouldn't be used elsewhere.

Because the obstacles to clear aren't merely cosmetic. Making six AIs deliberate is expensive: you multiply the tokens, hence the bill. It takes time: a panel's latency exceeds that of a single model. It creates dependence on suppliers: a product named OpenClaw disappeared in a single day in April 2026, when Anthropic cut off the access it relied on entirely. And there's a more insidious trap still: the false signal of authority. Hearing that “six AIs agree” can reassure wrongly if, in reality, only two answered, or if they all share the same erroneous source. A poorly presented consensus lies by omission.

That was the landscape before our eyes. On one side, thirty years of science saying the same thing: don't trust a single opinion. On the other, a tech world that had just, in a few weeks, nodded in unison. And in the middle, a glaring gap: no real object — reliable, mobile, privacy-respecting — to carry this idea into the hand of the person who needs it, the moment they need it.

It's exactly into this void that Satcove was conceived. Not to reinvent a wheel research had already cut, but to solve the problems no one had wanted to take head-on: cost, latency, supplier dependence, the honesty of the signal, and above all the distance between a laboratory truth and a life decision. Filling that void was anything but obvious: it meant agreeing to industrialize what others left as a prototype, and to carry alone constraints research could ignore. Science had been right for thirty years; the world had just acknowledged it; someone was missing to make it real, robust and accessible. That's our reason for being.

Satcove is the finished form of this thirty-year story, brought down to one simple gesture. You ask a question. Six of the world's best artificial intelligences answer in parallel, then confront one another: they read the others' answers, defend or revise their position, and surface their disagreements instead of hiding them. In the end, you get a synthesized verdict — clear, readable, usable — together with what really matters: the measure of their agreement, and the map of their divergences.

We made three choices no one else combines, and it's that combination, more than any single element, that defines Satcove. The first choice is the native iOS app. Not a website checked in haste from a browser tab, but a real application designed for the phone you always have on you — because an important decision rarely shows up when you're sitting at a computer, and a second opinion is only worth something if it's available in the moment.

The second choice is Europe. Hosting and data stay in Europe, under the most demanding protection regime there is. Nothing you entrust leaks, nothing is used to train anyone, nothing is resold. For the very questions that justify cross-checking several opinions — health, money, law, the intimate — this confidentiality isn't a marketing option: it's the condition for daring to ask the real question, the one you'd never type into a service that feeds on your data. Our Privacy Shield anonymizes personal information before any AI even sees it.

The third choice is honesty, and it may be the most important. Satcove will never serve you a fake “everyone agrees.” The app shows you how many AIs actually answered, and where, exactly, they diverge. If agreement is strong, you know it and can move forward. If it's weak, you know that too: this disagreement isn't a product flaw, it's information — the signal to dig deeper, or to speak with a professional. We prefer an uncomfortable truth to a manufactured certainty. Satcove helps you decide; it doesn't decide for you, and replaces neither a doctor, nor a lawyer, nor an adviser.

This demand for honesty runs through every detail. When a model is unavailable, we say so rather than fill the silence; when the panel shrinks, the score reflects it instead of mimicking a façade of unanimity. We saw the trap this field sets — the false signal of authority, that “six AIs agree” which reassures wrongly when only two have spoken — and we chose to defuse it rather than profit from it. A number is only worth something if it tells the truth about what it measures.

We don't claim to have invented multi-AI consensus. That would be false, and you've understood it reading this story: the idea belongs to Jacobs and Hinton, to Shazeer, to the MIT and DeepMind team, to Hegazy, to Karpathy, to hundreds of researchers. What we claim is more modest and more useful: to have been the ones who finally made it real for you. To have taken a scientific truth thirty years old, freshly validated by the whole world, and turned it into something you can open, understand and use in thirty seconds, with confidence.

It's the legitimate culmination of a long movement, not a solitary rupture. Thirty years of science proved us right; it only remained for someone to keep the promise. If a question weighs on you today, you now know where the idea comes from — and where to find it.