SubQ: The Startup Claiming to Fix LLM Speed Bottlenecks

A Miami Startup Says It Cracked a Decade-Old AI Problem. Here's What the Evidence Shows.

Large language models have transformed the technology landscape, powering everything from customer service chatbots to advanced coding assistants. But underneath all the excitement lies a stubborn, expensive problem — one that has frustrated AI researchers for nearly a decade. A Miami-based startup called Subquadratic is now claiming it has finally found the solution, and it has brought in outside validators to help make the case.

The company's new model, SubQ, promises to be dramatically faster, cheaper, and more energy-efficient than existing large language models. Those are extraordinary claims in a field full of hyperbole, but early third-party testing suggests there may be something genuinely worth watching here.

What Is the Bottleneck Holding Back Large Language Models?

To understand why Subquadratic's claims matter, it helps to understand the core problem that modern LLMs face: a mathematical issue known as quadratic scaling in the attention mechanism.

When a large language model processes text, it uses a technique called "attention" to figure out which words are most relevant to each other. In the dominant transformer architecture that underpins models like GPT-4, Claude, and Gemini, every single word in a document is compared against every other word. That means if you double the length of a document, the computational cost doesn't double — it quadruples. Triple the length, and the cost grows ninefold.

This is the "quadratic" problem. As documents get longer, the cost of processing them grows explosively. It is why running and training state-of-the-art LLMs requires enormous data centers, consumes vast amounts of electricity, and remains prohibitively expensive for many use cases. For years, researchers at major AI labs have tried to find a way around it — and for years, no solution has fully delivered.

How Subquadratic Claims to Have Solved It

Subquadratic's answer is a technique called sparse attention. Rather than comparing every word to every other word in a document, sparse attention selectively skips comparisons that are unlikely to be meaningful. The intuition is simple: when reading a long research paper, not every sentence is equally relevant to every other sentence. A smart reader focuses on the parts that matter most, and sparse attention tries to teach an AI model to do the same.

The concept of sparse attention is not new. Researchers have been exploring it for years, and several attempts have been made to build competitive models around it. The problem has always been that in practice, cutting those comparisons tends to hurt model quality. Benchmarks suffer, performance degrades, and the trade-off ends up not being worth it.

Subquadratic claims its implementation is different — that it has found a way to make sparse attention work at a competitive level without sacrificing the model quality that users and businesses actually care about. The company emerged from stealth mode last month with that announcement, though initial details were sparse and skepticism in the AI community was high.

What Do the Independent Test Results Say?

Rather than asking the industry to take its word for it, Subquadratic commissioned an independent evaluation from Appen, a third-party AI data and evaluation firm. The results, which the company has now shared publicly, are striking.

SubQ ran 56 times faster than rival approaches in comparative testing.
The model scored 98% on a key long-document retrieval benchmark, a type of test specifically designed to stress-test how well a model handles lengthy, complex inputs.
Energy consumption and cost metrics were also reported to be significantly lower than competing models.

A 56x speed improvement, if it holds up under broader scrutiny, would be a genuinely transformative result. Long-document processing is one of the most expensive and practically important challenges in the enterprise AI market — the ability to reliably read and reason over hundreds of pages of contracts, research, or financial records is something businesses have been eager to achieve at scale.

Why the AI Community Remains Cautious

Despite the promising numbers, there are real reasons for caution. Critics within the AI research community have pointed out several important caveats.

First, SubQ is not yet widely available. Independent researchers and developers have not been able to run their own evaluations, which means the results rely entirely on Subquadratic's chosen evaluation partner. The broader open testing that typically validates breakthroughs in AI has not yet happened.

Second, and perhaps more significantly, the SubQ model was reportedly built on borrowed weights from an existing Chinese open-source model. That raises questions about the originality of the underlying architecture and whether the efficiency gains are truly coming from Subquadratic's sparse attention innovation or from other inherited characteristics of the base model.

These are not trivial concerns. Extraordinary claims require extraordinary evidence, and while the Appen results are encouraging, the AI field has seen enough premature announcements to warrant patience before declaring a decade-old problem solved.

Why This Still Matters for the Future of AI

Even amid the skepticism, Subquadratic's work deserves attention. The quadratic scaling problem is real, its costs are enormous, and the incentive to solve it is immense. If SubQ's sparse attention approach does hold up, the implications stretch across the entire AI industry.

Faster, cheaper, and more energy-efficient LLMs would lower the barrier to entry for smaller companies and developers, reduce the environmental footprint of AI infrastructure, and unlock new applications that are currently too expensive to run at scale — particularly those involving very long documents or real-time processing of large datasets.

Whether Subquadratic has truly cracked this problem or whether the results will soften under broader independent testing remains to be seen. But the company has done what many stealth-mode startups fail to do: it has put forward concrete, third-party-validated numbers that give the conversation a factual foundation. The next step is wider availability and open benchmarking — and the AI world will be watching closely when that happens.

The Bottom Line

Subquadratic and its SubQ model represent one of the more intriguing claims to emerge from the AI startup space in recent memory. The core problem they are targeting — quadratic scaling in LLM attention mechanisms — is genuine and consequential. The early evidence from independent testing is promising. The skepticism from the broader research community is also warranted and healthy.

For businesses, developers, and AI researchers, the story is worth tracking. If the performance claims survive open scrutiny, SubQ could mark a meaningful turning point in how large language models are built and deployed. If they don't, it will serve as another reminder of how difficult it truly is to move the needle on fundamental AI infrastructure challenges.

Either way, the conversation Subquadratic has started is one the industry needed to have.