Poetiq Meta-System Achieves State-of-the-Art on LiveCodeBench Pro via Automated Inference Harnesses

Poetiq’s Meta-System Automatically Builds a Model-Agnostic Harness That Improved Every LLM Tested on LiveCodeBench Pro Without Fine-Tuning

Poetiq has released results showing its Meta-System reached a new state-of-the-art on the LiveCodeBench Pro competitive coding benchmark. The system automatically builds and optimizes its own inference harness, enabling Gemini 3.1 Pro to jump from 78.6% to 90.9% accuracy.

Why This Matters

Most LLM performance gains currently rely on expensive fine-tuning or proprietary architectural changes that are inaccessible to external developers. Poetiq demonstrates that an intelligent orchestration layer can achieve superior results through recursive self-improvement, effectively decoupling task-specific performance from the underlying model’s weights. This approach addresses the reality of benchmark contamination by focusing on procedural logic and constraints rather than pattern matching against static datasets.

Key Insights

Recursive self-improvement enabled the Meta-System to build a harness from scratch using only Gemini 3.1 Pro API access in 2026.
The harness is model-agnostic, meaning optimization performed on one model successfully improved every other model tested, including GPT 5.5 High and Nemotron 3 Super 120B.
Gemini 3.0 Flash with the harness reached 82.3%, outperforming larger, more expensive models like Claude Opus 4.7 and GPT 5.2 High.
Kimi K2.6 demonstrated the highest individual gain, increasing from a 50.0% baseline to 79.9% when wrapped in the Meta-System harness.
The LiveCodeBench Pro benchmark (25Q2) validates solutions against memory and runtime constraints in C++, resisting overfitting by withholding ground-truth code.

Practical Applications

Cost-efficient Scaling: Using smaller, cheaper models like Gemini 3.0 Flash with an optimized harness to surpass the performance of flagship models in production. Pitfall: Over-reliance on raw model parameters for complex logic which leads to ballooning compute costs.
Cross-Model Deployment: Utilizing a single, task-specific inference harness to maintain performance across different proprietary and open-weights models without re-tuning. Pitfall: Hard-coding prompt structures for specific APIs which limits portability and resilience to model updates.

References:

https://www.marktechpost.com/2026/05/14/poetiqs-meta-system-automatically-builds-a-model-agnostic-harness-that-improved-every-llm-tested-on-livecodebench-pro-without-fine-tuning/

On This Page

Poetiq’s Meta-System Automatically Builds a Model-Agnostic Harness That Improved Every LLM Tested on LiveCodeBench Pro Without Fine-Tuning

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

Z.AI Releases GLM-5.1: 754B Open-Weight Agentic Model Sets New SWE-Bench Pro SOTA

Chroma Releases Context-1: A 20B Agentic Search Model for Multi-Hop Retrieval and Context Management

Why Your AGENTS.md Files are Sabotaging AI Coding Performance