Qwen 3.5: a Local AI Model Rivals the Cloud

Welcome to AI Pulse. Every Friday we share what caught our attention in AI that week — the models we’re trying, the papers we’re reading, the tools that change how we build. Written for people who’d rather skip the hype.

This first edition: Qwen 3.5-35B running locally takes on the cloud, an ASIC chip hits 16,000 tokens per second, and 8 of 12 AI models running food trucks went bankrupt.

Topic of the Week

Alibaba dropped Qwen 3.5-35B-A3B — a Mixture of Experts model with 35B parameters but only 3B active at inference. Reddit’s verdict: a gamechanger for agentic coding.

The numbers are impressive. The model runs locally on consumer hardware, and users are reporting Opencode results comparable to far more expensive cloud models. The real value: zero API dependency, zero code leaving your machine — which matters a lot for projects with sensitive code (banking, compliance).

But the full picture is more nuanced. A separate benchmark across 70 real repositories shows Qwen 3.5 falls apart on hard tasks — complex refactors, multi-file changes, deep codebase navigation. It nails the easy stuff, but it won’t replace frontier models for heavy lifting.

Bottom line: a great option for local pair programming and day-to-day work, but for agents operating on large codebases you still want Claude or GPT.

Fresh Tools

ASIC inference — 16,000 tokens per second
Startup Taalas built a chip dedicated to running AI models. Llama 3.1 8B runs on it at 16,000 tok/s — for reference, a typical Claude response is 50–100 tok/s. They’ve opened a free API as a proof of concept. The future of inference is dedicated hardware, not GPUs.

Claude Code Security Reviews
Anthropic added a security review mode to Claude Code. The agent scans your codebase for vulnerabilities, identifies attack vectors, and proposes fixes. For teams working on banking and compliance code — useful out of the box.

New Models

Claude Sonnet 4.6
Better coding, more consistent instruction following. On certain office-style tasks it actually outperforms the more expensive Opus.

Google Gemini 3.1 Pro
Doubled its reasoning scores compared to the previous version. On ARC-AGI-2 it hit 77.1% (2x version 3 Pro). The race is heating up — more competition = better tools for all of us.

Mercury 2
A diffusion language model from Inception Labs. Instead of generating one token at a time (like GPT, Claude, Qwen), Mercury generates many tokens in parallel. Result: ultra-fast inference without specialized hardware. An interesting architectural direction.

Liquid AI
A reasoning model that fits in under 900 MB of RAM. A mix of attention and convolutional layers instead of the standard transformer. Targeted at edge deployment — mobile, IoT, embedded. Small, fast, efficient.

Interlude

Someone on Reddit gave 12 AI models $2,000 and a food truck. They had to run the business for 30 days — location, menu, prices, staff, inventory.

Opus made $49K. GPT-5.2 — $28K. Eight models went bankrupt. And the best stat of all: every single model that took out a loan went bankrupt (8 out of 8).

Before you ask AI for a business strategy — remember the food truck.

Notable Papers

CodeCompass — why AI gets lost in your code
Anyone who uses Claude Code knows this problem — the agent is looking for a file that’s right under its nose, and it can’t see it. Researchers named it the Navigation Paradox: agents fail not because their context is too small, but because navigating code and searching text are two different problems.

The fix? Instead of keyword search, CodeCompass gives the agent access to a dependency graph — the agent “sees” the project structure, not just text. Result: 99.4% task completion vs 76.2% without it.

“Vibe Coding” and Epistemic Debt
A paper on the growing problem of vibe coding — the code works, but the author has no idea why. Researchers call this epistemic debt and propose a concrete fix: metacognitive scripts — structured prompts woven into the AI interaction that, after every generated block, force the developer to explain what’s happening, identify edge cases, and predict behavior under different inputs. In tests, the scripts noticeably improved code understanding without slowing work down. An interesting direction — AI as tutor, not as ghostwriter.

In the Background

Anthropic vs the distillers
Anthropic published a report on how DeepSeek, Moonshot AI, and MiniMax set up 24,000 fake accounts and ran 16 million conversations with Claude to copy agentic reasoning and tool use. Reddit erupted into a debate about whether this is theft or hypocrisy — Western companies also train on other people’s data.

Hegseth gives Anthropic an ultimatum
The US Secretary of Defense demanded Anthropic remove safety guardrails from Claude for Pentagon use. Anthropic refused. CEO Dario Amodei is meeting Hegseth this week.

See you next week.

Prepared at Tenvalleys — a delivery-first AI engineering partner — by Nikola Powałka. Feedback? Email us at contact@tenvalleys.com or reach out on LinkedIn.

8 of 12 AI models went bankrupt

Topic of the Week

Fresh Tools

New Models

Interlude

Notable Papers

In the Background

See more post

The values your model won't mention

The week AI got physical

The hidden cost of calling AI an "employee"

[ NEWSLETTER ]

Stay Updated with Our Blog

Subscribe to our newsletter for the
latest updates and new features.

8 of 12 AI models went bankrupt

Topic of the Week

Fresh Tools

New Models

Interlude

Notable Papers

In the Background

See more post

The values your model won't mention

The week AI got physical

The hidden cost of calling AI an "employee"

[ NEWSLETTER ]

Stay Updated with Our Blog

Subscribe to our newsletter for the latest updates and new features.

Subscribe to our newsletter for the
latest updates and new features.