This was the week AI agents got new eyes. LSP in Claude Code turned code navigation from minutes into milliseconds, Google turned documents into films, and 88-year-old Knuth showed how AI is actually supposed to be used. One thing at a time.

Topic of the Week

LSP in Claude Code — From Minutes to Milliseconds

Remember CodeCompass from last week? The paper that described the Navigation Paradox — AI agents struggle with code not because their context is too small, but because navigating code and searching text are two different problems.

A week later, someone on Reddit found a practical solution: enabling LSP in Claude Code. LSP (Language Server Protocol) is the same system VS Code uses to “understand” code — go to definition, find references, autocomplete. Someone wired it up to Claude Code and the results are brutal: code navigation dropped from 30–60 seconds to 50 milliseconds. Exact answers instead of guessing.

808 upvotes on r/ClaudeCode. People are calling it the biggest quality-of-life improvement in a long time.

Why does this matter? Because it fits a bigger trend: AI agents are getting better and better “eyes.” LSP gives them an understanding of code structure. MCP (Model Context Protocol) gives them access to external tools. GitNexus (more on that in tools) builds a dependency graph of an entire repo. Each of these solutions attacks the same problem: AI is good at generating code, but bad at understanding existing code. And that’s exactly what’s changing.

Reddit — LSP in Claude Code (808 upvotes)

Fresh Papers

A hierarchical agent system for payments
LLMs can automate workflows, but payments? Different league — it requires security, verification, and error tolerance. A new paper proposes a hierarchical multi-agent system where each agent has its role (verification, authorization, execution), with a “supervisor” overseeing everything. Practical AI in fintech — not a chatbot answering questions, but an agent that actually processes transactions.

arXiv

RIVA: LLM agents watching your infrastructure
Infrastructure as Code sounds great in theory, but in practice configuration “drifts” — someone changes something manually, the system updates itself, and suddenly what’s in Terraform doesn’t match what’s in production. RIVA is a framework that uses LLM agents to automatically detect those differences. Practical AI in DevOps — not hype, just a solution to a real problem.

arXiv

New Models

GPT-5.3 Instant
OpenAI shipped an update focused on conversational fluidity. Less “cringe” tone, fewer unnecessary refusals and preachy disclaimers, hallucinations down 26.8%. Not a revolution, but a solid everyday-use upgrade.

Qwen 3.5 Small (9B)
Alibaba released a series of small models (0.8B–9B). The most interesting one: Qwen3.5-9B beats OpenAI’s GPT-OSS-120B on key benchmarks (GPQA Diamond: 81.7 vs 71.5) — and runs on a regular laptop. AI on edge devices is getting serious.

Gemini 3.1 Flash-Lite
Google released a “small but mighty” model. Faster and cheaper than Flash 2.5, with better scores. New feature: “thinking levels” — you can dial in how much reasoning the model does, balancing speed vs quality.

Phi-4-reasoning-vision-15B
Microsoft released an open-weight multimodal reasoning model. 15B parameters, sees images, thinks. Microsoft is opening up more models, building out the ecosystem.

Claude Code & Coding AI

Three Claude Code releases this week:

v2.1.69 (yesterday!) — big release, 103 changes. New /claude-api skill for building apps with the Claude API. Improved /remote-control. Ctrl+U on an empty bash prompt closes bash mode.

v2.1.68
Opus 4.6 default reasoning effort lowered to medium (the sweet spot between speed and accuracy). The keyword “ultrathink” is back for high effort on demand. Old Opus 4 and 4.1 models removed.

v2.1.63
New /simplify and /batch commands. Project configs and auto memory now work across git worktrees. New env var ENABLE_CLAUDEAI_MCP_SERVERS=false to disable MCP servers from claude.ai.

From the community:

– Best Practices repo — a repo with all the tips and workflows in one place. Already 5,000 stars. GitHub – Free Max x20 for open source — Anthropic is giving 6 months of Claude Max (20x) to open-source maintainers with 5K+ stars or 1M+ monthly NPM downloads.

Interlude

Knuth and Claude — 88 Years Old, 30 Attempts, 1 Proof

Donald Knuth — computer science legend, author of “The Art of Computer Programming” (the bible of algorithms) — used Claude to solve a math problem. He’s 88.

Claude generated 30 different attempts at a solution. Knuth reviewed EVERY one of them, picked the one that worked empirically, and wrote the formal mathematical proof himself.

The internet immediately called it “vibemathing.” But this is the EXACT OPPOSITE of vibemathing. Knuth didn’t blindly trust the AI — he used it as a brainstorming partner, then applied human verification at the highest possible level.

This might be the most beautiful example of how we should be using AI: the machine generates options, the human verifies and proves. Especially when that human is 88 years old and still doing it better than most of us.

Tools of the Week

NotebookLM Cinematic Video Overviews (released March 4!) — Google added a feature to NotebookLM that turns your documents into animated explainer films. Not slides with narration — full animated scenes with a storyline. Under the hood: Gemini 3 plans the narration, Veo 3 generates the animation, Nano Banana Pro creates the graphic assets. Drop in a PDF, meeting notes, or a product spec → get a mini-film. For now, Ultra subscribers only and English only. Google Blog

GitNexus
open-source tool that turns any GitHub repo into an interactive knowledge graph + AI agent you can talk to about your code. Runs entirely in the browser. Has an MCP server with 7 tools: search, symbol lookup, blast radius, git-diff impact mapping. 7.3K stars. Same trend as the LSP topic above — giving AI better tools to understand code. GitHub

AI at Tenvalleys

CV Builder

This is a new section — every week we’ll share how we use AI in our day-to-day work at Tenvalleys.

We built a Claude Code skill that automates preparing CVs for client proposals. You drop in an old CV, a LinkedIn profile, or even raw notes from a conversation — and out comes a professional CV in the Tenvalleys branded template. HTML rendered to PDF via headless Chrome.

What it does:

– Generates a CV in the Tenvalleys branded template (two-column layout, A4, 3 density presets) – Writes bullet points using the CAR method (Context-Action-Result) — not “worked on projects,” but concrete achievements – Updates the CV from Linear data — projects, technologies, roles – Job-match — compares the CV to client requirements and produces a fit report – Searches our CV database for the best people for a specific role – Built-in quality checklist — the AI checks itself for hallucinations

It saves a lot of time on a process that used to be manual and slow. The same approach scales to other “we have lots of structured text, we need branded output, the AI does the first 80%” workflows.

If you’d like to automate something similar inside your own organisation — reach out at contact@tenvalleys.com.

In the Background

Claude #1 in the App Store
Claude overtook ChatGPT as the most popular app in the App Store. A big chunk of that is fallout from the OpenAI/Pentagon controversy — Anthropic refused to remove safety guardrails, users voted with their wallets.

OpenAI VP moves to Anthropic
the VP responsible for Post-Training (RLHF, safety, instructions) left OpenAI for Anthropic. Not a random engineer — someone who had direct influence over how GPT “thinks.” OpenAI is losing not just users, but key people too.

OpenAI raises $110B
record private funding round. Meanwhile they’re losing people and users. Lots of money, lots of questions.

Hot Take

Vibe Coding in AR Glasses, While Doing the Dishes

Someone posted on Reddit: “Vibe coding while doing the dishes in augmented reality.” The guy is literally coding in AR glasses while washing dishes.

On one hand — Knuth, in the same week, shows that the best results come from AI + careful human verification. On the other — someone’s coding at the kitchen sink because “the AI’s going to write the code anyway, I just nudge it.” And in the background, research says AI scores 84% on coding benchmarks but 25% on real production code.

Three completely different approaches to AI in one week.

Reddit — Vibe Coding While Doing Dishes in AR

See you next week.

Prepared at Tenvalleys — a delivery-first AI engineering partner — by Nikola Powałka. Feedback? Email us at contact@tenvalleys.com or reach out on LinkedIn.