Claude Code Reviews Your Pull Requests

Topic of the Week

Claude Code Review — AI Reviews Your PRs

Anthropic released a new feature in Claude Code: Code Review. When you open a pull request, Claude dispatches a team of agents that scan the changes for bugs, security holes, and regressions. Results show up as inline comments on GitHub, tagged by criticality.

Why does this matter? Because Anthropic itself says code output per engineer has grown 200% in a year — and review has become the bottleneck. More AI-generated code = more code to review = a need for AI to review it. This doesn’t replace human review, but it filters out the obvious problems before a person even looks.

Currently in research preview for Team and Enterprise plans.

Fresh Papers

SPD-RAG: a separate agent per document

Answering complex questions often means stitching together facts spread across many documents. Standard RAGs lose context on large corpora — and large-context-window models struggle to reason reliably. SPD-RAG proposes a different approach: each document gets its own agent that “knows” it inside out, and then the agents collaborate to assemble the answer.

Imagine a team where each person is an expert on one document — and together they answer the boss’s questions. Instead of one person trying to wrap their head around 500 pages at once.

Worth noticing the trend: two weeks ago we wrote about CodeCompass (code navigation), last week about LSP in Claude Code — and now SPD-RAG attacks the same problem, just in the world of documents instead of code. “How to find information faster and more reliably” might become one of the hottest topics in AI in the near future.

arXiv

Red Teaming LLMs in Banks — How to Test AI in Finance

Banks are deploying LLMs more and more, but standard AI safety tests don’t catch the risks specific to the financial sector. This paper proposes risk-adjusted harm scoring — an automated red teaming framework tuned to banking regulations. Instead of asking “can the model be broken?”, it asks “what financial and regulatory damage could a break cause?”.

This methodology is especially interesting for banking and financial services deployments. Testing AI for financial and compliance risk is going to be increasingly required, and it’s good to know concrete methodologies are emerging.

arXiv

New Models

Qwen again. Last week we wrote that Alibaba had released the Qwen 3.5 series. Since then, hard data has come in — and it’s interesting:

Fine-tuned Qwen3 SLMs beat frontier LLMs on narrow tasks
someone did a systematic comparison of small models (0.6–8B parameters) against the largest APIs: GPT-5, Gemini 2.5, Claude Opus 4.6. The conclusion: a small, well-tuned model can beat a giant API on a specific task. This changes the cost calculus — instead of paying for an expensive Opus, you stand up a small Qwen on your own server. (412 upvotes) Reddit

Qwen3.5-35B almost matches Claude Opus on SWE-bench
37.8% vs Opus’s 40% on a coding benchmark. A model you can run on your own hardware almost matches the most expensive API on the market. (423 upvotes) Reddit

Claude Code & Coding AI

Five releases this week (v2.1.70 → v2.1.74), most interesting changes:

/loop (v2.1.71) — new command for running prompts in a loop (e.g. /loop 5m check the deploy). Automatic monitoring and background tasks without leaving the terminal.

/context (v2.1.74) — suggests how to optimize your session: detects memory bloat, heavy tools, and other things slowing you down.

Memory leak fix (v2.1.74) — streaming wasn’t releasing memory, so long sessions kept getting slower. Fixed.

Tools of the Week

Context Hub
a tool from Andrew Ng (DeepLearning.AI) that solves a specific problem: coding agents don’t know about APIs and libraries that came out after their training cutoff. Context Hub is a crowdsourced documentation database that you plug into your coding agent — and suddenly it knows how to use the latest version of a framework instead of hallucinating outdated syntax. Newsletter

SurfSense — open-source alternative to NotebookLM
connects any AI model to your company’s internal knowledge sources (documents, wikis, databases). The team can collaboratively chat with the data, comment, and work together in real time. For those who need more than NotebookLM offers or want something self-hosted. Reddit

Apple M5 Max 128GB — local model benchmarks
new chip, 128GB unified memory, and r/LocalLLaMA immediately started testing. Post with 1,886 upvotes and 300 comments — results being posted live in the comments. If you want the details, link below. Reddit

AI at Tenvalleys

Uncoursed.ai

Some of our engineers are building Uncoursed.ai — a platform that turns any material (PDFs, presentations, internal documents) into full interactive courses with flashcards, quizzes, an AI tutor, and gamification. Drop in a 300-page textbook — get a finished course in minutes.

The idea came from a frustration we all know: you get 200 pages to read, you open the PDF, you read three pages and… you fall asleep. Or you dump it into ChatGPT, get a summary — and you have the feeling you “get the topic” but half the content is somewhere lost. The team wanted to create something that walks you through the entire material step by step, skipping nothing — but in a way that actually pulls you in.

And here’s the key: Uncoursed doesn’t summarize, doesn’t shorten, doesn’t “highlight the most important parts.” It guarantees 100% material coverage — you see exactly what you’ve worked through and what’s still ahead. On top of that, the platform combines scientifically validated learning techniques (active recall, spaced repetition) with mechanics borrowed from Duolingo — short sessions, quizzes, flashcards, AI tutor, gamification. Something like YouTube Shorts, except instead of doomscrolling — you’re actually learning.

There’s already a working MVP, with conversations underway with several large enterprises across banking, telco, retail, and publishing, as well as pilot rollouts inside our own education partnerships.

If you want to see a demo or have material you’d like to test on it — reach out at contact@tenvalleys.com.

For Dessert

Yann LeCun, Turing Award laureate, is 65. He just stepped down as Chief AI Scientist at Meta, and instead of retirement — he went out to raise a billion dollars for a startup (AMI Labs, $1.03B seed — likely the largest seed in European history). LeCun has been arguing for years that LLMs are a dead end and we need a fundamentally different architecture. Now he has the money to prove it. Most of the industry says he’s wrong. But what if he isn’t?

See you next week.

Prepared at Tenvalleys — a delivery-first AI engineering partner — by Nikola Powałka. Feedback? Email us at contact@tenvalleys.com or reach out on LinkedIn.

AI reviews your pull requests

Topic of the Week

Fresh Papers

New Models

Claude Code & Coding AI

Tools of the Week

AI at Tenvalleys

For Dessert

See more post

The values your model won't mention

The week AI got physical

The hidden cost of calling AI an "employee"

[ NEWSLETTER ]

Stay Updated with Our Blog

Subscribe to our newsletter for the
latest updates and new features.

AI reviews your pull requests

Topic of the Week

Fresh Papers

New Models

Claude Code & Coding AI

Tools of the Week

AI at Tenvalleys

For Dessert

See more post

The values your model won't mention

The week AI got physical

The hidden cost of calling AI an "employee"

[ NEWSLETTER ]

Stay Updated with Our Blog

Subscribe to our newsletter for the latest updates and new features.

Subscribe to our newsletter for the
latest updates and new features.