Claude Can Now Use Your Computer

This was Anthropic’s week. Claude learned to use your computer, got a new auto mode, started dreaming (yes, really), and showed up on Discord. 74 million people watched that happen on Twitter alone. But it wasn’t just Anthropic — Xiaomi proved you don’t need billions to build a top coding AI, Google launched privacy-preserving models for banks, and OpenAI quietly killed Sora. Let’s get into it.

Topic of the Week

Claude Can Now Use Your Computer

Anthropic shipped Computer Use on March 23 — and the internet lost it. 74 million views, 139K likes, 25K reposts. Those aren’t normal numbers for an AI feature launch.

Here’s what it does: Claude can now open apps on your Mac, navigate your browser, fill in spreadsheets — anything you’d do sitting at your desk. You can send it a task from your phone, go do something else, and come back to finished work on your computer.

Now, technically, “computer use” existed before — Anthropic launched a developer API version back in October 2024. But that was raw infrastructure. You needed Docker containers, VNC servers, and coding skills to make it work. What shipped this week is completely different: no setup required, just enable it and Claude sees your screen. Think of it like self-driving cars — the old version gave engineers access to raw sensor data. This one lets a normal person press “drive me to work.”

The smart part: when Claude has a proper integration (like Google Calendar or Slack), it uses that. But when there’s no connector — say, your company’s internal HR tool or that legacy system nobody built an API for — it falls back to clicking through the app like a human would.

Available now as a research preview for Pro and Max subscribers, macOS only. Anthropic recommends not using it with sensitive data yet.

CNBC | Engadget

Claude Code & Coding AI

But Anthropic didn’t stop there. It almost feels like they don’t stop at all — Twitter and Reddit are going crazy. Here’s everything else they shipped this week:

Auto Mode (Mar 24) — The middle ground between “approve every single action” and “let Claude do whatever it wants.” A classifier checks each action before it runs — safe ones proceed automatically, risky ones get blocked and Claude finds another way. Enable with claude --enable-auto-mode, cycle to it with Shift+Tab. Available on Team plan now, Enterprise rolling out.

TechCrunch

Auto Dream
This one is wild. Claude Code now has a “REM sleep” cycle for its memory. Every 24 hours (after at least 5 sessions), a background agent wakes up and cleans house: converts relative dates like “yesterday” to actual dates, removes contradicted facts, merges duplicate entries, and prunes the memory index to stay under 200 lines. If Auto Memory is the note-taking, Auto Dream is the filing system that keeps those notes useful over time.

claudefa.st

Claude Code Channels
Claude Code is now on Discord and Telegram. Message it a task from your phone, it executes on your machine. Anthropic is clearly building toward a world where your AI assistant is always reachable, not just when you’re at your terminal.

MCP Tools on Mobile (Mar 26) — Figma, Canva, Amplitude, Slack — all the integrations that launched on desktop in January now work on the Claude mobile app. Explore designs, create slides, check dashboards, all from your phone.

Projects in Cowork (Mar 20) — Keep your tasks and context in one place, focused on one area. Files and instructions stay local on your computer.

Version releases (v2.1.81 → v2.1.84)
Three releases this week. Highlights: --bare flag for scripted calls (v2.1.81), managed-settings.d/ for team policy fragments (v2.1.83), and PowerShell tool for Windows (v2.1.84).

Fresh Papers

Governed Memory: A Production Architecture for Multi-Agent Workflows

Your AI agents are all working on the same customer, but none of them remember what the others learned. This paper finally fixes that.

The problem: enterprise AI deploys dozens of agents across workflows — sales, support, ops — each acting on the same entities with no shared memory and no governance. RAG solves retrieval but not governance: who stores what, which policies apply, and whether quality is silently degrading.

The solution is a four-layer architecture: ingestion, governance routing, retrieval, and schema lifecycle. Results: 99.6% fact recall, 92% governance routing precision, 50% token reduction through progressive delivery, and zero cross-entity leakage across 500 adversarial queries. Already in production at Personize.ai.

Multi-agent systems with governance are emerging as a clear pattern for enterprise AI delivery — banking, insurance, regulated workflows in particular.

arXiv

VaultGemma: The World’s Most Capable Differentially Private LLM

Google trained a 1B-parameter model that mathematically cannot leak your data. VaultGemma uses differential privacy (adding calibrated noise during training) so that no individual data point can be extracted — ever. The privacy guarantee: epsilon <= 2.0, delta <= 1.1e-10. In plain language: zero detectable memorization of training data.

The catch: it’s not as smart as today’s best models. Google is honest about it — current DP-trained models perform roughly like non-private models from 5 years ago. But the gap is closing. And for regulated industries like banking and healthcare, where “good enough + guaranteed private” beats “amazing but might leak” — this is a big deal.

Google Research

OpenAI Model Spec: How Should AI Behave?

OpenAI published a 100-page framework defining exactly how their models should behave — who they listen to, what they refuse, and how they resolve conflicting instructions. It’s built around a chain of command: safety first, then OpenAI’s policies, then developer rules, then user preferences. The whole thing is public so researchers and policymakers can “read, inspect, and debate” it.

Interesting data point: current compliance rates range from 72% (GPT-4o) to 89% (GPT-5 Thinking). So even with a spec, models don’t follow it perfectly. The gap between “intended behavior” and “actual behavior” is itself a research problem.

Time | OpenAI

New Models

Xiaomi MiMo-V2-Flash
A phone company just built the #1 open-source coding model. MiMo-V2-Flash scores 73.4% on SWE-Bench Verified, beating every other open model. It’s a 309B MoE model (15B active parameters) with a 256K context window. Price: $0.10 per million input tokens — that’s 3.5% of what Claude Sonnet costs for comparable coding performance. Open-source under MIT license.

In edition #002 we covered Qwen beating big models on narrow tasks. In #003, Qwen nearly matched Claude Opus on SWE-bench. Now Xiaomi enters the ring. The Chinese open-source wave is widening from Alibaba alone to multiple hardware companies.

Reddit | GitHub

Google Gemini 3.1 Flash Lite
Google’s answer to the pricing war. 2.5x faster than Gemini 2.5 Flash, $0.25/M input tokens, 381 tokens/sec, 1M context window. Beats GPT-5 mini on 6 out of 11 benchmarks. Google’s clearly going after high-volume enterprise workloads where speed and cost matter more than peak intelligence.

OpenAI GPT-5.4 mini + nano
OpenAI going smaller too. GPT-5.4 mini is 2x faster than GPT-5 mini, optimized for coding, computer use, and subagents. Nano goes even smaller at $0.05/M input tokens. Everyone’s racing to the bottom on cost.

The trend is clear: the pricing war is collapsing the cost curve for capable AI. What cost $3/M tokens last year costs $0.10 now.

Fun Break

AI Makes Music Now (For Real This Time)

Remember Lyria 3 from our first edition? Back then, Google’s music AI could generate 30-second clips — fun to play with, but not exactly a song.

One month later: Lyria 3 Pro generates full 3-minute tracks with intros, verses, choruses, and bridges. It actually understands song structure now — you can prompt for specific musical elements and get something that feels composed, not just generated.

Available in the Gemini app for paid subscribers, and on Vertex AI for businesses who need audio at scale (think game soundtracks, video platforms).

From 30-second jingles to full songs in a month. That’s the pace of AI right now.

Google Blog

In the Background

Sora is dead. OpenAI is shutting down its video generation app — just six months after launch. Sora shot to #1 on the App Store in September, but by January downloads had dropped 45%. Disney was supposed to invest $1 billion and license Mickey Mouse for Sora content — the deal never closed. OpenAI says the research team will refocus on “world simulation for robotics.” Translation: the compute was too expensive for a product that wasn’t sticking.

AI at Tenvalleys

Introducing 10vOS

At Tenvalleys we’ve been building our own AI operating system — 10vOS. It’s a system of specialized agents, skills, and automations that powers how we build client presentations, proposals, interactive dashboards, landing pages, branded documents — and yes, this very newsletter. The data collection, article deep-reading, and trend analysis behind every edition is orchestrated by 10vOS; a human editor does the final pass.

The bet is simple: AI is most useful where it automates the parts of work that drain time without adding judgment, leaving humans more space for the judgment calls. We’re using 10vOS internally first because we’d rather find the bruises ourselves than on a client project.

If you’re thinking about building something similar inside your own organisation — or you’d like to see what 10vOS already does in practice — reach out at contact@tenvalleys.com.

See you next week.

Prepared at Tenvalleys — a delivery-first AI engineering partner — by Nikola Powałka. Feedback? Email us at contact@tenvalleys.com or reach out on LinkedIn.