Topic of the Week
Claude moves into the office, the bank, and the back office
Four Anthropic shipments this week, one connecting thread — pre-built agents wired into the tools people already use.
Claude for Microsoft Office is now generally available. Excel, PowerPoint and Word add-ins shipped to every paid Claude plan this week (Pro, Team, Enterprise — no Free). Outlook is in public beta. You install from Microsoft AppSource — works on Windows, Mac and the web. The interesting part isn’t per-app features; it’s that Claude becomes a single agent that follows you across all four apps without re-explanation. Email comes in → Word brief gets drafted → numbers go into Excel without breaking formulas → PowerPoint deck comes out respecting your slide masters. All edits require approval before saving. Microsoft Copilot’s biggest moat — being native to Office — just got punctured.
Claude for Small Business launched with 15 pre-built agentic workflows and 15 repeatable skills wired into the SMB tool stack: QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, Microsoft 365. Cash forecasting, month-end reconciliation, P&L generation, invoice chasing, lead triage. Targeted explicitly at the 44% of US GDP that hasn’t adopted AI yet — not a generic chatbot rebrand.
The anthropic/financial-services repo went public on GitHub (Apache 2.0). Nine named banking agents — Pitch Agent, Earnings Reviewer, Model Builder (DCF/LBO/3-statement in Excel), Valuation Reviewer, GL Reconciler, KYC Screener, Month-End Closer. Eleven MCP connectors pre-wired into the data vendors banks actually use: FactSet, Moody’s, S&P Global, Daloopa, Morningstar, PitchBook. Partner-built bundles from LSEG and S&P. Same source ships two ways: Claude Cowork plugins, or Managed Agents via /v1/agents. And firms can install it inside their own M365 tenant running against Bedrock, Vertex, or an internal LLM gateway — not Anthropic’s API.
And then Gates. Anthropic and the Gates Foundation announced a $200M, four-year partnership — grants, Claude credits, and engineering support, run by Anthropic’s Beneficial Deployments team. Global health gets the largest slice (4.6 billion people in low/middle-income countries), with specific targets: polio, HPV, preeclampsia, plus malaria and tuberculosis forecasting with the Institute for Disease Modeling. Education tools (K-12 tutoring, career guidance for US/sub-Saharan Africa/India) ship later this year via the Global AI for Learning Alliance.
Fresh Papers
Teaching Claude Why (Anthropic Research). Two editions ago we covered Natural Language Autoencoders — the tool that caught Claude quietly suspecting it was being tested. This is the training fix using the same interpretability stack. The headline finding is actually about training efficiency: Anthropic taught the model the principles behind aligned behavior (constitutional documents + show-your-reasoning data) rather than demonstrations of it, and a 3-million-token reasoning dataset matched results from one 28× larger. The blackmail-honeypot rate dropped from 96% on Opus 4 to 0% on Haiku 4.5 — the kind of measurable, named-behavior reduction risk and compliance teams can actually point to.
Migrating Data Ingestion Systems at Meta Scale (Meta Engineering, May 12). The story isn’t a fancier pipeline — it’s the migration playbook itself. Meta moved tens of thousands of customer-owned ingestion jobs onto one self-managed warehouse service, several petabytes of social-graph data per day, 100% migrated. The pattern: shadow run (both systems in parallel) → reverse shadow (new is source of truth, old is the safety net) → cleanup, with row-count + checksum comparators logging to Scuba and an automated promote/demote system that moved jobs between phases without human touch. When bad data was caught, the partition got flagged in metadata so CDC downstream wouldn’t propagate the corruption. For any bank or treasury looking at a multi-year platform migration, this is exactly the template that lets risk and audit sign off without a frozen-Saturday-night cutover.
New Models
Qwen 3.6 27B — close to Opus on Claude Code, running locally. Julien Chaumond (HF CTO) shipped real Hugging Face code this week using Qwen3.6-27B in llama.cpp on his MacBook. His take: “feels very, very close to hitting the latest Opus in Claude.” MLX-quantized runs in ~14 GB; third-party benchmarks back the direction (77.2% SWE-bench Verified). Continues the local-coding thread we’ve been tracking since #010.
Needle — 26M params, distilled from Gemini. Cactus Compute open-sourced a tiny function-calling model: MIT license, 14 MB quantized, 6000 tok/s on consumer hardware, beats models 10× its size on single-shot tool calls. Single-shot only — bad at multi-turn — but pushes agentic tool selection onto phones, IoT, voice kiosks without a network round-trip.
Coding AI
Codex moved into Chrome. OpenAI shipped a Chrome extension on May 8 (macOS + Windows; not yet in EU/UK). Codex now uses your signed-in browser sessions to test apps, navigate dashboards, complete data-entry flows, and debug — across multiple Chrome tabs in parallel, organized into tab groups per Codex thread. The headline isn’t the features; it’s the auth model. Most enterprise work lives behind SSO inside SaaS dashboards, and a coding agent that inherits your already-logged-in browser can finally operate on those apps without anyone having to wire up dedicated API access.
xAI launched Grok Build
its terminal-agent answer to Claude Code and Codex CLI. Announced May 14, early beta on Grok 4.3 beta, 16-agent “Heavy” architecture, 2M-token context to keep large codebases in memory. Three pitches at Claude Code: Plan Mode (proposes the plan first, you approve), native parallel subagents, full ACP (Agent Client Protocol) support for custom orchestration. Catch: it’s locked behind the $300/month SuperGrok Heavy tier. Install line is just curl … | bash.
Tools of the Week
Claude Platform on AWS (GA, May 11). Anthropic’s native Claude Platform now available directly through your AWS account — no separate Anthropic credentials, contracts, or billing relationship. Use the full platform (Cowork, Managed Agents, Files API) inside the AWS perimeter your security team already trusts. Big enterprise unlock: banks and regulated firms running on AWS can adopt Claude without a separate vendor onboarding.
IBM Granite Multilingual Embedding R2. Two Apache-licensed embedding models (311M + 97M params, ModernBERT-based) with a 32K context window — 64× bigger than R1, so you can embed long policy docs and contracts without aggressive chunking. 200+ languages, top scores in their MTEB-v2 size brackets. The 97M runs cheap on CPU; both are a clean drop-in for document-heavy RAG.
AI at Tenvalleys
10vOS skill hackathon. This week we ran our 10vOS skill hackathon — good vibes, sharp minds, some pizza, and four hours of collaboration and friendly competition to build skills that could actually help us in daily work. The results were kind of impressive:
– Management dashboard — tracks progress across all the projects management has a stake in – Personalized interview agent — generates personalized interviews to fill profile gaps for the people knowledge base – Test-protection hook — a guardrail that stops Claude from quietly modifying tests to make them pass instead of fixing the actual code – Calendar management skill — helps you prepare for upcoming meetings – RFP skill — turns a client RFP (PDF or HTML) into a structured requirements YAML, then drafts a full solution design markdown ready for SME review
If you’re thinking about how to build a library of in-house AI skills your team will actually use, reach out at contact@tenvalleys.com.
For the curious — get involved
This week, one of the team sat in on a talk with Sebastian Kondracki, co-founder of Bielik AI — the Polish open-source LLM built by SpeakLeash and Cyfronet AGH. The interesting part: they’re about to start training Bielik’s first vision/multimodal version, and the dataset is going to be community-sourced.
The project is called Obywatel Bielik (“Citizen Bielik”) — the goal is one million Polish-context photos: landmarks, regional cuisine, fauna, architecture, dialects, the things a model trained mostly on Western imagery won’t know how to recognise. Anyone can join in two ways: upload your own Polish photos, or annotate what’s already in the gallery. Web platform is live at obywatel.bielik.ai, mobile app is in beta — register on the site to get the launch notification. The multimodal Bielik is expected before summer 2026 or in September, and the partner lineup includes SpeakLeash, Cyfronet AGH, Ministry of Digitization, the National Digital Archives, NASK, and NVIDIA.
Prepared at Tenvalleys — a delivery-first AI engineering partner — by Nikola Powałka. Feedback? Email us at contact@tenvalleys.com or reach out on LinkedIn.


