This week frontier AI stopped asking you to come to it. Apple rebuilt Siri on Google’s Gemini and — the bigger news — let you pick Claude or ChatGPT instead; OpenAI’s Codex landed inside AWS; Claude showed up as a button in Excel. And just as the “AI ships everything now” story peaked, an NBER paper of 100,000 developers landed to remind everyone that writing 180% more code isn’t the same as shipping 180% more software.
Topic of the Week
Apple opens Apple Intelligence to Claude, ChatGPT and Gemini
What happened. At WWDC on Monday — Apple’s Worldwide Developers Conference, the annual June keynote where it previews the next year of iOS, macOS and friends — Apple did two things. First, it rebuilt Siri on a custom Google Gemini model — Apple is renting the brain rather than building it, reportedly for around $1B a year. Second, and more interesting for us: iOS 27 lets you route questions to your chatbot instead of Siri, with Claude and Gemini now named alongside the existing ChatGPT integration. You can even set a third-party AI as the default for Writing Tools and Image Playground. Claude is a native option on iPhone, iPad and Mac for the first time.
Why it matters. This is bring-your-own-model, baked into the OS that sits on ~2.2 billion devices. The platform usually known for locking its ecosystem down made model choice a system setting. It’s a concrete data point for a pattern we keep seeing: distribution matters as much as raw model quality, and supporting more than one model is becoming the normal way to ship — something worth weighing when a stack depends on a single vendor.
Fresh Papers
Writing code vs. shipping code — the AI productivity reality check (NBER, Demirer/Musolff/Yang). This is the one to read this week. The authors tracked 100,000+ GitHub developers linked to their actual AI-tool telemetry, across three generations of tooling. The finding is sobering and precise: each new generation lifts raw coding hard — autocomplete +40% commits, interactive agents +140%, autonomous agents +180% — but that 180% commit gain shrinks to +50% for number of projects and just +30% for actual releases. They estimate an elasticity of substitution between AI and human effort of 0.25: AI complements people, it doesn’t replace them, because the bottleneck was never the typing. It’s review, integration, deployment, adoption — the unglamorous “ship it” half.
Adaptive Auto-Harness — self-improving agents quietly rot (Emory + Amazon). Last week we tracked the agent-memory problem from LongMINT to FluxMem. This week the frontier moved one layer up: agents that improve themselves. The paper’s diagnosis is great — let one agent endlessly re-optimize its own prompts and skills and it bloats and degrades: one run grew from 12 to 34 skills and a 2KB prompt to 68KB, with accuracy peaking early then sliding. The fix is a “harness tree” — version-controlled, task-routed specialization (think git branches per task type) instead of one ever-growing config. Result: 80.9% on PolyBench vs 50.8% for the best baseline.
How Anthropic does self-service analytics with Claude — a case study, not a paper, but it reads like one. Anthropic automated 95% of internal business-analytics queries at ~95% accuracy, freeing the data-science team for real modeling work. The number worth remembering: with no Skills, the agent hit 21% accuracy mapping questions to data; with structured Skills (markdown procedural knowledge), 95%+ — and 99% in specific domains. Their thesis: analytics accuracy is a context and governance problem, not a SQL-generation problem. Fewer, heavily-owned canonical datasets; colocate modeling code, semantic layer and docs in one repo with CI; “start lean — a handful of datasets, a few dozen evals, a thin knowledge skill captures most of the upside.”
New Models
Gemini 3.5 — the cost-leadership play. Google’s pitch isn’t bigger benchmarks, it’s frontier-level output at roughly a third of competitors’ cost (Pichai’s framing), with Gemini now at 900M monthly users (double a year ago). Gemini 3.5 Flash has shipped; 3.5 Pro is promised “this month.” Worth watching given how much of the Apple deal runs on Gemini under the hood.
Microsoft goes first-party: MAI-Code-1-Flash + MAI-Thinking-1. Microsoft shipped its own reasoning and code models (June 2) — a quiet but real signal that it wants less dependence on OpenAI even while reselling everyone’s models through Foundry. Vendor strategy, not just a spec bump.
Claude Code & Coding AI
The plugin/skills layer grew up (v2.1.157–162). Last week Claude learned to write its own orchestration script (/workflows); this week the platform underneath it caught up. The headline change: skills now auto-load straight from .claude/skills — no marketplace required, plus a new claude plugin init <name> to scaffold one. For a skills-heavy setup, that removes a whole step. The other theme is safety: Claude Code now asks permission before writing to files that can execute code — shell startup files (.zshenv, .zlogin), and build configs like .npmrc, .bazelrc, .pre-commit-config.yaml, devcontainers. Parallel tool calls and a pile of WSL/paste fixes round it out.
Who’s actually building on Claude — the Problem Solvers showcase. Anthropic put up founder interviews on what gets built on Claude, and the line-up is a decent snapshot of the coding-agent economy: Lovable (conversational app-building, millions of users in two months), Legora (AI-native legal OS), Cognition (Scott Wu: engineers “going three, five times faster — and just shipping so much more”), Replit (50M+ users; “Anthropic continues to have the best coding models on the market”), and Genspark (Kay Zhu: “with every other model, we had to predefine every step — Anthropic’s model changed everything”). Read it next to the NBER paper above for a nice tension: founders feel the 3-5x; the data says watch what actually ships.
Tools of the Week
OpenAI’s frontier models + Codex are now GA on AWS. GPT-5.5 and GPT-5.4, plus Codex (OpenAI’s coding agent, 5M+ weekly users), now run inside Amazon Bedrock. Pay-per-token at OpenAI’s own rates, no seat licenses, no per-developer commitments — and it all sits under your existing AWS governance, IAM and billing. Frontier access without onboarding a new vendor.
AlloyDB Remote MCP Server hits GA (Google Cloud). A ready-made, secure way for AI agents to read a company’s database — no password sitting in a config file, read-only by default (the agent can’t delete anything), and every query written to an audit log. Exactly the controlled, “who-asked-what” access that regulated clients keep asking for.
Claude inside Excel (Microsoft Foundry). Microsoft Foundry now runs Claude Opus 4.8 in Excel’s “Agent Mode” — the model is reachable from the spreadsheet itself, no separate window. Combined with the Apple news above, the theme of the week is plain: Claude is showing up where people already work, not as a destination you visit.
In the Background
Following the $965B raise we covered two weeks ago, Anthropic confidentially filed a draft S-1 with the SEC (May 31) — the first paperwork on a path to an IPO. On the policy side, the CEOs of OpenAI, Anthropic, Google DeepMind and Microsoft signed a joint letter to Congress (June 5) urging mandatory biosecurity screening of all US synthetic-DNA providers, warning that AI is eroding the barriers to weaponizing biology.
AI at Tenvalleys
A lot happens off the screen, too. Our team pulled together the tech and AI events worth knowing in Warsaw for June and July, plus one big hackathon in October — some pure networking, some that could start a genuinely useful conversation.
June
9.06 (Tue), 19:00 — Tech and Beers
UWAGA PIWO (Żelazna 51/53). Casual tech networking, no registration, runs every two weeks.
10.06 (Wed), 18:30 — WarsawJS #139
WeWork Mennica Legacy Tower (Prosta 20). Six talks: AI in a dev career, React Native, architecture, Docker security. Free, registration.
10.06 (Wed), 18:30 — Tech Startups in the Pub
British Bulldog (Al. Jerozolimskie 42). Founder/investor networking, no panels or pitches. Free.
11.06 (Thu), 18:30 — Hands-on Agile #75: Token Economics
online. Claude token economics, i.e. how not to burn your AI budget. Free, registration.
17.06 (Wed), 18:00 — Mindstone Warsaw June AI
Świetlica Wolności (Nowy Świat 6/12). AI community meetup. Free, registration.
18.06 (Thu), 18:00 — Boxtech #3: AI in Engineering
Box Poland, Varso Tower (Chmielna 69). “Insights for Builders and Leaders”, two talks. Free, limited seats, Google Form.
July
10.07 (Fri) — Mindstone Warsaw July AI
Świetlica Wolności (Nowy Świat 6/12). Talks: “Beyond the Buzz: Practical Lessons from Bringing LLMs to Life” (Sergey Parkhomenko) and “How I use AI every day to improve sales” (Jacek Gabanowicz). Free, ~30 seats.
October — Kraków
3–4.10 (Sat–Sun) — HackYeah 2026
Tauron Arena, Kraków. Europe’s largest on-site hackathon, 24h, teams of 1–6. Categories include AI, Defence, Smart City, Sport & Healthcare. On-site mentoring, conference alongside. PL/EN, registration required.
If you’re heading to any of these, come say hi. And if there’s a good AI or engineering meetup we’ve missed, tell us at contact@tenvalleys.com.
For Dessert
Here’s the number that stuck with me this week: Anthropic disclosed that Claude now writes over 80% of its own code — up from under 10% in February 2025. Marina Favaro and Jack Clark put it plainly in a June 5 post: “AI that can build itself would be a major development in the history of technology… but full recursive self-improvement also might increase the risks of humans losing control over AI systems.” A model writing eight of every ten lines of its own next version, with the company shipping it flagging that it finds the pace a little unsettling. (And yes — the same day, Claude itself went down: an outage hit claude.ai, the API, Claude Code and Cowork around 15:08 UTC on June 5, with full service confirmed back by 18:27 UTC. Even the self-improving need a coffee break.)
Prepared at Tenvalleys — a delivery-first AI engineering partner — by Nikola Powałka. Feedback? Email us at contact@tenvalleys.com or reach out on LinkedIn.


