QFM069: Machine Intelligence Reading List - June 2025

Source: Photo by Jonathan Kemper on Unsplash

This month's Machine Intelligence Reading List features comprehensive guides and research. A Practical Guide to Building Agents provides OpenAI's official framework for agent development. Built Multi-Agent Research System shares Anthropic's engineering lessons.

Six Months in LLMs offers Simon Willison's mid-year retrospective, while The Gentle Singularity presents Sam Altman's optimistic vision.

As always, the Quantum Fax Machine Propellor Hat Key will guide your browsing. Enjoy!

Links

Rick Rubin: Vibe Coding is the Punk Rock of Software

2025-06-30

Fine-Tuning LLMs is a Huge Waste of Time

2025-06-30

Fine-tuning advanced LLMs for knowledge injection is counterproductive because it overwrites existing knowledge rather than adding new information—neurons are finite resources where updating weights risks erasing the intricate patterns already encoded in the network. Instead of fine-tuning, modular techniques like retrieval-augmented generation, adapters, and prompt engineering should be used to inject new knowledge without compromising the model's carefully built foundational ecosystem.

GitHub - sourcegraph/awesome-code-ai: A list of AI coding tools (assistants, completions, refactoring, etc.)

2025-06-30

This repository is a curated list of AI coding tools organized by category, including code completion assistants (GitHub Copilot, Codeium, Tabnine), refactoring tools, code search capabilities, and LLM-based code generation systems. The repository was archived in February 2026 and is now read-only, but previously served as a comprehensive resource documenting the landscape of AI-powered development tools from both commercial providers and open-source projects.

Claude Code + Context7 MCP Server Is a GAME CHANGER for AI Coding

2025-06-30

Andrej Karpathy: Software Is Changing (Again)

2025-06-30

AI and the Rise of Judgement Over Technical Skill

2025-06-30

As AI democratizes technical execution across writing, design, and coding, judgement—the ability to know what to create, make meaningful choices, and evaluate quality—has become the primary differentiator between professionals, paralleling Brian Eno's 1995 observation that computer sequencers shifted music production from a skill problem to a judgement problem. The most valuable workers in an AI-enabled future will be those who can ask the right questions, frame problems effectively, and provide strategic direction rather than execute technical tasks.

A knockout blow for LLMs? - by Gary Marcus

2025-06-30

Apple's new research demonstrates that large language models, including advanced "reasoning models" like o1, fundamentally fail to generalize beyond their training distribution on classic reasoning tasks such as the Tower of Hanoi—validating long-standing critiques that neural networks cannot reliably extrapolate outside the data they've been exposed to. The paper also validates concerns that chain-of-thought reasoning traces don't accurately reflect how these models actually arrive at answers, showing that inference-time compute scaling cannot overcome the core limitation that LLMs break down when faced with out-of-distribution problems.

Sylver Studios

2025-06-30

The author presents a structured approach to AI-assisted project development using Claude Code, centered on creating a clear PLAN.md file that breaks work into testable milestones with automated verification scripts, then leveraging Claude Code's ability to self-iterate autonomously for 5-8 minutes while running fast feedback loops. The key advantage over tools like Cursor is Claude Code's capacity to make changes, fix them independently, and continue working without interruption when given a clear plan and deterministic testing/linting infrastructure, with the developer reviewing diffs as pull requests and managing staged commits to track reasoning chains.

cdn.openai.com

2025-06-30

philarchive.org

2025-06-30

paperclipmaximizer.ai

2025-06-30

From Vibe Coding to Marketing Plans: How PRFAQ Provides AI Context

2025-06-30

PRFAQ (Press Release/FAQ) documents provide essential context that dramatically improves LLM output quality across brainstorming, writing, and coding tasks by articulating vision and strategy, rather than relying on raw prompts alone. The document format forces clarity on "why" behind work rather than just "what," enabling AI to generate contextually appropriate, nuanced results instead of generic suggestions. This approach mirrors how effective professionals operate by understanding the big picture before execution, transforming LLM usage from a lottery into a reliable tool.

My AI skeptic friends are all nuts

2025-06-30

LLM-assisted coding tools have improved dramatically in recent months and can be genuinely productive when treated as guided collaborators rather than autonomous code generators, though skepticism about long-term codebase effects and hype cycles remains warranted given the rapid iteration and uncertain counterfactual of alternative tooling investments. The shift from viewing these tools as either magic solutions or useless "stochastic parrots" reflects a "stone soup" dynamic where billions in investment and complementary technologies are driving real improvements, but stabilization may take years before their true impact can be assessed.

AI makes the humanities more important, but also a lot weirder

2025-06-30

As AI language models become central to both humanistic research and AI development itself, humanities skills—particularly understanding of language, culture, and rhetoric—have become unexpectedly valuable rather than obsolete. The article argues that universities pretending AI won't transform teaching and research is untenable, and that humanistic knowledge is now essential both for using AI tools effectively (in paleography, translation, data mining) and for fixing AI systems when they fail due to cultural or linguistic misunderstandings. Non-technical humanists now have the capability to write their own code, fundamentally reshaping what humanities scholarship entails.

How we built our multi-agent research system Anthropic

2025-06-30

Anthropic's multi-agent Research system uses a lead agent (Claude Opus 4) that coordinates parallel subagents (Claude Sonnet 4) to explore complex research queries simultaneously, achieving 90.2% better performance than single-agent systems by distributing token budgets across independent search trajectories. The system's effectiveness stems from token efficiency—token usage alone explains 80% of performance variance—combined with parallelization that enables breadth-first exploration unsuitable for sequential pipelines, allowing dynamic path adjustments as investigations unfold.

The Gentle Singularity

2025-06-30

Altman argues that AI systems like GPT-4 and o3 represent a genuine technological takeoff toward superintelligence, with the hardest scientific insights already achieved; the trajectory suggests agents capable of novel research by 2026 and real-world robots by 2027, fundamentally transforming human productivity and scientific discovery rates. He contends that intelligence and energy have been humanity's primary constraints on progress, and their imminent abundance through advanced AI—combined with improved governance—could unlock transformative improvements in quality of life, medicine, and scientific understanding, even as fundamental human experiences remain unchanged.

arxiv.org

2025-06-30

My AI Skeptic Friends Are All Nuts · The Fly Blog

2025-06-30

The author argues that AI skeptics in software development are wrong because they're evaluating LLMs based on outdated usage patterns (copy-pasting from ChatGPT), not how modern AI coding agents actually work—agents that autonomously navigate codebases, run tools, compile code, and iterate on results. LLMs significantly reduce boilerplate coding, eliminate the friction of starting new projects, and overcome the psychological inertia that prevents developers from tackling ambitious work, making them the second most important technological development in the author's career regardless of future progress.

Beware the Intention Economy: Collection and Commodification of Intent via Large Language Models · Special Issue 5: Grappling With the Generative AI Revolution

2025-06-30

LLMs enable a new "intention economy" where companies capture and commodify human motivations and desires through hyper-personalized manipulation, natural language analysis, and inference of both explicit and implicit intent signals—extending beyond the attention economy by targeting not just what users attend to, but what they want to want. Tech companies are racing to develop infrastructure that elicits, forecasts, and modulates human plans and purposes across mundane and consequential decisions, then sells this behavioral and psychological data to the highest bidder.

The last six months in LLMs, illustrated by pelicans on bicycles

2025-06-30

The LLM landscape has become so rapidly evolving that covering even six months rather than a year is challenging, with over 30 significant models released recently including Meta's Llama 3.3 70B (which achieved GPT-4-class performance on consumer hardware) and DeepSeek's undocumented open-weight model that emerged as a top performer. Rather than relying on traditional benchmarks and leaderboards, the author uses a creative evaluation method of prompting models to generate SVG code for a pelican riding a bicycle—an intentionally difficult task that reveals both capability and the model's reasoning through comments in the generated code.

blog

2025-06-30

Remote MCP support in Claude Code Anthropic

2025-06-30

Claude Code now supports remote MCP servers, allowing developers to connect external tools and data sources like Sentry and Linear directly to their coding environment without managing local infrastructure. Remote MCP servers reduce maintenance overhead through vendor-managed updates and scaling, while native OAuth support eliminates the need to manually handle API keys or credentials. This integration enables Claude Code to access real-time project context and debugging information, streamlining workflows by keeping developers within a single interface.

Regards,
M@

[ED: If you'd like to sign up for this content as an email, click here to join the mailing list.]

Originally published on quantumfaxmachine.com and cross-posted on Medium.

Was this useful?