QFM085: Machine Intelligence Reading List - October 2025
Source: Photo by Steve Johnson on Unsplash
This month's Machine Intelligence Reading List explores AI development tools and industry dynamics. Claude in Excel brings AI capabilities directly into spreadsheets, while Introducing Agent Skills demonstrates evolving approaches to building autonomous AI systems. Industry analysis features prominently with You Have No Idea How Screwed OpenAI Actually Is examining competitive pressures, and This Is How Much Anthropic and Cursor Spend On Amazon Web Services revealing the substantial infrastructure costs behind AI services.
The collection also addresses AI capabilities and limitations, with Andrej Karpathy — AGI is still a decade away offering a measured perspective on timelines, and Why do LLMs freak out over the seahorse emoji? exploring the quirky edge cases that reveal model limitations. Development practices receive attention through Train with coding assistants like Magnus Carlsen trains with chess engines, advocating for deliberate practice approaches when working with AI tools.
As always, the Quantum Fax Machine Propellor Hat Key will guide your browsing. Enjoy!

Links
A joint study with the UK AI Security Institute found that as few as 250 malicious documents can create a backdoor vulnerability in large language models regardless of model size, challenging the assumption that attackers need to control a percentage of training data. The research demonstrates that poisoning attacks may be more practical than previously believed, since creating a fixed small number of malicious documents is far more feasible than the massive volume that percentage-based attacks would require. The findings apply to narrow backdoors in their experiments, but suggest data-poisoning defenses warrant urgent investigation as model scales grow.
The Hedonistic Imperative proposes using genetic engineering to eliminate suffering across all sentient life by replacing pain and malaise pathways with neural architectures based on heritable gradients of bliss, making well-being the genetically pre-programmed norm rather than an exception. Drawing parallels to how synthetic painkillers and anesthetics transformed physical pain from inevitable to avoidable over two centuries, the framework argues that abolishing psychological suffering is technically feasible and therefore becomes an ethical imperative rather than a biological inevitability.
Large language models consistently hallucinate a seahorse emoji that does not exist in Unicode, with GPT-5 and Claude reporting 100% confidence in its existence across multiple queries. Using logit lens analysis on Llama 3.3-70B reveals that the model's internal representations become increasingly chaotic and incoherent as it attempts to generate an emoji token, cycling through corrupted Unicode fragments and unrelated characters rather than settling on any valid emoji, which explains why LLMs produce garbled emoji spam instead of gracefully acknowledging the emoji doesn't exist like a human would.
Human cognition operates at approximately 10 bits per second for memory, decision-making, and imagination, while sensory systems gather data at roughly one billion bits per second, creating a billion-fold disparity that researchers term "the Musk illusion." This bottleneck means the brain can only process one task at a time at this fixed slow speed, regardless of technological augmentation, and a lifetime of human learning could fit on a small USB drive. The findings, compiled from nearly a century of psychology and neuroscience research, suggest the brain is far less impressive than commonly perceived and highlight opportunities for reorganizing scientific research around this fundamental cognitive constraint.
Technical professionals across the industry—engineers and product managers who actually build AI systems—consistently view LLMs as useful tools being catastrophically over-hyped and forced onto users, yet this pragmatic majority opinion remains invisible while billionaire-led hype dominates public discourse. The suppression of this moderate view by fear of retaliation has narrowed technological possibilities, allowing centralized Big Tech companies to implement the worst approaches to AI (unconsented data use, environmental damage, monopolistic control) rather than better alternatives that respect creators and sustainability.
Anthropic spent $2.66 billion on AWS compute through September 2025, exceeding its estimated $2.55 billion annual revenue, while Cursor's AWS bills doubled from $6.2 million to $12.6 million between May and June 2025 following price increases from Anthropic's Priority Service Tiers. These figures reveal that major AI companies are spending more on compute infrastructure than they generate in revenue, with Anthropic's total costs likely significantly higher when accounting for Google Cloud spending and operational expenses beyond AWS.
The article argues that Silicon Valley's focus on scaling large language models (LLMs) represents a brittle technical foundation that may not justify the trillion-dollar datacenter investments being made, and suggests alternative approaches like Tiny Recursion Models (TRMs) could achieve AI competence where LLMs fail while requiring dramatically lower computational costs. The author contends that while AI possesses genuine value, the industry has conflated scaling LLMs with inevitable progress without examining fundamental technical vulnerabilities or exploring more efficient alternatives.
Claude in Excel is a beta feature that integrates Claude AI directly into spreadsheets to help users navigate complex financial models, test scenarios, debug errors, and build models while maintaining formula integrity and transparency. Available for paid Claude plan users, it enables users to query specific cells and formulas, update assumptions across entire models without breaking dependencies, and trace calculation errors to their source with explanations. The tool operates within existing security frameworks and recognizes financial modeling conventions, supporting .xlsx and .xlsm files with size limits based on plan type.
Abacus AI Deep Agent is an autonomous AI system capable of executing complex multi-step tasks including app development, report generation, data analysis, and integration with external tools and APIs. The platform demonstrates versatility across diverse use cases—from building Stripe-integrated websites and Telegram bots to conducting equity research, processing invoices, generating marketing videos, and automating social media strategies. Pricing starts at $7/month for the first month, then $10/month, with examples showcasing the agent's ability to simulate user behavior, connect to multiple business systems (Gmail, Slack, GitHub, Twitter), and deliver structured outputs like presentations and dashboards.
Claude now introduces Agent Skills, which are modular folders containing instructions, scripts, and resources that Claude automatically loads when relevant to improve performance on specialized tasks like Excel work or brand guideline compliance. Skills are composable, portable across all Claude products, and efficiently load only necessary components, with developers able to create custom skills via the API's new /v1/skills endpoint or through an interactive skill-creator tool. Organization-wide management, partner-built skill directories, and an open standard format enable skills to function consistently across Claude apps, Claude Code, and the developer platform.
AI-assisted coding can achieve 10x productivity gains, but this requires a fundamental shift in how teams manage quality and risk—the probability of bugs must decrease by an order of magnitude or more to offset the increased commit velocity, as what was once an annual production issue can become a weekly occurrence at high throughput. Magerramov's team addresses this through human-reviewed commits, Rust's compile-time safety checks, and steering rules for AI agents, while recognizing that testing alone is insufficient and advocating for aerospace-level quality practices to maintain stability at significantly increased development speeds.
The piece argues that online tech sentiment is shaped by three distinct worldviews: Builders (who value fidelity and respect evolved constraints), Solvers (who value sincerity and believe in rational planning), and a rising third group called Cynics (who value authenticity and fear being deceived or appearing gullible). While Builders and Solvers have long been in ideological opposition, the emergence of Cynics as a significant force creates a dangerous three-player dynamic, particularly because Cynics and Solvers are increasingly finding common cause against Builders' progress-oriented vision. The fundamental divide now hinges on tolerance for uncertainty: whether one can accept the unresolved, emerging, and in-flight nature of progress.
OpenAI is facing catastrophic financial losses, with first-half 2025 results showing a $13.5 billion net loss on $4.3 billion in revenue—meaning the company is losing approximately three times more money than it earns and is on track for a $27 billion annual loss. For every dollar of revenue growth, OpenAI's operational costs are increasing by $7.77, revealing that the company's expansion is fundamentally unsustainable and the investor narrative about future profitability is disconnected from financial reality.
The author argues that information production has accelerated beyond humanity's cognitive capacity to process it meaningfully, creating a state of "enslopification"—where signal degrades into noise and discourse becomes self-reinforcing low-quality content. Rather than proposing resistance or acceptance, they advocate for "meta-slop awareness" and "perpendicular emergence" as a third way to navigate this information collapse, claiming they've detected measurable slop-fields extending from digital devices and built detectors to monitor the phenomenon. The piece presents technological acceleration as simultaneously a threat to human cognition and a potential catalyst for transcendence, positioning engagement with enslopification dynamics as inevitable but potentially navigable.
The author argues that developers should use AI coding assistants the way Magnus Carlsen uses chess engines—not as autopilot but as a learning tool through rigorous post-game analysis. Rather than accepting LLM-generated code directly, developers should review it critically during code review to understand why solutions work and catch mistakes, shifting the time investment from coding to learning-focused review. The effectiveness of this approach depends entirely on whether the developer engages thoughtfully with the AI's output or blindly accepts it without comprehension.
Regards,
M@
[ED: If you'd like to sign up for this content as an email, click here to join the mailing list.]
Originally published on quantumfaxmachine.com and cross-posted on Medium.
hello@matthewsinclair.com | matthewsinclair.com | bsky.app/@matthewsinclair.com | masto.ai/@matthewsinclair | medium.com/@matthewsinclair | xitter/@matthewsinclair
Was this useful?