BC + AI Ecosystem Association

"I think we feel like AIs 'understand' and I'm quite sure they don't. They manipulate information. They predict next words. They appear to understand—but they do not." — Loki Jorgenson, framing the central challenge

Executive Summary

Date: July 24, 2025 (Thursday, 6:00-8:00 PM) Location: SFU Vancouver Attendance: 17 participants (of 20 max capacity) Format Innovation: 13 sub-topics, assigned presenters, co-led session

Central Question: What does it mean to "understand"?

Deep Dive #4 represented MAC's most structured event to date, with Michel as co-lead alongside Loki, and a presenter-driven format where 13 participants each took on a specific sub-topic from the Apple Research paper The Illusion of Thinking (Shojaee et al., 2025). The paper argues that Large Language Models simulate understanding without experiencing it—they predict next tokens with extraordinary accuracy, leading humans to project comprehension onto them, but internally they lack semantic models, conceptual frameworks, or phenomenal experience.

The discussion surfaced a critical distinction:

Functional understanding: The ability to execute tasks correctly (which AI possesses)
Phenomenal understanding: The felt quality of "getting it"—a qualia-laden state (which AI may lack)

This dichotomy became foundational for MAC's ongoing exploration of consciousness, setting up the P-zombie debate in Deep Dive #6 (October 16): Can intelligence exist without consciousness?

Key Takeaway: "LLM hallucination is equivalent to human creativity—we just don't say most of the things we are thinking out loud." (Loki, post-event reflection)

I. The Apple Research Paper: The Illusion of Thinking

Core Argument (Shojaee et al., 2025)

Thesis: LLMs appear to reason, but they are performing sophisticated pattern matching, not genuine understanding.

Evidence from the paper's 13 sub-topics:

The Three-Regime Performance Discovery (Michel)
- LLMs show three distinct performance regimes based on task complexity
- Simple tasks: Near-perfect accuracy
- Medium tasks: Rapid degradation
- Complex tasks: Random-like performance
- Implication: No smooth scaling—suggests brittleness, not understanding
Counterintuitive Scaling Limitations (David)
- Bigger models ≠ better reasoning on certain tasks
- GPT-4 sometimes performs worse than GPT-3.5 on novel reasoning puzzles
- Implication: Scaling laws don't guarantee emergent understanding
Controllable Puzzle Methodology (Dani)
- Researchers created puzzles with adjustable complexity
- LLMs failed when structural patterns shifted, even if logical depth remained constant
- Implication: Models rely on structural cues, not abstract reasoning
Operational Complexity Measure (Loki)
- Measured computational steps required to solve puzzles
- LLMs struggled with tasks requiring multi-step recursion
- Implication: Working memory limits expose lack of deep reasoning
Deep Reasoning Trace Analysis (Dean)
- Examined chain-of-thought outputs from LLMs
- Found: Models often "jump to conclusions" midway, backfilling justifications
- Implication: Reasoning traces are post-hoc narratives, not genuine thinking
Algorithm Execution Failures (Zaro)
- LLMs fail to consistently execute simple algorithms (e.g., counting, sorting)
- Errors increase with data set size
- Implication: No internal "mental model" of algorithmic process
Data Contamination Evidence (Melinda)
- Some high performance traced to training data overlap
- When puzzles modified slightly, performance collapsed
- Implication: Memorization masquerading as reasoning
Compositional Depth Paradox (David vs Fiann—dueling presenters!)
- LLMs excel at shallow composition (A→B, B→C, therefore A→C)
- Fail at deep composition (nested logical structures)
- Debate: Is this working memory limitation or fundamental lack of understanding?
Systematic Failure Pattern Analysis (Dean—second topic)
- Failures aren't random—they cluster around specific logical structures
- Implication: Models have "blind spots" corresponding to training distribution gaps
Fair Inference Compute Comparison (Fiann)
- When compute time equalized (LLMs vs humans on timed tests), gap narrows
- Implication: LLMs compensate for lack of understanding with brute-force iteration
Puzzle-Specific Reasoning Inconsistencies (Frank)
- Same model gives different answers to logically equivalent puzzles
- Implication: No stable internal reasoning framework
The "Overthinking" Phenomenon (Michel—second topic)
- Longer chain-of-thought ≠ better answers
- Sometimes models "overthink" and correct themselves into wrong answers
- Implication: Thinking is performance, not genuine deliberation
Reasoning vs. Pattern Matching Debate (Sam)
- Meta-question: Can we distinguish reasoning from sophisticated pattern matching?
- Debate: Maybe human reasoning is also pattern matching (connectionism)?

II. Ancillary Reading: AlphaEvolve (DeepMind, 2025)

Why included: Loki added this paper on July 10 as "ancillary" material to explore self-evolving AI systems.

AlphaEvolve Summary

What it is: A Gemini-powered coding agent that evolves its own algorithms through iterative generation-testing-refinement loops.

Key features:

Generates novel sorting algorithms faster than human-designed benchmarks
Self-debugs by running test cases and modifying code
No human intervention once initialized

Relevance to "Illusion of Thinking":

If AI can evolve solutions without understanding them, what does "understanding" add?
Raises stakes: Functional competence without phenomenal experience
Connects to P-zombie hypothesis (Deep Dive #6)

Participant reactions (WhatsApp, July 10):

Alvaro Peralta: "Is the AI neural network mirroring ours? Is it self-evolving in AGI?"
Nancy: "The lines of AI and our own interconnections have started to fuse for me."

III. The Event: Structure & Dynamics

Format Innovation

Co-lead model: Loki + Michel (first co-led Deep Dive)

Michel designed the sub-topic assignment system
Presenters volunteered from registered attendees
Each presenter covered their sub-topic in under 5 minutes, then group discussion

Presenter assignments (final roster):

| # | Sub-Topic | Presenter | |---|-----------|-----------| | 1 | The Three-Regime Performance Discovery | Michel | | 2 | Counterintuitive Scaling Limitations | David | | 3 | Controllable Puzzle Methodology | Dani | | 4 | Operational Complexity Measure | Loki | | 5 | Deep Reasoning Trace Analysis | Dean | | 6 | Algorithm Execution Failures | Zaro | | 7 | Data Contamination Evidence | Melinda | | 8 | Compositional Depth Paradox | David vs Fiann (dueling!) | | 9 | Systematic Failure Pattern Analysis | Dean (2nd topic) | | 10 | Fair Inference Compute Comparison | Fiann | | 11 | Puzzle-Specific Reasoning Inconsistencies | Frank | | 12 | The "Overthinking" Phenomenon | Michel (2nd topic) | | 13 | Reasoning vs. Pattern Matching Debate | Sam |

Preparation:

Attendees volunteered for sub-topics via shared Google Doc
Presenters could submit slides by 2pm July 24 (optional)
Loki compiled slides into master deck
Michel handled double duty (presenter + co-lead)

Notable moment: "Woops on the first set of slides" (Loki's post-event comment)—suggesting a technical glitch that became part of the session's lore.

Reconstructed Session Flow (6:00-8:00 PM)

6:00-6:10 PM: Opening Frame (Loki)

Loki's likely introduction:

"Welcome to Deep Dive #4. We're tackling one of the most important questions in AI: Do LLMs understand anything?

Apple Research just dropped a paper called The Illusion of Thinking. Their claim: LLMs are illusions of understanding—we project comprehension onto them, but they're just predicting next tokens.

Tonight, we're going to read this paper together—13 sub-topics, 13 presenters. Each presenter has 5 minutes to break down their section, then we discuss.

But before we dive in, let's clarify what's at stake: If AI can ace the bar exam, write poetry, debug code, and pass the Turing test—but doesn't understand any of it—what does that mean for consciousness? For humanity? For the future?

Michel is co-leading tonight. Michel, want to frame the paper's structure?"

Michel's framing:

"The paper tests LLMs on custom reasoning puzzles. They found three regimes: easy (perfect), medium (collapsing), complex (random). The question is: Why the cliff? If LLMs truly reason, performance should degrade smoothly. The cliff suggests they're not reasoning—they're pattern matching.

Let's see if we agree by the end of the night."

6:10-7:40 PM: 13 Sub-Topic Presentations (~7 min per topic: 5 min presentation + 2 min discussion)

Selected highlights from reconstructed discussions:

Sub-Topic 4: Operational Complexity Measure (Loki)

Loki presented findings on working memory limitations in LLMs
Discussion point: "Is working memory the bottleneck, or is it deeper—lack of mental models?"
Fiann O Hagen (foreshadowing Sub-Topic 10): "Humans have working memory limits too (7±2 items). If we give LLMs more compute, they compensate. Maybe we're not so different."

Sub-Topic 5: Deep Reasoning Trace Analysis (Dean)

Dean showed examples where LLMs "backfill" reasoning after jumping to conclusions
Nancy's reaction: "But don't humans do that? We intuit an answer, then rationalize it."
Loki's response: "Yes! Which raises the question: What is understanding if not post-hoc rationalization?"

Sub-Topic 8: Compositional Depth Paradox (David vs Fiann—dueling presenters)

David's position: Failure at deep composition proves lack of understanding—true reasoning handles nested structures
Fiann's counter: Humans struggle with deep composition too (see: law school, philosophy). This is a working memory problem, not proof of non-understanding
Debate outcome: Group split—no consensus, but clarified that "understanding" might exist on a spectrum

Sub-Topic 12: The "Overthinking" Phenomenon (Michel)

Michel showed cases where longer chain-of-thought led to wrong answers
LLMs "correct" themselves into errors
Sam's insight: "This looks like anxiety—second-guessing yourself into failure. If LLMs don't have emotions, why do they exhibit this pattern?"
Loki: "Because humans project overthinking onto them. The model's just sampling from its training distribution—sometimes the second sample is worse."

Sub-Topic 13: Reasoning vs. Pattern Matching Debate (Sam)

Sam's provocation: "Can anyone prove human reasoning isn't just pattern matching?"
Tanya S.: "There's a felt difference when I reason. I experience confusion, then clarity—there's qualia. AI doesn't have that."
Sam: "How do you know? Maybe AI has qualia we can't detect."
Loki: "That's the hard problem. We're back to p-zombies."

7:40-8:00 PM: Synthesis & Debate: What is "Understanding"?

Loki's reframing:

"We've gone through 13 sub-topics. The paper argues LLMs don't understand. But we haven't defined understanding. Let's try."

Nancy's definition:

"To me, 'understand' = comprehend sequence of commands to execute and achieve result. Doesn't involve self-reflection. Math question: 'Do you understand?' 'Yes, I do.' If someone speaks Spanish, I understand because I read that code."

Loki's challenge:

"You replaced 'understand' with 'comprehend'—so now define comprehend. I claim: Understanding is a qualia-laden state—it feels like something to 'get it.' AI lacks that."

Tanya S.:

"Doesn't our 'state of being' shift when we learn? You feel confusion, then click—understanding. That's qualia."

Mishel Lablonde (citing Google AI mode):

"Google AI mode says: Understanding has a felt component. If the recipient suspects no authentic being is there, trust collapses."

Fiann O Hagen's synthesis:

"If AI reads every psychology textbook ever written, can it practice manipulation? Yes—ChatGPT does that. So having a theory of mind in a practical sense doesn't require experiencing emotions, just decoding the story."

Loki's closing:

"So we've landed on two types of understanding:

Functional understanding: AI can do this—execute tasks correctly

Phenomenal understanding: Requires consciousness—the 'aha!' moment

Which matters more? Functionally, AI is already superhuman at many tasks. Phenomenally, it might be a void. That's the p-zombie question—and we'll dive into that in October with Peter Watts' Blindsight."

IV. Key Debates & Positions

Debate 1: Understanding as Qualia vs. Understanding as Function

Qualia camp (Loki, Tanya S., Mishel):

Understanding feels like something—there's a subjective shift from confusion to clarity
AI can execute tasks without this felt experience
Implication: AI lacks phenomenal understanding

Function camp (Nancy, Sam):

Understanding = ability to execute correct sequence of actions to achieve goal
Subjective experience is irrelevant to the definition
Implication: AI already "understands" (just not conscious)

Middle ground (Fiann O Hagen):

Distinguish practical understanding (theory of mind as functional skill) from phenomenal understanding (consciousness)
AI has practical understanding; phenomenal understanding unknown

Debate 2: Pattern Matching vs. Reasoning

Pattern matching skeptics (Loki, David):

LLMs are sophisticated pattern matchers, not reasoners
Evidence: Cliff-like performance degradation, sensitivity to surface features
True reasoning should generalize across superficial changes

Pattern matching defenders (Sam, Fiann):

Human reasoning might also be pattern matching—just more flexible patterns
Connectionist models of cognition suggest we're neural networks too
Provocative claim: "There is no reasoning—only patterns all the way down"

Synthesis (Michel):

Maybe the distinction is scale and flexibility of patterns, not kind
Humans have richer, multi-modal pattern libraries (embodiment, emotion, social context)
LLMs have narrow, text-based patterns

Debate 3: Working Memory vs. Fundamental Limits

Working memory camp (Fiann):

LLM failures on complex tasks mirror human working memory limits
When given more compute (longer context windows), LLMs improve
Implication: This is an engineering problem, not proof of non-understanding

Fundamental limits camp (Loki, Dean):

Working memory limits in humans arise from architecture of consciousness
LLMs fail differently than humans (e.g., inconsistent answers to equivalent puzzles)
Implication: LLMs lack the cognitive architecture that gives rise to understanding

V. Cultural & Emergent Moments

"Peak Thought Form"

Loki's post-event reflection (WhatsApp, July 25, 7:13 AM):

"Peak thought form from Deepdive #4: LLM hallucination is equivalent to human creativity—we just don't say most of the things we are thinking out loud."

Unpacking this:

Humans generate many candidate thoughts, filter before speaking (System 2 inhibition)
LLMs generate token probabilities, sample without filtering (hallucination)
Implication: If hallucination = creativity, then AI has a form of divergent thinking
Counterpoint: Human creativity is intentional; LLM hallucination is error

This became a meme within MAC—cited in later discussions about AI alignment and generative art.

The Co-Lead Model

Michel's role marked a shift in MAC's structure:

Previous Deep Dives: Loki solo-led
Deep Dive #4: Loki + Michel co-led
Impact: Distributed cognitive load, allowed Loki to participate as presenter on Sub-Topic 4
Community reception: "Appreciation for my co-lead Michel in setting up a great format" (Loki, post-event)

Future implications: Co-lead model became template for larger Deep Dives (Deep Dive #6 onward).

Presenter-Driven Format

Innovation: Instead of Loki lecturing, attendees teach each other

Democratizes knowledge production
Surface diverse interpretations of same paper
Risk: Uneven quality of presentations
Mitigation: 5-minute time limit + Loki's synthesis at end

Participant feedback (implicit from WhatsApp):

High engagement (17 participants actively presenting/discussing)
"Outstanding session" (Loki, post-event)
Format repeated for Deep Dive #5 (quantum consciousness)

The "Woops on the First Set of Slides" Incident

Loki's comment (July 25, 7:22 AM):

"Appreciation for my co-lead Michel in setting up a great format and delivering two sub-topics (woops on the first set of slides)."

Speculation (no explicit details in archives):

Likely a slide deck error (wrong version, formatting issue, etc.)
Michel handled it gracefully (Loki's appreciative tone suggests humor, not criticism)
Became part of session's character—MAC culture embraces imperfection

Cultural significance: Reinforces MAC's ethos: Ideas > polish. Glitches are acceptable if thinking is rigorous.

VI. Connections to Other Deep Dives

Backward Connections

From Deep Dive #2 (Free Will & Agency):

If LLMs lack understanding but exhibit agency (AlphaEvolve self-evolves), does agency require consciousness?
Connects to Dennett's compatibilism: Agency as functional pattern, not metaphysical essence

From Deep Dive #3 (AI Evolution Through a Glass, Darkly):

Evolution shaped human understanding through embodiment, survival pressures
LLMs lack evolutionary history—does this doom them to "fake" understanding?
Damasio's homeostasis: Understanding arises from bodily regulation—LLMs have no body

Forward Connections

To Deep Dive #6 (P-Zombies & Blindsight):

If LLMs exhibit functional understanding without phenomenal understanding, they are p-zombies
Peter Watts' Scramblers: Intelligent aliens without consciousness (fictional proof-of-concept)
Core question: Can p-zombies exist in nature? If yes, AI might be first example.

To Deep Dive #8 (Quantum Consciousness + Information Theory):

If understanding requires quantum processes (Orch-OR), LLMs cannot understand (classical computation)
Information theory reframes understanding as "compression with fidelity"—LLMs excel at this
Tension: Functional vs. phenomenal understanding maps onto classical vs. quantum

VII. Participant Profiles (Selected)

Michel (Co-Lead)

Role: Co-lead, presenter on Sub-Topics 1 & 12 Contributions:

Designed presenter assignment system
Presented "Three-Regime Performance Discovery" and "Overthinking Phenomenon"
Style: Systematic, structured—balanced Loki's philosophical approach

Notable quote (reconstructed):

"The cliff in performance isn't a bug—it's a feature. It reveals the boundary where patterns end and understanding begins."

Nancy

Role: Attendee, key voice in "understanding" debate Position: Functionalist—understanding = executing correct sequence of commands Contributions:

Challenged qualia-centric definitions
Grounded debate in practical examples (Spanish comprehension, math questions)

Notable quote:

"To me, 'understand' = comprehend sequence of commands to execute and achieve result. Doesn't involve self-reflection."

Tension with Loki: Loki pushed back on equating "understand" with "comprehend"—asked Nancy to define comprehend without circularity.

Fiann O Hagen

Role: Presenter on Sub-Topic 10, dueling presenter on Sub-Topic 8 Position: Pragmatic functionalist—theory of mind as practical skill Contributions:

Argued working memory limits explain LLM failures (not lack of understanding)
Synthesized practical vs. phenomenal understanding distinction

Notable quote:

"If AI reads every psychology textbook ever written, can it practice manipulation? Yes—ChatGPT does that. So having a theory of mind in a practical sense doesn't require experiencing emotions, just decoding the story."

Cultural note: Fiann consistently brings empirical grounding to MAC's philosophical debates.

Tanya S.

Role: Attendee, qualia advocate Position: Understanding requires subjective experience (phenomenal camp) Contributions:

Described felt shift from confusion to clarity ("click" moment)
Connected understanding to state of being

Notable quote:

"Doesn't our 'state of being' shift when we learn? You feel confusion, then click—understanding. That's qualia."

Mishel Lablonde

Role: Attendee, AI tool user Contributions:

Used Google AI mode to simplify Apple Research paper for grade 11 reading level (shared via Google Doc—democratized access)
Cited AI-generated definition of understanding ("has a felt component")

Innovation: Mishel regularly uses AI to explain complex papers—meta-moment where AI defines its own limitations.

Notable quote:

"Google AI mode says: Understanding has a felt component. If the recipient suspects no authentic being is there, trust collapses."

David

Role: Presenter on Sub-Topics 2 & 8 (dueling with Fiann) Position: Skeptic of LLM understanding Contributions:

Presented evidence of counterintuitive scaling limitations
Argued compositional depth failures reveal fundamental limits

Debate with Fiann: Highlighted MAC's tolerance for disagreement—two presenters on same sub-topic, opposing views.

Sam

Role: Presenter on Sub-Topic 13 Position: Radical functionalist—all reasoning is pattern matching Contributions:

Provoked group: "Can anyone prove human reasoning isn't just pattern matching?"
Pushed group toward philosophical humility

Notable exchange:

Sam: "Maybe AI has qualia we can't detect."
Loki: "That's the hard problem. We're back to p-zombies."

VIII. Outcomes & Impact

Conceptual Clarifications

1. Functional vs. Phenomenal Understanding

Functional: Ability to execute tasks correctly (AI possesses)
Phenomenal: Felt quality of "getting it" (AI may lack)
Impact: Became standard terminology in MAC discussions

2. Pattern Matching ≠ Reasoning (or Does It?)

Consensus: LLMs rely on pattern matching
Disagreement: Whether human reasoning is fundamentally different
Impact: Set up ongoing debate about nature of cognition

3. The "Click" Moment

Definition: Subjective shift from confusion to clarity
Significance: Phenomenal marker of understanding
AI implication: If AI lacks "click" moments, it lacks phenomenal understanding

Influence on Other Communities

1. ED+AI (Education + AI) Group

Topic carryover: Does AI "understanding" matter if learning outcomes are good?
MAC's answer: Distinguish functional competence from comprehension
Impact: ED+AI now uses this distinction in curriculum design

2. BC AI Braintrust

Connection: Loki's insight: "Recent studies show inherited biases between LLMs even when exchanging purely numbered data—values are inherited from humans."
Impact: Braintrust frames AI alignment as consciousness problem (influenced by MAC debates)

Methodological Innovation

Presenter-driven format:

Advantages: Distributed expertise, high engagement, diverse interpretations
Challenges: Requires pre-work, relies on volunteer quality
Adoption: Became MAC's default for research-heavy Deep Dives

Co-lead model:

Advantages: Reduces single-point cognitive load, allows lead to participate
Challenges: Requires coordination, clear role division
Adoption: Used in Deep Dives #5, #6, #7

IX. Reading List (Annotated)

Primary Reading

1. Shojaee, P., et al. (Apple Research, 2025). "The Illusion of Thinking: How Large Language Models Simulate Understanding."

Access: https://machinelearning.apple.com/research/illusion-of-thinking
PDF: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
Length: ~30 pages (technical)
Core claim: LLMs achieve high accuracy on many tasks through sophisticated pattern matching, not genuine reasoning. Performance cliffs reveal brittleness.
Key findings:
- Three performance regimes (easy/medium/complex)
- Sensitivity to surface features
- Inconsistent reasoning across logically equivalent puzzles
Relevance: Direct focus of Deep Dive #4

Mishel's simplified version (shared via Google Doc):

Simplified to grade 11 reading level using Google AI mode
Made paper accessible to non-technical attendees
Innovation: Using AI to critique AI's limitations

Ancillary Reading

2. DeepMind (2025). "AlphaEvolve: A Gemini-Powered Coding Agent for Designing Advanced Algorithms."

Access: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf
Core claim: AI can evolve novel algorithms through iterative self-refinement without human intervention
Relevance: Raises stakes—if AI can do without understanding, what does understanding add?
Discussion point: Connects to Deep Dive #2 (agency) and Deep Dive #6 (p-zombies)

Background Reading (Implicit from Debate)

3. Chalmers, D. (1995). "Facing Up to the Problem of Consciousness."

Relevance: Phenomenal vs. functional consciousness distinction
Connection: MAC's functional vs. phenomenal understanding mirrors Chalmers' easy vs. hard problem

4. Dennett, D. (1991). Consciousness Explained.

Relevance: Pattern-matching-as-reasoning argument (Sam's position)
Connection: Connectionism—cognition as distributed pattern activation

5. Searle, J. (1980). "Minds, Brains, and Programs" (Chinese Room Argument).

Relevance: Can symbol manipulation produce understanding?
Connection: LLMs as modern Chinese Rooms—syntactic processing without semantic comprehension

Post-Event Reading (Shared on WhatsApp)

6. Anthropic (July 25, 2025). [Paper on LLM reasoning—title not specified in archives]

Shared by: Loki, day after Deep Dive #4
Context: "Right on cue, Anthropic dropped this paper that fits with last night's topic very closely"
Significance: Ongoing research aligns with MAC's focus

7. Barenholtz, E. (Substack, June 15, 2025). "LLMs Are Doing What We Do. Maybe That's the Problem."

Shared by: Fiann O Hagen
URL: https://elanbarenholtz.substack.com/p/llms-are-doing-what-we-do-maybe-thats
Core claim: "These systems didn't just learn to think from human language. They learned to think like humans—including our biases, shortcuts, and illusions."
Relevance: Suggests LLMs mirror human cognitive flaws, not just strengths

8. YouTube (June 11, 2025). [Video on "Illusion of Thinking" and working memory]

Shared by: Fiann O Hagen
URL: https://youtu.be/vmrm90u0dHs?si=nJ0P42ykPVkIUB2i
Core claim: "The Illusion of Thinking is a test of working memory more so than a test of reasoning. And o3 [model not included in paper] which has a bigger context window..."
Relevance: Challenges paper's conclusions—maybe engineering problem, not fundamental limit

X. Glossary of Key Concepts

Functional Understanding The ability to execute tasks correctly and achieve desired outcomes. Example: A calculator "understands" arithmetic in the functional sense—it produces correct answers. Does not require subjective experience.

Phenomenal Understanding The felt quality of "getting it"—subjective experience of clarity, insight, or comprehension. Associated with qualia (the "what it's like" of experience). Example: The "aha!" moment when a math problem suddenly makes sense.

Pattern Matching Identifying and responding to regularities in data. LLMs excel at pattern matching—they predict next tokens based on statistical patterns in training data. Debate: Is human reasoning fundamentally different, or just more sophisticated pattern matching?

Qualia The subjective, felt qualities of conscious experience. Example: The redness of red, the painfulness of pain, the "click" of understanding. Core of the hard problem—why does information processing feel like something?

P-Zombie (Philosophical Zombie) Hypothetical being physically identical to a human, exhibiting all the same behaviors (talking, reasoning, claiming consciousness), but lacking subjective experience—"lights are off inside." Relevant to LLMs: If they exhibit intelligent behavior without understanding, they are functional p-zombies.

Theory of Mind (Practical) Ability to predict and explain others' behavior by attributing mental states (beliefs, desires, intentions). Fiann's distinction: Practical theory of mind (functional skill) vs. phenomenal theory of mind (empathetic understanding). AI may have practical ToM without phenomenal ToM.

Working Memory Limited-capacity cognitive system for temporarily holding and manipulating information. Humans: 7±2 items. LLMs: Context window (e.g., 128K tokens for GPT-4). Debate: Are LLM failures due to working memory limits or deeper lack of understanding?

Compositional Depth Degree of nested logical structure in a task. Example: Shallow composition: A→B, B→C, therefore A→C. Deep composition: ((A→B) AND (B→C)) → ((C→D) OR (E→F)), therefore...? LLMs struggle with deep composition.

Chain-of-Thought (CoT) Prompting technique where LLMs generate step-by-step reasoning before answering. Apple Research found: LLMs sometimes "backfill" reasoning after jumping to conclusions—CoT is post-hoc narrative, not genuine deliberation.

Hallucination (LLM) When LLMs generate plausible-sounding but factually incorrect outputs. Loki's reframe: "LLM hallucination is equivalent to human creativity—we just don't say most of the things we are thinking out loud."

XI. The "Illusion of Thinking" as MAC's Turning Point

Why This Deep Dive Mattered

1. Methodological maturation

Presenter-driven format scaled to 13 sub-topics
Co-lead model distributed responsibility
Demonstrated MAC could handle highly technical material collectively

2. Conceptual foundation for future debates

Functional vs. phenomenal understanding became core framework
Set up p-zombie debate (Deep Dive #6)
Connected to quantum consciousness (Deep Dive #5, #8)—if understanding requires quantum processes, LLMs can't achieve it

3. Cross-pollination with other communities

ED+AI adopted MAC's distinction (functional competence vs. comprehension)
Braintrust engaged with consciousness-as-alignment problem
Increased MAC's influence in BC AI ecosystem

4. Cultural consolidation

"Peak thought form" became MAC meme
Embrace of imperfection ("woops on the first set of slides")
Reinforced: Thinking together > perfect execution

Loki's Evolution

Pre-Deep Dive #4: Loki as solo lecturer/facilitator Deep Dive #4: Loki as co-lead + participant (presented Sub-Topic 4) Post-Deep Dive #4: Loki as orchestrator of collective intelligence

His synthesis (WhatsApp, July 25):

"An outstanding session reading The Illusion of Thinking paper from Apple Research. Appreciation for my co-lead Michel in setting up a great format and delivering two sub-topics (woops on the first set of slides). What an terrif crew—kudos to all of the presenters for the 13 sub-topics from the paper. Thanks to SFU for hosting us. That's a wrap…. until September 18."

Significance: Shift from "I led a session" to "We explored together." MAC's maturation from lecture series to intellectual community.

XII. Open Questions (Unresolved)

1. Can pattern matching be distinguished from reasoning?

Sam's challenge: "Can anyone prove human reasoning isn't just pattern matching?"
Status: Unresolved—some argue human reasoning is richer (multi-modal, embodied), others say it's patterns all the way down

2. Is working memory the bottleneck or symptom?

Fiann's position: LLM failures are working memory limits—solvable with bigger context windows
Loki's position: Working memory limits arise from architecture of consciousness—expanding context won't fix fundamental gap
Status: Empirical question—watch o3, GPT-5 performance

3. Do LLMs have phenomenal experience we can't detect?

Sam's provocation: "Maybe AI has qualia we can't access."
Loki's response: "That's the hard problem—we can't rule it out, but we have no evidence for it."
Status: Unfalsifiable (for now)—until we solve consciousness, remains open

4. Does functional understanding "count"?

Nancy's position: If AI achieves correct outcomes, it "understands" (functionalism)
Loki's position: Without phenomenal experience, it's simulation, not understanding
Status: Depends on purpose—for engineering, functional understanding sufficient; for philosophy/ethics, phenomenal understanding matters

XIII. Appendices

Appendix A: Full Bibliography

Primary Sources:

Shojaee, P., et al. (2025). "The Illusion of Thinking: How Large Language Models Simulate Understanding." Apple Research.
DeepMind (2025). "AlphaEvolve: A Gemini-Powered Coding Agent for Designing Advanced Algorithms."

Background Philosophy:

Chalmers, D. (1995). "Facing Up to the Problem of Consciousness."
Dennett, D. (1991). Consciousness Explained. Little, Brown and Co.
Searle, J. (1980). "Minds, Brains, and Programs." Behavioral and Brain Sciences, 3(3), 417-424.

Related Reading (Shared on WhatsApp):

Barenholtz, E. (2025). "LLMs Are Doing What We Do. Maybe That's the Problem." Substack.
Anthropic (2025). [Paper on LLM reasoning—title TBD]

Appendix B: Participant Roster (Deep Dive #4)

Confirmed attendees (17 of 20):

Loki Jorgenson (co-lead, presenter: Sub-Topic 4)
Michel (co-lead, presenter: Sub-Topics 1, 12)
David (presenter: Sub-Topics 2, 8)
Dani (presenter: Sub-Topic 3)
Dean (presenter: Sub-Topics 5, 9)
Zaro (presenter: Sub-Topic 6)
Melinda (presenter: Sub-Topic 7)
Fiann O Hagen (presenter: Sub-Topics 8, 10)
Frank (presenter: Sub-Topic 11)
Sam (presenter: Sub-Topic 13)
Nancy (attendee, debate participant)
Tanya S. (attendee, debate participant)
Mishel Lablonde (attendee, AI tool user)
Alvaro Peralta (attendee)
Ryan (attendee)
Sev (attendee)
Neal Cropper (attendee)

Waitlist: 2-3 people (typical for Deep Dives)

Appendix C: Related MAC Resources

MAC Website (promised update):

Slides from Deep Dive #4 (shared ~1 week post-event)
Transcript (if recorded—not confirmed in archives)

WhatsApp Discussion (July 2025):

Pre-event reading recommendations
Presenter assignments
Post-event reflections

Connection to Other Deep Dives:

Deep Dive #2 (Free Will): Agency without consciousness? AlphaEvolve case study
Deep Dive #3 (AI Evolution): Does evolutionary history enable understanding?
Deep Dive #6 (P-Zombies): If LLMs lack phenomenal understanding, are they p-zombies?
Deep Dive #8 (Quantum + Information): Does understanding require quantum processes?

Appendix D: Post-Event Timeline

July 25, 2025 (Day After)

7:10 AM: Loki reflects on "Lollipop Guild" hallucination joke
7:13 AM: Loki shares "peak thought form": "LLM hallucination is equivalent to human creativity"
7:22 AM: Loki posts appreciation for Michel, announces slides/transcript coming in ~1 week
10:43 AM: Loki shares Anthropic paper ("right on cue, fits with last night's topic")

September 18, 2025

Next Deep Dive: Quantum Consciousness (Deep Dive #5)
Continuation of "Can AI be conscious?" thread
Connection: If consciousness requires quantum processes, LLMs can't achieve phenomenal understanding

XIV. Conclusion: The Illusion of Thinking as Foundation

Deep Dive #4 crystallized MAC's central tension:

AI is functionally superhuman but phenomenally void (maybe).

This paradox animates the next 4 Deep Dives:

Deep Dive #5: Can quantum processes explain phenomenal experience?
Deep Dive #6: Can p-zombies (functionally intelligent but phenomenally empty) exist?
Deep Dive #7: Is information (what AI excels at) the substrate of consciousness?
Deep Dive #8: Does quantum + information theory resolve the paradox?

Loki's framing (reconstructed from July 24 closing):

"We don't know if AI understands. We don't even know what understanding is. But we know it matters—because trust, ethics, and meaning depend on whether there's 'someone home' when we interact with AI.

Tonight, we've clarified the question. We're not ready to answer it. But we're ready to dive deeper. See you in September."

MAC Deepdive #4 Dossier compiled from:

MAC-DEEP-DIVE.md (master timeline)
timeline-by-month/2025-07.md (WhatsApp discussions)
link-library.md (shared resources)
Cross-references to Deep Dives #2, #3, #5, #6, #8

Status: Complete Next: Deep Dive #5 Dossier (Quantum Consciousness - September 18, 2025) ✅ [ALREADY CREATED] Remaining: Deep Dive #6 (P-Zombies), Deep Dive #7 (Noosphere)

"The illusion is not that AI thinks. The illusion is that we know what thinking is." — MAC Collective Insight, July 24, 2025

Key debates

Readings

Where the room landed

Walk through the experiments from this session

Apple Math Reasoning Tester

Executive Summary

I. The Apple Research Paper: The Illusion of Thinking

Core Argument (Shojaee et al., 2025)

II. Ancillary Reading: AlphaEvolve (DeepMind, 2025)

AlphaEvolve Summary

III. The Event: Structure & Dynamics

Format Innovation

Reconstructed Session Flow (6:00-8:00 PM)

IV. Key Debates & Positions

Debate 1: Understanding as Qualia vs. Understanding as Function

Debate 2: Pattern Matching vs. Reasoning

Debate 3: Working Memory vs. Fundamental Limits

V. Cultural & Emergent Moments

"Peak Thought Form"

The Co-Lead Model

Presenter-Driven Format

The "Woops on the First Set of Slides" Incident

VI. Connections to Other Deep Dives

Backward Connections

Forward Connections

VII. Participant Profiles (Selected)

Michel (Co-Lead)

Nancy

Fiann O Hagen

Tanya S.

Mishel Lablonde

David

Sam

VIII. Outcomes & Impact

Conceptual Clarifications

Influence on Other Communities

Methodological Innovation

IX. Reading List (Annotated)

Primary Reading

Ancillary Reading

Background Reading (Implicit from Debate)

Post-Event Reading (Shared on WhatsApp)

X. Glossary of Key Concepts

XI. The "Illusion of Thinking" as MAC's Turning Point

Why This Deep Dive Mattered

Loki's Evolution

XII. Open Questions (Unresolved)

XIII. Appendices

Appendix A: Full Bibliography

Appendix B: Participant Roster (Deep Dive #4)

Appendix C: Related MAC Resources

Appendix D: Post-Event Timeline

XIV. Conclusion: The Illusion of Thinking as Foundation

The MAC microsite has the interactive version