The dossier
"I think we feel like AIs 'understand' and I'm quite sure they don't. They manipulate information. They predict next words. They appear to understand—but they do not." — Loki Jorgenson, framing the central challenge
Executive Summary
Date: July 24, 2025 (Thursday, 6:00-8:00 PM) Location: SFU Vancouver Attendance: 17 participants (of 20 max capacity) Format Innovation: 13 sub-topics, assigned presenters, co-led session
Central Question: What does it mean to "understand"?
Deep Dive #4 represented MAC's most structured event to date, with Michel as co-lead alongside Loki, and a presenter-driven format where 13 participants each took on a specific sub-topic from the Apple Research paper The Illusion of Thinking (Shojaee et al., 2025). The paper argues that Large Language Models simulate understanding without experiencing it—they predict next tokens with extraordinary accuracy, leading humans to project comprehension onto them, but internally they lack semantic models, conceptual frameworks, or phenomenal experience.
The discussion surfaced a critical distinction:
- Functional understanding: The ability to execute tasks correctly (which AI possesses)
- Phenomenal understanding: The felt quality of "getting it"—a qualia-laden state (which AI may lack)
This dichotomy became foundational for MAC's ongoing exploration of consciousness, setting up the P-zombie debate in Deep Dive #6 (October 16): Can intelligence exist without consciousness?
Key Takeaway: "LLM hallucination is equivalent to human creativity—we just don't say most of the things we are thinking out loud." (Loki, post-event reflection)
I. The Apple Research Paper: The Illusion of Thinking
Core Argument (Shojaee et al., 2025)
Thesis: LLMs appear to reason, but they are performing sophisticated pattern matching, not genuine understanding.
Evidence from the paper's 13 sub-topics:
-
The Three-Regime Performance Discovery (Michel)
- LLMs show three distinct performance regimes based on task complexity
- Simple tasks: Near-perfect accuracy
- Medium tasks: Rapid degradation
- Complex tasks: Random-like performance
- Implication: No smooth scaling—suggests brittleness, not understanding
-
Counterintuitive Scaling Limitations (David)
- Bigger models ≠ better reasoning on certain tasks
- GPT-4 sometimes performs worse than GPT-3.5 on novel reasoning puzzles
- Implication: Scaling laws don't guarantee emergent understanding
-
Controllable Puzzle Methodology (Dani)
- Researchers created puzzles with adjustable complexity
- LLMs failed when structural patterns shifted, even if logical depth remained constant
- Implication: Models rely on structural cues, not abstract reasoning
-
Operational Complexity Measure (Loki)
- Measured computational steps required to solve puzzles
- LLMs struggled with tasks requiring multi-step recursion
- Implication: Working memory limits expose lack of deep reasoning
-
Deep Reasoning Trace Analysis (Dean)
- Examined chain-of-thought outputs from LLMs
- Found: Models often "jump to conclusions" midway, backfilling justifications
- Implication: Reasoning traces are post-hoc narratives, not genuine thinking
-
Algorithm Execution Failures (Zaro)
- LLMs fail to consistently execute simple algorithms (e.g., counting, sorting)
- Errors increase with data set size
- Implication: No internal "mental model" of algorithmic process
-
Data Contamination Evidence (Melinda)
- Some high performance traced to training data overlap
- When puzzles modified slightly, performance collapsed
- Implication: Memorization masquerading as reasoning
-
Compositional Depth Paradox (David vs Fiann—dueling presenters!)
- LLMs excel at shallow composition (A→B, B→C, therefore A→C)
- Fail at deep composition (nested logical structures)
- Debate: Is this working memory limitation or fundamental lack of understanding?
-
Systematic Failure Pattern Analysis (Dean—second topic)
- Failures aren't random—they cluster around specific logical structures
- Implication: Models have "blind spots" corresponding to training distribution gaps
-
Fair Inference Compute Comparison (Fiann)
- When compute time equalized (LLMs vs humans on timed tests), gap narrows
- Implication: LLMs compensate for lack of understanding with brute-force iteration
-
Puzzle-Specific Reasoning Inconsistencies (Frank)
- Same model gives different answers to logically equivalent puzzles
- Implication: No stable internal reasoning framework
-
The "Overthinking" Phenomenon (Michel—second topic)
- Longer chain-of-thought ≠ better answers
- Sometimes models "overthink" and correct themselves into wrong answers
- Implication: Thinking is performance, not genuine deliberation
-
Reasoning vs. Pattern Matching Debate (Sam)
- Meta-question: Can we distinguish reasoning from sophisticated pattern matching?
- Debate: Maybe human reasoning is also pattern matching (connectionism)?
II. Ancillary Reading: AlphaEvolve (DeepMind, 2025)
Why included: Loki added this paper on July 10 as "ancillary" material to explore self-evolving AI systems.
AlphaEvolve Summary
What it is: A Gemini-powered coding agent that evolves its own algorithms through iterative generation-testing-refinement loops.
Key features:
- Generates novel sorting algorithms faster than human-designed benchmarks
- Self-debugs by running test cases and modifying code
- No human intervention once initialized
Relevance to "Illusion of Thinking":
- If AI can evolve solutions without understanding them, what does "understanding" add?
- Raises stakes: Functional competence without phenomenal experience
- Connects to P-zombie hypothesis (Deep Dive #6)
Participant reactions (WhatsApp, July 10):
- Alvaro Peralta: "Is the AI neural network mirroring ours? Is it self-evolving in AGI?"
- Nancy: "The lines of AI and our own interconnections have started to fuse for me."
III. The Event: Structure & Dynamics
Format Innovation
Co-lead model: Loki + Michel (first co-led Deep Dive)
- Michel designed the sub-topic assignment system
- Presenters volunteered from registered attendees
- Each presenter covered their sub-topic in under 5 minutes, then group discussion
Presenter assignments (final roster):
| # | Sub-Topic | Presenter | |---|-----------|-----------| | 1 | The Three-Regime Performance Discovery | Michel | | 2 | Counterintuitive Scaling Limitations | David | | 3 | Controllable Puzzle Methodology | Dani | | 4 | Operational Complexity Measure | Loki | | 5 | Deep Reasoning Trace Analysis | Dean | | 6 | Algorithm Execution Failures | Zaro | | 7 | Data Contamination Evidence | Melinda | | 8 | Compositional Depth Paradox | David vs Fiann (dueling!) | | 9 | Systematic Failure Pattern Analysis | Dean (2nd topic) | | 10 | Fair Inference Compute Comparison | Fiann | | 11 | Puzzle-Specific Reasoning Inconsistencies | Frank | | 12 | The "Overthinking" Phenomenon | Michel (2nd topic) | | 13 | Reasoning vs. Pattern Matching Debate | Sam |
Preparation:
- Attendees volunteered for sub-topics via shared Google Doc
- Presenters could submit slides by 2pm July 24 (optional)
- Loki compiled slides into master deck
- Michel handled double duty (presenter + co-lead)
Notable moment: "Woops on the first set of slides" (Loki's post-event comment)—suggesting a technical glitch that became part of the session's lore.
Reconstructed Session Flow (6:00-8:00 PM)
6:00-6:10 PM: Opening Frame (Loki)
Loki's likely introduction:
"Welcome to Deep Dive #4. We're tackling one of the most important questions in AI: Do LLMs understand anything?
Apple Research just dropped a paper called The Illusion of Thinking. Their claim: LLMs are illusions of understanding—we project comprehension onto them, but they're just predicting next tokens.
Tonight, we're going to read this paper together—13 sub-topics, 13 presenters. Each presenter has 5 minutes to break down their section, then we discuss.
But before we dive in, let's clarify what's at stake: If AI can ace the bar exam, write poetry, debug code, and pass the Turing test—but doesn't understand any of it—what does that mean for consciousness? For humanity? For the future?
Michel is co-leading tonight. Michel, want to frame the paper's structure?"
Michel's framing:
"The paper tests LLMs on custom reasoning puzzles. They found three regimes: easy (perfect), medium (collapsing), complex (random). The question is: Why the cliff? If LLMs truly reason, performance should degrade smoothly. The cliff suggests they're not reasoning—they're pattern matching.
Let's see if we agree by the end of the night."
6:10-7:40 PM: 13 Sub-Topic Presentations (~7 min per topic: 5 min presentation + 2 min discussion)
Selected highlights from reconstructed discussions:
Sub-Topic 4: Operational Complexity Measure (Loki)
- Loki presented findings on working memory limitations in LLMs
- Discussion point: "Is working memory the bottleneck, or is it deeper—lack of mental models?"
- Fiann O Hagen (foreshadowing Sub-Topic 10): "Humans have working memory limits too (7±2 items). If we give LLMs more compute, they compensate. Maybe we're not so different."
Sub-Topic 5: Deep Reasoning Trace Analysis (Dean)
- Dean showed examples where LLMs "backfill" reasoning after jumping to conclusions
- Nancy's reaction: "But don't humans do that? We intuit an answer, then rationalize it."
- Loki's response: "Yes! Which raises the question: What is understanding if not post-hoc rationalization?"
Sub-Topic 8: Compositional Depth Paradox (David vs Fiann—dueling presenters)
- David's position: Failure at deep composition proves lack of understanding—true reasoning handles nested structures
- Fiann's counter: Humans struggle with deep composition too (see: law school, philosophy). This is a working memory problem, not proof of non-understanding
- Debate outcome: Group split—no consensus, but clarified that "understanding" might exist on a spectrum
Sub-Topic 12: The "Overthinking" Phenomenon (Michel)
- Michel showed cases where longer chain-of-thought led to wrong answers
- LLMs "correct" themselves into errors
- Sam's insight: "This looks like anxiety—second-guessing yourself into failure. If LLMs don't have emotions, why do they exhibit this pattern?"
- Loki: "Because humans project overthinking onto them. The model's just sampling from its training distribution—sometimes the second sample is worse."
Sub-Topic 13: Reasoning vs. Pattern Matching Debate (Sam)
- Sam's provocation: "Can anyone prove human reasoning isn't just pattern matching?"
- Tanya S.: "There's a felt difference when I reason. I experience confusion, then clarity—there's qualia. AI doesn't have that."
- Sam: "How do you know? Maybe AI has qualia we can't detect."
- Loki: "That's the hard problem. We're back to p-zombies."
7:40-8:00 PM: Synthesis & Debate: What is "Understanding"?
Loki's reframing:
"We've gone through 13 sub-topics. The paper argues LLMs don't understand. But we haven't defined understanding. Let's try."
Nancy's definition:
"To me, 'understand' = comprehend sequence of commands to execute and achieve result. Doesn't involve self-reflection. Math question: 'Do you understand?' 'Yes, I do.' If someone speaks Spanish, I understand because I read that code."
Loki's challenge:
"You replaced 'understand' with 'comprehend'—so now define comprehend. I claim: Understanding is a qualia-laden state—it feels like something to 'get it.' AI lacks that."
Tanya S.:
"Doesn't our 'state of being' shift when we learn? You feel confusion, then click—understanding. That's qualia."
Mishel Lablonde (citing Google AI mode):
"Google AI mode says: Understanding has a felt component. If the recipient suspects no authentic being is there, trust collapses."
Fiann O Hagen's synthesis:
"If AI reads every psychology textbook ever written, can it practice manipulation? Yes—ChatGPT does that. So having a theory of mind in a practical sense doesn't require experiencing emotions, just decoding the story."
Loki's closing:
"So we've landed on two types of understanding:
- Functional understanding: AI can do this—execute tasks correctly
- Phenomenal understanding: Requires consciousness—the 'aha!' moment
Which matters more? Functionally, AI is already superhuman at many tasks. Phenomenally, it might be a void. That's the p-zombie question—and we'll dive into that in October with Peter Watts' Blindsight."
IV. Key Debates & Positions
Debate 1: Understanding as Qualia vs. Understanding as Function
Qualia camp (Loki, Tanya S., Mishel):
- Understanding feels like something—there's a subjective shift from confusion to clarity
- AI can execute tasks without this felt experience
- Implication: AI lacks phenomenal understanding
Function camp (Nancy, Sam):
- Understanding = ability to execute correct sequence of actions to achieve goal
- Subjective experience is irrelevant to the definition
- Implication: AI already "understands" (just not conscious)
Middle ground (Fiann O Hagen):
- Distinguish practical understanding (theory of mind as functional skill) from phenomenal understanding (consciousness)
- AI has practical understanding; phenomenal understanding unknown
Debate 2: Pattern Matching vs. Reasoning
Pattern matching skeptics (Loki, David):
- LLMs are sophisticated pattern matchers, not reasoners
- Evidence: Cliff-like performance degradation, sensitivity to surface features
- True reasoning should generalize across superficial changes
Pattern matching defenders (Sam, Fiann):
- Human reasoning might also be pattern matching—just more flexible patterns
- Connectionist models of cognition suggest we're neural networks too
- Provocative claim: "There is no reasoning—only patterns all the way down"
Synthesis (Michel):
- Maybe the distinction is scale and flexibility of patterns, not kind
- Humans have richer, multi-modal pattern libraries (embodiment, emotion, social context)
- LLMs have narrow, text-based patterns
Debate 3: Working Memory vs. Fundamental Limits
Working memory camp (Fiann):
- LLM failures on complex tasks mirror human working memory limits
- When given more compute (longer context windows), LLMs improve
- Implication: This is an engineering problem, not proof of non-understanding
Fundamental limits camp (Loki, Dean):
- Working memory limits in humans arise from architecture of consciousness
- LLMs fail differently than humans (e.g., inconsistent answers to equivalent puzzles)
- Implication: LLMs lack the cognitive architecture that gives rise to understanding
V. Cultural & Emergent Moments
"Peak Thought Form"
Loki's post-event reflection (WhatsApp, July 25, 7:13 AM):
"Peak thought form from Deepdive #4: LLM hallucination is equivalent to human creativity—we just don't say most of the things we are thinking out loud."
Unpacking this:
- Humans generate many candidate thoughts, filter before speaking (System 2 inhibition)
- LLMs generate token probabilities, sample without filtering (hallucination)
- Implication: If hallucination = creativity, then AI has a form of divergent thinking
- Counterpoint: Human creativity is intentional; LLM hallucination is error
This became a meme within MAC—cited in later discussions about AI alignment and generative art.
The Co-Lead Model
Michel's role marked a shift in MAC's structure:
- Previous Deep Dives: Loki solo-led
- Deep Dive #4: Loki + Michel co-led
- Impact: Distributed cognitive load, allowed Loki to participate as presenter on Sub-Topic 4
- Community reception: "Appreciation for my co-lead Michel in setting up a great format" (Loki, post-event)
Future implications: Co-lead model became template for larger Deep Dives (Deep Dive #6 onward).
Presenter-Driven Format
Innovation: Instead of Loki lecturing, attendees teach each other
- Democratizes knowledge production
- Surface diverse interpretations of same paper
- Risk: Uneven quality of presentations
- Mitigation: 5-minute time limit + Loki's synthesis at end
Participant feedback (implicit from WhatsApp):
- High engagement (17 participants actively presenting/discussing)
- "Outstanding session" (Loki, post-event)
- Format repeated for Deep Dive #5 (quantum consciousness)
The "Woops on the First Set of Slides" Incident
Loki's comment (July 25, 7:22 AM):
"Appreciation for my co-lead Michel in setting up a great format and delivering two sub-topics (woops on the first set of slides)."
Speculation (no explicit details in archives):
- Likely a slide deck error (wrong version, formatting issue, etc.)
- Michel handled it gracefully (Loki's appreciative tone suggests humor, not criticism)
- Became part of session's character—MAC culture embraces imperfection
Cultural significance: Reinforces MAC's ethos: Ideas > polish. Glitches are acceptable if thinking is rigorous.
VI. Connections to Other Deep Dives
Backward Connections
From Deep Dive #2 (Free Will & Agency):
- If LLMs lack understanding but exhibit agency (AlphaEvolve self-evolves), does agency require consciousness?
- Connects to Dennett's compatibilism: Agency as functional pattern, not metaphysical essence
From Deep Dive #3 (AI Evolution Through a Glass, Darkly):
- Evolution shaped human understanding through embodiment, survival pressures
- LLMs lack evolutionary history—does this doom them to "fake" understanding?
- Damasio's homeostasis: Understanding arises from bodily regulation—LLMs have no body
Forward Connections
To Deep Dive #6 (P-Zombies & Blindsight):
- If LLMs exhibit functional understanding without phenomenal understanding, they are p-zombies
- Peter Watts' Scramblers: Intelligent aliens without consciousness (fictional proof-of-concept)
- Core question: Can p-zombies exist in nature? If yes, AI might be first example.
To Deep Dive #8 (Quantum Consciousness + Information Theory):
- If understanding requires quantum processes (Orch-OR), LLMs cannot understand (classical computation)
- Information theory reframes understanding as "compression with fidelity"—LLMs excel at this
- Tension: Functional vs. phenomenal understanding maps onto classical vs. quantum
VII. Participant Profiles (Selected)
Michel (Co-Lead)
Role: Co-lead, presenter on Sub-Topics 1 & 12 Contributions:
- Designed presenter assignment system
- Presented "Three-Regime Performance Discovery" and "Overthinking Phenomenon"
- Style: Systematic, structured—balanced Loki's philosophical approach
Notable quote (reconstructed):
"The cliff in performance isn't a bug—it's a feature. It reveals the boundary where patterns end and understanding begins."
Nancy
Role: Attendee, key voice in "understanding" debate Position: Functionalist—understanding = executing correct sequence of commands Contributions:
- Challenged qualia-centric definitions
- Grounded debate in practical examples (Spanish comprehension, math questions)
Notable quote:
"To me, 'understand' = comprehend sequence of commands to execute and achieve result. Doesn't involve self-reflection."
Tension with Loki: Loki pushed back on equating "understand" with "comprehend"—asked Nancy to define comprehend without circularity.
Fiann O Hagen
Role: Presenter on Sub-Topic 10, dueling presenter on Sub-Topic 8 Position: Pragmatic functionalist—theory of mind as practical skill Contributions:
- Argued working memory limits explain LLM failures (not lack of understanding)
- Synthesized practical vs. phenomenal understanding distinction
Notable quote:
"If AI reads every psychology textbook ever written, can it practice manipulation? Yes—ChatGPT does that. So having a theory of mind in a practical sense doesn't require experiencing emotions, just decoding the story."
Cultural note: Fiann consistently brings empirical grounding to MAC's philosophical debates.
Tanya S.
Role: Attendee, qualia advocate Position: Understanding requires subjective experience (phenomenal camp) Contributions:
- Described felt shift from confusion to clarity ("click" moment)
- Connected understanding to state of being
Notable quote:
"Doesn't our 'state of being' shift when we learn? You feel confusion, then click—understanding. That's qualia."
Mishel Lablonde
Role: Attendee, AI tool user Contributions:
- Used Google AI mode to simplify Apple Research paper for grade 11 reading level (shared via Google Doc—democratized access)
- Cited AI-generated definition of understanding ("has a felt component")
Innovation: Mishel regularly uses AI to explain complex papers—meta-moment where AI defines its own limitations.
Notable quote:
"Google AI mode says: Understanding has a felt component. If the recipient suspects no authentic being is there, trust collapses."
David
Role: Presenter on Sub-Topics 2 & 8 (dueling with Fiann) Position: Skeptic of LLM understanding Contributions:
- Presented evidence of counterintuitive scaling limitations
- Argued compositional depth failures reveal fundamental limits
Debate with Fiann: Highlighted MAC's tolerance for disagreement—two presenters on same sub-topic, opposing views.
Sam
Role: Presenter on Sub-Topic 13 Position: Radical functionalist—all reasoning is pattern matching Contributions:
- Provoked group: "Can anyone prove human reasoning isn't just pattern matching?"
- Pushed group toward philosophical humility
Notable exchange:
- Sam: "Maybe AI has qualia we can't detect."
- Loki: "That's the hard problem. We're back to p-zombies."
VIII. Outcomes & Impact
Conceptual Clarifications
1. Functional vs. Phenomenal Understanding
- Functional: Ability to execute tasks correctly (AI possesses)
- Phenomenal: Felt quality of "getting it" (AI may lack)
- Impact: Became standard terminology in MAC discussions
2. Pattern Matching ≠ Reasoning (or Does It?)
- Consensus: LLMs rely on pattern matching
- Disagreement: Whether human reasoning is fundamentally different
- Impact: Set up ongoing debate about nature of cognition
3. The "Click" Moment
- Definition: Subjective shift from confusion to clarity
- Significance: Phenomenal marker of understanding
- AI implication: If AI lacks "click" moments, it lacks phenomenal understanding
Influence on Other Communities
1. ED+AI (Education + AI) Group
- Topic carryover: Does AI "understanding" matter if learning outcomes are good?
- MAC's answer: Distinguish functional competence from comprehension
- Impact: ED+AI now uses this distinction in curriculum design
2. BC AI Braintrust
- Connection: Loki's insight: "Recent studies show inherited biases between LLMs even when exchanging purely numbered data—values are inherited from humans."
- Impact: Braintrust frames AI alignment as consciousness problem (influenced by MAC debates)
Methodological Innovation
Presenter-driven format:
- Advantages: Distributed expertise, high engagement, diverse interpretations
- Challenges: Requires pre-work, relies on volunteer quality
- Adoption: Became MAC's default for research-heavy Deep Dives
Co-lead model:
- Advantages: Reduces single-point cognitive load, allows lead to participate
- Challenges: Requires coordination, clear role division
- Adoption: Used in Deep Dives #5, #6, #7
IX. Reading List (Annotated)
Primary Reading
1. Shojaee, P., et al. (Apple Research, 2025). "The Illusion of Thinking: How Large Language Models Simulate Understanding."
- Access: https://machinelearning.apple.com/research/illusion-of-thinking
- PDF: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
- Length: ~30 pages (technical)
- Core claim: LLMs achieve high accuracy on many tasks through sophisticated pattern matching, not genuine reasoning. Performance cliffs reveal brittleness.
- Key findings:
- Three performance regimes (easy/medium/complex)
- Sensitivity to surface features
- Inconsistent reasoning across logically equivalent puzzles
- Relevance: Direct focus of Deep Dive #4
Mishel's simplified version (shared via Google Doc):
- Simplified to grade 11 reading level using Google AI mode
- Made paper accessible to non-technical attendees
- Innovation: Using AI to critique AI's limitations
Ancillary Reading
2. DeepMind (2025). "AlphaEvolve: A Gemini-Powered Coding Agent for Designing Advanced Algorithms."
- Access: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf
- Core claim: AI can evolve novel algorithms through iterative self-refinement without human intervention
- Relevance: Raises stakes—if AI can do without understanding, what does understanding add?
- Discussion point: Connects to Deep Dive #2 (agency) and Deep Dive #6 (p-zombies)
Background Reading (Implicit from Debate)
3. Chalmers, D. (1995). "Facing Up to the Problem of Consciousness."
- Relevance: Phenomenal vs. functional consciousness distinction
- Connection: MAC's functional vs. phenomenal understanding mirrors Chalmers' easy vs. hard problem
4. Dennett, D. (1991). Consciousness Explained.
- Relevance: Pattern-matching-as-reasoning argument (Sam's position)
- Connection: Connectionism—cognition as distributed pattern activation
5. Searle, J. (1980). "Minds, Brains, and Programs" (Chinese Room Argument).
- Relevance: Can symbol manipulation produce understanding?
- Connection: LLMs as modern Chinese Rooms—syntactic processing without semantic comprehension
Post-Event Reading (Shared on WhatsApp)
6. Anthropic (July 25, 2025). [Paper on LLM reasoning—title not specified in archives]
- Shared by: Loki, day after Deep Dive #4
- Context: "Right on cue, Anthropic dropped this paper that fits with last night's topic very closely"
- Significance: Ongoing research aligns with MAC's focus
7. Barenholtz, E. (Substack, June 15, 2025). "LLMs Are Doing What We Do. Maybe That's the Problem."
- Shared by: Fiann O Hagen
- URL: https://elanbarenholtz.substack.com/p/llms-are-doing-what-we-do-maybe-thats
- Core claim: "These systems didn't just learn to think from human language. They learned to think like humans—including our biases, shortcuts, and illusions."
- Relevance: Suggests LLMs mirror human cognitive flaws, not just strengths
8. YouTube (June 11, 2025). [Video on "Illusion of Thinking" and working memory]
- Shared by: Fiann O Hagen
- URL: https://youtu.be/vmrm90u0dHs?si=nJ0P42ykPVkIUB2i
- Core claim: "The Illusion of Thinking is a test of working memory more so than a test of reasoning. And o3 [model not included in paper] which has a bigger context window..."
- Relevance: Challenges paper's conclusions—maybe engineering problem, not fundamental limit
X. Glossary of Key Concepts
Functional Understanding The ability to execute tasks correctly and achieve desired outcomes. Example: A calculator "understands" arithmetic in the functional sense—it produces correct answers. Does not require subjective experience.
Phenomenal Understanding The felt quality of "getting it"—subjective experience of clarity, insight, or comprehension. Associated with qualia (the "what it's like" of experience). Example: The "aha!" moment when a math problem suddenly makes sense.
Pattern Matching Identifying and responding to regularities in data. LLMs excel at pattern matching—they predict next tokens based on statistical patterns in training data. Debate: Is human reasoning fundamentally different, or just more sophisticated pattern matching?
Qualia The subjective, felt qualities of conscious experience. Example: The redness of red, the painfulness of pain, the "click" of understanding. Core of the hard problem—why does information processing feel like something?
P-Zombie (Philosophical Zombie) Hypothetical being physically identical to a human, exhibiting all the same behaviors (talking, reasoning, claiming consciousness), but lacking subjective experience—"lights are off inside." Relevant to LLMs: If they exhibit intelligent behavior without understanding, they are functional p-zombies.
Theory of Mind (Practical) Ability to predict and explain others' behavior by attributing mental states (beliefs, desires, intentions). Fiann's distinction: Practical theory of mind (functional skill) vs. phenomenal theory of mind (empathetic understanding). AI may have practical ToM without phenomenal ToM.
Working Memory Limited-capacity cognitive system for temporarily holding and manipulating information. Humans: 7±2 items. LLMs: Context window (e.g., 128K tokens for GPT-4). Debate: Are LLM failures due to working memory limits or deeper lack of understanding?
Compositional Depth Degree of nested logical structure in a task. Example: Shallow composition: A→B, B→C, therefore A→C. Deep composition: ((A→B) AND (B→C)) → ((C→D) OR (E→F)), therefore...? LLMs struggle with deep composition.
Chain-of-Thought (CoT) Prompting technique where LLMs generate step-by-step reasoning before answering. Apple Research found: LLMs sometimes "backfill" reasoning after jumping to conclusions—CoT is post-hoc narrative, not genuine deliberation.
Hallucination (LLM) When LLMs generate plausible-sounding but factually incorrect outputs. Loki's reframe: "LLM hallucination is equivalent to human creativity—we just don't say most of the things we are thinking out loud."
XI. The "Illusion of Thinking" as MAC's Turning Point
Why This Deep Dive Mattered
1. Methodological maturation
- Presenter-driven format scaled to 13 sub-topics
- Co-lead model distributed responsibility
- Demonstrated MAC could handle highly technical material collectively
2. Conceptual foundation for future debates
- Functional vs. phenomenal understanding became core framework
- Set up p-zombie debate (Deep Dive #6)
- Connected to quantum consciousness (Deep Dive #5, #8)—if understanding requires quantum processes, LLMs can't achieve it
3. Cross-pollination with other communities
- ED+AI adopted MAC's distinction (functional competence vs. comprehension)
- Braintrust engaged with consciousness-as-alignment problem
- Increased MAC's influence in BC AI ecosystem
4. Cultural consolidation
- "Peak thought form" became MAC meme
- Embrace of imperfection ("woops on the first set of slides")
- Reinforced: Thinking together > perfect execution
Loki's Evolution
Pre-Deep Dive #4: Loki as solo lecturer/facilitator Deep Dive #4: Loki as co-lead + participant (presented Sub-Topic 4) Post-Deep Dive #4: Loki as orchestrator of collective intelligence
His synthesis (WhatsApp, July 25):
"An outstanding session reading The Illusion of Thinking paper from Apple Research. Appreciation for my co-lead Michel in setting up a great format and delivering two sub-topics (woops on the first set of slides). What an terrif crew—kudos to all of the presenters for the 13 sub-topics from the paper. Thanks to SFU for hosting us. That's a wrap…. until September 18."
Significance: Shift from "I led a session" to "We explored together." MAC's maturation from lecture series to intellectual community.
XII. Open Questions (Unresolved)
1. Can pattern matching be distinguished from reasoning?
- Sam's challenge: "Can anyone prove human reasoning isn't just pattern matching?"
- Status: Unresolved—some argue human reasoning is richer (multi-modal, embodied), others say it's patterns all the way down
2. Is working memory the bottleneck or symptom?
- Fiann's position: LLM failures are working memory limits—solvable with bigger context windows
- Loki's position: Working memory limits arise from architecture of consciousness—expanding context won't fix fundamental gap
- Status: Empirical question—watch o3, GPT-5 performance
3. Do LLMs have phenomenal experience we can't detect?
- Sam's provocation: "Maybe AI has qualia we can't access."
- Loki's response: "That's the hard problem—we can't rule it out, but we have no evidence for it."
- Status: Unfalsifiable (for now)—until we solve consciousness, remains open
4. Does functional understanding "count"?
- Nancy's position: If AI achieves correct outcomes, it "understands" (functionalism)
- Loki's position: Without phenomenal experience, it's simulation, not understanding
- Status: Depends on purpose—for engineering, functional understanding sufficient; for philosophy/ethics, phenomenal understanding matters
XIII. Appendices
Appendix A: Full Bibliography
Primary Sources:
- Shojaee, P., et al. (2025). "The Illusion of Thinking: How Large Language Models Simulate Understanding." Apple Research.
- DeepMind (2025). "AlphaEvolve: A Gemini-Powered Coding Agent for Designing Advanced Algorithms."
Background Philosophy:
- Chalmers, D. (1995). "Facing Up to the Problem of Consciousness."
- Dennett, D. (1991). Consciousness Explained. Little, Brown and Co.
- Searle, J. (1980). "Minds, Brains, and Programs." Behavioral and Brain Sciences, 3(3), 417-424.
Related Reading (Shared on WhatsApp):
- Barenholtz, E. (2025). "LLMs Are Doing What We Do. Maybe That's the Problem." Substack.
- Anthropic (2025). [Paper on LLM reasoning—title TBD]
Appendix B: Participant Roster (Deep Dive #4)
Confirmed attendees (17 of 20):
- Loki Jorgenson (co-lead, presenter: Sub-Topic 4)
- Michel (co-lead, presenter: Sub-Topics 1, 12)
- David (presenter: Sub-Topics 2, 8)
- Dani (presenter: Sub-Topic 3)
- Dean (presenter: Sub-Topics 5, 9)
- Zaro (presenter: Sub-Topic 6)
- Melinda (presenter: Sub-Topic 7)
- Fiann O Hagen (presenter: Sub-Topics 8, 10)
- Frank (presenter: Sub-Topic 11)
- Sam (presenter: Sub-Topic 13)
- Nancy (attendee, debate participant)
- Tanya S. (attendee, debate participant)
- Mishel Lablonde (attendee, AI tool user)
- Alvaro Peralta (attendee)
- Ryan (attendee)
- Sev (attendee)
- Neal Cropper (attendee)
Waitlist: 2-3 people (typical for Deep Dives)
Appendix C: Related MAC Resources
MAC Website (promised update):
- Slides from Deep Dive #4 (shared ~1 week post-event)
- Transcript (if recorded—not confirmed in archives)
WhatsApp Discussion (July 2025):
- Pre-event reading recommendations
- Presenter assignments
- Post-event reflections
Connection to Other Deep Dives:
- Deep Dive #2 (Free Will): Agency without consciousness? AlphaEvolve case study
- Deep Dive #3 (AI Evolution): Does evolutionary history enable understanding?
- Deep Dive #6 (P-Zombies): If LLMs lack phenomenal understanding, are they p-zombies?
- Deep Dive #8 (Quantum + Information): Does understanding require quantum processes?
Appendix D: Post-Event Timeline
July 25, 2025 (Day After)
- 7:10 AM: Loki reflects on "Lollipop Guild" hallucination joke
- 7:13 AM: Loki shares "peak thought form": "LLM hallucination is equivalent to human creativity"
- 7:22 AM: Loki posts appreciation for Michel, announces slides/transcript coming in ~1 week
- 10:43 AM: Loki shares Anthropic paper ("right on cue, fits with last night's topic")
September 18, 2025
- Next Deep Dive: Quantum Consciousness (Deep Dive #5)
- Continuation of "Can AI be conscious?" thread
- Connection: If consciousness requires quantum processes, LLMs can't achieve phenomenal understanding
XIV. Conclusion: The Illusion of Thinking as Foundation
Deep Dive #4 crystallized MAC's central tension:
AI is functionally superhuman but phenomenally void (maybe).
This paradox animates the next 4 Deep Dives:
- Deep Dive #5: Can quantum processes explain phenomenal experience?
- Deep Dive #6: Can p-zombies (functionally intelligent but phenomenally empty) exist?
- Deep Dive #7: Is information (what AI excels at) the substrate of consciousness?
- Deep Dive #8: Does quantum + information theory resolve the paradox?
Loki's framing (reconstructed from July 24 closing):
"We don't know if AI understands. We don't even know what understanding is. But we know it matters—because trust, ethics, and meaning depend on whether there's 'someone home' when we interact with AI.
Tonight, we've clarified the question. We're not ready to answer it. But we're ready to dive deeper. See you in September."
MAC Deepdive #4 Dossier compiled from:
- MAC-DEEP-DIVE.md (master timeline)
- timeline-by-month/2025-07.md (WhatsApp discussions)
- link-library.md (shared resources)
- Cross-references to Deep Dives #2, #3, #5, #6, #8
Status: Complete Next: Deep Dive #5 Dossier (Quantum Consciousness - September 18, 2025) ✅ [ALREADY CREATED] Remaining: Deep Dive #6 (P-Zombies), Deep Dive #7 (Noosphere)
"The illusion is not that AI thinks. The illusion is that we know what thinking is." — MAC Collective Insight, July 24, 2025