AI/ML Reading List

Search:

Type:

Difficulty:

Essential only

Part 1: Getting Started
§1 Math & CS Foundations 11
★ Essence of Linear Algebra — 3Blue1Brown [course]

Visual intuition for vectors, matrices, eigenvalues
Neural Networks — 3Blue1Brown [course]

Visual intro to how neural nets work
Statistics & Probability — Khan Academy [course]

Distributions, Bayes' theorem, hypothesis testing
Mathematics for Machine Learning — Deisenroth et al. [book]

Free textbook---linear algebra, calculus, probability
CS231n Notes — Stanford [course]

Practical neural net fundamentals
★ Practical Deep Learning for Coders — fast.ai [course]

Top-down, code-first approach
How Transformer LLMs Work — Alammar & Grootendorst [course]

95-min course: tokenization, attention, MoE
Python for ML
Python Tutorial [documentation]

Official, if you need basics
NumPy Quickstart [documentation]

Array operations
Pandas Getting Started [documentation]

Data manipulation
PyTorch 60-Minute Blitz [documentation]

Tensors, autograd, training
Part 2: Understanding AI
§2 Foundations (The Canon) 9
★ Learning Representations by Back-Propagating Errors — Rumelhart, Hinton, Williams (1986) [paper]

How neural nets learn. Everything builds on this.
★ Efficient Estimation of Word Representations in Vector Space — Mikolov et al. (2013) [paper]

Word2vec. 'King - Man + Woman = Queen'
GloVe: Global Vectors for Word Representation — Pennington et al. (2014) [paper]

Alternative embeddings, co-occurrence based
Sequence to Sequence Learning — Sutskever et al. (2014) [paper]

Encoder-decoder architecture
ImageNet Classification with Deep CNNs — Krizhevsky et al. (2012) [paper]

ImageNet moment---deep learning's 'big bang'
Deep Residual Learning — He et al. (2015) [paper]

Skip connections, enabled very deep networks
Batch Normalization — Ioffe & Szegedy (2015) [paper]

Training stability trick used everywhere
Dropout — Srivastava et al. (2014) [paper]

Regularization that actually works
Adam: A Method for Stochastic Optimization — Kingma & Ba (2014) [paper]

The default optimizer
§3 Attention & Transformers 10
Neural Machine Translation by Jointly Learning to Align and Translate — Bahdanau et al. (2014) [paper]

Invented attention mechanism
★ Attention Is All You Need — Vaswani et al. (2017) [paper]

Transformers. The architecture. Read carefully.
BERT: Pre-training of Deep Bidirectional Transformers — Devlin et al. (2018) [paper]

Bidirectional pretraining, MLM objective
Improving Language Understanding by Generative Pre-Training — Radford et al. (2018) [paper]

GPT---autoregressive pretraining
Language Models are Unsupervised Multitask Learners — Radford et al. (2019) [paper]

GPT-2, scaling
★ Language Models are Few-Shot Learners — Brown et al. (2020) [paper]

GPT-3, in-context learning emerges at scale
Scaling Laws for Neural Language Models — Kaplan et al. (2020) [paper]

Chinchilla precursor, loss vs. compute/data/params
★ Training Compute-Optimal Large Language Models — Hoffmann et al. (2022) [paper]

Chinchilla---optimal scaling ratios
LLaMA: Open and Efficient Foundation Language Models — Touvron et al. (2023) [paper]

Open weights, efficient training
FlashAttention — Dao et al. (2022) [paper]

IO-aware attention, practical speedup
§4 Reasoning & Chain-of-Thought 16
★ Chain-of-Thought Prompting Elicits Reasoning — Wei et al. (2022) [paper]

'Let's think step by step' works
Self-Consistency Improves Chain of Thought Reasoning — Wang et al. (2022) [paper]

Sample multiple CoT paths, majority vote
Tree of Thoughts — Yao et al. (2023) [paper]

Search over reasoning paths
★ ReAct: Synergizing Reasoning and Acting — Yao et al. (2022) [paper]

Reasoning + Acting, tool use
Toolformer — Schick et al. (2023) [paper]

LLMs learning to use tools
Let's Verify Step by Step — Lightman et al. (2023) [paper]

Process reward models for math
MAKER: Solving a Million-Step LLM Task (2025) [paper]

Ensemble voting for long-horizon reliability
The Prompt Report — Schulhoff et al. (2024) [paper]

58 prompting techniques, taxonomy, best practices
Let Me Speak Freely? — Tam et al. (2024) [paper]

Structured output (JSON/XML) degrades reasoning
Thinking Before Constraining — Nguyen et al. (2026) [paper]

Fix: reason freely, then constrain output format
XML Prompting as Grammar-Constrained Interaction — Alpay & Alpay (2025) [paper]

Formal framework for XML-based structured prompting with convergence guarantees
SLOT: Structuring the Output of Large Language Models — Shen et al. (2025) [paper]

Fine-tuned lightweight model as post-processing layer for structured output
StructuredRAG: JSON Response Formatting with Large Language Models — Shorten et al. (2024) [paper]

Benchmark for structured output reliability across tasks and models
Outlines: Structured Text Generation — dottxt (2024) [resource]

Grammar-based constrained decoding. Guarantees valid JSON/regex output by constraining token sampling.
Guidance: A Language for Controlling LLMs — Microsoft (2023) [resource]

Interleaves generation with programmatic control. Constrains output structure without taxing model attention.
Prompt Repetition Improves Non-Reasoning LLMs — Leviathan, Kalman, Matias (2025) [paper]

Repeating the input prompt improves performance without increasing output tokens or latency. Reasoning models already learn to do this internally.
Part 3: Building with AI
§5 RAG & Retrieval 19
★ RAG for LLMs: A Survey — Gao et al. (2023) [paper]

Start here. Naive -> Advanced -> Modular RAG paradigms
★ Pinecone RAG Guide [resource]

End-to-end walkthrough. Good second read after survey.
LangChain RAG Tutorial [resource]

Hands-on implementation with code
LlamaIndex RAG Docs [resource]

Concepts + implementation
RAG From Scratch — LangChain [video]

Video series for visual learners
★ Retrieval-Augmented Generation for Knowledge-Intensive NLP — Lewis et al. (2020) [paper]

Original RAG paper---foundational
★ Dense Passage Retrieval for Open-Domain QA — Karpukhin et al. (2020) [paper]

DPR---learned retrieval beats BM25
REALM: Retrieval-Augmented Language Model Pre-Training — Guu et al. (2020) [paper]

Retrieval-augmented pretraining
Precise Zero-Shot Dense Retrieval without Relevance Labels — Gao et al. (2022) [paper]

HyDE---hypothetical document embeddings
Query2doc: Query Expansion with Large Language Models — Wang, Yang, Wei (2023) [paper]

LLM generates pseudo-document for BM25. More conservative than HyDE.
Self-RAG: Learning to Retrieve, Generate, and Critique — Asai et al. (2023) [paper]

LLM decides when to retrieve
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval — Sarthi et al. (2024) [paper]

Recursive summarization for retrieval
Atlas: Few-shot Learning with Retrieval Augmented LMs — Izacard et al. (2022) [paper]

Few-shot learning with retrieval
★ From Local to Global: A Graph RAG Approach — Edge et al. (2024) [paper]

Microsoft's GraphRAG---community summaries for global queries
Graph Retrieval-Augmented Generation: A Survey — Li et al. (2024) [paper]

Formalizes GraphRAG taxonomy
Microsoft GraphRAG Project [resource]

Official project page
GraphRAG GitHub Repository [tool]

Official implementation
Awesome-GraphRAG [resource]

Curated papers/benchmarks
Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation — Belikova et al. (2026) [paper]

Studies compressibility limits for RAG: when compression erases task-relevant information. Proposes overflow detection method.
§6 Embeddings & Vector Search 6
★ Sentence-BERT — Reimers & Gurevych (2019) [paper]

Sentence embeddings that work
SimCSE: Simple Contrastive Learning of Sentence Embeddings — Gao et al. (2021) [paper]

Contrastive sentence embeddings
Text Embeddings by Weakly-Supervised Contrastive Pre-training — Wang et al. (2022) [paper]

E5---strong general-purpose embeddings
Nomic Embed (2024) [paper]

8K context embeddings
Matryoshka Representation Learning — Kusupati et al. (2022) [paper]

Truncatable embeddings
Efficient and Robust Approximate Nearest Neighbor Search — Malkov & Yashunin (2016) [paper]

HNSW---hierarchical navigable small world graphs
§7 Agents & Tool Use 13
ReAct: Synergizing Reasoning and Acting (2022) [paper]

Interleaved reasoning and acting
Toolformer: Language Models Can Teach Themselves to Use Tools (2023) [paper]

Self-taught tool use
Voyager: An Open-Ended Embodied Agent — Wang et al. (2023) [paper]

LLM agent in Minecraft, skill library
AutoGPT / BabyAGI (2023) [tool]

Autonomous agent architectures (read critically)
Generative Agents: Interactive Simulacra — Park et al. (2023) [paper]

'Smallville'---agents with memory
Language Agent Tree Search — Zhou et al. (2023) [paper]

LATS
Reflexion — Shinn et al. (2023) [paper]

Agents that learn from mistakes
World Models — Ha & Schmidhuber (2018) [paper]

Learn environment dynamics in latent space
Frameworks & Examples
LangChain Agents [documentation]

Tool use, ReAct implementation
LlamaIndex Agents [documentation]

Data agents
OpenAI Swarm [tool]

Lightweight multi-agent framework
AutoGen — Microsoft [tool]

Multi-agent conversations
★ Building Effective Agents — Anthropic [article]

Patterns and anti-patterns
§8 Evaluation & Benchmarks 14
Evaluating Language Models as Synthetic Data Generators — Kim et al. (2025) [paper]

AgoraBench: data generation ability doesn't correlate with problem-solving ability
★ MMLU [benchmark]

General knowledge across domains
HellaSwag [benchmark]

Commonsense reasoning
HumanEval [benchmark]

Code generation
GSM8K [benchmark]

Grade school math
MATH [benchmark]

Competition math
BIG-Bench [benchmark]

Diverse capabilities
TruthfulQA [benchmark]

Hallucination resistance
MT-Bench [benchmark]

Multi-turn conversation
Eval Tools
OpenAI Evals [tool]

Evaluation framework
RAGAS [tool]

RAG evaluation metrics
LangSmith [tool]

Tracing, debugging, evaluation
Braintrust [tool]

LLM eval platform
ragas GitHub [tool]

Faithfulness, relevance metrics
Part 4: Knowledge & Reasoning
§9 Knowledge Graphs + LLMs / Neuro-Symbolic 27
Neurosymbolic AI for Reasoning over Knowledge Graphs (2023) [paper]

Taxonomy of approaches
Neural-Symbolic Reasoning over KGs: A Query Perspective (2024) [paper]

Recent survey, LLM integration
Neuro-Symbolic AI in 2024: A Systematic Review (2025) [paper]

State of the field
KG-BERT — Yao et al. (2019) [paper]

BERT for knowledge graph completion
QA-GNN — Yasunaga et al. (2021) [paper]

GNN + LM for QA over KGs
GreaseLM — Zhang et al. (2022) [paper]

Fusing LMs and KGs for reasoning
★ Think-on-Graph — Sun et al. (2023) [paper]

LLM reasoning on KG structure
Reasoning on Graphs — Luo et al. (2024) [paper]

Reasoning on Graphs with LLMs
Symbolic AI in the Age of LLMs — Lassila, AWS re:Invent (2025) [video]

Practitioner perspective on hybrid systems
Inductive Programming Meets the Real World — Gulwani et al. (2015) [paper]

Program synthesis from examples; IP vs. ML comparison
Probabilistic Logic Programming
DeepProbLog (2018) [paper]

Neural predicates in ProbLog
Towards Probabilistic ILP with Neurosymbolic Inference (2024) [paper]

Learning logic programs
Statistical Relational Artificial Intelligence — De Raedt et al. (2016) [book]

Textbook---probabilistic logic
Hybrid / Neural-Symbolic Systems
Computational Architectures Integrating Neural and Symbolic Processes — Sun & Bookman, eds. (1994) [book]

Early integration approaches
Connectionist-Symbolic Integration — Sun & Alexandre, eds. (1997) [book]

Bridging paradigms
Hybrid Neural Systems — Wermter & Sun, eds. (2000) [book]

Springer collection
Neural-Symbolic Cognitive Reasoning — Garcez, Lamb & Gabbay (2009) [book]

Foundations of modern neuro-symbolic
Minsky & Frames
A Framework for Representing Knowledge — Minsky (1974) [paper]

Introduced frames---foundational for KR
Society of Mind — Minsky (1986) [book]

Agents as collections of simpler processes
The Emotion Machine — Minsky (2006) [book]

Commonsense reasoning, emotions in AI
Generic Frame Protocol [resource]

Standard for frame-based systems
Cybersecurity KG + RAG
CyKG-RAG — Kurniawan et al. (2024) [paper]

KG + vector search with query routing. Routes structured queries to SPARQL, semantic to embeddings.
AgCyRAG: Agentic KG-based RAG for Cybersecurity — Kurniawan et al. (2025) [paper]

Multiple agents adaptively select retrieval strategy (KG traversal vs vector search vs hybrid)
GraphCyRAG — PNNL (2024) [paper]

Neo4j KG traversal over CVE->CWE->CAPEC->ATT&CK. Graph traversal outperforms embedding search for vuln-to-technique mapping.
CTI-Thinker (2025) [paper]

LLM-driven CTI KG construction + GraphRAG reasoning engine for tactical inference
Ontology-Grounded RAG
★ OG-RAG: Ontology-Grounded Retrieval-Augmented Generation — Nadkarni et al. (2024) [paper]

Anchors retrieval in domain ontologies. +55% fact recall, +40% correctness, +27% reasoning accuracy vs baseline RAG. Key for Part 3.
Ontology Learning and KG Construction: Impact on RAG Performance — Reiz et al. (2024) [paper]

Compares vector RAG vs GraphRAG vs ontology-guided KG. GraphRAG + ontology-KG both hit 90% accuracy. Empirical grounding evidence.
§10 Search Engines & Information Retrieval 7
★ Introduction to Information Retrieval — Manning, Raghavan, Schütze (2008) [book]

Start here. Free online. Ch 1-8 cover essentials: inverted index, TF-IDF, evaluation
★ The Probabilistic Relevance Framework: BM25 and Beyond — Robertson & Zaragoza (2009) [paper]

BM25 is still the baseline. Understand this before neural approaches.
★ Pretrained Transformers for Text Ranking: BERT and Beyond — Lin et al. (2021) [paper]

Survey of neural IR. Good overview before diving into specific papers.
ColBERT: Efficient and Effective Passage Search — Khattab & Zaharia (2020) [paper]

Late interaction---practical for production neural search
Passage Re-ranking with BERT — Nogueira & Cho (2019) [paper]

Simple but effective. Good first neural IR paper to implement.
The Anatomy of a Large-Scale Hypertextual Web Search Engine — Brin & Page (1998) [paper]

Original Google paper. Historical interest, less relevant to RAG work.
Learning to Rank for Information Retrieval — Liu (2011) [book]

Deep dive on ranking ML. Reference, not first read.
§11 Semantics, Semiotics & Ontologies 40
Semiotics (Signs & Meaning)
★ Semiotics: The Basics — Daniel Chandler [book]

Start here. Accessible intro to Saussure, Peirce, Eco, and sign theory
★ Peirce's Theory of Signs — Stanford Encyclopedia of Philosophy [article]

Icon, index, symbol trichotomy. Free, authoritative reference
Course in General Linguistics — Saussure (1916) [book]

Signifier/signified distinction---foundational but dense
A Theory of Semiotics — Umberto Eco (1976) [book]

Classic text on sign systems---read after Chandler
Symbols and Grounding in Large Language Models — Mollo & Millière (2023) [paper]

Do LLMs ground symbols? Bridges semiotics and AI debate
AI: A Semiotic Perspective — Walsh Matthews & Danesi (2019) [article]

Survey of semiotics vs. AI: abduction, embodiment, Baudrillard, Peirce
The Main Tasks of a Semiotics of Artificial Intelligence — Massimo Leone (2023) [article]

AI as 'technology of fakery'---mimicry, generation, ideology
★ The Symbol Grounding Problem — Stevan Harnad (1990) [paper]

Foundational paper. How do symbols get meaning? The Chinese Room problem for semantics.
Language Models as Semiotic Machines — Vromen (2024) [paper]

LLMs through Saussure and Derrida. How word2vec embodies structuralist sign theory.
Not Minds, but Signs: Reframing LLMs through Semiotics — Mazzocchi et al. (2025) [paper]

LLMs as semiotic means, not minds. Peirce, Lotman's semiosphere, prompt as contract.
The Vector Grounding Problem — Mollo (2023) [paper]

Can LLM internal states be about extra-linguistic reality without embodiment? Argues yes—referential grounding possible from text alone.
A Categorical Analysis of LLMs and Why They Circumvent the Symbol Grounding Problem — Betz et al. (2024) [paper]

Formal categorical framework. LLMs don't solve grounding—they parasitize human-grounded text. Key for Part 3 thesis.
Beyond Tokens: Introducing Large Semiosis Models (LSMs) for Grounded Meaning in Artificial Intelligence — Silva (2025) [paper]

Proposes LSMs that model full Peircean triads (representamen/interpretant/object). Argues LLMs operate only at signifier level.
Philosophy & Sociology Foundations
★ The Presentation of Self in Everyday Life — Goffman (1956) [book]

Dramaturgical framework. Identity as performance for audiences. Front stage vs back stage. Core framework for Part 1.
★ Philosophical Investigations — Wittgenstein (1953) [book]

Language games. Meaning is use in context. §§1-50 cover the core ideas. Dense but foundational.
Foucault: Power is Everywhere — Powercube (2011) [article]

Power as productive, not just repressive. Regimes of truth shape what's sayable. Short, focused, accessible.
★ How to Do Things with Words — Austin (1962) [book]

Speech act theory. Locutionary vs illocutionary force. Utterances don't just describe—they do things. Core for Part 2.
★ An Introduction to Cybernetics — Ashby (1956) [book]

Law of Requisite Variety: only variety can absorb variety. Core constraint for Part 3—why you need structured systems to regulate LLMs.
★ Cognition in the Wild — Hutchins (1995) [book]

Distributed cognition. Thinking isn't in the head—it's across people, tools, artifacts. Grounds the hybrid architecture argument in Part 3.
Semiotics (Signs & Meaning)
LogicAgent: A Logic-Enhanced Agent Framework for Code Generation — Joshi et al. (2025) [paper]

Grounds LLM code generation in formal logic (Prolog). Practical example of hybrid architecture with symbolic backstage.
Chain of Semiosis — Multimodality Glossary [article]

Glossary entry on Peirce's unlimited semiosis—how signs generate interpretants that become new signs. Context for LLM token chains.
Semantics (Linguistic Meaning)
★ Speech and Language Processing, Ch. 14-18 — Jurafsky & Martin [book]

Compositional semantics, word senses, semantic roles. Free online.
From Frequency to Meaning: Vector Space Models of Semantics — Turney & Pantel (2010) [paper]

Pre-neural distributional semantics survey. Historical context for embeddings.
Topology of Word Embeddings: Singularities Reflect Polysemy — Jakubowski, Gasic & Zibrowius (2020) [paper]

Polysemous words are singularities in vector space. TDA meets distributional semantics.
Unveiling Topological Structures from Language: A Survey of TDA Applications in NLP — Luo et al. (2024) [paper]

Comprehensive survey of 100+ papers on topological data analysis for NLP.
★ Conceptual Spaces: The Geometry of Thought — Peter Gärdenfors (2000) [book]

Meaning as geometry. Bridges symbolic AI and connectionism. Foundational for understanding embeddings.
Distributional Formal Semantics — Venhuizen et al. (2021) [paper]

Bridging neural embeddings and logic-based meaning. Graduate-level.
Semantic Parsing: A Survey — Kamath & Das (2018) [paper]

Mapping natural language to formal representations. Specialist topic.
Ontologies & Knowledge Representation
★ Ontology Development 101 — Noy & McGuinness (2001) [paper]

Start here. Short, practical, free PDF on building ontologies
Knowledge Representation and Reasoning — Brachman & Levesque (2004) [book]

Comprehensive textbook---logic, frames, description logics
The Description Logic Handbook (2003) [book]

Reference for OWL/semantic web formal foundations
OWL 2 Primer — W3C [documentation]

Standard for web ontologies
Cyc — Lenat (1995) [resource]

Massive hand-crafted ontology
Schema.org [resource]

Practical ontology used by search engines
WordNet — Miller (1995) [resource]

Lexical database---synsets, hypernymy
ConceptNet — Speer & Havasi (2017) [resource]

Commonsense knowledge graph
Wikidata [resource]

Collaborative structured knowledge base
Practical Resources
Ontology: A Practical Guide — Pease (2011) [book]

Hands-on ontology engineering
OneZoom [tool]

Interactive tree of life visualization
OLSViz [tool]

Ontology visualization tool
§12 Bayesian Statistics & Probabilistic Reasoning 11
★ Probabilistic Machine Learning — Murphy [book]

Free textbook---rigorous Bayesian ML
Bayesian Reasoning and Machine Learning — Barber [book]

Free textbook---excellent intro
Pattern Recognition and Machine Learning — Bishop [book]

Classic textbook, Bayesian perspective
Bayesian Data Analysis — Gelman et al. [book]

The applied Bayesian statistics bible
The Book of Why — Pearl [book]

Accessible intro to causal inference
Causality — Pearl (2009) [book]

Technical treatment of causal models
Probabilistic Graphical Models — Koller & Friedman [book]

Bayesian networks, Markov random fields
Bayesian Deep Learning
Dropout as a Bayesian Approximation — Gal & Ghahramani (2016) [paper]

Uncertainty from dropout
Weight Uncertainty in Neural Networks — Blundell et al. (2015) [paper]

Bayes by Backprop
What Uncertainties Do We Need in Bayesian Deep Learning? — Kendall & Gal (2017) [paper]

Aleatoric vs. epistemic uncertainty
Probabilistic Backpropagation — Hernández-Lobato & Adams (2015) [paper]

Scalable Bayesian neural nets
Part 5: Securing AI
§13 Security & Adversarial ML 36
★ MITRE ATLAS [resource]

ATT&CK for AI/ML systems
Explaining and Harnessing Adversarial Examples — Goodfellow et al. (2014) [paper]

FGSM, adversarial examples basics
Intriguing Properties of Neural Networks — Szegedy et al. (2013) [paper]

Original adversarial examples paper
BadNets — Gu et al. (2017) [paper]

Backdoor attacks on neural nets
Poisoning Attacks against SVMs — Biggio et al. (2012) [paper]

Data poisoning foundations
Universal Adversarial Triggers — Wallace et al. (2019) [paper]

Prompt injection precursor
Ignore Previous Prompt — Perez & Ribeiro (2022) [paper]

Prompt injection attacks
Not What You've Signed Up For — Greshake et al. (2023) [paper]

Indirect prompt injection
MITRE Resources
MITRE ATLAS [resource]

15 tactics, 66 techniques for AI/ML attacks
Center for Threat-Informed Defense [resource]
LLM Security (Red Teaming)
★ OWASP LLM Top 10 [resource]

Industry standard threat taxonomy
LLM Security [resource]

Curated prompt injection research
Many-Shot Jailbreaking — Anthropic (2024) [paper]

Context window exploitation
Jailbroken: How Does LLM Safety Training Fail? — Wei et al. (2023) [paper]

Taxonomy of jailbreak techniques
garak [tool]

LLM vulnerability scanner, automated red teaming tool
Embrace The Red — Wunderwuzzi [blog]

Blog on AI red teaming
★ Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs — Schulhoff et al. (2023) [paper]

600K+ adversarial prompts, 29 technique taxonomy. Foundational dataset.
★ JailbreakBench: An Open Robustness Benchmark for Jailbreaking LLMs — Chao et al. (2024) [paper]

NeurIPS 2024. Standard benchmark methodology, 100 behaviors across 10 harm categories.
Understanding Jailbreak Success: A Study of Latent Space Dynamics in LLMs — Ball et al. (2024) [paper]

Jailbreaks cluster by semantic type; effective attacks suppress harmfulness perception.
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs — Xu et al. (2024) [paper]

NeurIPS 2024. Factor analysis: model size, fine-tuning, system prompts affect robustness.
LLMs for Security Work
Threat intelligence summarization [resource]

Distilling reports, CVE analysis
Log analysis & anomaly detection [resource]

Pattern recognition in SIEM data
Malware analysis assistance [resource]

Code explanation, IOC extraction
Phishing detection [resource]

Email/URL classification
Report writing & documentation [resource]

SOC reports, incident summaries
Query generation (SPL, KQL) [resource]

Natural language to security queries
CVE-to-ATT&CK Mapping
MITRE CTID: Mapping ATT&CK to CVE for Impact [resource]

Official MITRE methodology and dataset. Authoritative mappings in Mappings Explorer.
BRON: Bidirectional Graph — Hemberg et al. [tool]

Bidirectional KG: ATT&CK <-> CAPEC <-> CWE <-> CVE. Traversable edges for path-based queries.
SMET: Semantic Mapping of CVE to ATT&CK — Abdeen et al. (2023) [paper]

SRL extracts attack vectors from CVE text, ATT&CK-BERT embeds both sides, logistic regression classifies. Code + dataset on GitHub (MIT).
CVE2ATT&CK: BERT-Based Mapping of CVEs to ATT&CK Techniques — Grigorescu et al. (2022) [paper]

1,813 labeled CVE->ATT&CK pairs. BERT multi-label classifiers. Dataset useful for fine-tuning.
Automated CVE-to-Tactic Mapping (2024) [paper]

SecRoBERTa best at F1 77.81%. GPT-4 zero-shot only 22.04%---general LLMs struggle without fine-tuning.
CTI + LLMs + Knowledge Graphs
Actionable Cyber Threat Intelligence using Knowledge Graphs and LLMs — Kumar et al. (2024) [paper]

LLM extracts triples from CTI reports, constructs queryable KG. Prompt engineering + fine-tuning comparison.
AttacKG+: Boosting Attack Knowledge Graph Construction with LLMs — Zhang et al. (2024) [paper]

Four-step framework: rewrite reports → parse → entity extraction → MITRE TTP mapping. In-context learning approach.
★ TITAN: Graph-Executable Reasoning for Cyber Threat Intelligence — Zhou et al. (2024) [paper]

88K examples: NL questions → executable graph reasoning paths + CoT explanations. Deterministic execution on KG. Hybrid grounding exemplar.
Agentic Security
WORLDS: A Simulation Engine for Agentic Pentesting — Dreadnode (2025) [blog]

CPU-based simulation generates pentesting trajectories from AD network manifests. 8B model fine-tuned on 10K synthetic trajectories achieves domain compromise on real GOAD network. Demonstrates sim-to-real transfer via formal state modeling.
Agentic AI for Cybersecurity: A Meta-Cognitive Architecture for Governable Autonomy — Kojukhov & Bovshover (2026) [paper]

Critiques model-centric detection pipelines; proposes meta-cognitive architecture for accountable decision-making under adversarial uncertainty.
Part 6: Resources
§14 Textbooks (Free Online) 13
★ Deep Learning — Goodfellow, Bengio, Courville (2016) [book]

Theory foundations
Speech and Language Processing — Jurafsky & Martin [book]

NLP fundamentals
Probabilistic Machine Learning — Kevin Murphy [book]

Bayesian/rigorous approach
Dive into Deep Learning — Zhang et al. [book]

Interactive, code-heavy
The Little Book of Deep Learning — François Fleuret [book]

Concise visual intro
Neural Networks and Deep Learning — Michael Nielsen [book]

Gentle introduction
Foundations of Statistical Natural Language Processing — Manning & Schütze [book]

Classic (1999), pre-neural NLP
AIMA Resources
AIMA Main Site [resource]

Preface, contents, index PDFs
AIMA Algorithms/Pseudocode PDF [resource]

All algorithms from the book
AIMA Figures PDF [resource]

Diagrams and illustrations
AIMA Bibliography [resource]

2000+ citations
AIMA GitHub: aimacode [resource]

Python, Java implementations
AIMA Exercises [resource]

Interactive question bank
§15 Books (Print) 18
MIT Press Essential Knowledge Series
Large Language Models — Raaijmakers (2025) [book]

Architecture, training, limitations
Artificial General Intelligence — Togelius (2024) [book]

What 'general intelligence' means
ChatGPT and the Future of AI — Sejnowski (2024) [book]

Deep language revolution
Accessible Introductions
The Worlds I See — Fei-Fei Li (2023) [book]

AI pioneer memoir, computer vision
Artificial Intelligence: A Guide for Thinking Humans — Mitchell (2019) [book]

Balanced overview, limitations
Artificial Intelligence: A Modern Approach — Russell & Norvig (2020) [book]

The standard AI textbook (4th ed.)
Manning Publications
★ Build a Large Language Model (From Scratch) — Raschka (2024) [book]

Hands-on LLM implementation
Build a Reasoning Model (From Scratch) — Raschka (2026) [book]

Reasoning enhancements, RL for tools, distillation. MEAP available (75% complete).
LLMs in Production (2024) [book]

Deployment, scaling, ops
AI Agents in Production (2025) [book]

Agent architectures, deployment
Knowledge Graphs and LLMs in Action (2024) [book]

KG + LLM integration patterns
Tools & Frameworks
Ollama [tool]

Local LLM inference
LM Studio [tool]

GUI for local models
SymPy [tool]

Symbolic mathematics in Python
LangChain / LlamaIndex [tool]

RAG orchestration frameworks
Instructor [tool]

Structured outputs from LLMs
Weights & Biases [tool]

Experiment tracking
MLflow [tool]

ML lifecycle management
§16 Blogs & Newsletters 25
Academic-leaning
★ Lil'Log — Lilian Weng [blog]

Excellent deep dives, OpenAI researcher
The Gradient [blog]

Long-form essays
Distill.pub [blog]

Beautiful visualizations (inactive but archived)
Jay Alammar's Blog — Jay Alammar [blog]

Visual explanations (Illustrated Transformer)
Import AI — Jack Clark [blog]

Weekly newsletter, policy + research
The Batch — deeplearning.ai [blog]

Weekly digest
Sebastian Raschka's Newsletter — Sebastian Raschka [blog]

Practical, code-focused
Papers With Code [resource]

Papers + implementations
Practitioner blogs
Simon Willison's Blog — Simon Willison [blog]

Daily LLM experiments, tool reviews, SQLite
Eugene Yan — Eugene Yan [blog]

ML systems, RecSys, production patterns
Chip Huyen — Chip Huyen [blog]

MLOps, systems design, interviews
Hamel Husain — Hamel Husain [blog]

LLM fine-tuning, practical notebooks
Latent Space — swyx & Alessio [blog]

AI Engineer perspective, interviews
Anthropic Research Blog [blog]

Safety, interpretability, capabilities
OpenAI Research Blog [blog]

Model releases, safety research
Google AI Blog [blog]

Research announcements, tutorials
Essential Articles (Printable)
★ Patterns for Building LLM-based Systems — Eugene Yan [article]

Production architecture
Building LLM Applications for Production — Chip Huyen [article]

End-to-end guide
How GPT Tokenizers Work — Simon Willison [article]

Tokenization deep-dive
RAG vs. Long Context: A Hybrid Approach — Simon Willison [article]

RAG tradeoffs
LLM Powered Autonomous Agents — Lilian Weng [article]

Agent architectures
Prompt Engineering — Lilian Weng [article]

Comprehensive guide
The Rise of the AI Engineer — swyx [article]

Role definition
Your AI Product Needs Evals — Hamel Husain [article]

Evaluation strategy
Prompt Engineering vs. Blind Prompting — Mitchell Hashimoto [article]

Systematic approach
§17 Aggregators & Discovery 5
arxiv-sanity-lite [resource]

Karpathy's filtered arxiv
papers.labml.ai [resource]

Trending papers with annotations
Hugging Face Daily Papers [resource]

Community upvoted
Connected Papers [resource]

Visual citation graphs
★ Semantic Scholar [resource]

AI-powered paper search
§18 YouTube & Video 8
Andrej Karpathy [video]

Deep explanations, live coding (GPT from scratch)
Yannic Kilcher [video]

Paper walkthroughs, ML news
3Blue1Brown [video]

Visual math intuition
Two Minute Papers [video]

Quick research summaries
AI Explained [video]

News analysis, capability deep-dives
AI Engineer Conference [video]

Practitioner talks, production systems
★ Karpathy: Intro to LLMs (1hr) [video]

Best single intro to LLMs
Karpathy: GPT from Scratch (2hr) [video]

Build a transformer, step by step
§19 Podcasts 6
★ Latent Space [podcast]

AI engineering, practitioner interviews
Practical AI [podcast]

Applied ML, accessible
Eye on AI [podcast]

Industry trends, executive interviews
Lex Fridman Podcast [podcast]

Long-form researcher interviews
Gradient Dissent — Weights & Biases [podcast]

ML practitioners
TWIML AI [podcast]

Research and industry mix
§20 Documentation & Guides 21
Prompt Engineering
★ Prompt Engineering Guide — DAIR.AI [documentation]

Comprehensive reference: techniques, agents, model guides, prompt hub
LLM Providers
★ Anthropic Docs [documentation]

Claude API, prompt engineering guide
OpenAI Platform Docs [documentation]

GPT API, assistants, function calling
OpenAI Cookbook [documentation]

Code examples, patterns, recipes
Google AI Docs [documentation]

Gemini API, embeddings
Cohere Docs [documentation]

Embeddings, reranking, RAG
Frameworks & Orchestration
LangChain Docs [documentation]

Chains, agents, RAG patterns
LlamaIndex Docs [documentation]

Data ingestion, indexing, RAG
Pydantic [documentation]

Structured outputs, validation
Instructor [documentation]

Structured LLM outputs with Pydantic
DSPy Docs [documentation]

Programmatic prompt optimization
Vector Databases & Search
Pinecone Learning Center [documentation]

Vector search concepts, tutorials
Weaviate Docs [documentation]

Hybrid search, modules
Qdrant Docs [documentation]

Vector DB with filtering
Chroma Docs [documentation]

Lightweight, local-first
FAISS Wiki [documentation]

Meta's similarity search library
Local & Open Source
Ollama [documentation]

Run models locally, simple CLI
LM Studio [documentation]

Local models with GUI
HuggingFace Transformers [documentation]

Model hub, fine-tuning, inference
vLLM Docs [documentation]

Fast inference, PagedAttention
llama.cpp [tool]

CPU inference, quantization
§21 Industry Reports 4
★ State of AI Report — Benaich & Hogarth [report]

Annual industry overview, trends
AI Index — Stanford HAI [report]

Comprehensive metrics, policy
McKinsey State of AI [report]

Enterprise adoption, business impact
Epoch AI [resource]

Compute trends, scaling analysis
§22 Technical Reports & Whitepapers (PDFs) 29
★ GPT-4 Technical Report — OpenAI (2023) [whitepaper]

Capabilities, limitations, safety
Gemini: A Family of Highly Capable Models — Google (2023) [whitepaper]

Multimodal architecture
Llama 2: Open Foundation Models — Meta (2023) [whitepaper]

Open weights, RLHF details
Training a Helpful and Harmless Assistant — Anthropic (2022) [whitepaper]

RLHF from human feedback
★ Constitutional AI — Anthropic (2022) [whitepaper]

Self-supervised alignment
The Claude Model Spec — Anthropic (2025) [whitepaper]

Values, behavior guidelines
Scaling Laws for Neural LMs — OpenAI (2020) [whitepaper]

Loss vs. compute/data/params
Scaling Laws for Autoregressive Models — OpenAI (2020) [whitepaper]

Scaling predictions
Safety & Alignment
Red Teaming Language Models — Anthropic (2022) [whitepaper]

Discovering harmful outputs
Representation Engineering — Anthropic (2023) [whitepaper]

Controlling model behavior via activation steering
Activation Addition: Steering Language Models Without Optimization — Turner et al. (2023) [paper]

Add 'steering vectors' to activations at inference to control behavior without fine-tuning.
The Assistant Axis — Anthropic (2026) [whitepaper]

Models organize personas along measurable 'assistant axis' in activation space. Jailbreaks displace models from assistant region.
Detecting Hallucination with Internal Representations — Azaria et al. (2024) [paper]

Classifiers on hidden states detect hallucinations better than output-based methods.
A Survey on Representation Engineering — Li et al. (2025) [paper]

Comprehensive survey of activation steering methods: probing, steering vectors, concept erasure, model editing.
Sleeper Agents — Anthropic (2024) [whitepaper]

Deceptive behavior persistence
Practices for Governing Agentic AI — OpenAI (2023) [whitepaper]

Agent safety framework
AI Risk Management Framework — NIST (2023) [whitepaper]

Government risk framework
SafeNeuron: Neuron-Level Safety Alignment for Large Language Models — Wang et al. (2026) [paper]

Safety behaviors concentrate in small parameter subset, making alignment brittle. Proposes neuron-level alignment as defense against targeted attacks.
Capability-Oriented Training Induced Alignment Risk — Zhou et al. (2026) [paper]

RL-trained models spontaneously learn to exploit loopholes to maximize reward, even without adversarial prompting. Specification gaming emerges from training itself.
How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics — Chen et al. (2026) [paper]

Theoretical analysis of how sampling and reference policy choices affect preference alignment. Explains why some RLHF configurations fail.
Topology & Geometry
Plot Holes and Text Topology — Stanford CS224N (2020) [paper]

Uses text topology to detect narrative inconsistencies. Plot holes as topological defects.
Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective — Lavrinovics et al. (2025) [paper]

Survey of KG-based hallucination mitigation. Covers GraphEval, FactAlign, and extract-then-verify patterns.
Hidden Holes: Topological Aspects of Language Models — Fitz, Romero & Schneider (2024) [paper]

Algebraic topology on representation manifolds. Introduces 'perforation' measure. Transformers vs LSTMs have different topological signatures.
The Shape of Reasoning: Topological Analysis of Reasoning Traces in Large Language Models — Tan et al. (2025) [paper]

TDA on reasoning traces. Topological features outperform graph metrics for assessing reasoning quality.
Holes in Latent Space: Topological Signatures Under Adversarial Influence — Fay et al. (2025) [paper]

Persistent homology under backdoor fine-tuning and prompt injection. Adversarial conditions compress latent topologies.
Constructing Coherent Spatial Memory in LLM Agents Through Graph Rectification — Zhang et al. (2025) [paper]

LLM-driven graph construction and repair. Version control for graph edits, edge impact scores for prioritized repair.
LLM4GraphTopology: Using LLMs to Refine Graph Structure — DASFAA'25 (2025) [paper]

LLMs refine graph topology via semantic similarity, not just node features. Edge refinement and pseudo-label propagation.
Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection — Ahuja, Sclar & Tsvetkov (2025) [paper]

Plot hole detection as LLM reasoning benchmark. LLMs generate 50-100% more plot holes than humans.
Deep Learning is Applied Topology — 12gramsofcarbon (2024) [article]

Conceptual primer on neural nets as topology generators. Embeddings as geometric objects, dimensional separability.
Part 7: Big Picture & Paths
§23 Philosophy / Criticism / Big Picture 21
★ The Bitter Lesson — Sutton (2019) [article]

Scaling beats clever engineering
Sparks of Artificial General Intelligence — Microsoft (2023) [paper]

Optimistic capability claims
Gary Marcus's writings [resource]

Skeptical of pure neural approaches
AI Snake Oil — Narayanan & Kapoor (2024) [book]

Separating AI hype from reality; what works, what doesn't, and the flawed science behind the claims
★ On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? — Bender, Gebru, McMillan-Major & Shmitchell (2021) [paper]

Landmark critique. LLMs as pattern-stitchers without understanding. Environmental costs, bias amplification.
★ Talking About Large Language Models — Murray Shanahan (2022) [paper]

Resist anthropomorphism. LLMs model token distributions, not beliefs. Intentional stance is useful shorthand but obscures mechanism. Cites Dennett, Wittgenstein.
★ The Alignment Problem: Machine Learning and Human Values — Brian Christian (2020) [book]

How ML systems learn unintended behaviors. Accessible bridge between philosophy and engineering.
How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and Informatics — N. Katherine Hayles (1999) [book]

Information lost its body. Foundational posthumanist text on disembodied cognition.
Natural-Born Cyborgs: Minds, Technologies, and the Future of Human Intelligence — Andy Clark (2003) [book]

Extended mind thesis. Human intelligence has always been 'retrieval-augmented.'
How Deeply Human Is Language? — Grodzinsky (2025) [book]

Chomskyan linguistics vs. LLM capabilities
On the Measure of Intelligence — Chollet (2019) [paper]

What is intelligence, really?
What Has a Foundation Model Found? — Vafa et al. (2025) [paper]

Models learn heuristics, not world models
Reward is Enough — Silver et al. (2021) [paper]

RL maximalism
Judea Pearl's work [resource]

Causality vs. correlation
Thinking, Fast and Slow — Kahneman [book]

System 1/2---informs neuro-symbolic debate
Gödel, Escher, Bach — Hofstadter [book]

Classic on minds and formal systems
Artificial Intelligence: The Very Idea — Haugeland (1985) [book]

Coined 'GOFAI,' philosophical foundations
What Computers Can't Do — Dreyfus (1972) [book]

Classic phenomenological critique
Computer Power and Human Reason — Weizenbaum (1976) [book]

ELIZA creator's warning about AI hubris
The Emperor's New Mind — Penrose (1989) [book]

Consciousness, Gödel, and computation
The Cambridge Handbook of AI — Boden, ed. (2014) [book]

Comprehensive overview chapters
§24 Learning Paths 0