Skip to main content

A.8 Learning Rescue for Stuck Points

Troubleshooting map for learning stuck points

Flowchart for minimal reproduction and asking for help

When stuck, first turn the problem from “I cannot learn this” into “I can locate this failure.”

First classify the problem

SymptomLikely problemFirst move
ModuleNotFoundErrorWrong environment or missing dependencyCheck Python and pip path
File not foundWrong working directory or relative pathPrint Path.cwd()
Code runs but result is strangeInput, label, or metric issuePrint samples and intermediate values
Training does not improveData, loss, learning rate, or label formatTry to overfit a tiny dataset
GPU memory explodesBatch, input, or model too largeReduce batch size first
Project feels too bigNo minimal closed loopDefine one input, one process, one output

Run these checks first

python --version
which python
pip --version
pip list
pwd
ls

If you use NVIDIA GPU:

nvidia-smi

For path issues:

from pathlib import Path

print(Path.cwd())
print(Path("data").exists())

Expected output:

Your folder will differ, but it should look like:

/your/current/project
False

Debug code in this order

  1. Print the first 2 inputs and labels.
  2. Print shapes, lengths, and value ranges.
  3. Print one intermediate result before the model.
  4. Print one model output before calculating metrics.
  5. Only then change the model or parameters.

Minimal inspection example:

texts = ["refund request", "invoice copy", "shipping delay"]
labels = ["support", "billing", "support"]

print("samples:", len(texts))
print("first texts:", texts[:2])
print("first labels:", labels[:2])
print("label set:", sorted(set(labels)))

Expected output:

samples: 3
first texts: ['refund request', 'invoice copy']
first labels: ['support', 'billing']
label set: ['billing', 'support']

Ask for help with a complete question

What I am doing:
What I expected:
What happened:
Last 20 lines of the error:
What I already tried:
Minimal reproducible code:

Minimal reproduction habit

When a project is messy, shrink it until it runs:

def predict(x):
return x * 2

data = [1, 2, 3]
preds = [predict(x) for x in data]
print(preds)

Expected output:

[2, 4, 6]

Then add real logic back one layer at a time. The layer that breaks is the layer to inspect.

Pause or keep going?

SituationBetter action
You have tried random fixes for 30 minutesPause and write hypotheses
You cannot explain the command you are copyingStop and inspect the environment
You have 1-2 clear hypothesesKeep testing
You know the next observable resultKeep going