---
name: test-driven-development
description: Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.
version: 1.0.0
author: Hermes Agent (adapted from Superpowers)
license: MIT
metadata:
  hermes:
    tags: [testing, tdd, development, quality, red-green-refactor]
    related_skills: [systematic-debugging, writing-plans, subagent-driven-development]
---

# Test-Driven Development (TDD)

## Overview

Write the test first. Watch it fail. Write minimal code to pass.

**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.

**Violating the letter of the rules is violating the spirit of the rules.**

## When to Use

**Always:**
- New features
- Bug fixes
- Refactoring
- Behavior changes

**Exceptions (ask your human partner):**
- Throwaway prototypes
- Generated code
- Configuration files

Thinking "skip TDD just this once"? Stop. That's rationalization.

## The Iron Law

```
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
```

Write code before the test? Delete it. Start over.

**No exceptions:**
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete

Implement fresh from tests. Period.

## Red-Green-Refactor Cycle

### RED - Write Failing Test

Write one minimal test showing what should happen.

**Good Example:**
```python
def test_retries_failed_operations_3_times():
    attempts = 0
    def operation():
        nonlocal attempts
        attempts += 1
        if attempts < 3:
            raise Exception('fail')
        return 'success'
    
    result = retry_operation(operation)
    
    assert result == 'success'
    assert attempts == 3
```
- Clear name, tests real behavior, one thing

**Bad Example:**
```python
def test_retry_works():
    mock = MagicMock()
    mock.side_effect = [Exception(), Exception(), 'success']
    
    result = retry_operation(mock)
    
    assert result == 'success'  # What about retry count? Timing?
```
- Vague name, mocks behavior not reality, unclear what it tests

### Verify RED

Run the test. It MUST fail.

**If it passes:**
- Test is wrong (not testing what you think)
- Code already exists (delete it, start over)
- Wrong test file running

**What to check:**
- Error message makes sense
- Fails for expected reason
- Stack trace points to right place

### GREEN - Minimal Code

Write just enough code to pass. Nothing more.

**Good:**
```python
def add(a, b):
    return a + b  # Nothing extra
```

**Bad:**
```python
def add(a, b):
    result = a + b
    logging.info(f"Adding {a} + {b} = {result}")  # Extra!
    return result
```

Cheating is OK in GREEN:
- Hardcode return values
- Copy-paste
- Duplicate code
- Skip edge cases

**We'll fix it in refactor.**

### Verify GREEN

Run tests. All must pass.

**If fails:**
- Fix minimal code
- Don't expand scope
- Stay in GREEN

### REFACTOR - Clean Up

Now improve the code while keeping tests green.

**Safe refactorings:**
- Rename variables/functions
- Extract helper functions
- Remove duplication
- Simplify expressions
- Improve readability

**Golden rule:** Tests stay green throughout.

**If tests fail during refactor:**
- Undo immediately
- Smaller refactoring steps
- Check you didn't change behavior

## Implementation Workflow

### 1. Create Test File

```bash
# Python
touch tests/test_feature.py

# JavaScript
touch tests/feature.test.js

# Rust
touch tests/feature_tests.rs
```

### 2. Write First Failing Test

```python
# tests/test_calculator.py
def test_adds_two_numbers():
    calc = Calculator()
    result = calc.add(2, 3)
    assert result == 5
```

### 3. Run and Verify Failure

```bash
pytest tests/test_calculator.py -v
# Expected: FAIL - Calculator not defined
```

### 4. Write Minimal Implementation

```python
# src/calculator.py
class Calculator:
    def add(self, a, b):
        return a + b  # Minimal!
```

### 5. Run and Verify Pass

```bash
pytest tests/test_calculator.py -v
# Expected: PASS
```

### 6. Commit

```bash
git add tests/test_calculator.py src/calculator.py
git commit -m "feat: add calculator with add method"
```

### 7. Next Test

```python
def test_adds_negative_numbers():
    calc = Calculator()
    result = calc.add(-2, -3)
    assert result == -5
```

Repeat cycle.

## Testing Anti-Patterns

### Mocking What You Don't Own

**Bad:** Mock database, HTTP client, file system
**Good:** Abstract behind interface, test interface

### Testing Implementation Details

**Bad:** Test that function was called
**Good:** Test the result/behavior

### Happy Path Only

**Bad:** Only test expected inputs
**Good:** Test edge cases, errors, boundaries

### Brittle Tests

**Bad:** Test breaks when refactoring
**Good:** Tests verify behavior, not structure

## Common Pitfalls

### "I'll Write Tests After"

No, you won't. Write them first.

### "This is Too Simple to Test"

Simple bugs cause complex problems. Test everything.

### "I Need to See It Work First"

Temporary code becomes permanent. Test first.

### "Tests Take Too Long"

Untested code takes longer. Invest in tests.

## Language-Specific Commands

### Python (pytest)

```bash
# Run all tests
pytest

# Run specific test
pytest tests/test_feature.py::test_name -v

# Run with coverage
pytest --cov=src --cov-report=term-missing

# Watch mode
pytest-watch
```

### JavaScript (Jest)

```bash
# Run all tests
npm test

# Run specific test
npm test -- test_name

# Watch mode
npm test -- --watch

# Coverage
npm test -- --coverage
```

### TypeScript (Jest with ts-jest)

```bash
# Run tests
npx jest

# Run specific file
npx jest tests/feature.test.ts
```

### Go

```bash
# Run all tests
go test ./...

# Run specific test
go test -run TestName

# Verbose
go test -v

# Coverage
go test -cover
```

### Rust

```bash
# Run tests
cargo test

# Run specific test
cargo test test_name

# Show output
cargo test -- --nocapture
```

## Integration with Other Skills

### With writing-plans

Every plan task should specify:
- What test to write
- Expected test failure
- Minimal implementation
- Refactoring opportunities

### With systematic-debugging

When fixing bugs:
1. Write test that reproduces bug
2. Verify test fails (RED)
3. Fix bug (GREEN)
4. Refactor if needed

### With subagent-driven-development

Subagent implements one test at a time:
1. Write failing test
2. Minimal code to pass
3. Commit
4. Next test

## Success Indicators

**You're doing TDD right when:**
- Tests fail before code exists
- You write <10 lines between test runs
- Refactoring feels safe
- Bugs are caught immediately
- Code is simpler than expected

**Red flags:**
- Writing 50+ lines without running tests
- Tests always pass
- Fear of refactoring
- "I'll test later"

## Remember

1. **RED** - Write failing test
2. **GREEN** - Minimal code to pass  
3. **REFACTOR** - Clean up safely
4. **REPEAT** - Next behavior

**If you didn't see it fail, it doesn't test what you think.**