You Think, AI Executes: The Skills That Actually Matter

The most valuable developer skill right now isn't writing more code faster. It's learning unfamiliar codebases, building context that guides decisions, planning strategic approaches to problems, and shipping production code with confidence.

I recently added .env file support to xc, a Markdown-based task runner written in Go. The codebase was completely unfamiliar. I'm not a Go expert. But in 2.5 hours, I went from zero knowledge to a production-ready pull request with 84% test coverage and zero bugs in manual testing.

Here's what's different: I didn't write a single line of code. Not one. AI wrote everything—tests, implementation, integration, documentation. My role was entirely different: I questioned, I planned, I directed, I reviewed. I read the code, but I didn't write it.

This isn't another "I asked ChatGPT to build an app" story. This is about the skills that separate developers who use AI as a force multiplier from those who just ask it to generate code. It's about onboarding fast, documenting strategically, planning thoroughly, directing execution, and reviewing confidently. The code writing? That's handled.

📁 Complete .ai/ folder in the working fork:
github.com/sudoish/xc/tree/ai-context/.ai

🔀 Production-ready PR:
github.com/joerdav/xc/pull/167

💡 The .ai/ folder lives in a separate ai-context branch so it doesn't clutter the main codebase but remains available for reference and iteration.

Why This Matters

Most AI coding demos show you the magic: "I asked ChatGPT to build X and it worked!" They skip the parts that actually matter for professional development: How do you onboard to a codebase you've never seen? How do you make architectural decisions when you don't understand the patterns yet? How do you ensure your code is production-ready when AI helped write it?

These are the skills that matter now. Code generation is table stakes. What matters is context building, strategic planning, and confident execution.

Here's the project: xc, a task runner that reads tasks from Markdown files. About 5,000 lines of Go. Completely unfamiliar to me. The feature request was straightforward: add .env file support (Issue #162). In 2.5 hours, using free AI models and a structured approach, I went from knowing nothing about the codebase to a merged pull request.

The difference wasn't better prompts. It was better process.

The Actual Workflow: What I Did vs What AI Did

Here's the honest breakdown of who did what. I didn't write a single line of code myself. That's not the valuable work anymore.

What I did:

Explored the codebase with AI — Asked questions, challenged its understanding, verified explanations against the actual code
Built the .ai/ structure — Wrote context docs, ADRs, rules, and implementation specs based on my growing understanding
Questioned the strategy — Evaluated alternatives, captured trade-offs, made architectural decisions
Directed the implementation — "Follow the spec. Implement test 1. Now test 2." Each step validated before moving forward
Reviewed iteratively — Asked AI to review the code, digested its findings, confirmed issues, asked it to fix them. Repeated multiple times
Final deep review — Read through the entire PR on GitHub, verified everything made sense, marked ready for review

What AI did:

Answered my questions — Explained architecture, pointed me to relevant files, clarified patterns
Wrote all the code — Tests, implementation, integration, everything
Found its own bugs — Self-review caught 5 issues before I even looked at the code
Fixed the issues — Applied fixes based on its own review findings
Followed the plan — Implemented exactly what the spec described, in the order specified

What we did together:

Built understanding through conversation
Validated each step before proceeding
Caught subtle bugs through TDD
Created production-ready code with high confidence

The key insight: I never typed code. I read it, reviewed it, directed changes to it. But I didn't write it. My value was in understanding, planning, and judgment. AI's value was in execution and self-checking. This is the new division of labor.

The Four Skills

This walkthrough demonstrates four skills that matter more than code generation:

Skill 1: Rapid Onboarding. Learning an unfamiliar codebase fast by building structured context instead of reading every file. The .ai/ folder captures architecture, patterns, and limitations in a way both humans and AI can reference.

Skill 2: Strategic Documentation. Building documentation that guides development, not just records it. Architecture Decision Records (ADRs) capture the "why" behind choices, evaluate alternatives, and create a shared understanding before code is written.

Skill 3: Systematic Planning. Breaking down problems into testable steps. Each test defines expected behavior. Each implementation proves the behavior works. Each commit tells part of the story. No guessing, no hoping.

Skill 4: Confident Execution. Shipping code you trust because you've tested it thoroughly, reviewed it critically, and validated it works in real scenarios. AI can help write code, but you own the quality.

These skills work regardless of the AI tool you use. They work with free models. They work on unfamiliar codebases.

The Feature Request

First, a quick primer on how xc works: it's a task runner that reads tasks directly from your README.md (or any markdown file). Tasks are defined as markdown headings with code blocks. When you run xc test, it finds the ## test heading in your README and executes the code block beneath it. The genius is that your documentation is your task runner, so they never get out of sync.

A user opened Issue #162 asking for .env file support. They wanted to use the same set of tasks for different environments without cluttering the Markdown with environment variables.

Before the feature, you'd have to write this in your README.md:

## deploy

Deploy to production.

Env: DATABASE_URL=postgres://prod/db, API_KEY=secret123, ENV=production

```
kubectl apply -f deployment.yaml
```

Then run with xc deploy.

After the feature, your README stays clean:

## deploy

Deploy to production.

```
kubectl apply -f deployment.yaml
```

The environment variables live in a separate .env file:

DATABASE_URL=postgres://prod/db
API_KEY=secret123
ENV=production

You still run the same command, but now the credentials are managed in .env instead of cluttering your documentation.

Simple ask, but the implementation requires real decisions. When do you load the files? What about overrides? How do you handle security? What about backward compatibility?

The `.ai/` Structure: Context as Code

Before writing any code, I created a structured context folder. This turned out to be the key to working with AI effectively. It's not about better prompts, it's about better structure.

Full .ai/ folder: github.com/sudoish/xc/tree/ai-context/.ai

The folder looks like this:

.ai/
├── agents.md              # Who's working on what
├── context.md             # Project overview, architecture
├── architecture/
│   ├── decisions.md       # Current design patterns
│   └── adrs/
│       └── 001-dotenv-support.md  # Design decisions for this feature
├── rules/
│   ├── code-style.md      # Go conventions
│   ├── testing.md         # TDD workflow
│   └── commits.md         # Commit message format
└── tasks/
    └── 001-dotenv-implementation.md  # Step-by-step plan

Important: This structure is an investment, not overhead you repeat for every feature. You build it once during your first feature, then leverage it for every feature after. The context.md, architecture/decisions.md, and rules/ files rarely change. Each new feature just adds a new ADR (like 002-api-caching.md) and a new task spec (like 002-api-caching-implementation.md).

Think of it like setting up your development environment. The initial setup takes time, but every feature after that is faster because the foundation exists.

Each file serves a specific purpose. The context.md file becomes AI's memory. It explains what xc does, how it's architected with its cmd/, models/, run/, and parser/ packages, what key behaviors exist like dependencies and environment handling, and what current limitations we're working around. Every time I ask AI a question, this context gets included automatically.

The rules/testing.md file defines the TDD workflow we follow: write a failing test first (red), write minimal code to make it pass (green), clean up without changing behavior (refactor), then commit. This keeps both me and AI honest. No skipping tests. No shortcuts.

The real gem is adrs/001-dotenv-support.md, the Architecture Decision Record. This is where design happens. It's not "build me a feature," it's "here's why we chose this approach." We decided to load .env files at application startup rather than per-task, to support .env.local overrides, to skip world-readable files for security, and to add CLI flags like --env-file and --no-env. We considered alternatives like per-task loading (rejected as too complex) and requiring an explicit flag (rejected as too much friction). This ADR becomes the source of truth. When AI suggests something different, I can just say "check the ADR."

The living documentation principle: As the codebase evolves, so does the .ai/ folder. When you add a new feature, you write a new ADR (002, 003, etc.). When architecture changes, you update architecture/decisions.md or add a new ADR explaining the change. When patterns emerge, you document them. The folder grows with the project, but the structure stays the same. Each feature builds on the understanding captured before it.

This means the second feature is faster than the first. The third is faster than the second. The documentation compounds.

The Task Spec: Planning Before Coding

Before writing any code, I created tasks/001-dotenv-implementation.md, a step-by-step plan for implementing the feature. This isn't a project management document. It's a development spec that breaks the feature into TDD cycles.

The spec listed each test I needed to write, what behavior it should verify, and the expected implementation. Test for file not found. Test for loading valid env. Test for .env.local overrides. Test for security checks. Each one became a TDD cycle.

This is what makes AI effective. Without the spec, I'd be asking AI "what should I do next?" every five minutes. With the spec, I'm asking "implement the next test according to the plan." The spec keeps development focused and systematic. It's the difference between wandering and following a map.

For your second feature, you write a new spec. For your third, another one. The format is consistent, but each spec is tailored to its feature. This is the work that makes development fast and confident.

The TDD Flow: Red → Green → Refactor → Commit

Here's where the real work happens. Each test defines acceptance criteria for exactly what needs to be built.

Cycle 1: Valid .env should load variables

First behavior: if a .env file exists and contains KEY=value pairs, those should be loaded into the environment. Test written, test failed (red)—no loader existed yet. Implementation added using the godotenv library (green). Test passed. Committed with "load env vars from dotenv file".

Cycle 2: .env.local should override .env

Expected behavior: if both .env and .env.local exist, and both define the same variable, the .env.local value wins. This is crucial for local development where you want to override defaults without modifying the base file. Test written, test failed initially because I was using the wrong function, fixed the implementation, test passed. Committed.

Cycle 3: World-readable files should be skipped

Security requirement: if a .env file has permissions that allow other users to read it (like chmod 644), skip loading it and warn the user. This prevents accidentally exposing secrets. Test created, test failed (secrets were being loaded), added permission check, test passed. Committed.

This rhythm of define → test → implement → verify → commit creates a clean history. When I looked at the final commit log, I could see exactly how the feature evolved: add godotenv dependency, load env vars from dotenv file, support dotenv local overrides, add security check for world readable files, integrate dotenv loading into main, add env file cli flags. Thirteen commits total, each one atomic and meaningful. Each commit is a story about one specific behavior being added.

The Review Process

After the implementation was done, I did a deep review of my own code. I found five issues that needed fixing.

Issue 1: Test Isolation (Critical)

Tests were modifying the global environment without properly restoring it. If a test set TEST_KEY=value, the cleanup would delete it, but what if that key already existed before the test ran? The cleanup wasn't restoring the original value, just removing the key. This breaks parallel test execution because tests can interfere with each other.

The fix: create a helper function that saves the current state of environment variables before the test runs, then restores that exact state (including whether the variable existed at all) when the test completes. Now tests are safe to run in parallel. Committed with "add test environment isolation helper".

Issue 2: Windows Test Bug (Critical)

One test needed to skip execution on Windows because file permission models are different. I had written the check incorrectly, reading from an environment variable instead of the language's built-in constant. This would break Windows CI. Small mistake, but important. Fixed and committed with "fix windows test skip to use runtime goos".

Issue 3: Early Exit Timing

The .env loading was happening even for commands like --help and --version, which meant users could see security warnings when just checking the version. Moved the loading to happen after those early exits. Performance optimization and better user experience. Committed.

Issue 4: Error Context

When file operations failed, errors didn't indicate which file caused the problem. Added context wrapping so errors show the specific file path. Makes debugging much easier. Committed.

Issue 5: Test Coverage

One helper function didn't have its own test. Added coverage to bring the total to 84%. Committed.

Each issue got its own fix, its own verification, its own commit. The same disciplined process for fixes that I used for features.

The Manual Testing

Code works in tests, but does it work for real users? I installed my version and created a test project to verify everything worked end-to-end.

I created a .env file with some variables, created a .env.local file that overrode some of them, and made sure the permissions were correct with chmod 600. Then I added a task to my README.md to verify the variables were loaded:

In README.md:

## check-env

Check loaded environment variables.

```
echo "Environment: $ENV"
echo "Database: $DATABASE_URL"
echo "API Key: ${API_KEY:0:8}..."
```

When I ran xc check-env, I saw exactly what I expected. The xc command read the task from the README and executed it with the environment variables from .env and .env.local. The environment was set to "development" from the base .env, but the database URL and API key were overridden by .env.local. Perfect.

I ran eight manual test scenarios: default .env loading, .env.local overrides, the –no-env flag skipping loading, –env-file loading a custom path, security warnings for world-readable files, task-level Env statements still working, –help not loading .env (avoiding unnecessary warnings), and a real-world multi-variable scenario. All eight passed.

The PR

I submitted everything as PR #167. The changes included thirteen commits (eight for the feature, five for fixes), about 200 lines of code including tests, four unit tests plus six integration tests, 84% code coverage, and zero bugs found in manual testing.

The documentation was complete with a README section showing examples, a .env.example template file, the load order documented clearly, and security best practices explained. Most importantly, everything was backward compatible. Existing task-level Env: statements still work exactly as before.

What I Learned

The .ai/ folder was the game-changer. Instead of writing long prompts like "Build me a .env loader with security checks and…", I could just say "Implement the loader per ADR-001". The ADR contains all the decisions. AI just implements them.

I used free models throughout. No expensive API calls. The key wasn't the model, it was the context. Clear architecture docs, explicit ADRs, and well-defined tests gave AI everything it needed to generate good code.

TDD kept everything honest. Every cycle followed the same pattern: write a test that defines the behavior, let AI suggest an implementation, let the test validate it works, then commit. No guessing. No "it probably works." The test proves it.

Thirteen commits might seem like a lot for 200 lines of code, but each commit serves a purpose. Each one is reviewable on its own. Each one tells part of the story. Each one is revertible if needed. Git bisect works perfectly with this kind of history.

The .env.local override issue shows the workflow clearly. AI suggested the wrong approach first, using Load() instead of Overload(). But the test caught it. That's how it should work: AI suggests, test validates, human decides.

The Real Value

This isn't about "AI wrote code for me." It's about process, collaboration, and documentation.

The process matters. Structured context in the .ai/ folder. Design decisions captured in ADRs. TDD discipline with tests written first. Small commits with one change at a time. This is how you ship production code.

The collaboration matters. AI acts as a pair programmer, not a magic wand. Tests validate AI suggestions. Human makes the design decisions. Both contribute to better code.

The documentation matters. Future contributors now have context about the project, the architecture, and why decisions were made the way they were. The implementation plan is explicit. The tests document the expected behavior. Six months from now, none of this is lost.

The compounding matters most. You build the foundation once. Every feature after that leverages it. The second feature doesn't need new context.md or rules/ files, just a new ADR and task spec. The third feature is even faster. The documentation evolves as the codebase evolves. New ADRs when architecture changes. Updates to context.md when understanding deepens. Updates to rules/ when patterns emerge. The investment pays dividends forever.

Try It Yourself

Want to replicate this process? Pick a project and create the .ai/ structure right in your working directory:

mkdir -p .ai/{architecture/adrs,rules,tasks}

Use the template structure as a guide. Build the foundation files once (context.md, architecture/decisions.md, rules/), then for each feature add a new ADR and task spec. The .ai/ folder lives alongside your code and evolves with it—commit it with your changes so it stays in sync.

Direct AI through TDD: "Implement test 1 from the spec." AI writes the test and implementation. "Run it." Test passes. "Commit." Repeat. When done, have AI review its own work, confirm findings, direct fixes. Then do your final review for strategic correctness.

Each feature adds a new ADR and task spec to the .ai/ folder. The foundation files rarely change. The documentation compounds.

The Full Timeline

I spent about 45 minutes on documentation upfront: exploring the codebase with AI, questioning its understanding, writing the ADRs, rules, and context. This sounds like a lot, but it's a one-time investment. The context.md, architecture/decisions.md, and rules/ files I wrote for this first feature will be reused for every future feature. I'll only spend 10-15 minutes on feature-specific docs (ADR + task spec) for the next feature.

The implementation took 40 minutes: directing AI through TDD cycles, one test at a time, validating each step. Integration of CLI flags and wiring into main.go took 15 minutes of the same directed approach. Documentation like README updates and examples took another 15 minutes. Manual testing took 15 minutes: I installed the binary and ran real scenarios. The review process took 30 minutes: first AI reviewed its own code (found 5 issues), then I reviewed the fixes, then I did a final deep review on GitHub.

Important: AI wrote 100% of the code. I wrote 100% of the strategy, asked 100% of the questions, and made 100% of the decisions. I reviewed every line, but I didn't type any of them. Total time from fork to production-ready PR was about 2.5 hours.

If I added a second feature tomorrow, it would take less time. By the third feature even faster. The documentation compounds.

Resources

The complete .ai/ structure and documentation is at github.com/sudoish/xc/tree/ai-context/.ai. The pull request with all code and tests is at github.com/joerdav/xc/pull/167. The working fork is at github.com/sudoish/xc. The original issue is joerdav/xc#162.

Note: The .ai/ folder lives in a separate branch in this example only because I wanted to reference it for this article without including it in the PR to the upstream project. In your own work, keep the .ai/ folder in your main working branch and commit it with your changes—it should evolve alongside your code, not separately.

The Skills That Actually Matter

AI wrote every line of code. I read every line, but I didn't write any of them. The feature is production-ready because I focused on what actually matters.

The four skills transformed from framework to practice: rapid onboarding through questioning AI and building structured context, strategic documentation through ADRs written before code, systematic planning through testable specs, and iterative review through AI self-checks followed by strategic verification.

The .ai/ folder, the ADRs, the task specs, the review cycles—they all worked exactly as planned. The result: 84% coverage, zero bugs, 2.5 hours from fork to production-ready PR.

These skills work with free models. They work on unfamiliar codebases. They separate developers who use AI effectively from those who just generate code and hope it works.

The magic isn't in the AI. It's in the process. And the process is this: you think, you plan, you direct, you review. AI executes.

What This Means for Your Career

The developer who can onboard to unfamiliar codebases fast, document decisions strategically, plan systematically, and execute with confidence is far more valuable than the developer who can write code quickly. Because here's the reality: code writing is no longer the bottleneck, it never was. AI just made this a lot more evident

I shipped a production-ready feature to an unfamiliar codebase in 2.5 hours without writing a single line of code. The bottleneck wasn't typing. It was understanding, planning, and judging. Those are the skills that matter.

AI tools are getting better at code generation every month. They're not getting better at understanding your codebase's architecture, making strategic trade-offs, or ensuring production quality. Those skills are still yours. Those skills are what companies pay for.

The question isn't "Will AI replace developers?" It's "Which developers will thrive when everyone has access to AI?" The answer is the ones who master onboarding, documentation, planning, and review. The ones who understand that their job is no longer to write code—it's to think clearly, plan thoroughly, and judge correctly.

This is the junior dev role being redefined. It's not about writing boilerplate anymore. That work is done. It's about learning systems fast, making good decisions, directing execution, and ensuring quality. If you can do that, you're not competing with AI. You're orchestrating it.

Writing code is optional. Reading it, understanding it, and judging it—those aren't.

This post documents a real open source contribution made using AI as a pair programmer. All code, tests, documentation, and the complete .ai/ folder structure are publicly available in the sudoish/xc fork for anyone who wants to replicate this approach.