Image

Beyond "It Works on My Machine": Lessons from Modernizing a Legacy Ansible Role

I’ve been meaning to update my Ansible role for chezmoi for a while now. Chezmoi is my favorite dotfiles manager and I had written an Ansible role to install it on fresh machines ages ago. It worked. I moved on. You know how it goes.

I use this role specifically in my chezmoi-dotfiles project to bootstrap new environments.

Then, recently, I decided to actually run the tests. What I found was… not pretty.

In the world of DevOps, “bit rot” is a silent killer. A project that worked perfectly a year ago can easily crumble under the weight of updated OS kernels, changing licensing models, and evolving tool standards. What started as a simple version bump turned into a deep-dive exploration of the current state of the Ansible ecosystem. I don’t have the first clue about half the changes that happened while I wasn’t looking, so I started digging.

Here are the key learnings from that journey.

The Infrastructure Shift: Saying Goodbye to Fedora

For years, Fedora was my go-to for testing “bleeding edge” RedHat-family compatibility. But recent shifts in the container ecosystem have made it increasingly difficult to maintain stable CI runs for Ansible roles. I kept running into persistent sudo and PAM authentication issues in minimal Fedora images that felt like fighting against the tide.

It turns out there’s a better way. The community is coalescing around Rocky Linux as the standard for RHEL-family testing. By switching the matrix to Rocky Linux 9 and 10, I got faster, more reliable tests that better reflect enterprise environments. Sometimes the right move is just to follow where the ecosystem is heading.

Molecule v6: The Great Decoupling

If you haven’t updated your Molecule tests in a while, you’re in for a surprise. Molecule has moved to a more modular architecture. The classic molecule[docker] installation no longer includes the driver by default.

You must now explicitly install molecule-plugins[docker]. This reflects a broader shift toward smaller, more focused tools, but it’s a major “gotcha” for automated CI pipelines that rely on standard pip installs. I am inherently lazy about reading changelogs (it’s a very human thing), and this one bit me.

Ansible Core 2.20+: The Death of Implicit Facts

Ansible is becoming stricter—and that’s a good thing for long-term stability. A major deprecation warning I had to address involves the INJECT_FACTS_AS_VARS setting.

Previously, you could use {{ ansible_architecture }} directly. In the near future (Ansible 2.24), this will no longer be auto-injected. The fix is to use the explicit {{ ansible_facts['architecture'] }} syntax everywhere. It’s more verbose, sure, but it guarantees your roles won’t break during the next major Ansible upgrade. A little pain now saves a lot of pain later.

The GLIBC Version Trap

One of the biggest hurdles was a GLIBC mismatch on Debian 11. Newer binaries (like the latest chezmoi releases) are often built on environments like Ubuntu 22.04, which use a newer GLIBC than what’s available in older stable distros like Debian Bullseye.

I spent more time on this than I’d like to admit. In the end, the decision was straightforward: focus support on Debian 12 (Bookworm) and the upcoming Debian 13 (Trixie). Sometimes, modernization means knowing when to cut ties with the past to ensure the reliability of the future.

AI as a Senior Peer, Not Just a Code Generator

I’ll be honest—this entire modernization was driven through a high-level AI collaboration. And the most valuable part wasn’t the AI writing the YAML. It was the AI debugging the environment.

Here’s the kind of thing it helped with:

  • Permission errors: It identified why remote_tmp needed to be moved to /tmp to fix permission errors in locked-down containers.
  • CI discovery issues: It helped adopt the “Geerlingguy pattern” for CI checkout paths to fix role discovery.
  • Shell interpretation quirks: It navigated through nested shell interpretation errors when updating PRs via the CLI.

I’m not saying AI replaces your understanding of the tools. But it’s an architectural peer that helps you navigate the specialized, often undocumented “dark corners” of infrastructure as code. That’s been a game-changer for me.

What Changed, in a Nutshell

For those who want the quick version:

  • OS Support: Updated to Debian 12 & 13, Ubuntu 22.04 & 24.04, and Rocky Linux 9 & 10. Dropped deprecated versions.
  • CI/CD: Upgraded all GitHub Actions, added Dependabot, implemented concurrency checks to cancel redundant runs.
  • Core Logic: Migrated from yum to dnf, fixed strict boolean conditionals, replaced ignore_errors with precise failed_when logic for cleaner output.
  • Documentation: Completely rewrote the README with CI badges, requirements, and usage examples.

Check It Out

The role is now cleaner, faster, and fully compatible with the next generation of Linux distributions. If you’re managing dotfiles with chezmoi and Ansible, I hope this helps.

That’s a wrap. I hope this helps anyone else trying to maintain legacy roles in a rapidly evolving ecosystem.