<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>ral-arturo.org</title>
    <description>ral-arturo blog, about free software, debian, networks, systems, or whatever</description>
    <link>https://ral-arturo.org/</link>
    <atom:link href="https://ral-arturo.org/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Fri, 27 Mar 2026 11:10:06 +0000</pubDate>
    <lastBuildDate>Fri, 27 Mar 2026 11:10:06 +0000</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    
      <item>
        <title>New job at Chainguard</title>
        <description>&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/chainguard-logo.png&quot; alt=&quot;Chainguard logo&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A few months ago, in June 2025, I joined &lt;a href=&quot;https://www.chainguard.dev&quot;&gt;Chainguard&lt;/a&gt;, a company focused on software supply chain security.
This post is a reflection on how I got here, what I’ve been doing, and why this role feels like a natural
fit for my interests in Linux and open source technology.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;the-company-and-its-mission&quot;&gt;The company and its mission&lt;/h2&gt;

&lt;p&gt;Chainguard’s mission is to make the software supply chain secure by default. The company is built around
the idea that the software we all depend on — from operating system packages to container base images — carries
hidden risk in the form of vulnerabilities, unverified provenance, and untrusted build processes.&lt;/p&gt;

&lt;p&gt;The company is perhaps best known for &lt;a href=&quot;https://www.chainguard.dev/chainguard-images&quot;&gt;Chainguard Images&lt;/a&gt;: a catalog of minimal, hardened container
base images that are continuously rebuilt and kept free of known CVEs. Each image is accompanied by a signed
&lt;a href=&quot;https://www.cisa.gov/sbom&quot;&gt;SBOM&lt;/a&gt; (Software Bill of Materials) and a verifiable &lt;a href=&quot;https://slsa.dev&quot;&gt;provenance attestation&lt;/a&gt;, making it possible
to cryptographically verify what went into a given image and how it was built.&lt;/p&gt;

&lt;p&gt;Chainguard has an extensive catalog of software, and maintaining it up-to-date and CVE-free is a significant
engineering challenge.&lt;/p&gt;

&lt;h2 id=&quot;what-i-do&quot;&gt;What I do&lt;/h2&gt;

&lt;p&gt;I joined the Chainguard Sustaining Engineering team as a Senior Software Engineer. We are responsible
for maintaining packages and images in the software catalog up-to-date and CVE-free. The core of the business, basically.&lt;/p&gt;

&lt;p&gt;We focus on the horizontal dimension of the catalog (pretty much all packages and images).&lt;/p&gt;

&lt;p&gt;With +30,000 packages and +2,000 images, this is indeed an interesting task.&lt;/p&gt;

&lt;p&gt;My role as Debian Developer, and my experiencie in the &lt;a href=&quot;https://ral-arturo.org/2025/04/17/lts.html&quot;&gt;Debian LTS project&lt;/a&gt; was extremely valuable when joning this
new team.&lt;/p&gt;

&lt;h2 id=&quot;looking-ahead&quot;&gt;Looking ahead&lt;/h2&gt;

&lt;p&gt;Software supply chain is truly a deep topic, gaining more and more relevance every day, especially as new technologies emerge
and get adopted everywhere.&lt;/p&gt;

&lt;p&gt;Since early in my career, I saw a recurrent problem of how companies, enterprises, or even governments, relate to and consume
open source software, in a reliable, secure way. I believe Chainguard is doing the right things in the ecosystem,
and I’m happy to be participating in the effort.&lt;/p&gt;

</description>
        <pubDate>Fri, 27 Mar 2026 08:00:00 +0000</pubDate>
        <link>https://ral-arturo.org/2026/03/27/chainguard.html</link>
        <guid isPermaLink="true">https://ral-arturo.org/2026/03/27/chainguard.html</guid>
        
        <category>job</category>
        
        <category>chainguard</category>
        
        <category>security</category>
        
        <category>linux</category>
        
        
      </item>
    
      <item>
        <title>Wikimedia Cloud VPS: IPv6 support</title>
        <description>&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20250520-cape-town-za-sea-point-nachtansicht.png&quot; alt=&quot;Cape Town (ZA), Sea Point, Nachtansicht&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://commons.wikimedia.org/wiki/User:XRay&quot;&gt;Dietmar Rabich&lt;/a&gt;,
&lt;a href=&quot;https://commons.wikimedia.org/wiki/File:Cape_Town_(ZA),_Sea_Point,_Nachtansicht_--_2024_--_1867-70_-_2.jpg&quot;&gt;Cape Town (ZA), Sea Point, Nachtansicht — 2024 — 1867-70 –
2&lt;/a&gt;,
&lt;a href=&quot;https://creativecommons.org/licenses/by-sa/4.0/legalcode&quot;&gt;CC BY-SA 4.0&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was originally published in the &lt;a href=&quot;https://techblog.wikimedia.org/2025/05/06/wikimedia-cloud-vps-ipv6-support/&quot;&gt;Wikimedia Tech blog&lt;/a&gt;, authored by Arturo Borrero Gonzalez.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS&quot;&gt;Wikimedia Cloud VPS&lt;/a&gt; is a service offered by the Wikimedia
Foundation, built using &lt;a href=&quot;https://en.wikipedia.org/wiki/OpenStack&quot;&gt;OpenStack&lt;/a&gt; and managed by the Wikimedia Cloud Services
team. It provides cloud computing resources for projects related to the
&lt;a href=&quot;https://meta.wikimedia.org/wiki/Wikimedia_movement&quot;&gt;Wikimedia movement&lt;/a&gt;, including virtual machines, databases, storage,
Kubernetes, and DNS.&lt;/p&gt;

&lt;p&gt;A few weeks ago, in April 2025,
&lt;a href=&quot;https://wikitech.wikimedia.org/wiki/News/2025_Cloud_VPS_VXLAN_IPv6_migration&quot;&gt;we were finally able to introduce IPv6&lt;/a&gt; to
the cloud virtual network, enhancing the platform’s scalability, security, and future-readiness. This is a major
milestone, many years in the making, and serves as an excellent point to take a moment to reflect on the road that got
us here.
There were definitely a number of challenges that needed to be addressed before we could get into IPv6. This post covers the journey to this
implementation.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The Wikimedia Foundation was an early adopter of the OpenStack technology, and the original OpenStack deployment in the
organization dates back to 2011. At that time, IPv6 support was still nascent and had limited implementation across
various OpenStack components.
&lt;a href=&quot;https://phabricator.wikimedia.org/T37947&quot;&gt;In 2012, the Wikimedia cloud users formally requested IPv6 support&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When Cloud VPS was originally deployed, we had set up the network following some of the upstream-recommended patterns:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;nova-networks as the engine in charge of the software-defined virtual network&lt;/li&gt;
  &lt;li&gt;using a flat network topology – all virtual machines would share the same network&lt;/li&gt;
  &lt;li&gt;using a physical VLAN in the datacenter&lt;/li&gt;
  &lt;li&gt;using Linux bridges to make this physical datacenter VLAN available to virtual machines&lt;/li&gt;
  &lt;li&gt;using a single virtual router as the edge network gateway, also executing a global egress NAT – barring some
exceptions, using what was called “dmz_cidr” mechanism&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In order for us to be able to implement IPv6 in a way that aligned with our architectural goals and operational
requirements, pretty much all the elements in this list would need to change. First of all, we needed to migrate from
nova-networks into Neutron,
&lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Neutron_SDN&quot;&gt;a migration effort that started in 2017&lt;/a&gt;.
Neutron was the more modern component to implement software-defined networks in OpenStack. To facilitate this
transition, we made the strategic decision to backport certain functionalities from nova-networks into Neutron,
specifically &lt;a href=&quot;https://phabricator.wikimedia.org/T167357&quot;&gt;the “dmz_cidr” mechanism and some egress NAT capabilities&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Once in Neutron, we started to think about IPv6. In 2018 there was an initial attempt to decide on the network CIDR
allocations that Wikimedia Cloud Services would have. This initiative encountered unforeseen challenges
&lt;a href=&quot;https://phabricator.wikimedia.org/T187929#7315518&quot;&gt;and was subsequently put on hold&lt;/a&gt;. We focused on removing the previously
backported nova-networks patches from Neutron.&lt;/p&gt;

&lt;p&gt;Between 2020 and 2021, we initiated another
&lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/2020_Network_refresh&quot;&gt;significant network refresh&lt;/a&gt;.
We were able to introduce the cloudgw project, as part of a larger effort to rework the Cloud VPS edge network. The new
edge routers allowed us to drop all the custom backported patches we had in Neutron from the nova-networks era,
unblocking further progress. Worth mentioning that the cloudgw router would use nftables as firewalling and NAT engine.&lt;/p&gt;

&lt;p&gt;A pivotal decision in 2022 was to
&lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Help:Using_OpenStack_APIs&quot;&gt;expose the OpenStack APIs to the internet&lt;/a&gt;, which
crucially enabled infrastructure management via OpenTofu. This was key in the IPv6 rollout as will be explained later.
Before this, management was limited to Horizon – the OpenStack graphical interface – or the command-line interface
accessible only from internal control servers.&lt;/p&gt;

&lt;p&gt;Later, in 2023, following the OpenStack project’s announcement of the deprecation of the neutron-linuxbridge-agent, we
began to &lt;a href=&quot;https://phabricator.wikimedia.org/T326373&quot;&gt;seriously consider migrating to the neutron-openvswitch-agent&lt;/a&gt;.
This transition would, in turn, simplify the enablement of “tenant networks” – a feature allowing each OpenStack project
to define its own isolated network, rather than all virtual machines sharing a single flat network.&lt;/p&gt;

&lt;p&gt;Once we replaced neutron-linuxbridge-agent with neutron-openvswitch-agent, we were ready to migrate virtual machines to
VXLAN. Demonstrating perseverance, we decided to execute the VXLAN migration in conjunction with the IPv6 rollout.&lt;/p&gt;

&lt;p&gt;We &lt;a href=&quot;https://phabricator.wikimedia.org/T389958&quot;&gt;prepared&lt;/a&gt; and tested several things, including the rework of the edge
routing to be based on BGP/OSPF instead of static routing. In 2024 we were ready for the initial attempt to deploy
IPv6, &lt;a href=&quot;https://phabricator.wikimedia.org/T380728&quot;&gt;which failed for unknown reasons&lt;/a&gt;. There was a full network outage and
we immediately reverted the changes. This quick rollback was feasible due to
&lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/tofu-infra&quot;&gt;our adoption of OpenTofu&lt;/a&gt;: deploying IPv6 had
been reduced to a single code change within our repository.&lt;/p&gt;

&lt;p&gt;We started an investigation, corrected a few issues, and
&lt;a href=&quot;https://phabricator.wikimedia.org/T391325&quot;&gt;increased our network functional testing coverage&lt;/a&gt; before trying again. One
of the problems we discovered was that Neutron would enable the “enable_snat” configuration flag for our main router
when adding the new external IPv6 address.&lt;/p&gt;

&lt;p&gt;Finally, in April 2025,
&lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/IPv6/initial_deploy#Log&quot;&gt;after many years in the making&lt;/a&gt;,
IPv6 was successfully deployed.&lt;/p&gt;

&lt;p&gt;Compared to the network from 2011, we would have:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Neutron as the engine in charge of the software-defined virtual network&lt;/li&gt;
  &lt;li&gt;Ready to use tenant-networks&lt;/li&gt;
  &lt;li&gt;Using a VXLAN-based overlay network&lt;/li&gt;
  &lt;li&gt;Using neutron-openvswitch-agent to provide networking to virtual machines&lt;/li&gt;
  &lt;li&gt;A modern and robust edge network setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Over time, the WMCS team has skillfully navigated numerous challenges to ensure our service offerings consistently meet
high standards of quality and operational efficiency. Often engaging in multi-year planning strategies, we have enabled
ourselves to set and achieve significant milestones.&lt;/p&gt;

&lt;p&gt;The successful IPv6 deployment stands as further testament to the team’s dedication and hard work over the years. I
believe we can confidently say that the 2025 Cloud VPS represents its most advanced and capable iteration to date.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was originally published in the &lt;a href=&quot;https://techblog.wikimedia.org/2025/05/06/wikimedia-cloud-vps-ipv6-support/&quot;&gt;Wikimedia Tech blog&lt;/a&gt;, authored by Arturo Borrero Gonzalez.&lt;/em&gt;&lt;/p&gt;

</description>
        <pubDate>Tue, 20 May 2025 13:00:00 +0000</pubDate>
        <link>https://ral-arturo.org/2025/05/20/wmcs-ipv6.html</link>
        <guid isPermaLink="true">https://ral-arturo.org/2025/05/20/wmcs-ipv6.html</guid>
        
        
      </item>
    
      <item>
        <title>My experience in the Debian LTS and ELTS projects</title>
        <description>&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/debian-logo.jpg&quot; alt=&quot;Debian&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Last year, I decided to start participating in the Debian LTS and ELTS projects. It was a great opportunity to engage in
something new within the Debian community. I had been following these projects for many years, observing their evolution
and how they gained traction both within the ecosystem and across the industry.&lt;/p&gt;

&lt;p&gt;I was curious to explore how contributors were working internally — especially how they managed security patching and
remediation for older software. I’ve always felt this was a particularly challenging area, and I was fortunate to
experience it firsthand.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;As of April 2025, the &lt;a href=&quot;https://wiki.debian.org/LTS&quot;&gt;Debian LTS project&lt;/a&gt; was primarily focused on providing security
maintenance for Debian 11 &lt;em&gt;Bullseye&lt;/em&gt;. Meanwhile, the &lt;a href=&quot;https://wiki.debian.org/LTS/Extended&quot;&gt;Debian ELTS project&lt;/a&gt; was
targeting Debian 8 &lt;em&gt;Jessie&lt;/em&gt;, Debian 9 &lt;em&gt;Stretch&lt;/em&gt;, and Debian 10 &lt;em&gt;Buster&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;During my time with the projects, I worked on a variety of packages and CVEs. Some of the most notable ones include:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://security-tracker.debian.org/tracker/source-package/bluez&quot;&gt;bluez&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://security-tracker.debian.org/tracker/source-package/nss&quot;&gt;nss&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://security-tracker.debian.org/tracker/source-package/libmojolicious-perl&quot;&gt;libmojolicious-perl&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://security-tracker.debian.org/tracker/source-package/uw-imap&quot;&gt;uw-imap&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://security-tracker.debian.org/tracker/source-package/dnsmasq&quot;&gt;dnsmasq&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://security-tracker.debian.org/tracker/source-package/firmware-nonfree&quot;&gt;firmware-nonfree&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://security-tracker.debian.org/tracker/source-package/activemq&quot;&gt;activemq&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://security-tracker.debian.org/tracker/source-package/frr&quot;&gt;frr&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://security-tracker.debian.org/tracker/source-package/libmodbus&quot;&gt;libmodbus&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are several technical highlights I’d like to share — things I learned or had to apply while participating:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;CI/CD pipelines&lt;/strong&gt;: We used CI/CD pipelines on &lt;a href=&quot;https://salsa.debian.org&quot;&gt;salsa.debian.org&lt;/a&gt; all the times to automate
tasks such as building, linting, and testing packages. For any package I worked on that lacked CI/CD integration,
setting it up became my first step.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;autopkgtest&lt;/strong&gt;: There’s a strong emphasis on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;autopkgtest&lt;/code&gt; as the mechanism for running functional tests and ensuring
that security patches don’t introduce regressions. I contributed by both extending existing test suites and writing
new ones from scratch.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Toolchain complexity for older releases&lt;/strong&gt;: Working with older Debian versions like &lt;em&gt;Jessie&lt;/em&gt; brought some unique
challenges. Getting a development environment up and running often meant troubleshooting issues with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sbuild&lt;/code&gt; chroots,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;qemu&lt;/code&gt; images, and other tools that don’t “just work” like they tend to on Debian stable.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Community collaboration&lt;/strong&gt;: The people involved in LTS and ELTS are extremely helpful and collaborative. Requests for
help, code reviews, and general feedback were usually answered quickly.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Shared ownership&lt;/strong&gt;: This collaborative culture also meant that contributors would regularly pick up work left by
others or hand off their own tasks when needed. That mutual support made a big difference.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Backporting security fixes&lt;/strong&gt;: This is probably the most intense —and most rewarding— activity. It involves manually
adapting patches to work on older codebases when upstream patches don’t apply cleanly. This requires deep code
understanding and thorough testing.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Upstream collaboration&lt;/strong&gt;: Reaching out to upstream developers was a key part of my workflow. I often asked if they
could provide patches for older versions or at least review my backports. Sometimes they were available, but most of
the time, the responsibility remained on us.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Diverse tech stack&lt;/strong&gt;: The work exposed me to a wide range of programming languages and frameworks—Python, Java, C,
Perl, and more. Unsurprisingly, some modern languages (like Go) are less prevalent in older releases like &lt;em&gt;Jessie&lt;/em&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Security team interaction&lt;/strong&gt;: I had frequent contact with the core Debian Security Team—the folks responsible for
security in Debian stable. This gave me a broader perspective on how Debian handles security holistically.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In March 2025, I decided to scale back my involvement in the projects due to some changes in my personal
life. Still, this experience has been one of the highlights of my career, and I would definitely recommend it to others.&lt;/p&gt;

&lt;p&gt;I’m very grateful for the warm welcome I received from the LTS/ELTS community, and I don’t rule out the possibility of
rejoining the LTS/ELTS efforts in the future.&lt;/p&gt;

&lt;p&gt;The Debian LTS/ELTS projects are currently coordinated by folks at &lt;a href=&quot;https://www.freexian.com/&quot;&gt;Freexian&lt;/a&gt;.
Many thanks to Freexian and &lt;a href=&quot;https://www.freexian.com/lts/debian/#sponsors&quot;&gt;sponsors&lt;/a&gt; for providing this opportunity!&lt;/p&gt;

</description>
        <pubDate>Thu, 17 Apr 2025 09:00:00 +0000</pubDate>
        <link>https://ral-arturo.org/2025/04/17/lts.html</link>
        <guid isPermaLink="true">https://ral-arturo.org/2025/04/17/lts.html</guid>
        
        
      </item>
    
      <item>
        <title>Wikimedia Toolforge: migrating Kubernetes from PodSecurityPolicy to Kyverno</title>
        <description>&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20240704-wikimedia-commons-chateau-de-valere.png&quot; alt=&quot;Le château de Valère et le Haut de Cry en juillet 2022&quot; /&gt;
&lt;em&gt;&lt;a href=&quot;https://commons.wikimedia.org/wiki/File:Ch%C3%A2teau_de_Val%C3%A8re_et_Haut_de_Cry_-_juillet_2022.jpg&quot;&gt;Christian
David&lt;/a&gt;,
&lt;a href=&quot;https://creativecommons.org/licenses/by-sa/4.0&quot;&gt;CC BY-SA 4.0&lt;/a&gt;, via Wikimedia Commons&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was originally published in the &lt;a href=&quot;https://techblog.wikimedia.org/2024/07/03/wikimedia-toolforge-migrating-kubernetes-from-podsecuritypolicy-to-kyverno/&quot;&gt;Wikimedia Tech blog&lt;/a&gt;, authored by Arturo Borrero Gonzalez.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Summary: this article shares the experience and learnings of migrating away from Kubernetes PodSecurityPolicy into
Kyverno in the Wikimedia Toolforge platform.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Portal:Toolforge&quot;&gt;Wikimedia Toolforge&lt;/a&gt; is a Platform-as-a-Service, built with
Kubernetes, and maintained by the Wikimedia Cloud Services team (WMCS). It is completely free and open, and we welcome
anyone to use it to build and host tools (bots, webservices, scheduled jobs, etc) in support of Wikimedia projects.&lt;/p&gt;

&lt;p&gt;We provide a set of platform-specific services, command line interfaces, and shortcuts to help in the task of setting up
webservices, jobs, and stuff like building container images, or using databases. Using these interfaces makes the
underlying Kubernetes system pretty much invisible to users. We also allow direct access to the Kubernetes API, and some
advanced users do directly interact with it.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Each account has a Kubernetes namespace where they can freely deploy their workloads. We have a number of controls in
place to ensure performance, stability, and fairness of the system, including quotas, RBAC permissions, and up until
recently PodSecurityPolicies (PSP). At the time of this writing, we had around 3.500 Toolforge tool accounts in the
system. We early adopted PSP in 2019 as a way to make sure Pods had the correct runtime configuration. We needed Pods to
stay within the safe boundaries of a set of pre-defined parameters. Back when we adopted PSP there was already the
option to use 3rd party agents, like  &lt;a href=&quot;https://open-policy-agent.github.io/gatekeeper/website/&quot;&gt;OpenPolicyAgent Gatekeeper&lt;/a&gt;,
but we decided not to invest in them, and went with a native, built-in mechanism instead.&lt;/p&gt;

&lt;p&gt;In 2021 it was &lt;a href=&quot;https://kubernetes.io/blog/2021/04/06/podsecuritypolicy-deprecation-past-present-and-future/&quot;&gt;announced&lt;/a&gt;
that the PSP mechanism would be deprecated, and removed in Kubernetes 1.25. Even though we had been warned years in
advance, we did not prioritize the migration of PSP until we were in Kubernetes 1.24, and blocked, unable to upgrade
forward without taking actions.&lt;/p&gt;

&lt;p&gt;The WMCS team explored different alternatives for this migration, but eventually we &lt;a href=&quot;https://phabricator.wikimedia.org/T362233&quot;&gt;decided to go with
Kyverno&lt;/a&gt; as a replacement for PSP. And so with that decision it began the
journey described in this blog post.&lt;/p&gt;

&lt;p&gt;First, we needed a source code refactor for one of the key components of our Toolforge Kubernetes:
&lt;a href=&quot;https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers&quot;&gt;maintain-kubeusers&lt;/a&gt;. This custom piece of
software that we built in-house, contains the logic to fetch accounts from LDAP and do the necessary instrumentation on
Kubernetes to accommodate each one: create namespace, RBAC, quota, a kubeconfig file, etc. With the refactor, we
introduced a proper reconciliation loop, in a way that the software would have a notion of what needs to be done for
each account, what would be missing, what to delete, upgrade, and so on. This would allow us to easily deploy new
resources for each account, or iterate on their definitions.&lt;/p&gt;

&lt;p&gt;The initial version of the refactor had a number of problems, though. For one, the new version of maintain-kubeusers was
doing more filesystem interaction than the previous version, resulting in a slow reconciliation loop over all the
accounts. We used NFS as the underlying storage system for Toolforge, and it could be very slow because of reasons
beyond this blog post. This was corrected in the next few days after the initial refactor rollout. A side note with an
implementation detail: we stored a configmap on each account namespace with the state of each resource. Storing more
state on this configmap was our solution to avoid additional NFS latency.&lt;/p&gt;

&lt;p&gt;I initially estimated this refactor would take me a week to complete, but unfortunately it took me around three weeks
instead. Previous to the refactor, there were several manual steps and cleanups required to be done when updating the
definition of a resource. The process is now automated, more robust, performant, efficient and clean. So in my opinion
it was worth it, even if it took more time than expected.&lt;/p&gt;

&lt;p&gt;Then, we worked on the Kyverno policies themselves. Because we had a very particular PSP setting, in order to ease the
transition, we tried to replicate their semantics on a 1:1 basis as much as possible. This involved things like
transparent mutation of Pod resources, then validation. Additionally, we had one different PSP definition for each
account, so we decided to create one different Kyverno namespaced policy resource for each account namespace — remember,
we had 3.5k accounts.&lt;/p&gt;

&lt;p&gt;We created a &lt;a href=&quot;https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/blob/main/maintain_kubeusers/resources/kyverno_pod_policy.yaml.tpl?ref_type=heads&quot;&gt;Kyverno policy
template&lt;/a&gt;
that we would then render and inject for each account.&lt;/p&gt;

&lt;p&gt;For developing and testing all this, maintain-kubeusers and the Kyverno bits, we had a project called
&lt;a href=&quot;https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo&quot;&gt;lima-kilo&lt;/a&gt;, which was a local Kubernetes setup
replicating production Toolforge. This was used by each engineer in their laptop as a common development environment.&lt;/p&gt;

&lt;p&gt;We had planned the migration from PSP to Kyverno policies in stages, like this:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;update our internal template generators to make Pod security settings explicit&lt;/li&gt;
  &lt;li&gt;introduce Kyverno policies in Audit mode&lt;/li&gt;
  &lt;li&gt;see how the cluster would behave with them, and if we had any offending resources reported by the new policies, and
correct them&lt;/li&gt;
  &lt;li&gt;modify Kyverno policies and set them in Enforce mode&lt;/li&gt;
  &lt;li&gt;drop PSP&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In stage 1, we &lt;a href=&quot;https://phabricator.wikimedia.org/T362050&quot;&gt;updated things&lt;/a&gt; like the toolforge-jobs-framework and tools-webservice.&lt;/p&gt;

&lt;p&gt;In stage 2, when we deployed the 3.5k Kyverno policy resources, our production cluster died almost immediately.
Surprise. All the monitoring went red, the Kubernetes apiserver became irresponsibe, and we were unable to perform any
administrative actions in the Kubernetes control plane, or even the underlying virtual machines. All Toolforge users
were impacted. This was a &lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Incidents/2024-06-12_WMCS_toolforge_k8s_control_plane&quot;&gt;full scale
outage&lt;/a&gt; that required the
energy of the whole WMCS team to recover from. We temporarily disabled Kyverno until we could learn what had occurred.&lt;/p&gt;

&lt;p&gt;This incident happened despite having tested before in lima-kilo and in another pre-production cluster we had, called
&lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolsbeta&quot;&gt;Toolsbeta&lt;/a&gt;. But we had not tested that many
policy resources. Clearly, this was something scale-related. After the incident, I went on and created 3.5k Kyverno
policy resources on lima-kilo, and indeed I was able to reproduce the outage. We took a number of measures, corrected a
&lt;a href=&quot;https://phabricator.wikimedia.org/T367389&quot;&gt;few errors&lt;/a&gt; in our infrastructure,  reached out to the Kyverno upstream
developers, &lt;a href=&quot;https://github.com/kyverno/kyverno/issues/10458&quot;&gt;asking for advice&lt;/a&gt;, and at the end we did the following to
accommodate the setup to our needs:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;corrected the external HAproxy kubernetes apiserver health checks, from checking just for open TCP ports, to actually
checking the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/healthz&lt;/code&gt; HTTP endpoint, which more accurately reflected the health of each k8s apiserver.&lt;/li&gt;
  &lt;li&gt;having a more realistic development environment. In lima-kilo, we created a couple of &lt;a href=&quot;https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/blob/main/helpers/toolforge_create_many_accounts.sh?ref_type=heads&quot;&gt;helper scripts&lt;/a&gt; to create/delete
4000 policy resources, each on a different namespace.&lt;/li&gt;
  &lt;li&gt;greatly over-provisioned memory in the Kubernetes control plane servers. This is, bigger memory in the base virtual
machine hosting the control plane. Scaling the memory headroom of the apiserver would prevent it from running out of
memory, and therefore crashing the whole system. We went from 8GB RAM per virtual machine to 32GB.  In our cluster, a
single apiserver pod could eat 7GB of memory on a normal day, so having 8GB on the base virtual machine was clearly
not enough. I also sent a &lt;a href=&quot;https://github.com/kyverno/website/pull/1295&quot;&gt;patch proposal&lt;/a&gt; to Kyverno upstream documentation suggesting they clarify the additional
memory pressure on the apiserver.&lt;/li&gt;
  &lt;li&gt;corrected &lt;a href=&quot;https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/blob/main/components/kyverno/values/common/settings.yaml?ref_type=heads&quot;&gt;resource requests and limits&lt;/a&gt; of Kyverno, to more accurately describe our actual usage.&lt;/li&gt;
  &lt;li&gt;increased the number of replicas of the Kyverno admission controller to 7, so admission requests could be handled more
timely by Kyverno.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I have to admit, I was &lt;a href=&quot;https://phabricator.wikimedia.org/T367950&quot;&gt;briefly tempted to drop Kyverno&lt;/a&gt;, and even stop pursuing using an external policy agent entirely,
and write our own custom admission controller out of concerns over performance of this architecture. However, after
applying all the measures listed above, the system became very stable, so we decided to move forward. The second attempt
at deploying it all went through just fine. No outage this time 🙂&lt;/p&gt;

&lt;p&gt;When we were in stage 4 we detected another bug. We had been following the Kubernetes upstream documentation for setting
securityContext to the right values. In particular, we were enforcing the procMount to be set to the default value,
which per the docs it was
‘&lt;a href=&quot;https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#securitycontext-v1-core&quot;&gt;DefaultProcMount&lt;/a&gt;’.
However, that string is the name of the internal variable in the source code, whereas the actual default value is the
string ‘&lt;a href=&quot;https://github.com/kubernetes/api/blob/release-1.24/core/v1/types.go#L6422&quot;&gt;Default&lt;/a&gt;’. This caused pods to be
rightfully rejected by Kyverno while we figured the problem. I sent a &lt;a href=&quot;https://github.com/kubernetes/kubernetes/pull/125782&quot;&gt;patch
upstream&lt;/a&gt; to fix this problem.&lt;/p&gt;

&lt;p&gt;We finally had everything in place, reached stage 5, and we were able to disable PSP. We unloaded the PSP controller
from the kubernetes apiserver, and deleted every individual PSP definition. Everything was very smooth in this last step
of the migration.&lt;/p&gt;

&lt;p&gt;This whole PSP project, including the maintain-kubeusers refactor, the outage, and all the different migration stages
took roughly three months to complete.&lt;/p&gt;

&lt;p&gt;For me there are a number of valuable reasons to learn from this project. For one, the scale is something to consider,
and test, when evaluating a new architecture or software component. Not doing so can lead to service outages, or
unexpectedly poor performances. This is in the first chapter of the SRE handbook, but we got a reminder the hard way 🙂&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was originally published in the &lt;a href=&quot;https://techblog.wikimedia.org/2024/07/03/wikimedia-toolforge-migrating-kubernetes-from-podsecuritypolicy-to-kyverno/&quot;&gt;Wikimedia Tech blog&lt;/a&gt;, authored by Arturo Borrero Gonzalez.&lt;/em&gt;&lt;/p&gt;

</description>
        <pubDate>Thu, 04 Jul 2024 09:00:00 +0000</pubDate>
        <link>https://ral-arturo.org/2024/07/04/kyverno.html</link>
        <guid isPermaLink="true">https://ral-arturo.org/2024/07/04/kyverno.html</guid>
        
        
      </item>
    
      <item>
        <title>Kubecon and CloudNativeCon 2024 Europe summary</title>
        <description>&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20240401-kubecon-logo.png&quot; alt=&quot;Kubecon EU 2024 Paris logo&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This blog post shares my thoughts on attending Kubecon and CloudNativeCon 2024 Europe in Paris. It was my third time at
this conference, and it felt bigger than last year’s in Amsterdam. Apparently it had an impact on public transport. I
missed part of the opening keynote because of the extremely busy rush hour tram in Paris.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;&lt;strong&gt;On Artificial Intelligence, Machine Learning and GPUs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Talks about AI, ML, and GPUs were everywhere this year. While it wasn’t my main interest, I did learn about GPU resource
sharing and power usage on Kubernetes. There were also ideas about offering Models-as-a-Service, which could be cool for
Wikimedia Toolforge in the future.&lt;/p&gt;

&lt;p&gt;See also:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://phabricator.wikimedia.org/T336905&quot;&gt;Phabricator T336905: Supporting AI, LLM, and data models on WMCS&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ollama.com/&quot;&gt;ollama.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;On security, policy and authentication&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This was probably the main interest for me in the event, given Wikimedia Toolforge was about to migrate away from Pod
Security Policy, and we were currently evaluating different alternatives.&lt;/p&gt;

&lt;p&gt;In contrast to my previous attendances to Kubecon, where there were three policy agents with presence in the program
schedule, Kyverno, Kubewarden and OpenPolicyAgent (OPA), this time only OPA had the most relevant sessions.&lt;/p&gt;

&lt;p&gt;One surprising bit I got from one of the OPA sessions was that it could work to authorize linux PAM sessions. Could this
be useful for Wikimedia Toolforge?&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20240401-kubecon-opa.png&quot; alt=&quot;OPA talk&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I attended several sessions related to authentication topics. I discovered the keycloak software, which looks very
promising. I also attended an Oauth2 session which I had a hard time following, because I clearly missed some additional
knowledge about how Oauth2 works internally.&lt;/p&gt;

&lt;p&gt;I also attended a couple of sessions that ended up being a vendor sales talk.&lt;/p&gt;

&lt;p&gt;See also:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://phabricator.wikimedia.org/T279110&quot;&gt;Phabricator T279110: Replace PodSecurityPolicy in Toolforge Kubernetes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.keycloak.org/&quot;&gt;keycloak.org&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;On container image builds, harbor registry, etc&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This topic was also of interest to me because, again, it is a core part of Wikimedia Toolforge.&lt;/p&gt;

&lt;p&gt;I attended a couple of sessions regarding container image builds, including topics like general best practices, image
minimization, and buildpacks. I learned about kpack, which at first sight felt like a nice simplification of how the
Toolforge build service was implemented.&lt;/p&gt;

&lt;p&gt;I also attended a session by the Harbor project maintainers where they shared some valuable information on things
happening soon or in the future , for example:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;new harbor command line interface coming soon. Only the first iteration though.&lt;/li&gt;
  &lt;li&gt;harbor operator, to install and manage harbor. Looking for new maintainers, otherwise going to be archived.&lt;/li&gt;
  &lt;li&gt;the project is now experimenting with adding support to hosting more artifacts: maven, NPM, pypi. I wonder if they will
consider hosting Debian .deb packages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;On networking&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I attended a couple of sessions regarding networking.&lt;/p&gt;

&lt;p&gt;One session in particular I paid special attention to, ragarding on network policies. They discussed new semantics being
added to the Kubernetes API.&lt;/p&gt;

&lt;p&gt;The different layers of abstractions being added to the API, the different hook points, and override layers clearly
resembled (to me at least) the network packet filtering stack of the linux kernel (netfilter), but without the 20 (plus)
years of experience building the right semantics and user interfaces.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20240401-kubecon-net.png&quot; alt=&quot;Network talk&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I very recently missed some semantics for limiting the number of open connections per namespace, see &lt;a href=&quot;https://phabricator.wikimedia.org/T356164&quot;&gt;Phabricator
T356164: [toolforge] several tools get periods of connection refused (104) when connecting to
wikis&lt;/a&gt; This functionality should be available in the lower level tools, I
mean Netfilter. I may submit a proposal upstream at some point, so they consider adding this to the Kubernetes API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final notes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In general, I believe I learned many things, and perhaps even more importantly I re-learned some stuff I had forgotten
because of lack of daily exposure. I’m really happy that the cloud native way of thinking was reinforced in me, which I
still need because most of my muscle memory to approach systems architecture and engineering is from the old pre-cloud
days. That being said, I felt less engaged with the content of the conference schedule compared to last year. I don’t
know if the schedule itself was less interesting, or that I’m losing interest?&lt;/p&gt;

&lt;p&gt;Finally, not an official track in the conference, but we met a bunch of folks from
&lt;a href=&quot;https://www.wikimedia.de/&quot;&gt;Wikimedia Deutschland&lt;/a&gt;. We had a really nice time talking about how
&lt;a href=&quot;https://wikibase.cloud&quot;&gt;wikibase.cloud&lt;/a&gt; uses Kubernetes, whether they could run in Wikimedia Cloud Services, and why
structured data is so nice.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20240401-kubecon-group.png&quot; alt=&quot;Group photo&quot; /&gt;&lt;/p&gt;

</description>
        <pubDate>Mon, 01 Apr 2024 09:00:00 +0000</pubDate>
        <link>https://ral-arturo.org/2024/04/01/kubecon.html</link>
        <guid isPermaLink="true">https://ral-arturo.org/2024/04/01/kubecon.html</guid>
        
        
      </item>
    
      <item>
        <title>Back to the Wikimedia Foundation!</title>
        <description>&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/wmf-logo-black.png&quot; alt=&quot;Wikimedia Foundation logo&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In October 2023, I departed from the &lt;a href=&quot;https://wikimediafoundation.org&quot;&gt;Wikimedia Foundation&lt;/a&gt;, the non-profit organization
behind well-known projects like Wikipedia and others, to &lt;a href=&quot;https://ral-arturo.org/2023/11/20/spryker.html&quot;&gt;join Spryker&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;However, in January 2024 Spryker conducted a round of layoffs reportedly due to budget and business reasons.
I was among those affected, being let go just three months after joining the company.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Fortunately, the &lt;a href=&quot;https://www.mediawiki.org/wiki/Wikimedia_Cloud_Services_team&quot;&gt;Wikimedia Cloud Services team&lt;/a&gt;, where I previously worked, was still seeking to backfill my
position, so I reached out to them. They graciously welcomed me back as a Senior Site Reliability Engineer,
in the same team and position as before.&lt;/p&gt;

&lt;p&gt;Although this three-month career “detour” wasn’t the outcome I initially envisioned, I found it to be a valuable experience.
During this time, I gained knowledge in a new tech stack, based on AWS, and discovered new engineering methodologies.
Additionally, I had the opportunity to meet some wonderful individuals. I believe I have emerged stronger from this experience.&lt;/p&gt;

&lt;p&gt;Returning to the Wikimedia Foundation is truly motivating. It feels privileged to be part of this mature organization,
its community, and movement, with its inspiring &lt;a href=&quot;https://meta.wikimedia.org/wiki/Mission&quot;&gt;mission&lt;/a&gt; and &lt;a href=&quot;https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Values&quot;&gt;values&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In addition, I’m hoping that this also means I can once again dedicate a bit more attention to my FLOSS activities,
such as my duties within the Debian project.&lt;/p&gt;

&lt;p&gt;My email address is back online: &lt;a href=&quot;mailto:aborrero@wikimedia.org&quot;&gt;aborrero@wikimedia.org&lt;/a&gt;.
You can find me again in the IRC libera.chat server, in the usual wikimedia channels, nick &lt;em&gt;arturo&lt;/em&gt;.&lt;/p&gt;

</description>
        <pubDate>Tue, 13 Feb 2024 09:30:00 +0000</pubDate>
        <link>https://ral-arturo.org/2024/02/13/wmf.html</link>
        <guid isPermaLink="true">https://ral-arturo.org/2024/02/13/wmf.html</guid>
        
        <category>job</category>
        
        <category>wikimedia foundation</category>
        
        
      </item>
    
      <item>
        <title>OpenTofu: handcrafted include-file mechanism with YAML</title>
        <description>&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/opentofu-logo.png&quot; alt=&quot;Post logo&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I recently started playing with Terraform/&lt;a href=&quot;https://opentofu.org/&quot;&gt;OpenTofu&lt;/a&gt; almost on a daily basis.&lt;/p&gt;

&lt;p&gt;The other day I was working with Amazon Managed Prometheus (or AMP), and wanted to define prometheus alert rules on YAML files.&lt;/p&gt;

&lt;p&gt;I decided that I needed a way to put the alerts on a bunch of files, and then load them by the declarative code, on the correct
AMP workspace.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;I came up with this code pattern that I’m sharing here, for my future reference, and in case it is interesting to someone else.&lt;/p&gt;

&lt;p&gt;The YAML file where I specify the AMP workspace, and where the alert rule files live:&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nn&quot;&gt;---&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;alert_files&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;my_alerts_production&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;amp&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;workspace&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;production&quot;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;files&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;alert_rules/production/*.yaml&quot;&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;my_alerts_staging&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;amp&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;workspace&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;staging&quot;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;files&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;alert_rules/staging/*.yaml&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;files&lt;/code&gt; entry contains a file pattern. I will later expand the pattern using the &lt;a href=&quot;https://opentofu.org/docs/language/functions/fileset&quot;&gt;fileset()&lt;/a&gt; function.&lt;/p&gt;

&lt;p&gt;Each rule file would be something like this:&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nn&quot;&gt;---&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;my_rule_namespace&quot;&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;rule_data&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;|&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;# this is prometheus-specific config&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;groups:&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;- name: &quot;example_alert_group&quot;&lt;/span&gt;
      &lt;span class=&quot;s&quot;&gt;rules:&lt;/span&gt;
      &lt;span class=&quot;s&quot;&gt;- alert: Example_Alert_Cpu&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;# just arbitrary values, to produce an example alert&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;expr: avg(rate(ecs_cpu_seconds_total{container=~&quot;something&quot;}[2m])) &amp;gt; 1&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;for: 10s&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;annotations:&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;summary: &quot;CPU usage is too high&quot;&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;description: &quot;The container average CPU usage is too high.&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I’m interested in the data structure mutating into something similar to this:&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nn&quot;&gt;---&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;alert_files&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;my_alerts_production&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;amp&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;workspace&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;production&quot;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;alerts_data&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;rule_namespace_1&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;rule_data&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;|&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;# actual alert definition here&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;[..]&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;rule_namespace_2&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;rule_data&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;|&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;# actual alert definition here&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;[..]&lt;/span&gt;

  &lt;span class=&quot;na&quot;&gt;my_alerts_staging&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;amp&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;workspace&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;staging&quot;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;alerts_data&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;rule_namespace_1&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;rule_data&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;|&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;# actual alert definition here&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;[..]&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;rule_namespace_2&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;rule_data&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;|&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;# actual alert definition here&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;[..]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is the algorithm that does the trick:&lt;/p&gt;

&lt;div class=&quot;language-terraform highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nx&quot;&gt;locals&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;alerts_config&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;alert_files&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;k&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;amp&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;amp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;files&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;fileset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;files&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;amp&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;amp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;alertmanager_data&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;z&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;files&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;yamldecode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;z&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Because the declarative nature of the Terraform/OpenTofu language, I needed to implement 3 different for loops. Each loop
reads the map and transforms it in some way, passing the result into the next loop. A bit convoluted if you ask me.&lt;/p&gt;

&lt;p&gt;To explain the logic, I think it makes more sense to read it from inside out.&lt;/p&gt;

&lt;p&gt;First loop:&lt;/p&gt;
&lt;div class=&quot;language-terraform highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    &lt;span class=&quot;nx&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;v&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;local&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;alert_files&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;k&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;amp&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;amp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;files&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;fileset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;files&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This loop iterates the input YAML map in key-value pairs, remapping each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;amp&lt;/code&gt; entry, and expanding the file globs using the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fileset()&lt;/code&gt; into a temporal &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;files&lt;/code&gt; entry.&lt;/p&gt;

&lt;p&gt;Second loop:&lt;/p&gt;
&lt;div class=&quot;language-terraform highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    &lt;span class=&quot;nx&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;# previous fileset() loop&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;amp&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;amp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;alertmanager_data&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;# yamldecode() loop&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This intermediate loop is responsible for building the final data structure. It iterates the previous &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fileset()&lt;/code&gt; loop
to remap it calling the next loop, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;yamldecode()&lt;/code&gt; one. Note how the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;amp&lt;/code&gt; entry is being “rebuilt” in each remap (first loop
and this one), otherwise we would lose it!&lt;/p&gt;

&lt;p&gt;Third loop:&lt;/p&gt;
&lt;div class=&quot;language-terraform highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    &lt;span class=&quot;nx&quot;&gt;alertmanager_data&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;z&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;files&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;yamldecode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;z&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And finally, this is maybe the easiest loop of the 3, we iterate the temporal &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;file&lt;/code&gt; entry that was created in the first loop,
calling &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;yamldecode()&lt;/code&gt; for each of the file names generated by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fileset()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The resulting data structure should allow you to easily create resources later in a &lt;a href=&quot;https://opentofu.org/docs/language/meta-arguments/for_each/&quot;&gt;for_each&lt;/a&gt; loop.&lt;/p&gt;

</description>
        <pubDate>Thu, 14 Dec 2023 15:00:00 +0000</pubDate>
        <link>https://ral-arturo.org/2023/12/14/opentofu.html</link>
        <guid isPermaLink="true">https://ral-arturo.org/2023/12/14/opentofu.html</guid>
        
        
      </item>
    
      <item>
        <title>New job at Spryker</title>
        <description>&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/spryker_logo.png&quot; alt=&quot;Post logo&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Last month, in October 2023, I started a new job as a Senior Site Reliability Engineer at the Germany-headquartered technology company
&lt;a href=&quot;https://spryker.com/&quot;&gt;Spryker&lt;/a&gt;. They are primarily focused on e-commerce and infrastructure businesses.&lt;/p&gt;

&lt;p&gt;I joined this company with excitement and curiosity, as I would be working with a new technology stack: AWS and Terraform. I had not been
directly exposed to them in the past, but I was aware of the vast industry impact of both.&lt;/p&gt;

&lt;p&gt;Suddenly, things like PXE boot, NIC firmware, and Linux kernel upgrades are mostly AWS problems.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;This is a full-time, 100% remote job position. I’m writing this post one month into the new position, and all I can say is that the
onboarding was smooth, and I found some good people in my team.&lt;/p&gt;

&lt;p&gt;Prior to joining Spryker, I was a Senior SRE at the Wikimedia Cloud Services team at the Wikimedia Foundation. I had been there
&lt;a href=&quot;https://ral-arturo.org/2017/10/23/first-day-wmf.html&quot;&gt;since 2017, for 6 years&lt;/a&gt;. It was a privilege working there.&lt;/p&gt;

</description>
        <pubDate>Mon, 20 Nov 2023 07:44:00 +0000</pubDate>
        <link>https://ral-arturo.org/2023/11/20/spryker.html</link>
        <guid isPermaLink="true">https://ral-arturo.org/2023/11/20/spryker.html</guid>
        
        
      </item>
    
      <item>
        <title>Wikimedia Hackathon 2023 Athens summary</title>
        <description>&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20230530-hackathon-logo.png&quot; alt=&quot;Post logo&quot; /&gt;&lt;/p&gt;

&lt;p&gt;During the weekend of 19-23 May 2023 I attended the &lt;a href=&quot;https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023&quot;&gt;Wikimedia hackathon 2023&lt;/a&gt; in Athens,
Greece. The event physically reunited folks interested in the more technological aspects of the Wikimedia movement in person for the
first time &lt;a href=&quot;https://ral-arturo.org/2019/08/28/wikimania2019.html&quot;&gt;since 2019&lt;/a&gt;. The scope of the hacking projects include (but was not limited to)
tools, wikipedia bots, gadgets, server and network infrastructure, data and other technical systems.&lt;/p&gt;

&lt;p&gt;My role in the event was two-fold: on one hand I was in the event because of my role as SRE in the Wikimedia Cloud Services team, where we provided
very valuable services to the community, and I was expected to support the technical contributors of the movement that were around. Additionally, and
because of that same role, I did some hacking myself too, which was specially augmented given I generally collaborate on a daily basis with some
community members that were present in the hacking room.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The hackathon had some conference-style track and I ran a session with my coworker Bryan, called
&lt;a href=&quot;https://phabricator.wikimedia.org/T333939&quot;&gt;Past, Present and Future of Wikimedia Cloud Services (Toolforge and friends)&lt;/a&gt;
&lt;a href=&quot;https://commons.wikimedia.org/wiki/File:Past_Present_and_Future_of_WMCS.pdf&quot;&gt;(slides)&lt;/a&gt; which was very satisfying to deliver given the friendly space
that it was. I attended a bunch of other sessions, and all of them were interesting and well presented. The number of ML themes that were present in
the program schedule was exciting. I definitely learned a lot from attending those sessions, from how LLMs work, some
fascinating applications for them in the wikimedia space, to what were some industry trends for training and hosting ML models.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20230531-hackathon-session.png&quot; alt=&quot;Session&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Despite the sessions, the main purpose of the hackathon was, well, hacking. While I was in the hacking space for more than 12 hours each day, my
ability to get things done was greatly reduced by the constant conversations, help requests, and other social interactions with the folks. Don’t get
me wrong, I embraced that reality with joy, because the social bonding aspect of it is perhaps the main reason why we gathered in person instead of
virtually.&lt;/p&gt;

&lt;p&gt;That being said, this is a rough list of what I did:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Helped review the status of &lt;a href=&quot;https://phabricator.wikimedia.org/T319593&quot;&gt;Migrate bldrwnsch from Toolforge GridEngine to Toolforge Kubernetes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Helped the maintainer of the &lt;a href=&quot;https://phabricator.wikimedia.org/T337190&quot;&gt;bodh Toolforge tool get it working with a reverse proxy&lt;/a&gt;
and started conversations about &lt;a href=&quot;https://phabricator.wikimedia.org/T337191&quot;&gt;how to facilitate the use case&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Discussed persistent storage options in Toolforge, which resulted in &lt;a href=&quot;https://phabricator.wikimedia.org/T337192&quot;&gt;some conversations within the WMCS team&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Had several debates with several folks on what computing abstractions we are providing, including Toolforge as a PaaS, raw Kubernetes access, or even if
we should continue offering virtual machines to the community.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/commit/5ce8bb56d64d490009a39fada0293e3ed4e3ef53&quot;&gt;Patched toolforge-weld&lt;/a&gt; to
support some stuff required by &lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Jobs_framework&quot;&gt;jobs-framework-cli&lt;/a&gt;.
Also made a release of it.&lt;/li&gt;
  &lt;li&gt;Started &lt;a href=&quot;https://gerrit.wikimedia.org/r/c/cloud/toolforge/jobs-framework-cli/+/921412&quot;&gt;a patch for jobs-framework-cli&lt;/a&gt; to
use &lt;a href=&quot;https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld&quot;&gt;toolforge-weld&lt;/a&gt;. Which is still unfinished as of this writing.&lt;/li&gt;
  &lt;li&gt;Debated about how to host ML models, what to do with GPUs etc.&lt;/li&gt;
  &lt;li&gt;Suggested fellow SRE Riccardo in working on &lt;a href=&quot;https://phabricator.wikimedia.org/T335759&quot;&gt;cloud-private subnet: introduce new domain&lt;/a&gt; to
integrate with &lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Netbox&quot;&gt;Netbox&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Uncovered &lt;a href=&quot;https://phabricator.wikimedia.org/T337010&quot;&gt;a flavor definition problem in our Openstack&lt;/a&gt; deployment.&lt;/li&gt;
  &lt;li&gt;Reviewed many Toolforge account requests (most of them not related to the hackathon though), some quota requests and similar things.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hackathon was also the final days of Technical Engagement as an umbrella group for WMCS and Developer Advocacy teams within the Technology
department of the Wikimedia Foundation because of an internal reorg.. We used the chance to reflect on the pleasant time we have had together since 2019
and take a final picture of the few of us that were in person in the event.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20230531-hackathon-te.png&quot; alt=&quot;Technical Engagement&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It wasn’t the first Wikimedia Hackathon for me, and I felt the same as in previous iterations: it was a welcoming space, and I was surrounded by
friends and nice human beings. I ended the event with a profound feeling of being privileged, because I was part of the Wikimedia movement, and
because I was invited to participate in it.&lt;/p&gt;

</description>
        <pubDate>Wed, 31 May 2023 12:11:00 +0000</pubDate>
        <link>https://ral-arturo.org/2023/05/31/hackathon.html</link>
        <guid isPermaLink="true">https://ral-arturo.org/2023/05/31/hackathon.html</guid>
        
        <category>wikimedia</category>
        
        
      </item>
    
      <item>
        <title>Kubecon and CloudNativeCon 2023 Europe summary</title>
        <description>&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20230427-kubecon-logo.png&quot; alt=&quot;Post logo&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This post serves as a report from my attendance to Kubecon and CloudNativeCon 2023 Europe that took place in 
Amsterdam in April 2023. It was my second time physically attending this conference, the first one was in 
Austin, Texas (USA) in 2017. I also attended once in a virtual fashion.&lt;/p&gt;

&lt;p&gt;The content here is mostly generated for the sake of my own recollection and learnings, and is written from 
the notes I took during the event.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;The very first session was the opening keynote, which reunited the whole crowd to bootstrap the event and 
share the excitement about the days ahead. Some astonishing numbers were announced: there were more than 
10.000 people attending, and apparently it could confidently be said that it was the largest open source 
technology conference taking place in Europe in recent times.&lt;/p&gt;

&lt;p&gt;It was also communicated that the next couple iteration of the event will be run in China in September 2023 
and Paris in March 2024.&lt;/p&gt;

&lt;p&gt;More numbers, the CNCF was hosting about 159 projects, involving 1300 maintainers and about 200.000 
contributors. The cloud-native community is ever-increasing, and there seems to be a strong trend in the 
industry for cloud-native technology adoption and all-things related to PaaS and IaaS.&lt;/p&gt;

&lt;p&gt;The event program had different tracks, and in each one there was an interesting mix of low-level and higher 
level talks for a variety of audience. On many occasions I found that reading the talk title alone was not 
enough to know in advance if a talk was a 101 kind of thing or for experienced engineers. But unlike in 
previous editions, I didn’t have the feeling that the purpose of the conference was to try selling me 
anything. Obviously, speakers would make sure to mention, or highlight in a subtle way, the involvement of a 
given company in a given solution or piece of the ecosystem. But it was non-invasive and fair enough for me.&lt;/p&gt;

&lt;p&gt;On a different note, I found the breakout rooms to be often small. I think there were only a couple of rooms 
that could accommodate more than 500 people, which is a fairly small allowance for 10k attendees. I realized 
with frustration that the more interesting talks were immediately fully booked, with people waiting in line 
some 45 minutes before the session time. Because of this, I missed a few important sessions that I’ll 
hopefully watch online later.&lt;/p&gt;

&lt;p&gt;Finally, on a more technical side, I’ve learned many things, that instead of grouping by session I’ll group 
by topic, given how some subjects were mentioned in several talks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On gitops and CI/CD pipelines&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most of the mentions went to &lt;a href=&quot;https://fluxcd.io/&quot;&gt;FluxCD&lt;/a&gt; and &lt;a href=&quot;https://argoproj.github.io/cd/&quot;&gt;ArgoCD&lt;/a&gt;. At 
that point there were no doubts that gitops was a mature approach and both flux and argoCD could do an 
excellent job. ArgoCD seemed a bit more over-engineered to be a more general purpose CD pipeline, and flux 
felt a bit more tailored for simpler gitops setups. I discovered that both have nice web user interfaces that 
I wasn’t previously familiar with.&lt;/p&gt;

&lt;p&gt;However, in two different talks I got the impression that the initial setup of them was simple, but migrating 
your current workflow to gitops could result in a bumpy ride. This is, the challenge is not deploying 
flux/argo itself, but moving everything into a state that both humans and flux/argo can understand. I also 
saw some curious mentions to the config drifts that can happen in some cases, even if the goal of gitops is 
precisely for that to never happen. Such mentions were usually accompanied by some hints on how to operate 
the situation by hand.&lt;/p&gt;

&lt;p&gt;Worth mentioning, I missed any practical information about one of the key pieces to this whole gitops story: 
building container images. Most of the showcased scenarios were using pre-built container images, so in that 
sense they were simple. Building and pushing to an image registry is one of the two key points we would need 
to solve in Toolforge Kubernetes if adopting gitops.&lt;/p&gt;

&lt;p&gt;In general, even if gitops were already in our radar for
&lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Toolforge_Kubernetes_component_workflow_improvements&quot;&gt;Toolforge Kubernetes&lt;/a&gt;, 
I think it climbed a few steps in my priority list after the conference.&lt;/p&gt;

&lt;p&gt;Another learning was this site: &lt;a href=&quot;https://opengitops.dev/&quot;&gt;https://opengitops.dev/&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20230427-kubecon-group.png&quot; alt=&quot;Group&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On etcd, performance and resource management&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I attended a talk focused on etcd performance tuning that was very encouraging. They were basically talking 
about the &lt;a href=&quot;https://phabricator.wikimedia.org/T333929&quot;&gt;exact&lt;/a&gt; 
&lt;a href=&quot;https://phabricator.wikimedia.org/T267966&quot;&gt;same&lt;/a&gt; &lt;a href=&quot;https://phabricator.wikimedia.org/T333931&quot;&gt;problems&lt;/a&gt; we 
have had in Toolforge Kubernetes, like api-server and etcd failure modes, and how sensitive etcd is to &lt;a href=&quot;https://phabricator.wikimedia.org/T283296&quot;&gt;disk 
latency&lt;/a&gt;, IO pressure and network throughput. Even though 
Toolforge Kubernetes scale is small compared to other Kubernetes deployments out there, I found it very 
interesting to see other’s approaches to the same set of challenges.&lt;/p&gt;

&lt;p&gt;I learned how most Kubernetes components and apps can overload the api-server. Because even the api-server 
talks to itself. Simple things like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kubectl&lt;/code&gt; may have a completely different impact on the API depending on 
usage, for example when listing the whole list of objects (very expensive) vs a single object.&lt;/p&gt;

&lt;p&gt;The conclusion was to try avoiding hitting the api-server with LIST calls, and use ResourceVersion which 
avoids full-dumps from etcd (which, by the way, is the default when using bare &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kubectl get&lt;/code&gt; calls). I 
already knew some of this, and for example the &lt;em&gt;jobs-framework-emailer&lt;/em&gt; was already making use of this 
&lt;a href=&quot;https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/toolforge/jobs-framework-emailer/+/refs/heads/main/emailer/events.py#428&quot;&gt;ResourceVersion functionality&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There have been a lot of improvements in the performance side of Kubernetes in recent times, or more 
specifically, in how resources are managed and used by the system. I saw a review of resource management from 
the perspective of the container runtime and kubelet, and plans to support fancy things like topology-aware 
scheduling decisions and dynamic resource claims (changing the pod resource claims without 
re-defining/re-starting the pods).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On cluster management, bootstrapping and multi-tenancy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I attended a couple of talks that mentioned kubeadm, and one in particular was from the maintainers 
themselves. This was of interest to me because as of today &lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Deploying&quot;&gt;we use it for 
Toolforge&lt;/a&gt;. They shared all 
the latest developments and improvements, and the plans and roadmap for the future, with a special mention to 
something they called “kubeadm operator”, apparently capable of auto-upgrading the cluster, auto-renewing 
certificates and such.&lt;/p&gt;

&lt;p&gt;I also saw a comparison between the different cluster bootstrappers, which to me confirmed that kubeadm was 
the best, from the point of view of being a well established and well-known workflow, plus having a very 
active contributor base. The kubeadm developers invited the audience to submit feature requests,
&lt;a href=&quot;https://github.com/kubernetes/kubeadm/issues/2866&quot;&gt;so I did&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The different talks confirmed that the basic unit for multi-tenancy in kubernetes is the namespace.  Any 
serious multi-tenant usage should leverage this. There were some ongoing conversations, in official sessions 
and in the hallway, about the right tool to implement K8s-whitin-K8s, and &lt;a href=&quot;https://www.vcluster.com/&quot;&gt;vcluster&lt;/a&gt;
was mentioned enough times for me to be convinced it was the right candidate. This was despite of my impression
that multiclusters / multicloud are regarded as hard topics in the general community. I definitely would like to play
with it sometime down the road.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On networking&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I attended a couple of basic sessions that served really well to understand how Kubernetes instrumented the 
network to achieve its goal. The conference program had sessions to cover topics ranging from network 
debugging recommendations, CNI implementations, to IPv6 support. Also, one of the keynote sessions had a 
reference to how kube-proxy is not able to perform NAT for SIP connections, which is interesting because I 
believe Netfilter Conntrack could do it if properly configured. One of the conclusions on the CNI front was 
that Calico has a massive community adoption (in Netfilter mode), which is reassuring, especially considering 
it is &lt;a href=&quot;https://gitlab.wikimedia.org/repos/cloud/toolforge/calico&quot;&gt;the one we use for Toolforge Kubernetes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://ral-arturo.org/assets/20230427-kubecon-slide.png&quot; alt=&quot;Slide&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On jobs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I attended a couple of talks that were related to HPC/grid-like usages of Kubernetes. I was truly impressed 
by some folks out there who were using Kubernetes Jobs on massive scales, such as to train machine learning 
models and other fancy AI projects.&lt;/p&gt;

&lt;p&gt;It is acknowledged in the community that the early implementation of things like Jobs and CronJobs had some 
limitations that are now gone, or at least greatly improved. Some new functionalities have been added as 
well. Indexed Jobs, for example, enables each Job to have a number (index) and process a chunk of a larger 
batch of data based on that index. It would allow for full grid-like features like sequential (or again, 
indexed) processing, coordination between Job and more graceful Job restarts. My first reaction was: Is that 
something we would like to enable in &lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework&quot;&gt;Toolforge Jobs Framework&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On policy and security&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A surprisingly good amount of sessions covered interesting topics related to policy and security. It was nice 
to learn two realities:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;kubernetes is capable of doing pretty much anything security-wise and create 
greatly secured environments.&lt;/li&gt;
  &lt;li&gt;it does not by default. The defaults are not security-strict on purpose.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It kind of made sense to me: Kubernetes was used for a wide range of use cases, and developers didn’t know 
beforehand to which particular setup they should accommodate the default security levels.&lt;/p&gt;

&lt;p&gt;One session in particular covered the most basic security features that should be enabled for any Kubernetes 
system that would get exposed to random end users. In my opinion, the Toolforge Kubernetes setup was already 
doing a good job in that regard. To my joy, some sessions referred to the Pod Security Admission mechanism, 
which is one of the key security features we’re about to adopt (when migrating away from
&lt;a href=&quot;https://phabricator.wikimedia.org/T279110&quot;&gt;Pod Security Policy&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;I also learned a bit more about Secret resources, their current implementation and how to leverage a 
combo of CSI and RBAC for a more secure usage of external secrets.&lt;/p&gt;

&lt;p&gt;Finally, one of the major takeaways from the conference was learning about &lt;a href=&quot;https://kyverno.io/&quot;&gt;kyverno&lt;/a&gt; and
&lt;a href=&quot;https://github.com/Shopify/kubeaudit&quot;&gt;kubeaudit&lt;/a&gt;. I was previously aware of the
&lt;a href=&quot;https://open-policy-agent.github.io/gatekeeper/website/docs/&quot;&gt;OPA Gatekeeper&lt;/a&gt;. From the several demos I saw, it
was to me that kyverno should help us make Toolforge Kubernetes more sustainable by replacing all of our
&lt;a href=&quot;https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Custom_components&quot;&gt;custom admission controllers&lt;/a&gt;
with it. I already opened a ticket to &lt;a href=&quot;https://phabricator.wikimedia.org/T335131&quot;&gt;track this idea&lt;/a&gt;, which I’ll be
proposing to my team soon.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final notes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In general, I believe I learned many things, and perhaps even more importantly I re-learned some stuff I had 
forgotten because of lack of daily exposure. I’m really happy that the cloud native way of thinking was 
reinforced in me, which I still need because most of my muscle memory to approach systems architecture and 
engineering is from the old pre-cloud days.&lt;/p&gt;

&lt;p&gt;List of sessions I attended on the first day:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Keynote&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://kccnceu2023.sched.com/event/1HyVB&quot;&gt;Node Resource Management: The Big Picture - Sascha Grunert &amp;amp; Swati Sehgal, Red Hat; Alexander Kanevskiy, Intel; Evan Lezar, NVIDIA; David Porter, Google.&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HyYu&quot;&gt;How We Securely Scaled Multi-Tenancy with VCluster, Crossplane, and Argo CD - Ilia Medvedev &amp;amp; Kostis Kapelonis, Codefresh.&lt;/a&gt; &lt;em&gt;(Couldn’t really attend, room full)&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HySr&quot;&gt;Flux Beyond Git: Harnessing the Power of OCI - Stefan Prodan &amp;amp; Hidde Beydals, Weaveworks.&lt;/a&gt; &lt;em&gt;(Couldn’t really attend, room full)&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HyWa&quot;&gt;Tutorial: Measure Twice, Cut Once: Dive Into Network Foundations the Right Way! - Marino Wijay &amp;amp; Jason Skrzypek, Solo.io&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HySi&quot;&gt;Argo CD Core - A Pure GitOps Agent for Kubernetes - Alexander Matyushentsev, Akuity &amp;amp; Leonardo Luz Almeida, Intuit&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1Iki0&quot;&gt;Kubeadm Deep Dive - Rohit Anand, NEC &amp;amp; Paco Xu, Dao&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1Hyd1&quot;&gt;Cloud Operate Multi-Tenancy Service Mesh with ArgoCD in Production - Lin Sun, Solo.io &amp;amp; Faseela K, Ericsson Software Technology&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;List of sessions I attended on the second day:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Keynote&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1Hycg&quot;&gt;Setting up Etcd with Kubernetes to Host Clusters with Thousands of Nodes - Marcel Zięba, Isovalent &amp;amp; Laurent Bernaille, Datadog&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HybZ&quot;&gt;Container Is the New VM: The Paradigm Change No One Explained to You - Marga  Manterola, Isovalent &amp;amp; Rodrigo Campos Catelin, Microsoft&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HyXe&quot;&gt;Ephemeral Clusters as a Service with ClusterAPI and GitOps - Alessandro Vozza, Solo.io &amp;amp; Joaquin Rodriguez, Microsoft&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HydS&quot;&gt;Automating Configuration and Permissions Testing for GitOps with OPA Conftest - Eve Ben Ezra &amp;amp; Michael Hume, The New York Times&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HyVT&quot;&gt;Across Kubernetes Namespace Boundaries: Your Volumes Can Be Shared Now! - Masaki Kimura &amp;amp; Takafumi Takahashi, Hitachi&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;List of sessions I attended on third day:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Keynote&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1Hyb5&quot;&gt;Prevent Embarrassing Cluster Takeovers with This One Simple Trick! - Daniele de Araujo dos Santos &amp;amp; Shane Lawrence, Shopify&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1M6nq&quot;&gt;Hacking and Defending Kubernetes Clusters: We’ll Do It LIVE!!! - Fabian Kammel &amp;amp; James Cleverley-Prance, ControlPlane&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HyUn&quot;&gt;Painless Multi-Cloud to the Edge Powered by NATS &amp;amp; Kubernetes - Tomasz Pietrek &amp;amp; David Gee, Synadia&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HyWI&quot;&gt;Demystifing IPv6 Kubernetes - Antonio Jose Ojea Garcia, Google &amp;amp; Fernando Gont, Yalo&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HyTa&quot;&gt;Open Policy Agent. (OPA) Intro &amp;amp; Deep Dive - Charlie Egan, Styra, Inc.&lt;/a&gt; &lt;em&gt;(Couldn’t really attend, room full)&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1Hydn&quot;&gt;Practical Challenges with Pod Security Admission - V Körbes &amp;amp; Christian Schlotter, VMware&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HyaG&quot;&gt;Enabling HPC and ML Workloads with the Latest Kubernetes Job Features - Michał Woźniak, Google &amp;amp; Vanessa Sochat, Lawrence Livermore National Laboratory&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sched.co/1HyY5&quot;&gt;Can You Keep a Secret? on Secret Management in Kubernetes - Liav Yona &amp;amp; Gal Cohen, Firefly&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The videos have been &lt;a href=&quot;https://youtube.com/playlist?list=PLj6h78yzYM2PyrvCoOii4rAopBswfz1p7&quot;&gt;published on Youtube&lt;/a&gt;.&lt;/p&gt;

</description>
        <pubDate>Thu, 27 Apr 2023 10:47:00 +0000</pubDate>
        <link>https://ral-arturo.org/2023/04/27/kubecon.html</link>
        <guid isPermaLink="true">https://ral-arturo.org/2023/04/27/kubecon.html</guid>
        
        <category>kubecon</category>
        
        
      </item>
    
  </channel>
</rss>
