<feed xmlns="http://www.w3.org/2005/Atom"><title>Doing the needful</title><id>https://phabricator.wikimedia.org/phame/blog/feed/1/</id><link rel="self" type="application/atom+xml" href="https://phabricator.wikimedia.org/phame/blog/feed/1/" /><updated>2025-07-10T22:51:07+00:00</updated><subtitle>Occasional updates from the #release-engineering-team</subtitle><entry><title>Investigate a PHP segmentation fault</title><link href="/phame/live/1/post/306/investigate_a_php_segmentation_fault/" /><id>https://phabricator.wikimedia.org/phame/post/view/306/</id><author><name>hashar (Antoine Musso)</name></author><published>2023-07-28T12:06:08+00:00</published><updated>2024-03-07T21:36:37+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p><strong>Summary</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Install debugging packages: <tt class="remarkup-monospaced">apt-get -y install php7.4-common-dbgsym php7.4-cli-dbgsym</tt></li>
<li class="remarkup-list-item">curl -o ~/php-gdbinit <a href="https://raw.githubusercontent.com/php/php-src/php-7.4.30/.gdbinit" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://raw.githubusercontent.com/php/php-src/php-7.4.30/.gdbinit</a></li>
<li class="remarkup-list-item">gdb &lt;your php command&gt;</li>
<li class="remarkup-list-item">Enter <tt class="remarkup-monospaced">run</tt> then once the command has failed: <tt class="remarkup-monospaced">bt</tt> and <tt class="remarkup-monospaced">zbacktrace</tt></li>
</ul>

<hr class="remarkup-hr" />

<p>The <a href="/tag/beta-cluster-infrastructure/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_7"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_6" aria-hidden="true"></span>Beta-Cluster-Infrastructure</span></a> is a farm of wikis we use for experimentation and integration testing. It is updated continuously: new code is every ten minutes and the databases every hour by running MediaWiki <tt class="remarkup-monospaced">maintenance/update.php</tt>. The scheduling and running are driven by Jenkins jobs which statuses can be seen on the <a href="https://integration.wikimedia.org/ci/view/Beta/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Beta view</a>:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/jqdpiylajgidxcdfy5mi/PHID-FILE-iixbuea2ghkb2tepmxrn/image.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_0"><img src="https://phab.wmfusercontent.org/file/data/jqdpiylajgidxcdfy5mi/PHID-FILE-iixbuea2ghkb2tepmxrn/image.png" height="192" width="928" loading="lazy" alt="image.png (192×928 px, 33 KB)" /></a></div></p>

<p>On top of that, Jenkins will emit notification messages to IRC as long as one of the update job fails. One of them started failing on July 25th and this is how I was seeing it the alarm (times are for France, UTC+2):</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/m35mgweehganqyt7mbxw/PHID-FILE-l5i7ndbakwmcrjezoijo/image.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_1"><img src="https://phab.wmfusercontent.org/file/data/m35mgweehganqyt7mbxw/PHID-FILE-l5i7ndbakwmcrjezoijo/image.png" height="92" width="941" loading="lazy" alt="image.png (92×941 px, 37 KB)" /></a></div></p>

<p>(wmf-insecte is the Jenkins bot, <em>insecte</em> is french for bug (animals), and the <em>wmf-</em> prefix identifies it as a Wikimedia Foundation robot).</p>

<p>Clicking on the link gives the output of the update script which eventually fails with:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">+ /usr/local/bin/mwscript update.php --wiki=wikifunctionswiki --quick --skip-config-validation
20:31:09 ...wikilambda_zlanguages table already exists.
20:31:09 ...have wlzl_label_primary field in wikilambda_zobject_labels table.
20:31:09 ...have wlzl_return_type field in wikilambda_zobject_labels table.
20:31:09 /usr/local/bin/mwscript: line 27:  1822 Segmentation fault      sudo -u &quot;$MEDIAWIKI_WEB_USER&quot; $PHP &quot;$MEDIAWIKI_DEPLOYMENT_DIR_DIR_USE/multiversion/MWScript.php&quot; &quot;$@&quot;</pre></div>

<p>The important bit is <strong>Segmentation fault</strong> which indicates the program (php) had a fatal fault and it got rightfully killed by the Linux Kernel. Looking at the instance Linux Kernel messages via <tt class="remarkup-monospaced">dmesg -T</tt>:</p>

<div class="remarkup-code-block code-block-counterexample" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code remarkup-counterexample">[Mon Jul 24 23:33:55 2023] php[28392]: segfault at 7ffe374f5db8 ip 00007f8dc59fc807 sp 00007ffe374f5da0 error 6 in libpcre2-8.so.0.7.1[7f8dc59b9000+5d000]
[Mon Jul 24 23:33:55 2023] Code: ff ff 31 ed e9 74 fb ff ff 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 55 48 89 d5 53 44 89 c3 48 81 ec 98 52 00 00 &lt;48&gt; 89 7c 24 18 4c 8b a4 24 d0 52 00 00 48 89 74 24 10 48 89 4c 24
[Mon Jul 24 23:33:55 2023] Core dump to |/usr/lib/systemd/systemd-coredump 28392 33 33 11 1690242166 0 php pipe failed</pre></div>

<p>With those data, I had enough to the most urgent step: file a task (<a href="https://phabricator.wikimedia.org/T342769" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_2"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T342769</span></span></a>) which can be used as an audit trail and reference for the future. It is the <strong>single most important step</strong> I am doing whenever I am debugging an issue, since if I have to stop due to time constraint or lack of technical abilities, others can step in and continue. It also provides an historical record that can be looked up in the future, and indeed this specific problem already got investigated and fully documented a couple years ago. Having a task is the most important thing one must do whenever debugging, it is invaluable.  For PHP segmentation fault, we even have a dedicated project <a href="/tag/php-segfault/" class="phui-tag-view phui-tag-type-shade phui-tag-yellow phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_9"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-tags" data-meta="0_8" aria-hidden="true"></span>php-segfault</span></a></p>

<p>With the task filed, I have continued the investigation. The previous successful build had:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">19:30:18 ...have wlzl_label_primary field in wikilambda_zobject_labels table.
19:30:18 ...have wlzl_return_type field in wikilambda_zobject_labels table.
19:30:18 	❌ Unable to make a page for Z7138: The provided content&#039;s label clashes with Object &#039;Z10138&#039; for the label in &#039;Z1002&#039;.
19:30:18 	❌ Unable to make a page for Z7139: The provided content&#039;s label clashes with Object &#039;Z10139&#039; for the label in &#039;Z1002&#039;.
19:30:18 	❌ Unable to make a page for Z7140: The provided content&#039;s label clashes with Object &#039;Z10140&#039; for the label in &#039;Z1002&#039;.
19:30:18 ...site_stats is populated...done.</pre></div>

<p>The successful build started at 19:20 UTC and the failing one finished at 20:30 UTC which gives us a short time window to investigate. Since the failure seems to happen after updating the WikiLambda MediaWiki extension, I went to inspect the few commits that got merged at that time. I took advantage of Gerrit adding review actions as git notes, notably the exact time a change got submitted and subsequently merged. The process:</p>

<p>Clone the suspect repository:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">git clone https://gerrit.wikimedia.org/r/extensions/WikiLambda
cd WikiLambda</pre></div>

<p>Fetch the Gerrit review notes:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">git fetch origin refs/notes/review:refs/notes/review</pre></div>

<p>The <tt class="remarkup-monospaced">review</tt> notes can be shown below the commit by passing <tt class="remarkup-monospaced">--notes=review</tt> to <tt class="remarkup-monospaced">git log</tt> or <tt class="remarkup-monospaced">git show</tt>, an example for the current HEAD of the repository:</p>

<div class="remarkup-code-block" data-code-lang="console" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="gp">$ git show -q --notes=review</span>
<span class="go">commit c7f8071647a1aeb2cef6b9310ccbf3a87af2755b (HEAD -&gt; master, origin/master, origin/HEAD)</span>
<span class="go">Author: Genoveva Galarza &lt;ggalarzaheredero@wikimedia.org&gt;</span>
<span class="go">Date:   Thu Jul 27 00:34:03 2023 +0200</span>
<span class="go"></span>
<span class="go">    Initialize blank function when redirecting to FunctionEditor from DefaultView</span>
<span class="go">    </span>
<span class="go">    Bug: T342802</span>
<span class="go">    Change-Id: I09d3400db21983ac3176a0bc325dcfe2ddf23238</span>
<span class="go"></span>
<span class="go">Notes (review):</span>
<span class="go">    Verified+1: SonarQube Bot &lt;kharlan+sonarqubebot@wikimedia.org&gt;</span>
<span class="go">    Verified+2: jenkins-bot</span>
<span class="go">    Code-Review+2: Jforrester &lt;jforrester@wikimedia.org&gt;</span>
<span class="go">    Submitted-by: jenkins-bot</span>
<span class="go">    Submitted-at: Wed, 26 Jul 2023 22:47:59 +0000</span>
<span class="go">    Reviewed-on: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/942026</span>
<span class="go">    Project: mediawiki/extensions/WikiLambda</span>
<span class="go">    Branch: refs/heads/master</span></pre></div>

<p>Which shows this change has been approved by Jforrester and entered the repository on Wed, 26 Jul 2023 22:47:59  UTC.  Then to find the commits in that range, I ask <tt class="remarkup-monospaced">git log</tt> to list:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">anything that has a commit date for the day (it is not necessarily correct but in this case it is a good enough approximation)</li>
<li class="remarkup-list-item">from oldest to newest</li>
<li class="remarkup-list-item">sorted by topology order (aka in the order the commit entered the repository rather than based on the commit date)</li>
<li class="remarkup-list-item">show the review notes to get the <tt class="remarkup-monospaced">Submitted-at</tt> field</li>
</ul>

<p>I can then scroll to the commits having a <tt class="remarkup-monospaced">Submitted-at</tt> in the time window of 19:20 UTC - 20:30 UTC. I have amended the below output to remove most of the review notes except for the first commit:</p>

<div class="remarkup-code-block" data-code-lang="console" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="gp">$ git log --oneline --since=2023/07/25 --reverse --notes=review --no-merges --topo-order</span>
<span class="go">&lt;scroll&gt;</span>
<span class="go">653ea81a Handle oldid url param to view a particular revision</span>
<span class="go">Notes (review):</span>
<span class="go">    Verified+1: SonarQube Bot &lt;kharlan+sonarqubebot@wikimedia.org&gt;</span>
<span class="go">    Verified+2: jenkins-bot</span>
<span class="go">    Code-Review+2: Jforrester &lt;jforrester@wikimedia.org&gt;</span>
<span class="go">    Submitted-by: jenkins-bot</span>
<span class="go">    Submitted-at: Tue, 25 Jul 2023 19:26:53 +0000</span>
<span class="go">    Reviewed-on: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941482</span>
<span class="go">    Project: mediawiki/extensions/WikiLambda</span>
<span class="go">    Branch: refs/heads/master</span>
<span class="go"></span>
<span class="go">fe4b0446 AUTHORS: Update for July 2023</span>
<span class="go">Notes (review):</span>
<span class="go">    Submitted-at: Tue, 25 Jul 2023 19:49:43 +0000</span>
<span class="go">    Reviewed-on: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941507</span>
<span class="go"></span>
<span class="go">73fcb4a4 Update function-schemata sub-module to HEAD (1c01f22)</span>
<span class="go">Notes (review):</span>
<span class="go">    Submitted-at: Tue, 25 Jul 2023 19:59:23 +0000</span>
<span class="go">    Reviewed-on: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941384</span>
<span class="go"></span>
<span class="go">598f5fcc PageRenderingHandler: Don&#039;t make &#039;read&#039; selected if we&#039;re on the edit tab</span>
<span class="go">Notes (review):</span>
<span class="go">    Submitted-at: Tue, 25 Jul 2023 20:16:05 +0000</span>
<span class="go">    Reviewed-on: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941456</span></pre></div>

<p>Or in a Phabricator task and human friendly way:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">19:26 <a href="https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941482" class="remarkup-link remarkup-link-ext" rel="noreferrer">Handle oldid url param to view a particular revision</a></li>
<li class="remarkup-list-item">19:43 <a href="https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941507" class="remarkup-link remarkup-link-ext" rel="noreferrer">AUTHORS: Update for July 2023</a></li>
<li class="remarkup-list-item">19:59 <a href="https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941384" class="remarkup-link remarkup-link-ext" rel="noreferrer">Update function-schemata sub-module to HEAD (1c01f22)</a> for <a href="https://phabricator.wikimedia.org/T335583" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_3"><span class="phui-tag-core phui-tag-color-object">T335583</span></a></li>
<li class="remarkup-list-item">20:16 <a href="https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941456" class="remarkup-link remarkup-link-ext" rel="noreferrer">PageRenderingHandler: Don&#039;t make &#039;read&#039; selected if we&#039;re on the edit tab</a></li>
</ul>

<p>The <em>Update function-schemata sub-module to HEAD (1c01f22)</em> has a short log of changes it introduces:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">New changes:</li>
<li class="remarkup-list-item">abc4aa6 definitions: Add Z1908/bug-bugi and Z1909/bug-lant ZNaturalLanguages</li>
<li class="remarkup-list-item">0f1941e definitions: Add Z1910/piu ZNaturalLanguage</li>
<li class="remarkup-list-item">1c01f22 definitions: Re-label all objects to drop the &#039;Z&#039; per Amin</li>
</ul>

<p>Since the update script fail on WikiLambda I have reached out to its developers so they can investigate their code and maybe find what can trigger the issue.</p>

<p>On the PHP side we need a trace. That can be done by configuring the Linux Kernel to take a dump of the program before terminating it and having it stored on disk, it did not quite work due to a configuration issue on the machine and in the first attempt we forgot to run the command by asking bash to allow the dump generation (<tt class="remarkup-monospaced">ulimit -c unlimited</tt>). From a past debugging session, I went to run the command directly under the GNU debugger: <tt class="remarkup-monospaced">gdb</tt>.</p>

<p>There are a few preliminary step to debug the PHP program, at first one needs to install the debug symbols which lets the debugger map the binary entries to lines of the original source code. Since error mentions libpcre2 I also installed its debugging symbols:</p>

<div class="remarkup-code-block" data-code-lang="console" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="gp">$ sudo apt-get -y install php7.4-common-dbgsym php7.4-cli-dbgsym libpcre2-dbg</span></pre></div>

<p>I then used <tt class="remarkup-monospaced">gdb</tt> to start a debugging session:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">sudo  -s -u www-data gdb --args /usr/bin/php /srv/mediawiki-staging/multiversion/MWScript.php update.php --wiki=wikifunctionswiki --quick --skip-config-validation
gdb&gt;</pre></div>

<p>Then ask <tt class="remarkup-monospaced">gdb</tt> to start the program by entering in the input prompt: <tt class="remarkup-monospaced">run</tt> <kbd title="Enter">⏎</kbd>. After several minutes, it caught the segmentation fault:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">gdb&gt; run
&lt;output&gt;
&lt;output freeze for several minutes while update.php is doing something&gt;

Thread 1 &quot;php&quot; received signal SIGSEGV, Segmentation fault.
0x00007ffff789e807 in pcre2_match_8 (code=0x555555ce1fb0, 
    subject=subject@entry=0x7fffcb410a98 &quot;Z1002&quot;, length=length@entry=5, 
    start_offset=start_offset@entry=0, options=0, 
    match_data=match_data@entry=0x555555b023e0, mcontext=0x555555ad5870)
    at src/pcre2_match.c:6001
6001	src/pcre2_match.c: No such file or directory.</pre></div>

<p><em>I could not find a debugging symbol package containing <tt class="remarkup-monospaced">src/pcre2_match.c</tt> but that was not needed afterall</em>.</p>

<p>To retrieve the stacktrace enter to the gdb prompt <tt class="remarkup-monospaced">bt</tt> <kbd title="Enter">⏎</kbd>:</p>

<div class="remarkup-code-block" data-code-lang="cpp" data-sigil="remarkup-code-block"><pre class="remarkup-code" style=" max-height: 24em; overflow: auto;"><span></span><span class="n">gdb</span><span class="o">&gt;</span> <span class="n">bt</span>
<span class="cp">#0  0x00007ffff789e807 in pcre2_match_8 (code=0x555555ce1fb0, </span>
    <span class="n">subject</span><span class="o">=</span><span class="n">subject</span><span class="err">@</span><span class="n">entry</span><span class="o">=</span><span class="mh">0x7fffcb410a98</span> <span class="s">&quot;Z1002&quot;</span><span class="p">,</span> <span class="n">length</span><span class="o">=</span><span class="n">length</span><span class="err">@</span><span class="n">entry</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> 
    <span class="n">start_offset</span><span class="o">=</span><span class="n">start_offset</span><span class="err">@</span><span class="n">entry</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">options</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> 
    <span class="n">match_data</span><span class="o">=</span><span class="n">match_data</span><span class="err">@</span><span class="n">entry</span><span class="o">=</span><span class="mh">0x555555b023e0</span><span class="p">,</span> <span class="n">mcontext</span><span class="o">=</span><span class="mh">0x555555ad5870</span><span class="p">)</span>
    <span class="n">at</span> <span class="n">src</span><span class="o">/</span><span class="n">pcre2_match</span><span class="p">.</span><span class="nl">c</span><span class="p">:</span><span class="mi">6001</span>
<span class="cp">#1  0x00005555556a3b24 in php_pcre_match_impl (pce=0x7fffe83685a0, </span>
    <span class="n">subject_str</span><span class="o">=</span><span class="mh">0x7fffcb410a80</span><span class="p">,</span> <span class="n">return_value</span><span class="o">=</span><span class="mh">0x7fffcb44b220</span><span class="p">,</span> <span class="n">subpats</span><span class="o">=</span><span class="mh">0x0</span><span class="p">,</span> <span class="n">global</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> 
    <span class="n">use_flags</span><span class="o">=&lt;</span><span class="n">optimized</span> <span class="n">out</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">flags</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">start_offset</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="n">at</span> <span class="p">.</span><span class="o">/</span><span class="n">ext</span><span class="o">/</span><span class="n">pcre</span><span class="o">/</span><span class="n">php_pcre</span><span class="p">.</span><span class="nl">c</span><span class="p">:</span><span class="mi">1300</span>
<span class="cp">#2  0x00005555556a493b in php_do_pcre_match (execute_data=0x7fffcb44b710, </span>
    <span class="n">return_value</span><span class="o">=</span><span class="mh">0x7fffcb44b220</span><span class="p">,</span> <span class="n">global</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="n">at</span> <span class="p">.</span><span class="o">/</span><span class="n">ext</span><span class="o">/</span><span class="n">pcre</span><span class="o">/</span><span class="n">php_pcre</span><span class="p">.</span><span class="nl">c</span><span class="p">:</span><span class="mi">1149</span>
<span class="cp">#3  0x00007ffff216a3cb in tideways_xhprof_execute_internal ()</span>
   <span class="n">from</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">php</span><span class="o">/</span><span class="mi">20190902</span><span class="o">/</span><span class="n">tideways_xhprof</span><span class="p">.</span><span class="n">so</span>
<span class="cp">#4  0x000055555587ddee in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()</span>
    <span class="n">at</span> <span class="p">.</span><span class="o">/</span><span class="n">Zend</span><span class="o">/</span><span class="n">zend_vm_execute</span><span class="p">.</span><span class="nl">h</span><span class="p">:</span><span class="mi">1732</span>
<span class="cp">#5  execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539</span>
<span class="cp">#6  0x00007ffff2169c89 in tideways_xhprof_execute_ex ()</span>
   <span class="n">from</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">php</span><span class="o">/</span><span class="mi">20190902</span><span class="o">/</span><span class="n">tideways_xhprof</span><span class="p">.</span><span class="n">so</span>
<span class="cp">#7  0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()</span>
    <span class="n">at</span> <span class="p">.</span><span class="o">/</span><span class="n">Zend</span><span class="o">/</span><span class="n">zend_vm_execute</span><span class="p">.</span><span class="nl">h</span><span class="p">:</span><span class="mi">1714</span>
<span class="cp">#8  execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539</span>
<span class="cp">#9  0x00007ffff2169c89 in tideways_xhprof_execute_ex ()</span>
   <span class="n">from</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">php</span><span class="o">/</span><span class="mi">20190902</span><span class="o">/</span><span class="n">tideways_xhprof</span><span class="p">.</span><span class="n">so</span>
<span class="cp">#10 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()</span>
    <span class="n">at</span> <span class="p">.</span><span class="o">/</span><span class="n">Zend</span><span class="o">/</span><span class="n">zend_vm_execute</span><span class="p">.</span><span class="nl">h</span><span class="p">:</span><span class="mi">1714</span>
<span class="cp">#11 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539</span>
<span class="cp">#12 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()</span>
   <span class="n">from</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">php</span><span class="o">/</span><span class="mi">20190902</span><span class="o">/</span><span class="n">tideways_xhprof</span><span class="p">.</span><span class="n">so</span>
<span class="cp">#13 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()</span>
    <span class="n">at</span> <span class="p">.</span><span class="o">/</span><span class="n">Zend</span><span class="o">/</span><span class="n">zend_vm_execute</span><span class="p">.</span><span class="nl">h</span><span class="p">:</span><span class="mi">1714</span>
<span class="cp">#14 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539</span>
<span class="cp">#15 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()</span>
   <span class="n">from</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">php</span><span class="o">/</span><span class="mi">20190902</span><span class="o">/</span><span class="n">tideways_xhprof</span><span class="p">.</span><span class="n">so</span>
<span class="cp">#16 0x000055555587c63c in ZEND_DO_FCALL_SPEC_RETVAL_UNUSED_HANDLER ()</span>
    <span class="n">at</span> <span class="p">.</span><span class="o">/</span><span class="n">Zend</span><span class="o">/</span><span class="n">zend_vm_execute</span><span class="p">.</span><span class="nl">h</span><span class="p">:</span><span class="mi">1602</span>
<span class="cp">#17 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53535</span>
<span class="cp">#18 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()</span>
   <span class="n">from</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">php</span><span class="o">/</span><span class="mi">20190902</span><span class="o">/</span><span class="n">tideways_xhprof</span><span class="p">.</span><span class="n">so</span>
<span class="cp">#19 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()</span>
    <span class="n">at</span> <span class="p">.</span><span class="o">/</span><span class="n">Zend</span><span class="o">/</span><span class="n">zend_vm_execute</span><span class="p">.</span><span class="nl">h</span><span class="p">:</span><span class="mi">1714</span>
<span class="cp">#20 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539</span>
<span class="cp">#21 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()</span>
   <span class="n">from</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">php</span><span class="o">/</span><span class="mi">20190902</span><span class="o">/</span><span class="n">tideways_xhprof</span><span class="p">.</span><span class="n">so</span>
<span class="cp">#22 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()</span>
    <span class="n">at</span> <span class="p">.</span><span class="o">/</span><span class="n">Zend</span><span class="o">/</span><span class="n">zend_vm_execute</span><span class="p">.</span><span class="nl">h</span><span class="p">:</span><span class="mi">1714</span>
<span class="cp">#23 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539</span>
<span class="cp">#24 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()</span>
   <span class="n">from</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">php</span><span class="o">/</span><span class="mi">20190902</span><span class="o">/</span><span class="n">tideways_xhprof</span><span class="p">.</span><span class="n">so</span>
<span class="cp">#25 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()</span>
 <span class="n">at</span> <span class="p">.</span><span class="o">/</span><span class="n">Zend</span><span class="o">/</span><span class="n">zend_vm_execute</span><span class="p">.</span><span class="n">Quit</span>
<span class="n">CONTINUING</span></pre></div>

<p>Which is not that helpful. Thankfully the PHP project provides a set of macro for <tt class="remarkup-monospaced">gdb</tt> which lets one map the low level C code to the PHP code that was expected. It is provided in their source repository <tt class="remarkup-monospaced">/.gdbinit</tt> and one should use the version from the PHP branch being debugged, since we use php 7.4 I went to use the version from the latest 7.4 series (7.4.30 at the time of this writing): <a href="https://raw.githubusercontent.com/php/php-src/php-7.4.30/.gdbinit" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://raw.githubusercontent.com/php/php-src/php-7.4.30/.gdbinit</a></p>

<p>Download the file to your home directory (ex: <tt class="remarkup-monospaced">/home/hashar/gdbinit</tt>) and ask <tt class="remarkup-monospaced">gdb</tt> to import it with, for example, <tt class="remarkup-monospaced">source /home/hashar/gdbinit</tt> <kbd title="Enter">⏎</kbd>:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">(gdb) source /home/hashar/gdbinit</pre></div>

<p>This provides a few new commands to show PHP Zend values and to generate a very helpfull stacktrace (<tt class="remarkup-monospaced">zbacktrace</tt>):</p>

<div class="remarkup-code-block" data-code-lang="cpp" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span><span class="p">(</span><span class="n">gdb</span><span class="p">)</span> <span class="n">zbacktrace</span>
<span class="p">[</span><span class="mh">0x7fffcb44b710</span><span class="p">]</span> <span class="n">preg_match</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\7</span><span class="s">^Z[1-9]\d*$</span><span class="se">\7</span><span class="s">u&quot;</span><span class="p">,</span> <span class="s">&quot;Z1002&quot;</span><span class="p">)</span> <span class="p">[</span><span class="n">internal</span> <span class="n">function</span><span class="p">]</span>
<span class="p">[</span><span class="mh">0x7fffcb44aba0</span><span class="p">]</span> <span class="n">Opis</span><span class="err">\</span><span class="n">JsonSchema</span><span class="err">\</span><span class="n">Validator</span><span class="o">-&gt;</span><span class="n">validateString</span><span class="p">(</span><span class="n">reference</span><span class="p">,</span> <span class="n">reference</span><span class="p">,</span> <span class="n">array</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="mh">0x7fffcb44ac10</span><span class="p">],</span> <span class="n">array</span><span class="p">(</span><span class="mi">7</span><span class="p">)[</span><span class="mh">0x7fffcb44ac20</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44ac30</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44ac40</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44ac50</span><span class="p">])</span> <span class="o">/</span><span class="n">srv</span><span class="o">/</span><span class="n">mediawiki</span><span class="o">-</span><span class="n">staging</span><span class="o">/</span><span class="n">php</span><span class="o">-</span><span class="n">master</span><span class="o">/</span><span class="n">vendor</span><span class="o">/</span><span class="n">opis</span><span class="o">/</span><span class="n">json</span><span class="o">-</span><span class="n">schema</span><span class="o">/</span><span class="n">src</span><span class="o">/</span><span class="n">Validator</span><span class="p">.</span><span class="nl">php</span><span class="p">:</span><span class="mi">1219</span> 
<span class="p">[</span><span class="mh">0x7fffcb44a760</span><span class="p">]</span> <span class="n">Opis</span><span class="err">\</span><span class="n">JsonSchema</span><span class="err">\</span><span class="n">Validator</span><span class="o">-&gt;</span><span class="n">validateProperties</span><span class="p">(</span><span class="n">reference</span><span class="p">,</span> <span class="n">reference</span><span class="p">,</span> <span class="n">array</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="mh">0x7fffcb44a7d0</span><span class="p">],</span> <span class="n">array</span><span class="p">(</span><span class="mi">7</span><span class="p">)[</span><span class="mh">0x7fffcb44a7e0</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44a7f0</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44a800</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44a810</span><span class="p">],</span> <span class="nb">NULL</span><span class="p">)</span> <span class="o">/</span><span class="n">srv</span><span class="o">/</span><span class="n">mediawiki</span><span class="o">-</span><span class="n">staging</span><span class="o">/</span><span class="n">php</span><span class="o">-</span><span class="n">master</span><span class="o">/</span><span class="n">vendor</span><span class="o">/</span><span class="n">opis</span><span class="o">/</span><span class="n">json</span><span class="o">-</span><span class="n">schema</span><span class="o">/</span><span class="n">src</span><span class="o">/</span><span class="n">Validator</span><span class="p">.</span><span class="nl">php</span><span class="p">:</span><span class="mi">943</span> 
<span class="p">[</span><span class="mh">0x7fffcb44a4c0</span><span class="p">]</span> <span class="n">Opis</span><span class="err">\</span><span class="n">JsonSchema</span><span class="err">\</span><span class="n">Validator</span><span class="o">-&gt;</span><span class="n">validateKeywords</span><span class="p">(</span><span class="n">reference</span><span class="p">,</span> <span class="n">reference</span><span class="p">,</span> <span class="n">array</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="mh">0x7fffcb44a530</span><span class="p">],</span> <span class="n">array</span><span class="p">(</span><span class="mi">7</span><span class="p">)[</span><span class="mh">0x7fffcb44a540</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44a550</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44a560</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44a570</span><span class="p">])</span> <span class="o">/</span><span class="n">srv</span><span class="o">/</span><span class="n">mediawiki</span><span class="o">-</span><span class="n">staging</span><span class="o">/</span><span class="n">php</span><span class="o">-</span><span class="n">master</span><span class="o">/</span><span class="n">vendor</span><span class="o">/</span><span class="n">opis</span><span class="o">/</span><span class="n">json</span><span class="o">-</span><span class="n">schema</span><span class="o">/</span><span class="n">src</span><span class="o">/</span><span class="n">Validator</span><span class="p">.</span><span class="nl">php</span><span class="p">:</span><span class="mi">519</span> 
<span class="p">[</span><span class="mh">0x7fffcb44a310</span><span class="p">]</span> <span class="n">Opis</span><span class="err">\</span><span class="n">JsonSchema</span><span class="err">\</span><span class="n">Validator</span><span class="o">-&gt;</span><span class="n">validateSchema</span><span class="p">(</span><span class="n">reference</span><span class="p">,</span> <span class="n">reference</span><span class="p">,</span> <span class="n">array</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="mh">0x7fffcb44a380</span><span class="p">],</span> <span class="n">array</span><span class="p">(</span><span class="mi">7</span><span class="p">)[</span><span class="mh">0x7fffcb44a390</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44a3a0</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44a3b0</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44a3c0</span><span class="p">])</span> <span class="o">/</span><span class="n">srv</span><span class="o">/</span><span class="n">mediawiki</span><span class="o">-</span><span class="n">staging</span><span class="o">/</span><span class="n">php</span><span class="o">-</span><span class="n">master</span><span class="o">/</span><span class="n">vendor</span><span class="o">/</span><span class="n">opis</span><span class="o">/</span><span class="n">json</span><span class="o">-</span><span class="n">schema</span><span class="o">/</span><span class="n">src</span><span class="o">/</span><span class="n">Validator</span><span class="p">.</span><span class="nl">php</span><span class="p">:</span><span class="mi">332</span> 
<span class="p">[</span><span class="mh">0x7fffcb449350</span><span class="p">]</span> <span class="n">Opis</span><span class="err">\</span><span class="n">JsonSchema</span><span class="err">\</span><span class="n">Validator</span><span class="o">-&gt;</span><span class="n">validateConditionals</span><span class="p">(</span><span class="n">reference</span><span class="p">,</span> <span class="n">reference</span><span class="p">,</span> <span class="n">array</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="mh">0x7fffcb4493c0</span><span class="p">],</span> <span class="n">array</span><span class="p">(</span><span class="mi">7</span><span class="p">)[</span><span class="mh">0x7fffcb4493d0</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb4493e0</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb4493f0</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb449400</span><span class="p">])</span> <span class="o">/</span><span class="n">srv</span><span class="o">/</span><span class="n">mediawiki</span><span class="o">-</span><span class="n">staging</span><span class="o">/</span><span class="n">php</span><span class="o">-</span><span class="n">master</span><span class="o">/</span><span class="n">vendor</span><span class="o">/</span><span class="n">opis</span><span class="o">/</span><span class="n">json</span><span class="o">-</span><span class="n">schema</span><span class="o">/</span><span class="n">src</span><span class="o">/</span><span class="n">Validator</span><span class="p">.</span><span class="nl">php</span><span class="p">:</span><span class="mi">703</span> 
<span class="p">[</span><span class="mh">0x7fffcb4490b0</span><span class="p">]</span> <span class="n">Opis</span><span class="err">\</span><span class="n">JsonSchema</span><span class="err">\</span><span class="n">Validator</span><span class="o">-&gt;</span><span class="n">validateKeywords</span><span class="p">(</span><span class="n">reference</span><span class="p">,</span> <span class="n">reference</span><span class="p">,</span> <span class="n">array</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="mh">0x7fffcb449120</span><span class="p">],</span> <span class="n">array</span><span class="p">(</span><span class="mi">7</span><span class="p">)[</span><span class="mh">0x7fffcb449130</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb449140</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb449150</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb449160</span><span class="p">])</span> <span class="o">/</span><span class="n">srv</span><span class="o">/</span><span class="n">mediawiki</span><span class="o">-</span><span class="n">staging</span><span class="o">/</span><span class="n">php</span><span class="o">-</span><span class="n">master</span><span class="o">/</span><span class="n">vendor</span><span class="o">/</span><span class="n">opis</span><span class="o">/</span><span class="n">json</span><span class="o">-</span><span class="n">schema</span><span class="o">/</span><span class="n">src</span><span class="o">/</span><span class="n">Validator</span><span class="p">.</span><span class="nl">php</span><span class="p">:</span><span class="mi">523</span> 
<span class="p">[</span><span class="mh">0x7fffcb448f00</span><span class="p">]</span> <span class="n">Opis</span><span class="err">\</span><span class="n">JsonSchema</span><span class="err">\</span><span class="n">Validator</span><span class="o">-&gt;</span><span class="n">validateSchema</span><span class="p">(</span><span class="n">reference</span><span class="p">,</span> <span class="n">reference</span><span class="p">,</span> <span class="n">array</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="mh">0x7fffcb448f70</span><span class="p">],</span> <span class="n">array</span><span class="p">(</span><span class="mi">7</span><span class="p">)[</span><span class="mh">0x7fffcb448f80</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb448f90</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb448fa0</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb448fb0</span><span class="p">])</span> <span class="o">/</span><span class="n">srv</span><span class="o">/</span><span class="n">mediawiki</span><span class="o">-</span><span class="n">staging</span><span class="o">/</span><span class="n">php</span><span class="o">-</span><span class="n">master</span><span class="o">/</span><span class="n">vendor</span><span class="o">/</span><span class="n">opis</span><span class="o">/</span><span class="n">json</span><span class="o">-</span><span class="n">schema</span><span class="o">/</span><span class="n">src</span><span class="o">/</span><span class="n">Validator</span><span class="p">.</span><span class="nl">php</span><span class="p">:</span><span class="mi">332</span> 
<span class="o">&lt;</span><span class="n">loop</span><span class="o">&gt;</span></pre></div>

<p>The stacktrace shows the code entered an infinite loop while validating a Json schema up to a point it is being stopped.</p>

<p>The arguments can be further inspected by using <tt class="remarkup-monospaced">printz</tt> and giving it as argument an object reference. For the line:</p>

<div class="remarkup-code-block" data-code-lang="cpp" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span><span class="n">For</span> <span class="p">[</span><span class="mh">0x7fffcb44aba0</span><span class="p">]</span> <span class="n">Opis</span><span class="err">\</span><span class="n">JsonSchema</span><span class="err">\</span><span class="n">Validator</span><span class="o">-&gt;</span><span class="n">validateString</span><span class="p">(</span><span class="n">reference</span><span class="p">,</span> <span class="n">reference</span><span class="p">,</span> <span class="n">array</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="mh">0x7fffcb44ac10</span><span class="p">],</span> <span class="n">array</span><span class="p">(</span><span class="mi">7</span><span class="p">)[</span><span class="mh">0x7fffcb44ac20</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44ac30</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44ac40</span><span class="p">],</span> <span class="n">object</span><span class="p">[</span><span class="mh">0x7fffcb44ac50</span><span class="p">])</span> <span class="o">/</span><span class="n">srv</span><span class="o">/</span><span class="n">mediawiki</span><span class="o">-</span><span class="n">staging</span><span class="o">/</span><span class="n">php</span><span class="o">-</span><span class="n">master</span><span class="o">/</span><span class="n">vendor</span><span class="o">/</span><span class="n">opis</span><span class="o">/</span><span class="n">json</span><span class="o">-</span><span class="n">schema</span><span class="o">/</span><span class="n">src</span><span class="o">/</span><span class="n">Validator</span><span class="p">.</span><span class="nl">php</span><span class="p">:</span><span class="mi">1219</span></pre></div>

<div class="remarkup-code-block" data-code-lang="cpp" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span><span class="p">(</span><span class="n">gdb</span><span class="p">)</span> <span class="n">printzv</span> <span class="mh">0x7fffcb44ac10</span>
<span class="p">[</span><span class="mh">0x7fffcb44ac10</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> <span class="nl">array</span><span class="p">:</span>     <span class="n">Hash</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="mh">0x5555559d7f00</span><span class="p">]</span><span class="o">:</span> <span class="p">{</span>
<span class="p">}</span></pre></div>

<div class="remarkup-code-block" data-code-lang="cpp" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span><span class="p">(</span><span class="n">gdb</span><span class="p">)</span> <span class="n">printzv</span> <span class="mh">0x7fffcb44ac20</span>
<span class="p">[</span><span class="mh">0x7fffcb44ac20</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">21</span><span class="p">)</span> <span class="nl">array</span><span class="p">:</span>     <span class="n">Packed</span><span class="p">(</span><span class="mi">7</span><span class="p">)[</span><span class="mh">0x7fffcb486118</span><span class="p">]</span><span class="o">:</span> <span class="p">{</span>
      <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="mi">0</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb445748</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">17</span><span class="p">)</span> <span class="nl">string</span><span class="p">:</span> <span class="n">Z2K2</span>
      <span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="mi">1</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb445768</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">18</span><span class="p">)</span> <span class="nl">string</span><span class="p">:</span> <span class="n">Z4K2</span>
      <span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="mi">2</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb445788</span><span class="p">]</span> <span class="kt">long</span><span class="o">:</span> <span class="mi">1</span>
      <span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="mi">3</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb4457a8</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">15</span><span class="p">)</span> <span class="nl">string</span><span class="p">:</span> <span class="n">Z3K3</span>
      <span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="mi">4</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb4457c8</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span> <span class="nl">string</span><span class="p">:</span> <span class="n">Z12K1</span>
      <span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="mi">5</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb4457e8</span><span class="p">]</span> <span class="kt">long</span><span class="o">:</span> <span class="mi">1</span>
      <span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="mi">6</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb445808</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">6</span><span class="p">)</span> <span class="nl">string</span><span class="p">:</span> <span class="n">Z11K1</span>
<span class="p">}</span></pre></div>

<div class="remarkup-code-block" data-code-lang="cpp" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span><span class="p">(</span><span class="n">gdb</span><span class="p">)</span> <span class="n">printzv</span> <span class="mh">0x7fffcb44ac30</span>
<span class="p">[</span><span class="mh">0x7fffcb44ac30</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">22</span><span class="p">)</span> <span class="n">object</span><span class="p">(</span><span class="n">Opis</span><span class="err">\</span><span class="n">JsonSchema</span><span class="err">\</span><span class="n">Schema</span><span class="p">)</span> <span class="err">#</span><span class="mi">485450</span> <span class="p">{</span>
<span class="n">id</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb40f508</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> <span class="nl">string</span><span class="p">:</span> <span class="o">/</span><span class="n">Z6</span><span class="err">#</span>
<span class="n">draft</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb40f518</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="nl">string</span><span class="p">:</span> <span class="mo">07</span>
<span class="n">internal</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb40f528</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="nl">reference</span><span class="p">:</span> <span class="p">[</span><span class="mh">0x7fffcb6704e8</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="nl">array</span><span class="p">:</span>     <span class="n">Hash</span><span class="p">(</span><span class="mi">1</span><span class="p">)[</span><span class="mh">0x7fffcb4110e0</span><span class="p">]</span><span class="o">:</span> <span class="p">{</span>
      <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="s">&quot;/Z6#&quot;</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb71d280</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="n">object</span><span class="p">(</span><span class="n">stdClass</span><span class="p">)</span> <span class="err">#</span><span class="mi">480576</span>
<span class="p">}</span></pre></div>

<div class="remarkup-code-block" data-code-lang="cpp" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span><span class="p">(</span><span class="n">gdb</span><span class="p">)</span> <span class="n">printzv</span> <span class="mh">0x7fffcb44ac40</span>
<span class="p">[</span><span class="mh">0x7fffcb44ac40</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span> <span class="n">object</span><span class="p">(</span><span class="n">stdClass</span><span class="p">)</span> <span class="err">#</span><span class="mi">483827</span>
<span class="n">Properties</span>     <span class="n">Hash</span><span class="p">(</span><span class="mi">1</span><span class="p">)[</span><span class="mh">0x7fffcb6aa2a0</span><span class="p">]</span><span class="o">:</span> <span class="p">{</span>
      <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="s">&quot;pattern&quot;</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb67e3c0</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="nl">string</span><span class="p">:</span> <span class="o">^</span><span class="n">Z</span><span class="p">[</span><span class="mi">1-9</span><span class="p">]</span><span class="err">\</span><span class="n">d</span><span class="o">*</span><span class="n">$</span>
<span class="p">}</span></pre></div>

<div class="remarkup-code-block" data-code-lang="cpp" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span><span class="p">(</span><span class="n">gdb</span><span class="p">)</span> <span class="n">printzv</span> <span class="mh">0x7fffcb44ac50</span>
<span class="p">[</span><span class="mh">0x7fffcb44ac50</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span> <span class="n">object</span><span class="p">(</span><span class="n">Opis</span><span class="err">\</span><span class="n">JsonSchema</span><span class="err">\</span><span class="n">ValidationResult</span><span class="p">)</span> <span class="err">#</span><span class="mi">486348</span> <span class="p">{</span>
<span class="n">maxErrors</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb4393e8</span><span class="p">]</span> <span class="kt">long</span><span class="o">:</span> <span class="mi">1</span>
<span class="n">errors</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="mh">0x7fffcb4393f8</span><span class="p">]</span> <span class="p">(</span><span class="n">refcount</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> <span class="nl">array</span><span class="p">:</span>     <span class="n">Hash</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="mh">0x5555559d7f00</span><span class="p">]</span><span class="o">:</span> <span class="p">{</span>
<span class="p">}</span></pre></div>

<p>Extracting the parameters was enough for WikiLambda developers to find the immediate root cause, they have removed some definitions which triggered the infinite loop and manually ran a script to reload the data in the Database. Eventually the Jenkins job managed to update the wiki database:</p>

<div class="remarkup-code-block" data-code-lang="irc" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span><span class="cp">16:30:26 </span><span class="nt">&lt;wmf-insecte&gt; </span>Project beta-update-databases-eqiad build #69029: FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69029/</pre></div>

<p>One problem solved!</p>

<p>References:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="/T342769" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_4"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T342769: Beta update.php fails due to PHP segfault in libpcre2-8.so.0.7.1</span></span></a></li>
<li class="remarkup-list-item">A similar debugging session I had to do for update.php back in 2021 {<a href="https://phabricator.wikimedia.org/T296539#7531235" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_5"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T296539#7531235</span></span></a>}</li>
<li class="remarkup-list-item">2006 notes by <a href="https://phabricator.wikimedia.org/p/tstarling/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_10"><span class="phui-tag-core phui-tag-color-person">@tstarling</span></a> from whom I have learned about gdb and the PHP gdb config: <a href="https://wikitech.wikimedia.org/wiki/GDB_with_PHP" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://wikitech.wikimedia.org/wiki/GDB_with_PHP</a></li>
</ul></div></content></entry><entry><title>CI: Get notified immediately when a job fails</title><link href="/phame/live/1/post/302/ci_get_notified_immediately_when_a_job_fails/" /><id>https://phabricator.wikimedia.org/phame/post/view/302/</id><author><name>kostajh (Kosta Harlan)</name></author><published>2023-03-07T09:59:27+00:00</published><updated>2023-04-17T14:24:07+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>If you&#039;ve submitted patches for MediaWiki core, skins or extensions, you&#039;ve seen this output in Gerrit:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/jyrs77a2eladcub55o72/PHID-FILE-m32cokm25r6r3rebws5z/image.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_11"><img src="https://phab.wmfusercontent.org/file/data/jyrs77a2eladcub55o72/PHID-FILE-m32cokm25r6r3rebws5z/image.png" height="498" width="1802" loading="lazy" alt="image.png (498×1 px, 253 KB)" /></a></div></p>

<p>That is a list of links to each job&#039;s console output for a patch that failed verification.</p>

<p>You can see a job that failed at <strong>1m 54s</strong>. But <tt class="remarkup-monospaced">jenkins-bot</tt> does not post a comment on the patch until all jobs have completed. That means you won&#039;t get email/IRC notifications for test failures on your patch until the longest running job completes, in this case, after <strong>14m 57s</strong>.[0] ⏳⏱️</p>

<p>With all due respect to <a href="https://xkcd.com/303" class="remarkup-link remarkup-link-ext" rel="noreferrer">xkcd/303</a>... wouldn&#039;t it be nice to get notified as soon as a failure occurs, so you can fix your patch earlier to avoid context switching, or losing time during a backport window?</p>

<p>IMHO, yes, and, now it&#039;s possible!</p>

<h3 class="remarkup-header">⚙️ Get started</h3>

<p>Commit a <tt class="remarkup-monospaced">quibble.yaml</tt> file (<a href="https://wikitech.wikimedia.org/wiki/Tool:Early_warning_bot#Using_with_your_project" class="remarkup-link remarkup-link-ext" rel="noreferrer">documentation</a>, <a href="https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/867531" class="remarkup-link remarkup-link-ext" rel="noreferrer">example patch</a>) to your MediaWiki project[1]:</p>

<div class="remarkup-code-block" data-code-lang="yaml" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span><span class="nt">earlywarning</span><span class="p">:</span>
    <span class="nt">should_vote</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">1</span>
    <span class="nt">should_comment</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">1</span></pre></div>

<p>The next time that there is a test failure[2] in your repository, you will see a comment from the <a href="https://wikitech.wikimedia.org/wiki/Tool:Early_warning_bot" class="remarkup-link remarkup-link-ext" rel="noreferrer">Early warning bot</a> and a <tt class="remarkup-monospaced">Verified: -1</tt> vote.</p>

<p>Here&#039;s an example of how that might look in practice:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/ox5szdrfty3uiz7sb6ly/PHID-FILE-ma6ps7xl62dcbheyypqi/image.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_12"><img src="https://phab.wmfusercontent.org/file/data/ox5szdrfty3uiz7sb6ly/PHID-FILE-ma6ps7xl62dcbheyypqi/image.png" height="1104" width="1648" loading="lazy" alt="image.png (1×1 px, 303 KB)" /></a></div>[3]<br />
(Yes, the formatting needs some work still, patches welcome!)</p>

<p>So, the bot announces 2 minutes after the patch is updated that there&#039;s a problem, with the output of the failed command. The full report from <tt class="remarkup-monospaced">jenkins-bot</tt> arrives 14 minutes later.</p>

<h3 class="remarkup-header">📚 Further reading</h3>

<p>For details on how this works, please see the documentation for the <a href="https://wikitech.wikimedia.org/wiki/Tool:Early_warning_bot#Using_with_your_project" class="remarkup-link remarkup-link-ext" rel="noreferrer">Early warning bot</a>. Your feedback and contributions are very welcome on <a href="/T323750" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_14"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T323750: Provide early feedback when a patch has job failures</span></span></a> (feel free to tag <a href="https://phabricator.wikimedia.org/T323750" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_15"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T323750</span></span></a> with patches adding <tt class="remarkup-monospaced">quibble.yaml</tt> to your project.)</p>

<h3 class="remarkup-header">🙌🏻 Thank you</h3>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_19"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a> for code &amp; architecture review, pushing through Jenkins configuration updates, and solving some low-level plumbing for <a href="/T331061" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_16"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T331061: Capture output from failed command and transmit to earlywarningbot</span></span></a></li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/p/Urbanecm/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_20"><span class="phui-tag-core phui-tag-color-person">@Urbanecm</span></a> for fixing a bug where the bot reported on non-essential pipelines (<a href="https://phabricator.wikimedia.org/T331236" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_17"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T331236</span></span></a>)</li>
</ul>

<p>Cheers,<br />
Kosta</p>

<p>[0] An alternative for getting real time progress is to watch <a href="https://integration.wikimedia.org/zuul/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Zuul TV</a> <div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/sygjyrbm5spmeahns7oq/PHID-FILE-i3babhxbrmk4dqwo47oz/image.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_13"><img src="https://phab.wmfusercontent.org/file/data/73f5bwjuzbhgd23xrrke/PHID-FILE-nfpu6x433rilzlqzznqr/preview-image.png" width="154.85193621868" height="220" alt="image.png (878×618 px, 118 KB)" /></a></div>. There is also the excellent work in <a href="/T214068" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_18"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T214068: Display Zuul status of jobs for a change on Gerrit UI</span></span></a> but this does not generate email/IRC notifications or set a verification label.<br />
[1] This will work for MediaWiki core, extensions, skins; in theory any CI job using Quibble could use it, though.<br />
[2] Some jobs, like mwext-phan, won&#039;t report back early because they are not yet run via Quibble.</p></div></content></entry><entry><title>Shrinking H2 database files</title><link href="/phame/live/1/post/300/shrinking_h2_database_files/" /><id>https://phabricator.wikimedia.org/phame/post/view/300/</id><author><name>hashar (Antoine Musso)</name></author><published>2022-12-16T15:38:36+00:00</published><updated>2023-01-09T17:23:24+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our code review system <a href="/tag/gerrit/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_25"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_24" aria-hidden="true"></span>Gerrit</span></a> has several caches, the largest ones being backed up on disk. The disk caches offload memory usage and persist the data between restarts. As a Java application, the caches are stored in <a href="https://h2database.com/" class="remarkup-link remarkup-link-ext" rel="noreferrer">H2 database</a> files and I recently had to find how to connect to them in order to inspect their content and reduce their size.</p>

<p>In short: <tt class="remarkup-monospaced">java -Dh2.maxCompactTime=15000 ...</tt> would cause the <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">H2</span></span></span> driver to compact the database upon disconnection.</p>

<h3 class="remarkup-header">context</h3>

<p>During an upgrade, the Gerrit installation filed up the system root partition entirely (<a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-11-17_Gerrit_3.5_upgrade" class="remarkup-link remarkup-link-ext" rel="noreferrer">incident report for Gerrit 3.5 upgrade</a>). The reason is two caches occupying 9G and 11G out of a the 40G system partition.  Those caches hold differences to files made by patchsets and are stored in two files:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>/var/lib/gerrit2/review_site/cache/</th><th>Size (MB)</th></tr>
<tr><td>git_file_diff.h2.db</td><td>8376</td></tr>
<tr><td>gerrit_file_diff.h2.db</td><td>11597</td></tr>
<tr></tr>
</table></div>

<p>An easy fix would have been to stop the service, delete all caches, restart the service and let the application refile the cold caches. It is a short term solution, long term what if it is an issue in the application and we have to do the same all over again in the next few weeks?   The large discrepancy also triggered my curiosity and I had to know the exact root cause to find a definitive fix to it. There started my journey of debugging.</p>

<h3 class="remarkup-header">They are all empty?</h3>

<p>When looking at the cache through the application shows caches are way smaller at around <strong>150MBytes</strong>:</p>

<div class="remarkup-code-block" data-code-lang="org gerrit show-caches" data-sigil="remarkup-code-block"><div class="remarkup-code-header">ssh -p 29418 gerrit.wikimedia.org gerrit show-caches</div><pre class="remarkup-code">  Name                          |Entries              |  AvgGet |Hit Ratio|
                                |   Mem   Disk   Space|         |Mem  Disk|
--------------------------------+---------------------+---------+---------+
D gerrit_file_diff              | 24562 150654 157.36m|  14.9ms | 72%  44%|
D git_file_diff                 | 12998 143329 158.06m|  14.8ms |  3%  14%|
                                               ^^^^^^^</pre></div>

<p>One could assume some overhead but there is no reason for metadata to occupy <strong>hundred</strong> times more space than the actual data they are describing. Specially given each cached item is a file diff which is more than a few bytes. To retrieve the files locally I compressed them with gzip and they shrunk to a mere 32 MBytes! It is a strong indication those files are filled mostly with empty data which suggests the database layer never reclaims no more used blocks. Reclaiming is known as compacting in <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">H2</span></span></span> database or vacuuming in Sqlite.</p>

<h3 class="remarkup-header">Connecting</h3>

<p>Once I retrieved the files, I have tried to connect to them using the <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">H2</span></span></span> database jar and kept doing mistakes after mistakes due to my completely lack of knowledge on that front:</p>

<p><strong>Version matters</strong></p>

<p>At first I tried with the latest version h2-2.1.214.jar and it did not find any data. I eventually found out the underlying storage system has been changed compared to version 1.3.176 used by Gerrit.I thus had to use an older version which can be retrieved from the Gerrit.war package.</p>

<p><strong>File parameter which is not a file</strong></p>

<p>I then wanted to a SQL dump of the database to inspect it using the <tt class="remarkup-monospaced">Script</tt> java class: <tt class="remarkup-monospaced">java -cp h2-1.3.176.jar org.h2.tools.Script</tt>, it requires a <tt class="remarkup-monospaced">-url</tt> option which is a jdbc URI containing the database name.  Intuitively I gave the full file name:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">java -cp h2-1.3.176.jar org.h2.tools.Script -url jdbc:h2:git_file_diff.h2.db&#039;</pre></div>

<p>It returns instantly and generate the dump:</p>

<div class="remarkup-code-block" data-code-lang="sql" data-sigil="remarkup-code-block"><div class="remarkup-code-header">backup.sql</div><pre class="remarkup-code"><span></span><span class="k">CREATE</span> <span class="k">USER</span> <span class="k">IF</span> <span class="k">NOT</span> <span class="k">EXISTS</span> <span class="ss">&quot;&quot;</span> <span class="n">SALT</span> <span class="s1">&#39;&#39;</span> <span class="n">HASH</span> <span class="s1">&#39;&#39;</span> <span class="k">ADMIN</span><span class="p">;</span></pre></div>

<p>Essentially an empty file. Looking at file on disk it created a <tt class="remarkup-monospaced">git_file_diff.h2.db.h2.db</tt> file which is 24kbytes. Lesson learned, the <tt class="remarkup-monospaced">h2.db</tt> suffix must be removed from the URI. I was then able to create the dump using:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">java -cp h2-1.3.176.jar org.h2.tools.Script -url jdbc:h2:git_file_diff&#039;</pre></div>

<p>Which resulted in a properly sized <tt class="remarkup-monospaced">backup.sql</tt>.</p>

<p><strong>Web based admin</strong></p>

<p>I have altered the SQL to make it fit Sqlite in order to load it in SqliteBrowser (a graphical interface which is very convenient to inspect those databases).  Then I found invoking the jar directly starts a background process attached to the database and open my web browser to a web UI: <tt class="remarkup-monospaced">java -jar h2-1.3.176.jar -url jdbc:h2:git_file_diff</tt>:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/ov5ovekdltkufsx35gdn/PHID-FILE-2apqmrcz7bwdyge77vxm/h2_web_ui.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_21"><img src="https://phab.wmfusercontent.org/file/data/ov5ovekdltkufsx35gdn/PHID-FILE-2apqmrcz7bwdyge77vxm/h2_web_ui.png" height="470" width="810" loading="lazy" alt="h2_web_ui.png (470×810 px, 92 KB)" /></a></div></p>

<p>That is very convenient to inspect the file.  The caches are are key value storages with a column keeping track of the size of each record. Summing them is how <tt class="remarkup-monospaced">gerrit show-caches</tt> finds out the size of the caches (roughly 150Mbytes for the two diff caches).</p>

<h3 class="remarkup-header">Compacting solutions</h3>

<p>The <a href="http://www.h2database.com/html/features.html#compacting" class="remarkup-link remarkup-link-ext" rel="noreferrer">H2 Database feature page</a> mentions empty space is to be re-used which is not the case as seen above. The document states when the database connection is closed, it compact it for up to 200 milliseconds. Gerrit establish the connection on start up and keep it up until it is shutdown at which point the compaction occurs.  It is not frequent enough, and the small delay is apparently not sufficient to compact our huge databases. To run a full compaction several methods are possible:</p>

<p><strong><tt class="remarkup-monospaced">SHUTDOWN COMPACT</tt>:</strong> this request an explicit compaction and terminates the connection. The documentation implies it is not subject to the time limit. That would have required a change in the Gerrit Java code to issue the command.</p>

<p><strong><tt class="remarkup-monospaced">org.h2.samples.Compact</tt> script</strong>: <tt class="remarkup-monospaced">H2</tt> has a <tt class="remarkup-monospaced">org.h2.samples.Compact</tt> to manually compact a given database, it would need some instrumentation to trigger it against each file after Gerrit is shutdown, possibly as a <tt class="remarkup-monospaced">systemd.service</tt> <tt class="remarkup-monospaced">ExecStopPost</tt> and iterating through each files.</p>

<p><strong>jdbc URL parameter <tt class="remarkup-monospaced">MAX_COMPACT_TIME</tt></strong>: the 200 milliseconds can be bumped by adding the parameter to the JDBC connection URL (separated by a semi column <tt class="remarkup-monospaced">;</tt>). Again it would require a change in Gerrit Java code to modify the way it connects.</p>

<p>The beauty of open source is I could access the database source code. It is hosted in <a href="https://github.com/h2database/h2database" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://github.com/h2database/h2database</a> in the <tt class="remarkup-monospaced">version-1.3</tt> tag which holds a subdirectory for each sub version.  When looking at a setting, the database driver uses the following piece of code (<a href="http://h2database.com/html/license.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">code licensed under Mozilla Public License Version 2.0 or Eclipse Public License 1.0</a>):</p>

<div class="remarkup-code-block" data-code-lang="java" data-sigil="remarkup-code-block"><div class="remarkup-code-header">version-1.3.176/h2/src/main/org/h2/engine/SettingsBase.java</div><pre class="remarkup-code"><span class="mi">60</span>     <span class="c">/**</span>
<span class="c">61      * Get the setting for the given key.</span>
<span class="c">62      *</span>
<span class="c">63      * @param key the key</span>
<span class="c">64      * @param defaultValue the default value</span>
<span class="c">65      * @return the setting</span>
<span class="c">66      */</span>
<span class="mi">67</span>     <span class="k">protected</span> <span class="n">String</span> <span class="n">get</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="n">String</span> <span class="n">defaultValue</span><span class="o">)</span> <span class="o">{</span>
<span class="mi">68</span>         <span class="n">StringBuilder</span> <span class="n">buff</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StringBuilder</span><span class="o">(</span><span class="s">&quot;h2.&quot;</span><span class="o">);</span>
<span class="mi">69</span>         <span class="kt">boolean</span> <span class="n">nextUpper</span> <span class="o">=</span> <span class="kc">false</span><span class="o">;</span>
<span class="mi">70</span>         <span class="k">for</span> <span class="o">(</span><span class="kt">char</span> <span class="n">c</span> <span class="o">:</span> <span class="n">key</span><span class="o">.</span><span class="n">toCharArray</span><span class="o">())</span> <span class="o">{</span>
<span class="mi">71</span>             <span class="k">if</span> <span class="o">(</span><span class="n">c</span> <span class="o">==</span> <span class="s">&#039;_&#039;</span><span class="o">)</span> <span class="o">{</span>
<span class="mi">72</span>                 <span class="n">nextUpper</span> <span class="o">=</span> <span class="kc">true</span><span class="o">;</span>
<span class="mi">73</span>             <span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="mi">74</span>                 <span class="c">// Character.toUpperCase / toLowerCase ignores the locale</span>
<span class="mi">75</span>                 <span class="n">buff</span><span class="o">.</span><span class="n">append</span><span class="o">(</span><span class="n">nextUpper</span> <span class="o">?</span> <span class="n">Character</span><span class="o">.</span><span class="n">toUpperCase</span><span class="o">(</span><span class="n">c</span><span class="o">)</span> <span class="o">:</span> <span class="n">Character</span><span class="o">.</span><span class="n">toLowerCase</span><span class="o">(</span><span class="n">c</span><span class="o">));</span>
<span class="mi">76</span>                 <span class="n">nextUpper</span> <span class="o">=</span> <span class="kc">false</span><span class="o">;</span>
<span class="mi">77</span>             <span class="o">}</span>
<span class="mi">78</span>         <span class="o">}</span>
<span class="mi">79</span>         <span class="n">String</span> <span class="n">sysProperty</span> <span class="o">=</span> <span class="n">buff</span><span class="o">.</span><span class="n">toString</span><span class="o">();</span>
<span class="mi">80</span>         <span class="n">String</span> <span class="n">v</span> <span class="o">=</span> <span class="n">settings</span><span class="o">.</span><span class="n">get</span><span class="o">(</span><span class="n">key</span><span class="o">);</span>
<span class="mi">81</span>         <span class="k">if</span> <span class="o">(</span><span class="n">v</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="mi">82</span>             <span class="n">v</span> <span class="o">=</span> <span class="n">Utils</span><span class="o">.</span><span class="n">getProperty</span><span class="o">(</span><span class="n">sysProperty</span><span class="o">,</span> <span class="n">defaultValue</span><span class="o">);</span>
<span class="mi">83</span>             <span class="n">settings</span><span class="o">.</span><span class="n">put</span><span class="o">(</span><span class="n">key</span><span class="o">,</span> <span class="n">v</span><span class="o">);</span>
<span class="mi">84</span>         <span class="o">}</span>
<span class="mi">85</span>         <span class="k">return</span> <span class="n">v</span><span class="o">;</span>
<span class="mi">86</span>     <span class="o">}</span></pre></div>

<p>When retrieving the setting <tt class="remarkup-monospaced">MAX_COMPACT_TIME</tt> it forges a camel case version of the setting name prefixed by <tt class="remarkup-monospaced">h2.</tt> which gives <tt class="remarkup-monospaced">h2.maxCompactTime</tt> then look it up in the JVM properties an if set pick its value.</p>

<p>Raising the compact time limit to 15 seconds is thus all about passing to <tt class="remarkup-monospaced">java</tt>: <tt class="remarkup-monospaced">-Dh2.maxCompactTime=15000</tt>.</p>

<h3 class="remarkup-header">Applying and resolution</h3>

<p><a href="https://phabricator.wikimedia.org/rOPUP7f6215e039d90840f4617c1e8278e1dd51abf2ea" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_23"><span class="phui-tag-core phui-tag-color-object">7f6215e039</span></a> in our Puppet applies the fix and summarize the above. Once I applied, I restart Gerrit once to have the setting taken in account and restarted it a second time to have it disconnect from the databases with the setting applied. The results are without appeal. Here are the largest gains:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>File</td><td>Before</td><td>After</td></tr>
<tr><td>approvals.h2.db</td><td><strong>610M</strong></td><td><strong>313M</strong></td></tr>
<tr><td>gerrit_file_diff.h2.db</td><td><strong>12G</strong></td><td><strong>527M</strong></td></tr>
<tr><td>git_file_diff.h2.db</td><td><strong>8.2G</strong></td><td><strong>532M</strong></td></tr>
<tr><td>git_modified_files.h2.db</td><td><strong>899M</strong></td><td><strong>149M</strong></td></tr>
<tr><td>git_tags.h2.db</td><td><strong>1.1M</strong></td><td><strong>32K</strong></td></tr>
<tr><td>modified_files.h2.db</td><td><strong>905M</strong></td><td><strong>208M</strong></td></tr>
<tr><td>oauth_tokens.h2.db</td><td>1.1M</td><td><strong>32K</strong></td></tr>
<tr><td>pure_revert.h2.db</td><td><strong>1.1M</strong></td><td><strong>32K</strong></td></tr>
<tr></tr>
</table></div>

<p>The <tt class="remarkup-monospaced">gerrit_file_diff</tt> and <tt class="remarkup-monospaced">git_file_diff</tt> went from respectively 12GB and 8.2G to 0.5G which addresses the issue.</p>

<h3 class="remarkup-header">Conclusion</h3>

<p>Setting the Java property <tt class="remarkup-monospaced">-Dh2.maxCompactTime=15000</tt> was a straightforward fix which does not require any change to the application code. It also guarantee the database will keep being compacted each time Gerrit is restarted and the issue that has lead to a longer maintenance window than expect would not reappear.</p>

<p>Happy end of year 2022!</p>

<p>References:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="/T323754" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_22"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T323754: Investigate Gerrit h2 cache being way too large</span></span></a></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-11-17_Gerrit_3.5_upgrade" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-11-17 incident report: Gerrit 3.5 upgrade</a></li>
<li class="remarkup-list-item"><a href="http://h2database.com/html/features.html#compacting" class="remarkup-link remarkup-link-ext" rel="noreferrer">H2 Database Features: Compacting a Database</a></li>
<li class="remarkup-list-item"><a href="https://github.com/h2database/h2database/blob/version-1.3/version-1.3.176/h2/src/main/org/h2/engine/SettingsBase.java" class="remarkup-link remarkup-link-ext" rel="noreferrer">SettingsBase.java for version 1.3.176</a></li>
<li class="remarkup-list-item"><a href="https://gerrit.wikimedia.org/r/c/operations/puppet/+/865023" class="remarkup-link remarkup-link-ext" rel="noreferrer">operations/puppet.git change 865023</a></li>
</ul></div></content></entry><entry><title>scap backport Makes Deployments Easy</title><link href="/phame/live/1/post/297/scap_backport_makes_deployments_easy/" /><id>https://phabricator.wikimedia.org/phame/post/view/297/</id><author><name>jeena (Jeena Huneidi)</name></author><published>2022-09-26T22:47:23+00:00</published><updated>2022-11-24T01:20:15+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Mediawiki developers, have you ever thought, “I wish I could deploy my own code for Mediawiki”? Now you can! More deploys! More fun!</p>

<p>Next time you want to get some code deployed, why not try <tt class="remarkup-monospaced">scap backport</tt>?</p>

<h2 class="remarkup-header">One Command To Deploy</h2>

<p><tt class="remarkup-monospaced">scap backport</tt> is one command that will +2 your patch, deploy to mwdebug and wait for your approval, and finally sync to all servers. You only need to provide the change number or gerrit url of your change.</p>

<p>You can run <tt class="remarkup-monospaced">scap backport</tt> on patches that have already merged, or re-run <tt class="remarkup-monospaced">scap backport</tt> if you decided to cancel in the middle of a run. <tt class="remarkup-monospaced">scap backport</tt> can also handle multiple patches at a time. After all the patches have been merged, they’ll be deployed all together. <tt class="remarkup-monospaced">scap backport</tt> will confirm that your patches are deployable before merging, and double check no extra patches have sneaked into your deployment.</p>

<h2 class="remarkup-header">One Command To Revert</h2>

<p>And if your code didn’t work out, don’t worry, there’s <tt class="remarkup-monospaced">scap backport —revert</tt>, which will create a revert patch, send it to Gerrit, and run all steps of scap backport to revert your work. You’re offered the choice to give a reason for revert, which will show up in the commit message. Just be aware that you&#039;ll need to wait for tests to run and your code to merge before it gets synced, so in an emergency this might not be the best option.</p>

<h3 class="remarkup-header">Extra Information</h3>

<p>You can also list available backports or reverts using the <tt class="remarkup-monospaced">—list</tt> flag!</p>

<p>If you&#039;d like some guidance on deploying backports, please sign up <a href="https://phabricator.wikimedia.org/maniphest/task/edit/form/96/" class="remarkup-link" rel="noreferrer">here</a> to join us for backport training, which happens once a week on Thursday during the UTC late backport window!</p>

<h2 class="remarkup-header">Scap Backport In Action</h2>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/vfy6q3lkzhojhftfnbnf/PHID-FILE-zasxa3oxiqlwgojozsyw/ezgif.com-gif-maker%281%29.gif" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_26"><img src="https://phab.wmfusercontent.org/file/data/vfy6q3lkzhojhftfnbnf/PHID-FILE-zasxa3oxiqlwgojozsyw/ezgif.com-gif-maker%281%29.gif" height="450" width="800" loading="lazy" alt="ezgif.com-gif-maker(1).gif (450×800 px, 2 MB)" /></a></div></p>

<h2 class="remarkup-header">Compare to Manual Steps</h2>

<p>For comparison, the previous way to backport would require the user to enter the following commands on the deployment host:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">cd /srv/mediawiki-staging/php-&lt;version&gt;
git status
git fetch
git log -p HEAD..@{u}
git rebase</pre></div>

<p>Then, if there were changes to an extension: <tt class="remarkup-monospaced">git submodule update [extensions|skins]/&lt;name&gt;</tt><br />
Then, log in to mwdebug and run <tt class="remarkup-monospaced">scap pull</tt><br />
Then, back on the deployment host: <tt class="remarkup-monospaced">scap sync-file php-&lt;version&gt;/&lt;path to file&gt; &#039;Backport: [[gerrit:&lt;change no&gt;|&lt;subject&gt; (&lt;bug no&gt;)]]&#039;</tt> for each changed file</p>

<h2 class="remarkup-header">Example Usage</h2>

<p><strong>List backports</strong><br />
<tt class="remarkup-monospaced">scap backport --list</tt></p>

<p><strong>Backport change(s)</strong><br />
<tt class="remarkup-monospaced">scap backport 1234</tt><br />
<tt class="remarkup-monospaced">scap backport https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1234</tt><br />
<tt class="remarkup-monospaced">scap backport 1234 5678</tt></p>

<p><strong>Merge but do not sync</strong><br />
<tt class="remarkup-monospaced">scap backport --stop-before-sync 1234</tt></p>

<p><strong>List revertable changes</strong><br />
<tt class="remarkup-monospaced">scap backport --revert --list</tt></p>

<p><strong>Revert change(s)</strong><br />
<tt class="remarkup-monospaced">scap backport --revert 1234</tt><br />
<tt class="remarkup-monospaced">scap backport --revert 1234 5678</tt></p>

<p>That&#039;s all for now, and happy backporting!</p></div></content></entry><entry><title>Production Excellence #46: July &amp; August 2022</title><link href="/phame/live/1/post/296/production_excellence_46_july_august_2022/" /><id>https://phabricator.wikimedia.org/phame/post/view/296/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2022-09-08T23:02:42+00:00</published><updated>2022-09-11T15:16:05+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How are we doing in our strive for operational excellence? Read on to find out!</p>

<h4 class="remarkup-header">Incidents</h4>

<p>7 documented incidents in July, and 4 in August (<a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident graphs</a>). Read more about past incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-07-03_shellbox_request_spike" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-07-03 shellbox</a><br />
Impact: For 16 minutes, edits and previews for pages with Score musical notes were slow or unavailable.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-07-10_thumbor" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-07-10 thumbor</a><br />
Impact: For several days, Thumbor p75 service response times gradually regressed by several seconds.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-07-11_FrontendUnavailable_cache_text" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-07-11 FrontendUnavailable cache text</a><br />
Impact: For 5 minutes, the MediaWiki API cluster in eqiad responded with higher latencies or errors.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-07-11_Shellbox_and_parsoid_saturation" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-07-11 Shellbox and parsoid saturation</a><br />
Impact: For 13 minutes, the mobileapps service was serving HTTP 503 errors to clients.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-07-12_codfw_A5_powercycle" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-07-12 codfw A5 power cycle</a><br />
Impact: No observed public-facing impact. Internal clean up took some work, e.g. for Ganeti VMs.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-07-13_brief_outbound_bandwidth_spike_eqsin" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-07-13 eqsin bandwidth</a><br />
Impact: For 20 minutes, there was a small increase in error responses for thumbnails served from the Eqsin data center (Singapore).</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-07-20_network_interruption" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-07-20 eqiad network</a><br />
Impact: For 10-15 minutes, a portion of wiki traffic from Eqiad-served regions was lost (about 1M uncached requests). For ~30 minutes, Phabricator was unable to access its database.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-08-10_cassandra_disk_space" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-08-10 cassandra disk space</a><br />
Impact: During planned downtime, other hosts ran out of space due to accumulating logs. No external impact.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-08-10_confd_all_hosts" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-08-10 confd all hosts</a><br />
Impact: No external impact.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-08-16_Beta_Cluster_502" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-08-16 Beta Cluster 502</a><br />
Impact: For 7 hours, all Beta Cluster sites were unavailable.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-08-16_x2_databases_replication_breakage" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-08-16 x2 database replication</a><br />
Impact: For 36 minutes, errors were noticeable for some editors. Saving edits was unaffected.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/7ryfnwymatltanafdfiy/PHID-FILE-iuxx3uozzbe4spraevhy/proderr-incidents_2022-08.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_27"><img src="https://phab.wmfusercontent.org/file/data/7ryfnwymatltanafdfiy/PHID-FILE-iuxx3uozzbe4spraevhy/proderr-incidents_2022-08.png" height="300" alt="proderr-incidents 2022-08.png (800×1 px, 107 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Incident follow-up</h4>

<p>Recently completed incident follow-up:</p>

<p><a href="https://phabricator.wikimedia.org/T315386" class="remarkup-link" rel="noreferrer">Replace certificate on elastic09 in Beta Cluster</a><br />
Brian (<a href="https://phabricator.wikimedia.org/p/bking/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_30"><span class="phui-tag-core phui-tag-color-person">@bking</span></a>, WMF Search) noticed during an incident review that an internal server used an expired cert and renewed it in accordance with a documented process.</p>

<p><a href="https://phabricator.wikimedia.org/T263872" class="remarkup-link" rel="noreferrer">Localisation cache must be purged after train deploy</a><br />
<a href="https://phabricator.wikimedia.org/p/Tchanders/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_31"><span class="phui-tag-core phui-tag-color-person">@Tchanders</span></a> (WMF AHT) filed this in 2020 after a recurring issue with stale interface labels. Work led by Ahmon (<a href="https://phabricator.wikimedia.org/p/dancy/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_32"><span class="phui-tag-core phui-tag-color-person">@dancy</span></a>, WMF RelEng).</p>

<p>Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up work</a> in Phabricator! These are preventive measures and tech debt mitigations written down after an incident is concluded.</p>

<p>Highlight from the &quot;Oldest incident follow-up&quot; query:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T83729" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_29"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T83729</span></span></a> Fix monitoring of poolcounter service.</li>
</ul>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Trends</h4>

<p>The month of July saw <a href="https://phabricator.wikimedia.org/maniphest/query/XHYmsxx4VNRI/#R" class="remarkup-link" rel="noreferrer">22 new production errors</a> of which 9 are still open today. In August we encountered <a href="https://phabricator.wikimedia.org/maniphest/query/BnX.PiwEomZt/#R" class="remarkup-link" rel="noreferrer">29 new production errors</a> of which 10 remain open today and have carried over to September.</p>

<p>Take a look at the <a href="/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-yellow phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_34"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-tags" data-meta="0_33" aria-hidden="true"></span>Wikimedia-production-error</span></a> workboard and look for tasks that could use your help.</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 <strong>Did you know?</strong></div>
<div class="remarkup-reply-body"><p>To zoom in and find your team&#039;s error reports, use the appropriate &quot;Filter&quot; link in the <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">sidebar</a> of the workboard.</p></div>
</blockquote>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/wwkvlsw5kjv2x45s747n/PHID-FILE-qjzrt3nnqhkm3tzrxaxx/proderr-unified_2022-08.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_28"><img src="https://phab.wmfusercontent.org/file/data/wwkvlsw5kjv2x45s747n/PHID-FILE-qjzrt3nnqhkm3tzrxaxx/proderr-unified_2022-08.png" height="400" alt="proderr-unified 2022-08.png (1×1 px, 110 KB)" /></a></div></p>

<p>For the month-over-month numbers, refer to the <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Thanks!</h4>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p></div></content></entry><entry><title>Production Excellence #45: June 2022</title><link href="/phame/live/1/post/292/production_excellence_45_june_2022/" /><id>https://phabricator.wikimedia.org/phame/post/view/292/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2022-07-30T00:14:09+00:00</published><updated>2022-07-30T00:39:02+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How are we doing in our strive for operational excellence? Read on to find out!</p>

<h4 class="remarkup-header">Incidents</h4>

<p>There were 6 incidents in June this year. That&#039;s double the median of three per month, over the past two years (<a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident graphs</a>).</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-06-01_Lost_index_in_cloudelastic" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-06-01 cloudelastic</a><br />
Impact: For 41 days, Cloudelastic was missing search results about files from commons.wikimedia.org.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-06-10_overload_varnish_haproxy" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-06-10 overload varnish haproxy</a><br />
Impact: For 3 minutes, wiki traffic was disrupted in multiple regions for cached and logged-in responses.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-06-12_appserver_latency" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-06-12 appserver latency</a><br />
Impact: For 30 minutes, wiki backends were intermittently slow or unresponsive, affecting a portion of logged-in requests and uncached page views.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-06-16_MariaDB_password_leak" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-06-16 MariaDB password</a><br />
Impact: For 2 hours, a current production database password was publicly known. Other measures ensured that no data could be compromised (e.g. firewalls and selective IP grants).</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-06-21_asw-a2-codfw_accidental_power_cycle" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-06-21 asw-a2-codfw power</a><br />
Impact: For 11 minutes, one of the Codfw server racks lost network connectivity. Among the affected servers was an LVS host. Another LVS host in Codfw automatically took over its load balancing responsibility for wiki traffic. During the transition, there was a brief increase in latency for regions served by Codfw (Mexico, and parts of US/Canada).</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-06-30_asw-a4-codfw_accidental_power_cycle" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-06-30 asw-a4-codfw power</a><br />
Impact: For 18 minutes, servers in the A4-codfw rack lost network connectivity. Little to no external impact.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/kijef6xlici3aaw6ahvl/PHID-FILE-aotgnoqzbwoie3pko3uk/proderr-incidents_2022-06.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_35"><img src="https://phab.wmfusercontent.org/file/data/kijef6xlici3aaw6ahvl/PHID-FILE-aotgnoqzbwoie3pko3uk/proderr-incidents_2022-06.png" height="300" alt="proderr-incidents 2022-06.png (800×1 px, 139 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Incident follow-up</h4>

<p>Recently completed incident follow-up:</p>

<p><a href="https://phabricator.wikimedia.org/T307648" class="remarkup-link" rel="noreferrer">Audit database usage of GlobalBlocking extension</a><br />
Filed by Amir (<a href="https://phabricator.wikimedia.org/p/Ladsgroup/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_37"><span class="phui-tag-core phui-tag-color-person">@Ladsgroup</span></a>) in May following an outage due to db load from GlobalBlocking. Amir reduced the extensions&#039; DB load by 10%, through avoiding checks for edit traffic from WMCS and Toolforge. And he implemented stats for monitoring GlobalBlocking DB queries going forward.</p>

<p><a href="https://phabricator.wikimedia.org/T312319" class="remarkup-link" rel="noreferrer">Reduce Lilypond shellouts from VisualEditor</a><br />
Filed by Reuven (<a href="https://phabricator.wikimedia.org/p/RLazarus/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_38"><span class="phui-tag-core phui-tag-color-person">@RLazarus</span></a>) and Kunal (<a href="https://phabricator.wikimedia.org/p/Legoktm/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_39"><span class="phui-tag-core phui-tag-color-person">@Legoktm</span></a>) after a shellbox incident. Ed (<a href="https://phabricator.wikimedia.org/p/Esanders/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_40"><span class="phui-tag-core phui-tag-color-person">@Esanders</span></a>) and Sammy (<a href="https://phabricator.wikimedia.org/p/TheresNoTime/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_41"><span class="phui-tag-core phui-tag-color-person">@TheresNoTime</span></a>) improved the Score extension&#039;s VisualEditor plugin to increase its debounce duration.</p>

<p>Remember to review and schedule <span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/project/view/4758/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_0" aria-hidden="true"></span>Incident Follow-up work</span></a></span> in Phabricator! These are preventive measures and tech debt mitigations written down after an incident is concluded. Read more about past incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech.</p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Trends</h4>

<p>In June and July (which is almost over), we reported <a href="https://phabricator.wikimedia.org/maniphest/query/WDqlrITVmIoX/#R" class="remarkup-link" rel="noreferrer">27 new production errors</a> and <a href="https://phabricator.wikimedia.org/maniphest/query/pzOAOpbnF3PX/#R" class="remarkup-link" rel="noreferrer">25 production errors</a> respectively. Of these 52 new issues, 27 were closed in weeks since then, and 25 remain unresolved and will carry over to August.</p>

<p>We also addressed 25 stagnant problems that we carried over from previous months, thus the workboard overall remains at exactly 299 unresolved production errors.</p>

<p>Take a look at the <a href="/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-yellow phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_43"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-tags" data-meta="0_42" aria-hidden="true"></span>Wikimedia-production-error</span></a> workboard and look for tasks that could use your help.</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 <strong>Did you know?</strong></div>
<div class="remarkup-reply-body"><p>To zoom in and find your team&#039;s error reports, use the appropriate &quot;Filter&quot; link in the <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">sidebar</a> of the workboard .</p></div>
</blockquote>

<p>For the month-over-month numbers, refer to the <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/zca4kbqwovqytpms6swp/PHID-FILE-nij2b3fpqx5nr25rw3lh/proderr-unified_2022-06.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_36"><img src="https://phab.wmfusercontent.org/file/data/zca4kbqwovqytpms6swp/PHID-FILE-nij2b3fpqx5nr25rw3lh/proderr-unified_2022-06.png" height="400" alt="proderr-unified 2022-06.png (1×1 px, 111 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Thanks!</h4>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<blockquote><p>&quot;Mr. Vice President. No numbers, no bubbles.&quot;<br />
— <a href="https://en.wikiquote.org/wiki/Hans_Rosling" class="remarkup-link remarkup-link-ext" rel="noreferrer">🔴🟠🟡🟢🔵🟣</a></p></blockquote></div></content></entry><entry><title>GitLab-a-thon!</title><link href="/phame/live/1/post/288/gitlab-a-thon/" /><id>https://phabricator.wikimedia.org/phame/post/view/288/</id><author><name>brennen (Brennen Bearnes)</name></author><published>2022-05-31T18:30:19+00:00</published><updated>2022-06-03T18:39:22+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Release Engineering&#039;s <a href="https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/GitLab-a-thon" class="remarkup-link remarkup-link-ext" rel="noreferrer">&quot;GitLab-a-thon&quot; sprint for May 10th-24th (roughly)</a> focused on the mechanics of migrating a Wikimedia service to GitLab, setting up a CI pipeline, building container images from that service, and publishing images to the Wikimedia registry. We selected the Blubber project as a good candidate for experimentation:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Workboard: <a href="/project/view/5873/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_56"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-map-marker" data-meta="0_55" aria-hidden="true"></span>Release-Engineering-Team (GitLab-a-thon 🦊)</span></a></li>
<li class="remarkup-list-item"><a href="/T301168" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_44"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T301168: Migrate Blubber project to GitLab</span></span></a></li>
<li class="remarkup-list-item"><a href="/T307536" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_45"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T307536: Build and Run Blubber test variant on GitLab untrusted runners</span></span></a></li>
</ul>

<p>We evaluated build mechanisms including GitLab&#039;s suggested docker-in-docker, <a href="https://github.com/GoogleContainerTools/kaniko" class="remarkup-link remarkup-link-ext" rel="noreferrer">Kaniko</a>, <a href="https://podman.io/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Podman</a>, and <a href="https://github.com/moby/buildkit/" class="remarkup-link remarkup-link-ext" rel="noreferrer">BuildKit</a>:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="/T307599" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_46"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T307599: Investigate alternatives to docker-in-docker for container image creation in GitLab</span></span></a></li>
<li class="remarkup-list-item"><a href="/T308213" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_47"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T308213: Investigate Kaniko as an option to build CI images</span></span></a></li>
<li class="remarkup-list-item"><a href="/T307810" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_48"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T307810: Investigate buildkitd instances as image builders for GitLab</span></span></a></li>
</ul>

<p>We ultimately landed on BuildKit as the least constraining for future options, and the most in line with features we&#039;d like to offer.</p>

<p>We explored a range of options for building and publishing, including variations on:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Building on runners provisioned on a DigitalOcean Kubernetes cluster and importing to the production registry from some trusted location (contint, for example) by way of a shim.</li>
<li class="remarkup-list-item">Building on trusted runners and publishing to the GitLab Container Registry, then importing to the production registry by way of a shim.</li>
<li class="remarkup-list-item">Building on trusted runners and publishing directly from there to the prod registry, authenticated against GitLab by way of JWT.</li>
</ul>

<p>We eventually landed on this latter, and work is well underway on implementation: <a href="/T308501" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_49"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T308501: Authenticate trusted runners for registry access against GitLab using temporary JSON Web Token</span></span></a></p>

<p>Other work included implementing CI for Blubber on GitLab (<a href="https://phabricator.wikimedia.org/T307534" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_50"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T307534</span></span></a>), improvements to user-facing documentation (<a href="https://phabricator.wikimedia.org/T307535" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_51"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T307535</span></span></a>, <a href="https://phabricator.wikimedia.org/T307538" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_52"><span class="phui-tag-core phui-tag-color-object">T307538</span></a>), enforcing the allowlist for container images in GitLab CI (<a href="https://phabricator.wikimedia.org/T291978" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_53"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T291978</span></span></a>), experimentation with the GitLab Container Registry (<a href="https://phabricator.wikimedia.org/T307537" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_54"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T307537</span></span></a>), and extensive discussions with ServiceOps on GitLab infrastructure.</p></div></content></entry><entry><title>Production Excellence #44: May 2022</title><link href="/phame/live/1/post/285/production_excellence_44_may_2022/" /><id>https://phabricator.wikimedia.org/phame/post/view/285/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2022-06-16T00:07:21+00:00</published><updated>2022-06-16T01:13:36+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h4 class="remarkup-header">Incidents</h4>

<p>By golly, we&#039;ve had quite the month! 10 documented incidents, which is more than three times the two-year median of 3. The last time we experienced ten or more incidents in one month, was June 2019 when we had eleven (<a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident graphs</a>, <a href="https://phabricator.wikimedia.org/phame/post/view/163/production_excellence_12_june_2019/" class="remarkup-link" rel="noreferrer">Excellence monthly of June 2019</a>).</p>

<p>I&#039;d like to draw your attention to something positive. As you read the below, take note of incidents that did <em>not</em> impact public services, and did <em>not</em> have lasting impact or data loss. For example, the <a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-24_Failed_Apache_restart" class="remarkup-link remarkup-link-ext" rel="noreferrer">Apache incident</a> benefited from PyBal&#039;s automatic health-based depooling. The <a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-02_deployment" class="remarkup-link remarkup-link-ext" rel="noreferrer">deployment server incident</a> recovered without loss thanks to Bacula. The <a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-01_etcd" class="remarkup-link remarkup-link-ext" rel="noreferrer">Etcd incident</a> impact was limited by serving stale data. And, the <a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-31_Analytics_Data_Lake_-_Hadoop_Namenode_failure" class="remarkup-link remarkup-link-ext" rel="noreferrer">Hadoop incident</a> recovered by resuming from Kafka right where it left off.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/sod427mtvhnj2uv6hehm/PHID-FILE-fy7zemebsxvm3656do77/proderr-incidents_2022-05.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_57"><img src="https://phab.wmfusercontent.org/file/data/sod427mtvhnj2uv6hehm/PHID-FILE-fy7zemebsxvm3656do77/proderr-incidents_2022-05.png" height="300" alt="proderr-incidents 2022-05.png (800×1 px, 135 KB)" /></a></div></p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-01_etcd" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-05-01 etcd</a><br />
Impact: For 2 hours, Conftool could not sync Etcd data between our core data centers. Puppet and some other internal services were unavailable or out of sync. The issue was isolated, with no impact on public services.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-02_deployment" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-05-02 deployment server</a><br />
Impact: For 4 hours, we could not update or deploy MediaWiki and other services, due to corruption on the active deployment server. No impact on public services.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-05_Wikimedia_full_site_outage" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-05-05 site outage</a><br />
Impact: For 20 minutes, all wikis were unreachable for logged-in users and non-cached pages. This was due to a GlobalBlocks schema change causing significant slowdown in a frequent database query.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-09_confctl" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-05-09 Codfw confctl</a><br />
Impact: For 5 minutes, all web traffic routed to Codfw received error responses. This affected central USA and South America (local time after midnight). The cause was human error and lack of CLI parameter  validation.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-09_exim-bdat-errors" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-05-09 exim-bdat-errors</a><br />
Impact: During five days, about 14,000 incoming emails from Gmail users to wikimedia.org were rejected and returned to sender.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-21_varnish_cache_busting" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-05-21 varnish cache busting</a><br />
Impact: For 2 minutes, all wikis and services behind our CDN were unavailable to all users.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-24_Failed_Apache_restart" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-05-24 failed Apache restart</a><br />
Impact: For 35 minutes, numerous internal services that use Apache on the backend were down. This included Kibana (logstash) and Matomo (piwik). For 20 of those minutes, there was also reduced MediaWiki server capacity, but no measurable end-user impact for wiki traffic.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-25_de.wikipedia.org" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-05-25 de.wikipedia.org</a><br />
Impact: For 6 minutes, a portion of logged-in users and non-cached pages experienced a slower response or an error. This was due to increased load on one of the databases.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-26_Database_hardware_failure" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-05-26 m1 database hardware</a><br />
Impact: For 12 minutes, internal services hosted on the m1 database (e.g. Etherpad) were unavailable or at reduced capacity.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-05-31_Analytics_Data_Lake_-_Hadoop_Namenode_failure" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-05-31 Analytics Hadoop failure</a><br />
Impact: For 1 hour, all HDFS writes and reads were failing. After recovery, ingestion from Kafka resumed and caught up. No data loss or other lasting impact on the Data Lake.</p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Incident follow-up</h4>

<p>Recently completed incident follow-up:</p>

<p><a href="https://phabricator.wikimedia.org/T308100" class="remarkup-link" rel="noreferrer">Invalid confctl selector should either error out or select nothing</a><br />
Filed by Amir (<a href="https://phabricator.wikimedia.org/p/Ladsgroup/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_59"><span class="phui-tag-core phui-tag-color-person">@Ladsgroup</span></a>) after the confctl incident this past month. Giuseppe (<a href="https://phabricator.wikimedia.org/p/Joe/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_60"><span class="phui-tag-core phui-tag-color-person">@Joe</span></a>) implemented CLI parameter validation to prevent human error from causing a similar outage in the future.</p>

<p><a href="https://phabricator.wikimedia.org/T237224" class="remarkup-link" rel="noreferrer">Backup opensearch dashboards data</a><br />
Filed back in 2019 by Filippo (<a href="https://phabricator.wikimedia.org/p/fgiunchedi/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_61"><span class="phui-tag-core phui-tag-color-person">@fgiunchedi</span></a>). The OpenSearch homepage dashboard (at logstash.wikimedia.org) was accidentally deleted last month. Bryan (<a href="https://phabricator.wikimedia.org/p/bd808/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_62"><span class="phui-tag-core phui-tag-color-person">@bd808</span></a>) tracked down its content and re-created it. Cole (<a href="https://phabricator.wikimedia.org/p/colewhite/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_63"><span class="phui-tag-core phui-tag-color-person">@colewhite</span></a>) and Jaime (<a href="https://phabricator.wikimedia.org/p/jcrespo/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_64"><span class="phui-tag-core phui-tag-color-person">@jcrespo</span></a>) worked out a strategy and set up automated backups going forward.</p>

<p>Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up work</a> in Phabricator! These are preventive measures and tech debt mitigations written down after an incident is concluded. Read more about past incidents at Incident status on Wikitech.</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡<strong>Did you know?</strong>: The form on the <strong><a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a></strong> page now includes a date, to more easily create backdated reports.</div>
<div class="remarkup-reply-body"></div>
</blockquote>



<hr class="remarkup-hr" />

<h4 class="remarkup-header">Trends</h4>

<p>In May we discovered <a href="https://phabricator.wikimedia.org/maniphest/query/z7vLwJdXtLu2/#R" class="remarkup-link" rel="noreferrer">28 new production errors</a>, of which 20 remain unresolved and have come with us to June.</p>

<p>Last month the workboard totalled 292 tasks still open from prior months. Since the last edition, we completed 11 tasks from previous months, gained 11 additional errors from May (some of May was counted in last month), and have 7 fresh errors in the current month of June. As of today, the workboard houses 299 open production error tasks (<a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet</a>, <a href="https://phabricator.wikimedia.org/project/reports/1055/" class="remarkup-link" rel="noreferrer">phab report</a>).</p>

<p>Take a look at the workboard and look for tasks that could use your help.<br />
<span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_1" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/iehaaq6poun2oomlcwbs/PHID-FILE-kuqfntrg64zo6amslmqr/proderr-unified_2022-05.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_58"><img src="https://phab.wmfusercontent.org/file/data/iehaaq6poun2oomlcwbs/PHID-FILE-kuqfntrg64zo6amslmqr/proderr-unified_2022-05.png" height="370" alt="proderr-unified 2022-05.png (1×1 px, 191 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Thanks!</h4>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p></div></content></entry><entry><title>Production Excellence #43: April 2022</title><link href="/phame/live/1/post/284/production_excellence_43_april_2022/" /><id>https://phabricator.wikimedia.org/phame/post/view/284/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2022-05-12T21:00:02+00:00</published><updated>2022-05-12T21:00:02+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h4 class="remarkup-header">Incidents</h4>

<p>Last month we experienced 2 (public) incidents. This is below the three-year median of 3 incidents a month (<a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident graphs</a>).</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-04-06_esams_network" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-04-06 esams network</a><br />
Impact: For 30 minutes, wikis were slow or unreachable for a portion of clients to the Esams data center. Esams is one of two DCs primarily serving Europe, Middle East, and Africa.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-04-26_cr2-eqord_down" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-04-26 cr2-eqord down</a><br />
Impact: No external impact. Internally, for 2 hours we were unable to access our Eqord routers by any means. This was due to a fiber cut on a redundant link to Eqiad, which then coincided with planned vendor maintenance on the links to Ulsfo and Eqiad. See also <a href="https://wikitech.wikimedia.org/wiki/Network_design" class="remarkup-link remarkup-link-ext" rel="noreferrer">Network design</a>.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/gx2ya7ay4oadwbvstys3/PHID-FILE-eepkypklh223tt6cnbss/proderr-incidents_2022-04.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_65"><img src="https://phab.wmfusercontent.org/file/data/gx2ya7ay4oadwbvstys3/PHID-FILE-eepkypklh223tt6cnbss/proderr-incidents_2022-04.png" height="300" alt="proderr-incidents 2022-04.png (800×1 px, 127 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Incident follow-up</h4>

<p>Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up work</a> in Phabricator, which are preventive measures and tech debt mitigations written down after an incident is concluded. Read more about past incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech.</p>

<p>Recently resolved incident follow-up:</p>

<p><a href="https://phabricator.wikimedia.org/T249683" class="remarkup-link" rel="noreferrer">Reduce mysql grants for wikiadmin scripts</a><br />
Filed in 2020 after the wikidata drop-table incident (<a href="https://wikitech.wikimedia.org/wiki/Incidents/2020-04-07_Wikidata%27s_wb_items_per_site_table_dropped" class="remarkup-link remarkup-link-ext" rel="noreferrer">details</a>). Carried out over the last six months by Amir <a href="https://phabricator.wikimedia.org/p/Ladsgroup/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_67"><span class="phui-tag-core phui-tag-color-person">@Ladsgroup</span></a> (SRE Data Persistence).</p>

<p><a href="https://phabricator.wikimedia.org/T308204" class="remarkup-link" rel="noreferrer">Improve reliability of Toolforge k8s cron jobs</a> and <a href="https://phabricator.wikimedia.org/T308205" class="remarkup-link" rel="noreferrer">Re-enable CronJobControllerV2</a><br />
Filed earlier this week after a Toolforge incident and carried out by Taavi <span class="phabricator-remarkup-mention-unknown">@Majavah</span>.</p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Trends</h4>

<p>During the month of April we reported <a href="https://phabricator.wikimedia.org/maniphest/query/OZ99DkeJf85D/#R" class="remarkup-link" rel="noreferrer">27 new production errors</a>. Of these new errors, we resolved 14, and the remaining 13 are still open and have carried over to May.</p>

<p>Last month, the workboard totalled 298 unresolved error reports. Of these older reports that carried over from previous months, 16 were resolved. Most of these were reports from before 2019.</p>

<p>The new total, including some tasks for the current month of May, is 292. A slight decrease! (<a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet</a>).</p>

<p>Take a look at the workboard and look for tasks that could use your help.</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_2" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/p33bkc5shgi2xiiqy3ma/PHID-FILE-grqk7ehgojobicytegoi/proderr-unified_2022-04.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_66"><img src="https://phab.wmfusercontent.org/file/data/p33bkc5shgi2xiiqy3ma/PHID-FILE-grqk7ehgojobicytegoi/proderr-unified_2022-04.png" height="400" alt="proderr-unified 2022-04.png (1×1 px, 116 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Thanks!</h4>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head"><a href="https://en.wikiquote.org/wiki/Pirates_of_the_Caribbean:_Dead_Men_Tell_No_Tales" class="remarkup-link remarkup-link-ext" rel="noreferrer">No Tales</a>:</div>
<div class="remarkup-reply-body"><p>In a fair fight, I&#039;d kill you!<br />
— Well, that&#039;s not much incentive for me to fight fair then, is it?</p></div>
</blockquote></div></content></entry><entry><title>Production Excellence #42: March 2022</title><link href="/phame/live/1/post/283/production_excellence_42_march_2022/" /><id>https://phabricator.wikimedia.org/phame/post/view/283/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2022-04-21T21:29:56+00:00</published><updated>2022-04-21T21:29:56+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h4 class="remarkup-header">Incidents</h4>

<p>We&#039;ve had quite the month, with 8 documented incidents. That&#039;s more than double the two-year median of three a month (<a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident graphs</a>).</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-03-01_ulsfo_network" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-03-01 ulsfo network</a><br />
Impact: For 20 minutes, clients normally routed to Ulsfo were unable to reach our projects. This includes New Zealand, parts of Canada, and the United States west coast.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-03-04_esams_availability_banner_sampling" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-03-04 esams availability banner sampling</a><br />
Impact: For 1.5 hours, all wikis were largely unreachable from Europe (via Esams), with more limited impact across the globe via other data centers as well.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-03-06_wdqs-categories" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-03-06 wdqs-categories</a><br />
Impact: For 1.5 hours, some requests to the public Wikidata Query Service API were sporadically blocked.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-03-10_MediaWiki_availability" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-03-10 site availability</a><br />
Impact: For 12 min, all wikis were unreachable to logged-in users, and to unregistered users trying to access uncached content.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-03-27_api" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-03-27 api</a><br />
Impact: For ~4 hours, in three segments of 1-2 hours each over two days, there were higher levels of failed or slow MediaWiki API requests.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-03-27_wdqs_outage" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-03-27 wdqs outage</a><br />
Impact: For 30 minutes, all WDQS queries failed due to an internal deadlock.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-03-29_network" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-03-29 network</a><br />
Impact: For approximately 5 minutes, Wikipedia and other Wikimedia sites were slow or inaccessible for many users, mostly in Europe/Africa/Asia. (Details not public at this time.)</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incidents/2022-03-31_api_errors" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-03-31 api errors</a><br />
Impact: For 22 minutes, API server and app server availability were slightly decreased (~0.1% errors, all for s7-hosted wikis such as Spanish Wikipedia), and the latency of API servers was elevated as well.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/xedrjee3sul2gydfwoai/PHID-FILE-h2ogt6dmse5kmxqmhqm2/proderr-incidents_2022-03.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_68"><img src="https://phab.wmfusercontent.org/file/data/xedrjee3sul2gydfwoai/PHID-FILE-h2ogt6dmse5kmxqmhqm2/proderr-incidents_2022-03.png" height="300" alt="proderr-incidents 2022-03.png (800×1 px, 107 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Incident follow-up</h4>

<p>Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up (Sustainability)</a> in Phabricator, which are preventive measures and tech debt mitigations written down after an incident is concluded. Read more about past incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech. Some recently completed sustainability work:</p>

<p><a href="https://phabricator.wikimedia.org/T248506" class="remarkup-link" rel="noreferrer">Add linecard diversity to router-to-router interconnect at Codfw</a><br />
Filed by Chris <a href="https://phabricator.wikimedia.org/p/CDanis/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_70"><span class="phui-tag-core phui-tag-color-person">@CDanis</span></a> (SRE Infra) in 2020 after an incident where all hosts in the Codfw data center lost connectivity at once. Completed by Arzhel <a href="https://phabricator.wikimedia.org/p/ayounsi/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_71"><span class="phui-tag-core phui-tag-color-person">@ayounsi</span></a> and Cathal cmooney (SRE Infra), and <a href="https://phabricator.wikimedia.org/p/Papaul/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_72"><span class="phui-tag-core phui-tag-color-person">@Papaul</span></a> (DC Ops); including in Esams where the same issue existed.</p>

<p><a href="https://phabricator.wikimedia.org/T295187" class="remarkup-link" rel="noreferrer">Expand parser tests to cover language conversation variants in table-of-contents output</a><br />
Suggested and carried out by <a href="https://phabricator.wikimedia.org/p/cscott/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_73"><span class="phui-tag-core phui-tag-color-person">@cscott</span></a> (Parsoid) after reviewing an incident in November. The TOC on wikis that rely on the LanguageConverter service (such as Chinese Wikipedia) were no longer localized</p>

<p><a href="https://phabricator.wikimedia.org/T304323" class="remarkup-link" rel="noreferrer">Fix unquoted URL parameters in Icgina health checks</a><br />
Suggested by Riccardo <a href="https://phabricator.wikimedia.org/p/Volans/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_74"><span class="phui-tag-core phui-tag-color-person">@Volans</span></a> (SRE Infra) in response to an early warning signal for TLS certificate expiry. He realized that automated checks for a related cluster were still claiming to be in good health, when they in fact should have been firing a similar warning. Carried out by Filippo <a href="https://phabricator.wikimedia.org/p/fgiunchedi/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_75"><span class="phui-tag-core phui-tag-color-person">@fgiunchedi</span></a> and Daniel <a href="https://phabricator.wikimedia.org/p/Dzahn/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_76"><span class="phui-tag-core phui-tag-color-person">@Dzahn</span></a>.</p>

<p><a href="https://phabricator.wikimedia.org/T281249" class="remarkup-link" rel="noreferrer">Provide automation to quickly show replication status when primary is down</a><br />
Filed in April by Jaime (SRE Data Persistence), carried out by John <a href="https://phabricator.wikimedia.org/p/jbond/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_77"><span class="phui-tag-core phui-tag-color-person">@jbond</span></a> and Amir <a href="https://phabricator.wikimedia.org/p/Ladsgroup/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_78"><span class="phui-tag-core phui-tag-color-person">@Ladsgroup</span></a>.</p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Trends</h4>

<p>Since the last edition, we resolved 24 of the 301 unresolved errors that carried over from previous months.</p>

<p>In March, we created <a href="https://phabricator.wikimedia.org/maniphest/query/ryOkF_JP6cV1/#R" class="remarkup-link" rel="noreferrer">54 new production errors</a>. That&#039;s quite high compared to the twenty-odd reports we find most months. Of these, 17 remain open today a month later.</p>

<p>In the month of April, so far, we reported <a href="https://phabricator.wikimedia.org/maniphest/query/1LEA6jQzf7iU/#R" class="remarkup-link" rel="noreferrer">20 new errors</a> of which also 17 remain open today.</p>

<p>The production error workboard once again adds up to exactly 298 open tasks (<a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet</a>).</p>

<p>Take a look at the workboard and look for tasks that could use your help.</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_3" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/ulgvetgwehwqbcggdhfa/PHID-FILE-iwrq5jalk5zuf7se5x34/proderr-unified_2022-03.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_69"><img src="https://phab.wmfusercontent.org/file/data/ulgvetgwehwqbcggdhfa/PHID-FILE-iwrq5jalk5zuf7se5x34/proderr-unified_2022-03.png" height="400" alt="proderr-unified 2022-03.png (1×1 px, 113 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Thanks!</h4>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head"><a href="https://en.wikiquote.org/wiki/What%27s_Eating_Gilbert_Grape%3F" class="remarkup-link remarkup-link-ext" rel="noreferrer">♥️ Grape</a>:</div>
<div class="remarkup-reply-body"><p>Becky: What do you want?<br />
Gilbert: I want Momma to take aerobics classes. I want Ellen to grow up. I want a new brain for Arnie.<br />
Becky: — What do you want for you? Just for you?<br />
Gilbert: I want to be a good person.</p></div>
</blockquote></div></content></entry><entry><title>What We Learned from Trainsperiment Week</title><link href="/phame/live/1/post/281/what_we_learned_from_trainsperiment_week/" /><id>https://phabricator.wikimedia.org/phame/post/view/281/</id><author><name>thcipriani (Tyler Cipriani)</name></author><published>2022-04-20T20:15:37+00:00</published><updated>2022-04-28T08:56:31+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Developers should own the process of putting their code into production. They should decide when to deploy, monitor their deployment, and make decisions about rollback.</p>

<p>But that’s not how we work at Wikimedia today, and we on Release Engineering aren’t sure how to get there, so we’ve decided to experiment.</p>

<p>Typically a deployment takes us a full week to complete—the week of March 21st, 2022, <strong>we deployed MediaWiki four times</strong>.</p>

<p>We called that week <strong><a href="https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Trainsperiment_week" class="remarkup-link remarkup-link-ext" rel="noreferrer">🚂🧪Trainsperiment Week</a></strong>.</p>

<h3 class="remarkup-header">📻 Deployment frequency</h3>

<p>MediaWiki&#039;s mainline branch is changing constantly, but we deploy MediaWiki weekly (<a href="https://phabricator.wikimedia.org/phame/post/view/253/how_we_deploy_code/" class="remarkup-link" rel="noreferrer">kind of</a>). We keep stats that measure how far our main branch is from production.</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/mlbbo4puonjr2eycdxu2/PHID-FILE-enqifqdgzv7fqtht4uuj/deployment-fidelity.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_79"><img src="https://phab.wmfusercontent.org/file/data/mlbbo4puonjr2eycdxu2/PHID-FILE-enqifqdgzv7fqtht4uuj/deployment-fidelity.png" height="427" width="627" loading="lazy" alt="deployment-fidelity.png (427×627 px, 18 KB)" /></a></div></p>

<p>The trainsperiment changed our deployment frequency, which affected all the other metrics, too. <strong>Faster deployment means smaller batch size, and shorter change lead time.</strong></p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/vtiowlue5g6e7jw67giv/PHID-FILE-g2sy2nr5nxxwzuyjy4fg/deployment-fidelity-cd.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_80"><img src="https://phab.wmfusercontent.org/file/data/vtiowlue5g6e7jw67giv/PHID-FILE-g2sy2nr5nxxwzuyjy4fg/deployment-fidelity-cd.png" height="427" width="627" loading="lazy" alt="deployment-fidelity-cd.png (427×627 px, 13 KB)" /></a></div></p>

<h3 class="remarkup-header">📦 Change lead time</h3>

<p>The number that we knew would change during trainsperiment week was <strong>change lead time</strong>—the time from merge to deploy. If I merge a change, then a minute later I deploy it, that change’s lead time is one minute.</p>

<p>This chart shows the average lead time of all patches in a given train:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/rcpxzav4lzk4fdxmjuyv/PHID-FILE-hhrwtzmrpklcmfcj2igv/image2273.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_81"><img src="https://phab.wmfusercontent.org/file/data/rcpxzav4lzk4fdxmjuyv/PHID-FILE-hhrwtzmrpklcmfcj2igv/image2273.png" height="877" width="1079" loading="lazy" alt="image2273.png (877×1 px, 136 KB)" /></a></div></p>

<p>The chart below compares a typical week (1.38.0-wmf.1) to trainsperiment week (1.38.0-wmf.2, wmf.3, and wmf.4). Each dot is a change in a particular version—fewer dots mean fewer changes.</p>

<p>During trainsperiment week, we deployed faster. Each deployment was smaller, and the lead time of each patch in a release was shorter.</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/4ew45dgtuat5mqm6hrpw/PHID-FILE-qx3nsmus4g62453w6axc/Annotated_Trainsperiment_Week_%281%29.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_82"><img src="https://phab.wmfusercontent.org/file/data/4ew45dgtuat5mqm6hrpw/PHID-FILE-qx3nsmus4g62453w6axc/Annotated_Trainsperiment_Week_%281%29.png" height="448" width="925" loading="lazy" alt="Annotated Trainsperiment Week (1).png (448×925 px, 63 KB)" /></a></div></p>

<p>Here’s the same data on a logarithmic scale. During trainsperiment week there were only a few hours between trains, so the lead time could be measured in hours, not days!</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/dvpuhkpslgdyennggvxv/PHID-FILE-t57vwh62ghgpspvixcfq/leadtime-log.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_83"><img src="https://phab.wmfusercontent.org/file/data/dvpuhkpslgdyennggvxv/PHID-FILE-t57vwh62ghgpspvixcfq/leadtime-log.png" height="665" width="1353" loading="lazy" alt="leadtime-log.png (665×1 px, 50 KB)" /></a></div></p>

<h3 class="remarkup-header">📝 Survey Feedback</h3>

<p>At the end of the week, we asked for feedback via the Wikitech-l mailing list. We collected comments from the mediawiki.org talk page and the summaries of candid conversations.</p>

<h4 class="remarkup-header">👍 Satisfaction</h4>

<p>A small number of people took the time to respond to the survey—20 people answered our questions.</p>

<p>Almost everyone who took the survey seemed satisfied with communication. Most were satisfied with the experiment overall.</p>

<p>There were concerns on the <a href="https://www.mediawiki.org/wiki/Talk:Wikimedia_Release_Engineering_Team/Trainsperiment_week" class="remarkup-link remarkup-link-ext" rel="noreferrer">talk page</a> and in the survey responses about testing. <strong>Testing felt time-crunched</strong>, and everyone was worried about the time pressure on our Quality and Test Engineering Team (QTE).</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/jsmsjmwk3xnpmsr7eydg/PHID-FILE-jt4hvbsohrwqqwpbx3ta/%F0%9F%9A%82%F0%9F%A7%AATrainsperiment_Satsifaction_%282%29.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_84"><img src="https://phab.wmfusercontent.org/file/data/jsmsjmwk3xnpmsr7eydg/PHID-FILE-jt4hvbsohrwqqwpbx3ta/%F0%9F%9A%82%F0%9F%A7%AATrainsperiment_Satsifaction_%282%29.png" height="512" width="826" loading="lazy" alt="🚂🧪Trainsperiment Satsifaction (2).png (512×826 px, 21 KB)" /></a></div></p>

<h4 class="remarkup-header">🌚 Impact</h4>

<p>Less than half of our respondents felt that the Trainsperiment positively impacted their work, with one respondent strongly disagreeing that there was a positive impact.</p>

<p><strong>Most people were neutral about the impact of this experiment on their work.</strong></p>

<p>The person who felt that there was a negative impact was concerned about the lack of time allotted for testing—they urged us to rethink testing if we wanted to try this again.</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/swfv4em56ai4bgpupj5c/PHID-FILE-5l4cntozhiizfteeyn4x/%F0%9F%9A%82%F0%9F%A7%AATrainsperiment_Agreement_%281%29.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_85"><img src="https://phab.wmfusercontent.org/file/data/swfv4em56ai4bgpupj5c/PHID-FILE-5l4cntozhiizfteeyn4x/%F0%9F%9A%82%F0%9F%A7%AATrainsperiment_Agreement_%281%29.png" height="514" width="839" loading="lazy" alt="🚂🧪Trainsperiment Agreement (1).png (514×839 px, 23 KB)" /></a></div></p>

<h4 class="remarkup-header">💌 Comments</h4>

<p>The survey contained free-form prompts for feedback. Below is a smattering of representative responses. Most of the comments below are amalgamations and simplifications, but the reactions in quotes are verbatim.</p>

<p><strong>What should RelEng have done differently</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Automated alerts: emails whenever there’s a deploy or the train is blocked</li>
</ul>

<p><strong>What would you need to change if we did this every week?</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item">No time to find and fix regressions means the QA process would need to change somehow</li>
<li class="remarkup-list-item">More transparency around when train rolls out and a clearer blocking process</li>
<li class="remarkup-list-item">Translations</li>
<li class="remarkup-list-item">“my mental model.”</li>
</ul>

<p><strong>Other Feedback</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item">“With less time between groups, breakage will reach all wikis very quickly”</li>
<li class="remarkup-list-item">“Often Tuesdays are currently used to deploy bug fixes that are hard to test locally […] we would need to revisit many of our workflows”</li>
<li class="remarkup-list-item">“This, at least on paper, will help devs”</li>
<li class="remarkup-list-item">“This was a pure win, IMO.”</li>
</ul>

<h3 class="remarkup-header">🗣️ Conversations</h3>

<p>We talked individually to people who had concerns about the experiment on Slack and IRC, in meetings, in the survey feedback, and on the talk page.</p>

<p><strong>People were concerned about shortening the time for review.</strong> This is understandable given that we shortened a 168-hour process to a 12-hour process. </p>

<p>Our QA process takes time. Our overburdened principal engineers take time to review code going live on a weekly basis. Due to some esoteric details, even our CI system gives us more confidence given more time—it was possible that MediaWiki could have broken compatibility with an extension without alerting anyone.</p>

<p><strong>We have come to rely on the weekly cadence to make a careful release, and a faster process would mean rethinking our process pipeline to production.</strong></p>

<h3 class="remarkup-header">🎀 Release Engineering&#039;s Feedback</h3>

<p><strong>The weekly train hides a lot of technical debt</strong>—it’s a giant feature flag and the missing testing environment rolled into one. It goes out every week (mostly), and Release Engineering spends about 20% of its time monitoring the release.</p>

<p>During trainsperiment week, we spent 100% of our time deploying—that’s not sustainable for our team.</p>

<p>We surfaced process pain points with this experiment, which was a success. We added to the already overlarge burdens of our principal engineers and quality engineers, which was a failure.</p>

<p><strong>But this isn’t the end of the experiments</strong>. We endeavor to bring developers and production closer together—preferably with us standing back a healthy distance. If you’d like to help us get there—get in touch.</p>

<hr class="remarkup-hr" />

<p>Thanks to <a href="https://phabricator.wikimedia.org/p/kchapman/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_86"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@kchapman</span></a>, <a href="https://phabricator.wikimedia.org/p/brennen/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_87"><span class="phui-tag-core phui-tag-color-person">@brennen</span></a>, and <a href="https://phabricator.wikimedia.org/p/Krinkle/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_88"><span class="phui-tag-core phui-tag-color-person">@Krinkle</span></a> for reading earlier drafts of this post and offering their feedback.</p></div></content></entry><entry><title>A Trainsperiments Week Reflection</title><link href="/phame/live/1/post/278/a_trainsperiments_week_reflection/" /><id>https://phabricator.wikimedia.org/phame/post/view/278/</id><author><name>dduvall (Dan Duvall)</name></author><published>2022-04-01T02:29:46+00:00</published><updated>2022-04-08T13:59:27+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Over here in the <a href="/tag/release-engineering-team/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_90"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_89" aria-hidden="true"></span>Release-Engineering-Team</span></a>, <a href="https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys" class="remarkup-link remarkup-link-ext" rel="noreferrer">Train Deployment</a> is usually a rotating duty. We&#039;ve <a href="https://phabricator.wikimedia.org/J253" class="remarkup-link" rel="noreferrer">written about it before</a>, so I won&#039;t go into the exact process, but I want to tell you something new about it.</p>

<p>It&#039;s awful, incredibly stressful, and a bit lonely.</p>

<p>And last week we ran an experiment where we endeavored to perform the full train cycle four times in a single week... What is wrong with us? (Okay. I need to own this. It was technically my idea.) So what is wrong with me? Why did I wish this on my team? Why did everyone agree to it?</p>

<p>First I think it&#039;s important to portray (and perhaps with a little more color) how terrible running the train can be.</p>

<h4 class="remarkup-header">How it usually feels to run a Train Deployment and why</h4>

<blockquote><p>Here&#039;s a little chugga-choo with a captain and a crew. Would the llama like a ride? Llama Llama tries to hide.</p>

<p>―Llama Llama, Llama Llama Misses Mama</p></blockquote>

<p>At the outset of many a week I have wondered why, when the kids are safely in childcare and I&#039;m finally in a quiet house well fed and preparing a nice hot shower to not frantically use but actually enjoy, my shoulder is cramping and there&#039;s a strange buzzing ballooning in my abdomen.</p>

<p>Am I getting sick? Did I forget something? This should be nice. Why can&#039;t I have nice things? Why... Oh. Yes. Right. I&#039;m on train this week.</p>

<p>Train begins in the body before it terrorizes the mind, and I&#039;m not the only one who feels that way.</p>

<blockquote><p>A week of periodic drudgery which at any moment threatens to tip into the realm of waking nightmare.</p>

<p>―Stoic yet Hapless Conductor</p></blockquote>

<p>Aptly put. The nightmare is anything from a tiny visual regression to taking some of the largest sites on the Internet down completely.</p>

<blockquote><p>Giving a presentation but you have no idea what the slides are.</p>

<p>―Bravely Befuddled Conductor</p></blockquote>

<p>Yes. There&#039;s no visibility into what we are deploying. It&#039;s a week&#039;s worth of changes, other teams&#039; changes, changes from teams with different workflows and development cycles, all touching hundreds of different codebases. The changes have gone through review, they&#039;ve been hammered by automated tests, and yet we are still too far removed from them to understand what might happen when they&#039;re exposed to real world conditions.</p>

<blockquote><p>It&#039;s like throwing a penny into a well, a well of snakes, bureaucratic snakes that hate pennies, and they start shouting at you to fill out oddly specific sounding forms of which you have none.</p>

<p>―Lost Soul been &#039;round these parts</p></blockquote>

<p>Kafkaesque.</p>

<p>When under the stress and threat of the aforementioned nightmare, it&#039;s difficult to think straight. But we have to. We have to parse and investigate intricate stack traces, run <tt class="remarkup-monospaced">git blame</tt>s on the deployment server, navigate our bug reporting forms and try to recall which teams are responsible for which parts of the aggregate MediaWiki codebase we&#039;ve put together which itself is highly specific to WMF&#039;s production installation and really only becomes that long after changes merge to main branches of the constituent codebases.</p>

<p>We have to exercise clear judgement and make decisive calls of whether to rollback partially (previous group) or completely (all groups to previous version). We may have to halt everything and start hollering in IRC, Slack channels, mailing lists, to get the signal to the right folks (wonderful and gracious folks) that no more code changes will be deployed until what we&#039;re seeing is dealt with. We have to play the bad guys and gals to get the train back on track.</p>

<h4 class="remarkup-header">Trainsperiments Week and what was different about it</h4>

<blockquote><p>Study after study shows that having a good support network constitutes the single most powerful protection against becoming traumatized. Safety and terror are incompatible. When we are terrified, nothing calms us down like a reassuring voice or the firm embrace of someone we trust.</p>

<p>―Bessel Van Der Kolk, M.D., The Body Keeps the Score</p></blockquote>

<p>Four trains in a single week and everyone in Release Engineering is onboard. What could possibly be better about that?</p>

<p>Well there is a safety in numbers as they say, and not in some Darwinistic way where most of us will be picked off by the train demons and the others will somehow take solace in their incidental fitness, but in a way where we are mutually trusting, supportive, and feeling collectively resourced enough to do the needful with aplomb.</p>

<p>So we set up video meetings for all scheduled deployment windows, had synchronous hand offs between our European colleagues and our North American ones. We welcomed folks from other teams into our deployments to show them the good, the bad, and the ugly of how their code gets its final send off &#039;round the bend and into the setting hot fusion reaction that is production. We found and fixed <a href="https://phabricator.wikimedia.org/T223287" class="remarkup-link" rel="noreferrer">longstanding and mysterious bugs in our tooling</a>. <em>We deployed four full trains in a single week</em>.</p>

<p>And it felt markedly different.</p>

<blockquote><p>One of those barn raising projects you read about where everybody pushes the walls up en masse.</p>

<p>―Our Stoic Now Softened but Still Sardonic Conductor</p></blockquote>

<p>Yes! Lonely and unwitnessed work is de facto drudgery. Toiling safely together we have a greater chance at staving off the stress and really feeling the accomplishment.</p>

<blockquote><p>Giving a presentation with your friends and everyone contributes one slide.</p>

<p>―Our No Longer Befuddled but Simply Brave Conductor</p></blockquote>

<p>Many hands make light work!</p>

<blockquote><p>It was like throwing a handful of pennies into a well, a well of snakes, still bureaucratic and shouty, oh hey but my friends are here and they remind me these are just stack traces, words on a screen, and my friends happen to be great at filling out forms.</p>

<p>―Our Once Lost Now Found Conductor</p></blockquote>

<p>When no one person is overwhelmed or unsafe, we all think and act more clearly.</p>

<h4 class="remarkup-header">The hidden takeaways of Trainsperiment Week</h4>

<p>So how should what we&#039;ve learned during our Trainsperiment Week inform our future deployment strategies and process. How should train deployments change?</p>

<p>The known hypothesis we wanted to test by performing this experiment was in essence:</p>

<ol class="remarkup-list">
<li class="remarkup-list-item">More frequent deployments will result in fewer changes being deployed each time.</li>
<li class="remarkup-list-item">Fewer changes on average means the deployment is less likely to fail. The deployment is safer.</li>
<li class="remarkup-list-item">A safer deployment can be performed more frequently. (Positive feedback loop to #1.)</li>
<li class="remarkup-list-item">Overall we will: <strong>move faster; break less</strong>.</li>
</ol>

<p>I don&#039;t know if we&#039;ve proved that yet but we got an inkling that yes, the smaller subsequent deployments of the week did seem to go more smoothly. One week, however, even a week of four deployment cycles is not a large enough sample to say definitively whether doing train more frequently will for sure result in safer, more frequent deployments with fewer failures.</p>

<p>What was not apparent until we did our retrospective, however, is that it simply felt easier to do deployments together. It was still a kind of drudgery, but it was not abjectly <em>terrible</em>.</p>

<p>My personal takeaway is that <strong>a conductor who feels resourced and safe is the basis for all other improvements to the deployment process</strong>, and I want conductors to not only have tooling that works reliably with actionable logging at their disposal, but to feel a sense of community there with them when they&#039;re pushing the buttons. I want them to feel that the hard calls of whether or not to halt everything and rollback are not just their calls but shared in the moment among numerous people with intimate knowledge of the overall MediaWiki software ecosystem.</p>

<p>Better tooling—particularly around error reporting and escalation—is a barrier to entry for sure. Once we&#039;ve made sufficient improvements there we need to get that tooling into other people&#039;s hands and show them that this process does not have to be so terrifying. And I think we&#039;re on the right track here with increased frequency and smaller sets of changes, but <strong>we can&#039;t lose sight of the human/social element and foundational basis of safety</strong>.</p>

<p>More than anything else, I want wider participation in the train deployment process by engineers in the entire organization along with volunteers.</p>

<hr class="remarkup-hr" />

<p>Thanks to <a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_91"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a> for reading my drafts and unblocking me from myself a number of times. Thanks to <a href="https://phabricator.wikimedia.org/p/jeena/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_92"><span class="phui-tag-core phui-tag-color-person">@jeena</span></a> and <a href="https://phabricator.wikimedia.org/p/brennen/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_93"><span class="phui-tag-core phui-tag-color-person">@brennen</span></a> for the inspirational analogies.</p></div></content></entry><entry><title>GitLab: Rethinking how we handle access control</title><link href="/phame/live/1/post/273/gitlab_rethinking_how_we_handle_access_control/" /><id>https://phabricator.wikimedia.org/phame/post/view/273/</id><author><name>brennen (Brennen Bearnes)</name></author><published>2022-03-04T22:44:08+00:00</published><updated>2022-03-10T00:42:03+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I&#039;ll start with a bit of general administrivia.  First, our migration of Wikimedia code review &amp; CI to GitLab continues, and we&#039;re mindful that people could use regular updates on progress.  Second, I need to think through some stuff about the project, and doing that in writing is helpful for all involved.  I&#039;m going to try writing occasional blog entries here for both purposes.</p>

<p>Now on to the main topic of this post: Access control for groups and projects on the Wikimedia GitLab instance.</p>

<p><strong>The tl;dr:</strong> We&#039;ve been modeling access to things on GitLab by using groups under <tt class="remarkup-monospaced">/people</tt> to contain individual users and then granting those groups access to things under <tt class="remarkup-monospaced">/repos</tt>.  This has been tricky to explain and doesn&#039;t work as well at a technical level as we&#039;d hoped, so we&#039;re mostly scrapping the distinction, and moving control of project access to individual memberships in groups under <tt class="remarkup-monospaced">/repos</tt>.  This should be easier to think about, simpler to manage, and seems like it will suit our needs better. Read on for the nitty-gritty detail.</p>

<p>(Thanks to <a href="https://phabricator.wikimedia.org/p/Dzahn/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_96"><span class="phui-tag-core phui-tag-color-person">@Dzahn</span></a>, <span class="phabricator-remarkup-mention-unknown">@Majavah</span>, <a href="https://phabricator.wikimedia.org/p/bd808/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_97"><span class="phui-tag-core phui-tag-color-person">@bd808</span></a>, <a href="https://phabricator.wikimedia.org/p/AntiCompositeNumber/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_98"><span class="phui-tag-core phui-tag-color-person">@AntiCompositeNumber</span></a>, and <a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_99"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a> for helping me think through the issues underlying this post.)</p>

<h2 class="remarkup-header">Background</h2>

<p>During the GitLab consultation, when we were working on building up a model of how we&#039;d use GitLab for Wikimedia projects, we wrote up a draft policy for managing users and their access to projects.</p>

<p>GitLab supports <a href="https://docs.gitlab.com/ee/user/group/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Groups</a>.  GitLab groups are similar to GitHub&#039;s concept of organizations, although the specifics differ.  Groups can contain:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Other, nested groups</li>
<li class="remarkup-list-item">Individual projects (repositories &amp; metadata)</li>
<li class="remarkup-list-item">Users as members; members of other groups can be invited to a group<ul class="remarkup-list">
<li class="remarkup-list-item">A user who is a member of a top-level group is also a member of every group it contains</li>
</ul></li>
</ul>

<p>We&#039;ve since changed the original draft policy in some small ways - in particular, we decided to move most projects into a top-level <tt class="remarkup-monospaced">/repos</tt> group in order to offer shared CI runners (see <a href="https://phabricator.wikimedia.org/T292094" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_94"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T292094</span></span></a>). You can read the policy we landed on at <a href="https://www.mediawiki.org/w/index.php?title=GitLab/Policy&amp;oldid=5067934" class="remarkup-link remarkup-link-ext" rel="noreferrer">the latest revision of GitLab/Policy on mediawiki.org</a>.</p>

<p>The basic idea was that we would separate groups out into:</p>

<ol class="remarkup-list">
<li class="remarkup-list-item">Sub-groups of <tt class="remarkup-monospaced">/repos</tt>: Namespaces for projects, split up by functional area of code</li>
<li class="remarkup-list-item">Sub-groups of <tt class="remarkup-monospaced">/people</tt>: Namespaces for individual users, split up by organizational units like:<ul class="remarkup-list">
<li class="remarkup-list-item">Volunteer group</li>
<li class="remarkup-list-item">Teams at organizations such as the WMF, WMDE, etc.</li>
</ul></li>
</ol>

<p>Groups in <tt class="remarkup-monospaced">/people</tt> could then be given access to projects under <tt class="remarkup-monospaced">/repos</tt>.</p>

<p>Our hope was that this would let us decouple the management of groups of humans from the individual projects they work on, and ease onboarding for new contributors.  A new member of the WMF Release Engineering team, for example, could be added to a single group and then have access to all the things they need to do their job.</p>

<p>We intended for most <tt class="remarkup-monospaced">/people</tt> groups to be owned by their members, who would in turn have ownership-level access to their projects under <tt class="remarkup-monospaced">/repos</tt>, allowing for contributors to a project to manage access and invite new contributors.</p>

<p>As a concrete example:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://gitlab.wikimedia.org/repos/cloud" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gitlab.wikimedia.org/repos/cloud</a> contains various cloud services projects</li>
<li class="remarkup-list-item">The Wikimedia Foundation Cloud Services team and volunteer cloud administrators are modeled by membership in:<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://gitlab.wikimedia.org/people/wmf-team-cloud-services" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gitlab.wikimedia.org/people/wmf-team-cloud-services</a></li>
<li class="remarkup-list-item"><a href="https://gitlab.wikimedia.org/people/volunteer-group-cloud-admin" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gitlab.wikimedia.org/people/volunteer-group-cloud-admin</a></li>
</ul></li>
</ul>

<h2 class="remarkup-header">Problems with this scheme</h2>

<p>I&#039;ve been proceeding under this plan as people request the creation of GitLab project groups, but there turn out to be some problems.</p>

<p><strong>First, it doesn&#039;t seem like permission inheritance for nested groups with other groups as members works the way you&#039;d expect &amp; hope</strong>:  See <a href="https://phabricator.wikimedia.org/T300939" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_95"><span class="phui-tag-core phui-tag-color-object">T300939</span></a> - &quot;GitLab group permissions are not inherited by sub-groups for groups of users invited to the parent repo&quot;.</p>

<p><strong>Second, users have concerns about equity of access and tight coupling of things like employment with a specific organization to project access.</strong> We didn&#039;t have any intention of modeling any group of users as second-class citizens within this scheme, but it seems to create the impression of one all the same.  It&#039;s also striking that the set of projects people work on just isn&#039;t that cleanly mapped to any particular organizational structure.  Once you&#039;ve been a technical contributor for a while, you&#039;ve almost certainly collected responsibilities that no org chart reflects accurately.</p>

<p><strong>Finally, and maybe most importantly, this is a complicated way to do things.</strong>  People have a hard time thinking about it, and it requires a lot of explanation.  That seems bad for an abstraction that we&#039;d like to be basically self-serve for most users.</p>

<h2 class="remarkup-header">Proposed solution</h2>

<p>Mostly, my plan is to use groups closer to how they seem to be designed:</p>

<ol class="remarkup-list">
<li class="remarkup-list-item">Sub-groups of <tt class="remarkup-monospaced">/repos</tt> will contain both individual contributor memberships and projects</li>
<li class="remarkup-list-item">Except in occasional one-off cases, access should be granted at the level of a containing group rather than at the level of individual projects, so as to avoid micromanaging access to many projects.</li>
<li class="remarkup-list-item">We&#039;ll keep <tt class="remarkup-monospaced">/people</tt> in mind as a potential solution for some problems (for example, it might be a good tool for synchronizing groups of users from LDAP and granting access to certain projects on that basis), but not rely on it for anything at the moment.</li>
</ol>

<p>There are some unanswered questions here, but I plan to redraft the policy doc, move existing project layouts to this scheme, and start creating new project groups on this basis in the coming week or so.</p>

<p>My main philosophical takeaway here is that I work with a bunch of anarchists, and it&#039;s always best to plan accordingly.</p>

<p>Originally, one of our goals for this migration was avoiding a repeat of the weird, nested morass that is our current set of Gerrit permissions. While it would be a good idea to keep the structure of things on GitLab flatter and easier to think about, I&#039;m no longer <em>that</em> worried about it. Some of the complexity is inherent to any large set of projects and contributors; some of it just reflects a long-lived technical culture that&#039;s emergent and largely self-governing, tendencies that nearly always resist well-intentioned efforts to rationalize and map structure to things like official organizational layout.</p></div></content></entry><entry><title>Diving Into Our Deployment Data</title><link href="/phame/live/1/post/272/diving_into_our_deployment_data/" /><id>https://phabricator.wikimedia.org/phame/post/view/272/</id><author><name>thcipriani (Tyler Cipriani)</name></author><published>2022-02-15T21:17:21+00:00</published><updated>2022-02-25T22:11:38+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>If you’ve ever experienced the pride of seeing your name on <a href="https://en.wikipedia.org/wiki/Special:Version/Credits" class="remarkup-link remarkup-link-ext" rel="noreferrer">MediaWiki&#039;s contributor list</a>, you&#039;ve been involved in our deployment process (whether you knew it or not).</p>

<p>The Wikimedia deployment process — 🚂🌈 The Train — <strong>pushed over 13,000 developer changes to production in 2021 </strong>. That&#039;s more than a change per hour for every single hour of the year—24 hours per day, seven days per week!</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/pveqatsonicr76wilj4b/PHID-FILE-g5lnykyelfntxdhzl52s/Trainbows_Not_Painbows1.svg.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_100"><img src="https://phab.wmfusercontent.org/file/data/pveqatsonicr76wilj4b/PHID-FILE-g5lnykyelfntxdhzl52s/Trainbows_Not_Painbows1.svg.png" height="351" width="640" loading="lazy" alt="Trainbows_Not_Painbows1.svg.png (351×640 px, 33 KB)" /></a></div></p>

<p>As you deploy more software to production, you may begin to wonder: is anything I&#039;ve been working on going to be deployed this week? What&#039;s the status of production? Where can I find data about any of this?</p>

<h3 class="remarkup-header">🤔 Current train info</h3>

<p>Bryan Davis (<a href="https://phabricator.wikimedia.org/p/bd808/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_109"><span class="phui-tag-core phui-tag-color-person">@bd808</span></a>) created the <a href="https://versions.toolforge.org/" class="remarkup-link remarkup-link-ext" rel="noreferrer">versions</a> toolforge tool in 2017. <strong>The versions tool is a dashboard showing the current status of Wikimedia&#039;s more than 900 wikis.</strong></p>

<p>Other places to find info about the current deployment:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Deployments" class="remarkup-link remarkup-link-ext" rel="noreferrer">Deployment calendar</a> – shows upcoming deployments happening this week and next</li>
<li class="remarkup-list-item"><a href="https://train-blockers.toolforge.org/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Current deployment task</a> – an indispensable tool created by <span class="phabricator-remarkup-mention-unknown">@Majavah</span> that always points to this week&#039;s train</li>
</ul>

<h3 class="remarkup-header">📈 Past train data</h3>

<p>There&#039;s an aphorism in management: you can&#039;t manage what you can&#039;t measure. For years the train chugged along steadily, but it&#039;s only recently that we&#039;ve begun to collect data on its chuggings.</p>

<p>The <a href="https://gitlab.wikimedia.org/thcipriani/train-stats#train-stats" class="remarkup-link remarkup-link-ext" rel="noreferrer">train stats</a> project started in early 2021 and contains train data going back to March 2016.</p>

<p><strong>Now we&#039;re able to talk about our deployments informed by the data</strong>. <a href="/tag/release-engineering-team/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_111"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_110" aria-hidden="true"></span>Release-Engineering-Team</span></a> partnered with <a href="/tag/research/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_113"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_112" aria-hidden="true"></span>Research</span></a> late last year to explore the data we have.</p>

<p>We&#039;re able to see metrics like <strong>Lead time</strong> and <strong>Cycle time</strong></p>

<blockquote><p>We measured product delivery lead time as the time it takes to go from code committed to code running in production.</p>

<p>– Accelerate (pg. 14, 15)</p></blockquote>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/6cygqwwjjmmadnd6xoif/PHID-FILE-oucgwtngqjwr7eguejmv/leadtime.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_101"><img src="https://phab.wmfusercontent.org/file/data/6cygqwwjjmmadnd6xoif/PHID-FILE-oucgwtngqjwr7eguejmv/leadtime.png" height="637" width="1315" loading="lazy" alt="leadtime.png (637×1 px, 162 KB)" /></a></div></p>

<p>Our lead time — the time to go from commit in mainline to production — is always less than a week. In the scatter plots above, we can see some evidence of work-life balance: not many patches land two days before deployment — that&#039;s the weekend!</p>

<blockquote><p>For the software delivery process, the most important global metric is cycle time. This is the time between deciding that a feature needs to be implemented and having that feature released to users.</p>

<p>– Continuous Delivery (pg 138)</p></blockquote>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/vulwpiopim4etq7flysa/PHID-FILE-jrag7zgur7jwcvfcr5gj/cycle-time.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_102"><img src="https://phab.wmfusercontent.org/file/data/vulwpiopim4etq7flysa/PHID-FILE-jrag7zgur7jwcvfcr5gj/cycle-time.png" height="637" width="1341" loading="lazy" alt="cycle-time.png (637×1 px, 48 KB)" /></a></div></p>

<p>Our cycle time — the time between a patch requesting code review and its deployment — varies. Some trains have massive outliers. In the chart above, for example, you can see one train that had a patch that was five years old!</p>

<p>It is now possible to see what we on Release Engineering had long suspected: the number of patches for each train has slowly been ticking up over time:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/abhlxdnmb75mcv2qlzwm/PHID-FILE-z7ujxfmuy7ucqgd2hay5/patches-per-train.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_103"><img src="https://phab.wmfusercontent.org/file/data/abhlxdnmb75mcv2qlzwm/PHID-FILE-z7ujxfmuy7ucqgd2hay5/patches-per-train.png" height="489" width="1003" loading="lazy" alt="patches-per-train.png (489×1 px, 78 KB)" /></a></div></p>

<p>Also shown above: as the number of patches continues to rise, the number of comments per patch — that is, code-review comments per patch — has dropped.</p>

<p>The data also show that the average number of lines of code per patch is slightly going up:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/ahu6c43snei26bt7gpgi/PHID-FILE-wdgot6lh6bhk3gz24qrd/loc-per-patch.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_104"><img src="https://phab.wmfusercontent.org/file/data/ahu6c43snei26bt7gpgi/PHID-FILE-wdgot6lh6bhk3gz24qrd/loc-per-patch.png" height="485" width="1017" loading="lazy" alt="loc-per-patch.png (485×1 px, 55 KB)" /></a></div></p>

<h3 class="remarkup-header">🔥 Train derailment</h3>

<p>The <tt class="remarkup-monospaced">train-stats</tt> repo has data on blockers and delays. Most trains have a small number of blockers and deploy without fanfare. Other trains are plagued by problems that explode into an endless number of blockers — cascading into a series of psychological torments, haunting deployers like the train-equivalent of ringwraiths. Trainwraiths, let’s say.</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/k5hd3ygwfdcfucxji7zx/PHID-FILE-bzy3z5vu5sbcegb5cpa6/trainwraith.jpg" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_105"><img src="https://phab.wmfusercontent.org/file/data/k5hd3ygwfdcfucxji7zx/PHID-FILE-bzy3z5vu5sbcegb5cpa6/trainwraith.jpg" height="332" width="600" loading="lazy" alt="trainwraith.jpg (332×600 px, 91 KB)" /></a></div></p>

<p>The shape of the histogram of this data shows that <strong>blockers per train follows a power law</strong> — most trains have a few blockers:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/x7n2cwmptvwq25ch7k5b/PHID-FILE-3etll6axwvnyl2ya26rv/blockers-per-train.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_106"><img src="https://phab.wmfusercontent.org/file/data/x7n2cwmptvwq25ch7k5b/PHID-FILE-3etll6axwvnyl2ya26rv/blockers-per-train.png" height="681" width="787" loading="lazy" alt="blockers-per-train.png (681×787 px, 15 KB)" /></a></div></p>

<p>Surprisingly, most of our blockers happen before we even start a train. Bugs from the previous week that we couldn&#039;t justify halting everything to fix, but need to be fixed before we lay down more code on top.</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/aleg3y2opppdzronxckg/PHID-FILE-hmkxr7ksabiebby35laq/blockers-by-group.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_107"><img src="https://phab.wmfusercontent.org/file/data/aleg3y2opppdzronxckg/PHID-FILE-hmkxr7ksabiebby35laq/blockers-by-group.png" height="426" width="1016" loading="lazy" alt="blockers-by-group.png (426×1 px, 17 KB)" /></a></div></p>

<p>The data also let us correlate train characteristics with failure signals. Here we see that the number of patches (“patches”) per train (trending ↑) positively correlates with blockers, and lines of code review (“loc_per_train_bug”) per patch (trending ↓) negatively correlates with blockers — <strong>more patches and less code review are both correlated with more blockers</strong>:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/5siy4wzv5jbrwucr2eze/PHID-FILE-woumahylmb6asdxx3ur5/correlation.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_108"><img src="https://phab.wmfusercontent.org/file/data/5siy4wzv5jbrwucr2eze/PHID-FILE-woumahylmb6asdxx3ur5/correlation.png" height="716" width="713" loading="lazy" alt="correlation.png (716×713 px, 82 KB)" /></a></div></p>

<p>Contrast this with Facebook&#039;s view of train risk. In a 2016 paper entitled &quot;<a href="https://research.facebook.com/publications/development-and-deployment-at-facebook/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Development and Deployment at Facebook</a>,&quot; Facebook&#039;s researchers documented how their Release Engineering team quantified deployment risk:</p>

<blockquote><p>Inputs affecting the amount of oversight exercised over new code are <strong>the size of the change and the amount of discussion about it during code reviews; higher levels for either of these indicate higher risk</strong>.<br />
– Development and Deployment at Facebook (emphasis added)</p></blockquote>

<p>In other words, to Facebook, more code, and more discussion about code, means riskier code. Our preliminary data seem to only partially support this: <strong>more code is riskier, but more discussion seems to lessen our risk</strong>.</p>

<h3 class="remarkup-header">🧭 Explore on your own</h3>

<p>This train data is open for anyone to explore. You can download the sqlite database that contains all train data from <a href="https://gitlab.wikimedia.org/thcipriani/train-stats/-/blob/main/data/train.db" class="remarkup-link remarkup-link-ext" rel="noreferrer">our gitlab repo</a>, or play with it live on our <a href="https://data.releng.team/train" class="remarkup-link remarkup-link-ext" rel="noreferrer">datasette install</a>.</p>

<p>There are a few Jupyter notebooks that explore the data:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://github.com/mirrys/release-engineering-data/blob/main/metrics_reng_data.ipynb" class="remarkup-link remarkup-link-ext" rel="noreferrer">Miriam Redi&#039;s excellent exploration</a> (where most of the charts in this post came from), and a <a href="https://docs.google.com/presentation/d/12K7k0LkgxK8ovPM0zLaK7Q9az7jZca0tAV91uwSIib8/edit" class="remarkup-link remarkup-link-ext" rel="noreferrer">presentation of her analysis</a>.</li>
<li class="remarkup-list-item"><a href="https://gitlab.wikimedia.org/thcipriani/train-stats/-/blob/main/README.ipynb" class="remarkup-link remarkup-link-ext" rel="noreferrer">My ham-fisted attempts at data science</a></li>
</ul>

<p>An audacious dream for the future of this data is to build a model to quantify exactly how risky a patchset is. We keep data on everything from bugs to rollbacks. Perhaps in future a model will help us roll out code faster and safer.</p>

<hr class="remarkup-hr" />

<p>Thanks to <a href="https://phabricator.wikimedia.org/p/Miriam/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_114"><span class="phui-tag-core phui-tag-color-person">@Miriam</span></a>, <a href="https://phabricator.wikimedia.org/p/bd808/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_115"><span class="phui-tag-core phui-tag-color-person">@bd808</span></a>, and <a href="https://phabricator.wikimedia.org/p/brennen/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_116"><span class="phui-tag-core phui-tag-color-person">@brennen</span></a> for reading early drafts of this post: it&#039;d be wronger without their input 💖.</p></div></content></entry><entry><title>Production Excellence #41: February 2022</title><link href="/phame/live/1/post/267/production_excellence_41_february_2022/" /><id>https://phabricator.wikimedia.org/phame/post/view/267/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2022-03-15T00:59:04+00:00</published><updated>2022-03-20T12:49:25+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h4 class="remarkup-header">Incidents</h4>

<p>3 documented incidents last month.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2022-02-01_ulsfo_network" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-02-01 ulsfo network</a><br />
Impact: For 3 minutes, clients served by the ulsfo POP were not able to contribute or display un-cached pages.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2022-02-22_wdqs_updater_codfw" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-02-22 wdqs updater codfw</a><br />
Impact: For 2 hours, WDQS updates failed to be processed. Most bots and tools were unable to edit Wikidata during this time.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2022-02-22_vrts" class="remarkup-link remarkup-link-ext" rel="noreferrer">2022-02-22 vrts</a><br />
Impact: For 12 hours, incoming emails to a specific recently created VRTS queue were not processed with senders receiving a bounce with an SMTP 550 Error.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/qtjizm7iwxlgl2itls6l/PHID-FILE-4a5sgpvplyxkxryrx26h/proderr-incidents_2022-02.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_117"><img src="https://phab.wmfusercontent.org/file/data/qtjizm7iwxlgl2itls6l/PHID-FILE-4a5sgpvplyxkxryrx26h/proderr-incidents_2022-02.png" height="300" alt="proderr-incidents 2022-02.png (800×1 px, 122 KB)" /></a></div></p>

<p>Figure from <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident graphs</a>.</p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Incident follow-up</h4>

<p>Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up work</a> in Phabricator, which are preventive measures and tech debt mitigations written down after an incident is concluded. Read about past incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech.</p>

<p>Recently conducted incident follow-up:</p>

<p><a href="https://phabricator.wikimedia.org/T222102" class="remarkup-link" rel="noreferrer">Create a dashboard for Prometheus metrics about health of Prometheus itself.</a><br />
Pitched by CDanis after an April 2019 incident, carried out by Filippo (<a href="https://phabricator.wikimedia.org/p/fgiunchedi/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_119"><span class="phui-tag-core phui-tag-color-person">@fgiunchedi</span></a>).</p>

<p><a href="https://phabricator.wikimedia.org/T200036" class="remarkup-link" rel="noreferrer">Improve wording around AbuseFilter messages about throttling functionality.</a><br />
Originally filed in 2018. This came up last month during an incident where the wording may&#039;ve led to a misunderstanding. Now resolved by <a href="https://phabricator.wikimedia.org/p/Daimona/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_120"><span class="phui-tag-core phui-tag-color-person">@Daimona</span></a>.</p>

<p><a href="https://phabricator.wikimedia.org/T290902" class="remarkup-link" rel="noreferrer">Exclude restart procedure from automated Elasticsearch provisioning.</a><br />
There can be too much automation! Filed after an incident last September. Fixed by <a href="https://phabricator.wikimedia.org/p/RKemper/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_121"><span class="phui-tag-core phui-tag-color-person">@RKemper</span></a>.</p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Outstanding errors</h4>

<p>Take a look at the workboard and look for tasks that could use your help.</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_4" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p>I skip breakdowns most months as each breakdown has its flaws. However, I hear people find them useful, so I&#039;ll try to do them from time to time with my noted caveats. The last breakdown was in <a href="https://phabricator.wikimedia.org/phame/post/view/265/production_excellence_39_december_2021/" class="remarkup-link" rel="noreferrer">the December edition</a>, which focussed on throughput during a typical month. Important to recognise is that neither high nor low throughput is per-se good or bad. It&#039;s good when issues are detected, reported, and triaged correctly. It&#039;s also good if a team&#039;s components are stable and don&#039;t produce any errors. A report may be found to be invalid or a duplicate, which is sometimes only determined a few weeks later.</p>

<p>The below &quot;after six months&quot; breakdown takes more of that into consideration by looking at what&#039;s on the table after six months (tasks upto Sept 2021). This may be considered &quot;fairer&quot; in some sense, although has the drawback of suffering from hindsight bias, and possibly not highlighting current or most urgent areas.</p>

<p>WMF Product:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Anti Harassment Tools (3): 1 MW Blocks, 2 SecurePoll.</li>
<li class="remarkup-list-item">Community Tech (0).</li>
<li class="remarkup-list-item">Design Systems (1): 1 WVUI.</li>
<li class="remarkup-list-item">Editing Team (15): 14 VisualEditor, 1 OOUI.</li>
<li class="remarkup-list-item">Growth Team (13): 11 Flow, 1 GrowthExperiments, 1 MW Recent changes.</li>
<li class="remarkup-list-item">Language Team (6): 4 ContentTranslation, 1 CX-server, 1 Translate extension.</li>
<li class="remarkup-list-item">Parsoid Team (9): 8 Parsoid, 1 ParserFunctions extension .</li>
<li class="remarkup-list-item">Product Infrastructure: 2 JsonConfig, 1 Kartographer, 1 WikimediaEvents.</li>
<li class="remarkup-list-item">Reading Web (0).</li>
<li class="remarkup-list-item">Structured Data (4): 2 MW Uploading, 1 WikibaseMediaInfo, 1 3D extension.</li>
</ul>

<p>WMF Tech:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Data Engineering: 1 EventLogging.</li>
<li class="remarkup-list-item">Fundraising Tech: 1 CentralNotice.</li>
<li class="remarkup-list-item">Performance: 1 Rdbms.</li>
<li class="remarkup-list-item">Platform MediaWiki Team (19): 4 MW-Page-data, 1 MW-REST-API, 1 MW-Action-API, 1 MW-Snapshots, 1 MW-ContentHandler, 1 MW-JobQueue, 1 MW-libs-RequestTimeout, 9 Other.</li>
<li class="remarkup-list-item">Search Platform: 1 MW-Seach.</li>
<li class="remarkup-list-item">SRE Service Operations: 1 Other.</li>
</ul>

<p>WMDE:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">WMDE-Wikidata (7): 5 Wikibase, 2 Lexeme.</li>
<li class="remarkup-list-item">WMDE-TechWish: 1 FileImporter.</li>
</ul>

<p>Other:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Missing steward (7): 2 Graph, 2 LiquidThreads, 2 TimedMediaHandler, 1 MW Special-Contributions-page.</li>
<li class="remarkup-list-item">Individually maintained (2): 1 WikimediaIncubator, 1 Score extension.</li>
</ul>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Trends</h4>

<p>In February, we reported <a href="https://phabricator.wikimedia.org/maniphest/query/1B79KZ8KkRj6/#R" class="remarkup-link" rel="noreferrer">25 new production errors</a>. Of those, 13 have since been resolved, and 12 remain open as of today (two weeks into the following month). We also resolved 22 errors that remained open from previous months. The overall workboard has grown slightly to a total of 301 outstanding error reports.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/jeeafywlhglcltdw5kar/PHID-FILE-p2zzjt55ae7fzqrlyvrg/proderr-unified_2022-02.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_118"><img src="https://phab.wmfusercontent.org/file/data/jeeafywlhglcltdw5kar/PHID-FILE-p2zzjt55ae7fzqrlyvrg/proderr-unified_2022-02.png" height="380" alt="proderr-unified 2022-02.png (1×1 px, 105 KB)" /></a></div></p>

<p>For the month-over-month numbers, refer to the <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Thanks!</h4>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p></div></content></entry><entry><title>Production Excellence #40: January 2022</title><link href="/phame/live/1/post/266/production_excellence_40_january_2022/" /><id>https://phabricator.wikimedia.org/phame/post/view/266/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2022-02-04T04:32:13+00:00</published><updated>2022-02-04T16:21:02+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h4 class="remarkup-header">Incidents</h4>

<p>There were no incidents this January. Pfew! Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up work</a> in Phabricator. These are preventive measures and tech debt mitigations written down after an incident is concluded. Read about past incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/onxjy5lsnoeqxrmc4alq/PHID-FILE-np2uhhmfqjrqcqnini2f/proderr-incidents_2022-01.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_122"><img src="https://phab.wmfusercontent.org/file/data/onxjy5lsnoeqxrmc4alq/PHID-FILE-np2uhhmfqjrqcqnini2f/proderr-incidents_2022-01.png" height="300" alt="proderr-incidents 2022-01.png (800×1 px, 166 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Trends <a name="trends"></a></h4>

<p>During 2021, I compared us to the median of 4 incidents per month, as measured over the two years prior (2019-2020).</p>

<p>I&#039;m glad to announce our median has lowered to 3 per month over the past two years (2020-2021). For more plots and numbers about our incident documentation, refer to <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident stats</a>.</p>

<p>Since the previous edition, we resolved 17 tasks from previous months. In January, there were <a href="https://phabricator.wikimedia.org/maniphest/query/f24Xwi0bGGZU/#R" class="remarkup-link" rel="noreferrer">45 new error reports</a> of which 28 have been resolved within the same month, the remaining 17 have carried over to February.</p>

<p>With precisely 17 tasks both closed and added, the workboard remains at the exact total of 298 open tasks, for the third month in a row. That&#039;s quite the coincidence.</p>

<p>Take a look at the workboard and look for tasks that could use your help.</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_5" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/nhkndseqrhipaq4x3hch/PHID-FILE-r3qtyo4cueh6n5luayxw/proderr-unified_2022-01.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_123"><img src="https://phab.wmfusercontent.org/file/data/nhkndseqrhipaq4x3hch/PHID-FILE-r3qtyo4cueh6n5luayxw/proderr-unified_2022-01.png" height="300" alt="Figure 1: Unresolved error reports by month." /></a></div></p>

<p>For the month-over-month numbers, refer to the <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h4 class="remarkup-header">Thanks!</h4>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head"><strong><a href="https://en.wikiquote.org/wiki/Back_to_the_Future_Part_II" class="remarkup-link remarkup-link-ext" rel="noreferrer">🎸 Doc Brown</a></strong>:</div>
<div class="remarkup-reply-body"><p>It could mean that that point in time contains some cosmic significance.., as if it were the temporal junction point of the entire space-time continuum… Or it could just be an amazing coincidence.</p></div>
</blockquote>

</div></content></entry><entry><title>Production Excellence #39: December 2021</title><link href="/phame/live/1/post/265/production_excellence_39_december_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/265/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2022-01-17T22:16:19+00:00</published><updated>2022-01-19T04:59:52+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">Incidents</h5>

<p>One documented incident last month:</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-12-03_mx" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-12-03 mx</a><br />
Impact: A portion of outgoing email from wikimedia.org was delivered with a delay of upto 24 hours. This affected staff Gmail, and Znuny/Phabricator notifications. No mail was lost, it was eventually delivered.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/riuomdseyt5qnoly3u2m/PHID-FILE-f7s2rgi35uuqrl2ewszk/proderr-incidents_2021-12.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_124"><img src="https://phab.wmfusercontent.org/file/data/riuomdseyt5qnoly3u2m/PHID-FILE-f7s2rgi35uuqrl2ewszk/proderr-incidents_2021-12.png" height="300" alt="proderr-incidents 2021-12.png (840×2 px, 154 KB)" /></a></div></p>

<p>Image from <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident graphs</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Incident follow-up</h5>

<p>Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up work</a> in Phabricator. These are preventive measures and tech debt mitigations written down after an incident. Read about past incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech.</p>

<p>Recently resolved incident follow-up:</p>

<p><a href="https://phabricator.wikimedia.org/T297144" class="remarkup-link" rel="noreferrer">Create paging alert for high MX queues</a>.<br />
Filed in December after the mail delivery incident, resolved later that month by Keith (Herron).</p>

<p><a href="https://phabricator.wikimedia.org/T297708" class="remarkup-link" rel="noreferrer">Limit db execution time of expensive MW special pages</a>.<br />
Filed in December after various incidents due to high DB/appserver load, carried out by Amir (Ladsgroup).</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> Trends</h5>

<p>In December we reported <a href="https://phabricator.wikimedia.org/maniphest/query/DhZaBJ5PI1NA/#R" class="remarkup-link" rel="noreferrer">22 new errors in December</a>, of which 5 have since been resolved, and 17 remain open and have carried over to January. From the 298 issues previously carried over, we also resolved 17, thus the workboard still adds up to 298 in total.</p>

<p>In previous editions, we sometimes looked at the breakdown of tasks that remained unresolved. This time, I&#039;d like to draw attention to the throughput and distribution of tasks that did get resolved.</p>

<p>Production errors resolved in the month of December, by team and component (<a href="https://phabricator.wikimedia.org/maniphest/query/vIEXYsei8lwE/#R" class="remarkup-link" rel="noreferrer">query</a>):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Community-Tech (2): GlobalPreferences (1), CodeMirror (1).</li>
<li class="remarkup-list-item">DBA: DjVuHandler (1).</li>
<li class="remarkup-list-item">Editing-team: DiscussionTools (1).</li>
<li class="remarkup-list-item">Fundraising Tech: CentralNotice (1).</li>
<li class="remarkup-list-item">Growth-Team (8): GrowthExperiments (6), Image-Suggestions (1), StructuredDiscussions (1).</li>
<li class="remarkup-list-item">Language-Team: UniversalLanguageSelector (1).</li>
<li class="remarkup-list-item">Parsoid (1).</li>
<li class="remarkup-list-item">Product-Infrastructure: TemplateStyles (1).</li>
<li class="remarkup-list-item">Readers-Web (2).</li>
<li class="remarkup-list-item">Structured-Data (2).</li>
<li class="remarkup-list-item">Wikidata team: Wikidata-Page-Banner (1).</li>
<li class="remarkup-list-item">Missing steward (1): MediaWiki-Logevents (<a href="https://phabricator.wikimedia.org/T289806" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_126"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T289806</span></span></a>: Thanks <a href="https://phabricator.wikimedia.org/p/Umherirrender/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_129"><span class="phui-tag-core phui-tag-color-person">@Umherirrender</span></a>!).</li>
</ul>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/zqtivfrjnvpohsd3knbe/PHID-FILE-liv2dyiwebt2fsnmieon/proderr-unified_2021-12.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_125"><img src="https://phab.wmfusercontent.org/file/data/zqtivfrjnvpohsd3knbe/PHID-FILE-liv2dyiwebt2fsnmieon/proderr-unified_2021-12.png" height="300" alt="Figure 1: Unresolved error reports by month." /></a></div></p>

<p>For the month-over-month numbers, refer to the <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Outstanding errors</h5>

<p>Take a look at the workboard and look for tasks that could use your help.<br />
<span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_6" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p>Oldest unresolved errors:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">(June 2020) WikibaseClient: RuntimeException in wblistentityusage API. <a href="https://phabricator.wikimedia.org/T254334" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_127"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T254334</span></span></a></li>
<li class="remarkup-list-item">(June 2020) WikibaseClient: Deadlock in EntityUsageTable::addUsages method. <a href="https://phabricator.wikimedia.org/T255706" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_128"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T255706</span></span></a></li>
</ul>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 Did you know:</div>
<div class="remarkup-reply-body"><p>To find your team&#039;s error reports, use the appropriate <strong>&quot;Filter&quot; link in the sidebar</strong> of <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">the workboard</a>.</p></div>
</blockquote></div></content></entry><entry><title>Production Excellence #38: November 2021</title><link href="/phame/live/1/post/261/production_excellence_38_november_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/261/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-12-12T01:34:21+00:00</published><updated>2021-12-13T21:48:59+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">Incidents</h5>

<p>6 documented incidents last month. That&#039;s above the two-year and five-year median of 4 per month (per <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident graphs</a>).</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-04_large_file_upload_timeouts" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-11-04 large file upload timeouts</a><br />
Impact: For 9 months, editors were unable to upload large files (e.g. to Commons). Editors would receive generic error messages, typically after a timeout. In retrospect, a dozen different distinct production errors had been reported and regularly observed that were related and provided different clues, however most of these remained untriaged and uninvestigated for months. This may be related to the affected components having no active code steward.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-05_TOC_language_converter" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-11-05 TOC language converter</a><br />
Impact: For 6 hours, wikis experienced a blank or missing table of contents on many pages. For up to 3 days prior, wikis that have multiple language variants (such as Chinese Wikipedia) displayed the table of contents in an incorrect or inconsistent language variant (which are not understandable to some readers).</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-10_cirrussearch_commonsfile_outage" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-11-10 cirrussearch commonsfile outage</a><br />
Impact: For ~2.5 hours, the Search results page was unavailable on many wikis (except English Wikipedia). On Wikimedia Commons the search suggestions feature was unresponsive as well.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-18_codfw_ipv6_network" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-11-18 codfw ipv6 network</a><br />
Impact: For 8 minutes, the Codfw cluster experienced partial loss of IPv6 connectivity for upload.wikimedia.org. This did not affect availability of the service because the &quot;<a href="https://en.wikipedia.org/wiki/Happy_Eyeballs" class="remarkup-link remarkup-link-ext" rel="noreferrer">Happy Eyeballs</a>&quot; algorithm ensures browsers (and other clients) automatically fallback to IPv4. The Codfw cluster generally serves Mexico and parts of the US and Canada. The upload.wikimedia.org service serves photos and other media/document files, such as displayed in Wikipedia articles.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-23_Core_Network_Routing" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-11-23 core network routing</a><br />
Impact: For about 12 minutes, Eqiad was unable to reach hosts in other data centers via public IP addresses. This was due to a BGP routing error. There was no impact on end-user traffic, and impact on internal traffic was limited (only Icinga alerts themselves) because internal traffic generally uses local IP subnets which we currently route with OSPF instead of BGP.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-25_eventgate-main_outage" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-11-25 eventgate-main outage</a><br />
Impact: For about 3 minutes, eventgate-main was down. This resulted in 25,000 MediaWiki backend errors due to inability to queue new jobs. About 1000 user-facing web requests failed (HTTP 500 Error). Event production briefly dropped from ~3000 per second to 0 per second.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/izk6vaswxmnkyplhgds3/PHID-FILE-jtefqtmdrr7klnbsnsot/proderr-incidents_2021-11.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_130"><img src="https://phab.wmfusercontent.org/file/data/izk6vaswxmnkyplhgds3/PHID-FILE-jtefqtmdrr7klnbsnsot/proderr-incidents_2021-11.png" height="300" alt="proderr-incidents 2021-11.png (800×2 px, 172 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Incident follow-up</h5>

<p>Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up work</a> in Phabricator, which are preventive measures and tech debt mitigations written down after an incident is concluded. Read more about past incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech.</p>

<p>Recently resolved incident follow-up:</p>

<p><a href="https://phabricator.wikimedia.org/T287916" class="remarkup-link" rel="noreferrer">Disable DPL on wikis that aren&#039;t using it</a>.<br />
Filed after a July 2021 incident, done by Amir (Ladsgroup) and Kunal (Legoktm).</p>

<p><a href="https://phabricator.wikimedia.org/T291352" class="remarkup-link" rel="noreferrer">Create easy access to MySQL ports for faster incident response and maintenance.</a><br />
Filed in Sep 2021, and carried out by Stevie (Kormat).</p>

<p><a href="https://phabricator.wikimedia.org/T233684" class="remarkup-link" rel="noreferrer">Create paging alert for primary DB hosts.</a><br />
Filed after a Sept 2019 incident, done by Stevie (Kormat).</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> Trends</h5>

<p>November saw 27 new production error reports of which 14 were resolved, and 13 remain open and carry over to the next month.</p>

<p>Of the 301 errors still open from previous months, 16 were resolved. Together with the 13 carried over from November that brings the workboard to 298 unresolved tasks.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/s5ofg4xu5e7upncd3atp/PHID-FILE-hwzijny57227y2cqqlhb/proderr-unified_2021-11.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_131"><img src="https://phab.wmfusercontent.org/file/data/s5ofg4xu5e7upncd3atp/PHID-FILE-hwzijny57227y2cqqlhb/proderr-unified_2021-11.png" height="389" alt="Unresolved error reports by month." /></a></div></p>

<p>For the month-over-month numbers, refer to the <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Outstanding errors</h5>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 Did you know:</div>
<div class="remarkup-reply-body"><p>To find your team&#039;s error reports, use the appropriate <strong>&quot;Filter&quot; link in the sidebar</strong> of the <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">workboard</a>.</p></div>
</blockquote>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_7" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p>Issues carried over from recent months:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>Apr 2021</td><td>9 of 42 issues left.</td></tr>
<tr><td>May 2021</td><td>16 of 54 issues left.</td></tr>
<tr><td>Jun 2021</td><td>9 of 26 issues left.</td></tr>
<tr><td>Jul 2021</td><td>11 of 31 issues left.</td></tr>
<tr><td>Aug 2021</td><td>10 of 46 issues left.</td></tr>
<tr><td>Sep 2021</td><td>10 of 24 issues left.</td></tr>
<tr><td>Oct 2021</td><td>20 of 49 issues left.</td></tr>
<tr><td><strong>Nov 2021</strong></td><td>13 of <a href="https://phabricator.wikimedia.org/maniphest/query/0W0Nuk9umBDc/#R" class="remarkup-link" rel="noreferrer">27 new issues</a> are carried forward.</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p></div></content></entry><entry><title>Production Excellence #37: October 2021</title><link href="/phame/live/1/post/260/production_excellence_37_october_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/260/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-11-05T02:05:31+00:00</published><updated>2021-11-05T02:05:31+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">Incidents</h5>

<p>There were 4 documented incidents last month. This is currently on average, compared to the past five years (per <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident graphs</a>).</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-10-08_network_provider" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-10-08 network provider</a><br />
Impact: For upto an hour, some regions experienced a partial connectivity outage. This primarily affected the US East Coast for ~13 minutes, and Russia for 1 hour. It was caused by a routing problem with one of several redundant network providers.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-10-22_eqiad_return_path_timeouts" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-10-22 eqiad networking</a><br />
Impact: For ~40 minutes clients that are normally geographically routed to Eqiad experienced connection or timeout errors. We lost about 7K req/s during this time. After initial recovery,  Eqiad was ready and repooled in ~10 minutes.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-10-25_s3_db_recentchanges_replica" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-10-25 s3 db replica</a><br />
Impact: For ~30min MediaWiki backends were slower than usual. For ~12 hours, many wiki replicas were stale for Wikimedia Cloud Services such as Toolforge.</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-10-29_graphite" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-10-29 graphite</a><br />
Impact: During a server upgrade, historical data was lost for a subset of Graphite metrics. Some were recovered via the redundant server, but others were lost as the redundant was also upgraded since then and lost some in a similar fashion.</p>

<p>Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up work</a> in Phabricator, which are preventive measures and tech debt mitigations written down after an incident is concluded. Read about past incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/uyp6oakdvmkg6667nd4r/PHID-FILE-ctay4wcioihgoll6gatq/proderr-incidents_2021-10.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_132"><img src="https://phab.wmfusercontent.org/file/data/uyp6oakdvmkg6667nd4r/PHID-FILE-ctay4wcioihgoll6gatq/proderr-incidents_2021-10.png" height="300" alt="proderr-incidents 2021-10.png (840×2 px, 182 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> Trends</h5>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head"><strong>Norwegian blue</strong> 🐦</div>
<div class="remarkup-reply-body"><p>298 bugs were up on the board.<br />
We solved 20 of those over the past thirty days.</p>

<p>How many might now be left unexplored?<br />
We also added new bugs to our database.</p>

<p>Half those bugs are pining for their fjord.<br />
The other 23 carry on, with their dossiers.</p>

<p>All in all, 301 bugs up on the board.</p></div>
</blockquote>

<p>In October, <a href="https://phabricator.wikimedia.org/maniphest/query/3A8rqYpefUFF/#R" class="remarkup-link" rel="noreferrer">49 new tasks</a> were reported as production errors. Of these, we resolved 26, and 23 remain unresolved and carry forward to the next month.</p>

<p>Previously, the production error workboard held an accumulated total of 298 still-open error reports. We resolved 20 of those. Together with the 23 new errors carried over from October, this brings us to 301 unresolved errors on the board.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/55a76uy6qw65p6z6ma22/PHID-FILE-k75ud2zbhlvdytb2ye6e/proderr-unified_2021-10.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_133"><img src="https://phab.wmfusercontent.org/file/data/55a76uy6qw65p6z6ma22/PHID-FILE-k75ud2zbhlvdytb2ye6e/proderr-unified_2021-10.png" height="400" alt="Figure 1: Unresolved error reports by month." /></a></div></p>

<p>For the month-over-month numbers, refer to the <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Outstanding errors</h5>

<p>Take a look at the workboard and look for tasks that could use your help.</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_8" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p>Issues carried over from recent months:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>Apr 2021</td><td>9 of 42 issues left.</td></tr>
<tr><td>May 2021</td><td>16 of 54 issues left.</td></tr>
<tr><td>Jun 2021</td><td>9 of 26 issues left.</td></tr>
<tr><td>Jul 2021</td><td>12 of 31 issues left.</td></tr>
<tr><td>Aug 2021</td><td>12 of 46 issues left.</td></tr>
<tr><td>Sep 2021</td><td>11 of 24 issues left.</td></tr>
<tr><td>Oct 2021</td><td>23 of <a href="https://phabricator.wikimedia.org/maniphest/query/3A8rqYpefUFF/#R" class="remarkup-link" rel="noreferrer">49 new issues</a> are carried forward.</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p></div></content></entry><entry><title>Production Excellence #36: September 2021</title><link href="/phame/live/1/post/259/production_excellence_36_september_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/259/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-10-21T23:31:26+00:00</published><updated>2024-09-17T02:48:39+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">Incidents</h5>

<p>We&#039;ve had quite an eventful month, with 8 documented incidents in September. That&#039;s the highest since last year (Feb 2020) and one of the three worst months of the last five years.</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-01_partial_parsoid_outage" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-09-01 partial Parsoid outage</a><ul class="remarkup-list">
<li class="remarkup-list-item">Impact: For 9 hours, 10% of Parsoid requests to parse/save pages were failing on all wikis. Little to no end-user impact apart from delays during RESTBase retries.</li>
</ul></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-04_appserver_latency" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-09-04 appserver latency</a><ul class="remarkup-list">
<li class="remarkup-list-item">Impact: For 37 minutes, MW backends were slow with 2% of requests receiving errors. This affected all wikis through logged-in users, bots/API queries, and some page views from unregistered users (e.g. pages that were recently edited or expired from CDN cache).</li>
</ul></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-06_Wikifeeds" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-09-06 Wikifeeds</a><ul class="remarkup-list">
<li class="remarkup-list-item">Impact: For 3 days, the Wikifeeds API failed ~1% of requests (e.g. 5 of 500 req/s).</li>
</ul></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-12-Esams-upload" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-09-12 Esams upload</a><ul class="remarkup-list">
<li class="remarkup-list-item">Impact: For 20 minutes, images were unavailable for people in Europe, affecting all wikis.</li>
</ul></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-13_cirrussearch_restart" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-09-13 CirrusSearch restart</a><ul class="remarkup-list">
<li class="remarkup-list-item">Impact: For ~2 hours, search was unavailable on Wikipedia from all regions. Search suggestions were missing or slow, and the search results page errored with  &quot;Try again later&quot;.</li>
</ul></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-18_appserver_latency" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-09-18 appserver latency</a><ul class="remarkup-list">
<li class="remarkup-list-item">Impact: For ~10 minutes, MW backends were slow or unavailable for all wikis.</li>
</ul></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-26_appserver_latency" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-09-26 appserver latency</a><ul class="remarkup-list">
<li class="remarkup-list-item">Impact: For ~15 minutes, MW backends were slow or unavailable for all wikis.</li>
</ul></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-29_eqiad-kubernetes" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-09-29 eqiad kubernetes</a><ul class="remarkup-list">
<li class="remarkup-list-item">Impact: For 2 minutes, MW backends were affected by a Kubernetes issue (via Kask sessionstore). 1500 edit attempts failed (8% of POSTs), and logged-in pageviews were slowed down, often taking several seconds.</li>
</ul></li>
</ul>

<p>Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up work</a> in Phabricator, which are preventive measures and tech debt mitigations written down after an incident is concluded.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/ggbmdna3pnlmuamch7kp/PHID-FILE-rekdrsz3zeh4p6k5aaqy/proderr-incidents_2021-09.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_134"><img src="https://phab.wmfusercontent.org/file/data/ggbmdna3pnlmuamch7kp/PHID-FILE-rekdrsz3zeh4p6k5aaqy/proderr-incidents_2021-09.png" height="300" alt="proderr-incidents 2021-09.png (830×1 px, 170 KB)" /></a></div></p>

<p>Image from <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident graphs</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> Trends</h5>

<p>The month of September saw 24 new production error reports of which 11 have since been resolved, and today, three to six weeks later, 13 remain open and have thus carried over to the next month. This is about average, although it makes it no less sad that we continue to introduce (and carry over) more errors than we rectify in the same time frame.</p>

<p>On the other hand, last month we did have a healthy focus on some of the older reports. The workboard stood at 301 unresolved errors last month. Of those, 16 were resolved. With the 13 new errors from September, this reduces the total slightly, to 298 open tasks.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/4b455senzns4dbjjbo6q/PHID-FILE-iemr5htdi34vpze4ingd/proderr-unified_2021-09.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_135"><img src="https://phab.wmfusercontent.org/file/data/4b455senzns4dbjjbo6q/PHID-FILE-iemr5htdi34vpze4ingd/proderr-unified_2021-09.png" height="400" alt="Unresolved error reports, stacked by month." /></a></div></p>

<p>For the month-over-month numbers, refer to the <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Did you know</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">💡 The default <strong>&quot;system error&quot; page now includes a request ID</strong>. <a href="https://phabricator.wikimedia.org/T291192" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_136"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T291192</span></span></a></li>
</ul>

<ul class="remarkup-list">
<li class="remarkup-list-item">💡 To zoom in and find your team&#039;s error reports, <strong>use the appropriate &quot;Filter&quot; link in the sidebar</strong> of the <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">workboard</a>.</li>
</ul>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">Outstanding errors</h5>

<p>Take a look at the workboard and look for tasks that could use your help.</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_0" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p>Summary over recent months:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>Jan 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/gn7TOpf2LdVE/#R" class="remarkup-link" rel="noreferrer">50 issues</a>)</td><td>3 left. <em>Unchanged.</em></td></tr>
<tr><td>Feb 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/xQxnXZys4q97/#R" class="remarkup-link" rel="noreferrer">20 issues</a>)</td><td>5 &gt; 4 left.</td></tr>
<tr><td>Mar 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/To0edISjsA9s/#R" class="remarkup-link" rel="noreferrer">48 issues</a>)</td><td>10 &gt; 9 left.</td></tr>
<tr><td>Apr 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/ORxSVxnJBlLc/#R" class="remarkup-link" rel="noreferrer">42 issues</a>)</td><td>17 &gt; 10 left.</td></tr>
<tr><td>May 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/9y.PWGoGgWbK/#R" class="remarkup-link" rel="noreferrer">54 issues</a>)</td><td>20 &gt; 17 left.</td></tr>
<tr><td>Jun 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/DlpqBkLj0aP4/#R" class="remarkup-link" rel="noreferrer">26 issues</a>)</td><td>10 &gt; 9 left.</td></tr>
<tr><td>Jul 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/qQAV178rYaJ_/#R" class="remarkup-link" rel="noreferrer">31 issues</a>)</td><td>12 left. <em>Unchanged.</em></td></tr>
<tr><td>Aug 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/i1wawBd5GKVY/#R" class="remarkup-link" rel="noreferrer">46 issues</a>)</td><td>17 &gt; 12 left.</td></tr>
<tr><td>Sep 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/BA8dqsGwaE_a/#R" class="remarkup-link" rel="noreferrer">24 issues</a>)</td><td>13 unresolved issues remaining.</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Tally</th><th></th></tr>
<tr><td>301</td><td>issues open, as of <a href="https://phabricator.wikimedia.org/phame/post/view/248/production_excellence_35_august_2021/" class="remarkup-link" rel="noreferrer">Excellence #35 (August 2021)</a></td></tr>
<tr><td>-16</td><td>issues closed, of the previous 301 open issues.</td></tr>
<tr><td>+13</td><td>new issues that survived September 2021.</td></tr>
<tr><td>298</td><td>issues open, as of today (19 Oct 2021).</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p></div></content></entry><entry><title>Benchmarking MediaWiki with PHPBench</title><link href="/phame/live/1/post/257/benchmarking_mediawiki_with_phpbench/" /><id>https://phabricator.wikimedia.org/phame/post/view/257/</id><author><name>kostajh (Kosta Harlan)</name></author><published>2021-10-28T13:54:06+00:00</published><updated>2022-03-30T12:14:03+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This post gives a quick introduction to a benchmarking tool, <a href="https://github.com/phpbench/phpbench" class="remarkup-link remarkup-link-ext" rel="noreferrer">phpbench</a>, ready for you to experiment with in core and skins/extensions.[1]</p>

<h3 class="remarkup-header">What is phpbench?</h3>

<p>From their <a href="https://phpbench.readthedocs.io/en/latest/" class="remarkup-link remarkup-link-ext" rel="noreferrer">documentation</a>:</p>

<blockquote><p>PHPBench is a benchmark runner for PHP analagous to PHPUnit but for performance rather than correctness.</p></blockquote>

<p>In other words, while a PHPUnit test will tell you if your code behaves a certain way given a certain set of inputs, a PHPBench benchmark only cares how long that same piece of code takes to execute.</p>

<p>The tooling and boilerplate will be familiar to you if you&#039;ve used PHPUnit. There&#039;s a command-line runner at <tt class="remarkup-monospaced">vendor/bin/phpbench</tt>, benchmarks are discoverable by default in <tt class="remarkup-monospaced">tests/Benchmark</tt>, a configuration file (<tt class="remarkup-monospaced">benchmark.json</tt>) allows for setting defaults across all benchmarks, and the benchmark tests classes and tests look pretty similar to PHPUnit tests.</p>

<p>Here&#039;s an example test for the <tt class="remarkup-monospaced">Html::openElement()</tt> function:</p>

<div class="remarkup-code-block" data-code-lang="php" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="k">namespace</span> <span class="no">MediaWiki\Tests\Benchmark</span><span class="o">;</span>

<span class="k">class</span> <span class="no">HtmlBench</span> <span class="o">{</span>

	<span class="sd">/**</span>
<span class="sd">	 * @Assert(&quot;mode(variant.time.avg) &lt; 85 microseconds +/- 10%&quot;)</span>
<span class="sd">	 */</span>
	<span class="k">public</span> <span class="k">function</span> <span class="no">benchHtmlOpenElement</span><span class="o">()</span> <span class="o">{</span>
		<span class="nc" data-symbol-name="\Html">\Html</span><span class="o">::</span><span class="nf" data-symbol-context="\Html" data-symbol-name="openElement">openElement</span><span class="o">(</span> <span class="s1">&#039;a&#039;</span><span class="o">,</span> <span class="o">[</span> <span class="s1">&#039;class&#039;</span> <span class="o">=&gt;</span> <span class="s1">&#039;foo&#039;</span> <span class="o">]</span> <span class="o">);</span>
	<span class="o">}</span>
<span class="o">}</span></pre></div>

<p>So, taking it line by line:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><tt class="remarkup-monospaced">class HtmlBench</tt> (placed in <tt class="remarkup-monospaced">tests/Benchmark/includes/HtmlBench.php</tt>) – the class where you can define the benchmarks for methods in a class. It would make sense to create a single benchmark class for a single class under test, just like with PHPUnit.</li>
<li class="remarkup-list-item"><tt class="remarkup-monospaced">public function benchHtmlOpenElement() {}</tt> – method names that begin with <tt class="remarkup-monospaced">bench</tt> will be executed by <tt class="remarkup-monospaced">phpbench</tt>; other methods can be used for set-up / teardown work. The contents of the method are benchmarked, so any set-up / teardown work should be done elsewhere.</li>
<li class="remarkup-list-item"><tt class="remarkup-monospaced">@Assert(&quot;mode(variant.time.avg) &lt; 85 microseconds +/- 10%&quot;)</tt> – we define a <a href="https://phpbench.readthedocs.io/en/latest/guides/assertions.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">phpbench assertion</a> that the average execution time will be less than 85 microseconds, with a tolerance of +/- 10%.</li>
</ul>

<p>If we run the test with <tt class="remarkup-monospaced">composer phpbench</tt>, we will see that the test passes. One thing to be careful with, though, is adding assertions that are too strict – you would not want a patch to fail CI because the assertion for execution was not flexible enough (more on this later on).</p>

<h4 class="remarkup-header">Measuring performance while developing</h4>

<p>One neat feature in PHPBench is the ability to tag current results and compare with another run. Looking at the <tt class="remarkup-monospaced">HTMLBench</tt> benchmark test from above, for example, we can compare the work done in <a href="/rMW5deb6a2a4546318d1fa94ad8c3fa54e9eb8fc67c" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_139"><span class="phui-tag-core phui-tag-color-object">rMW5deb6a2a4546: Html::openElement() micro-optimisations</span></a> to get before and after comparisons of the performance changes.</p>

<p>Here&#039;s a benchmark of <a href="https://phabricator.wikimedia.org/rMWe82c5e52d50a9afd67045f984dc3fb84e2daef44" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_141"><span class="phui-tag-core phui-tag-color-object">e82c5e52d50a9afd67045f984dc3fb84e2daef44</span></a>, the commit before the performance improvements added to <tt class="remarkup-monospaced">Html::openElement()</tt> in <a href="/rMW5deb6a2a4546318d1fa94ad8c3fa54e9eb8fc67c" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_140"><span class="phui-tag-core phui-tag-color-object">rMW5deb6a2a4546: Html::openElement() micro-optimisations</span></a></p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">❯ git checkout -b html-before-optimizations e82c5e52d50a9afd67045f984dc3fb84e2daef44 # get the old HTML::openElement code before optimizations
❯ git review -x 727429 # get the core patch which introduces phpbench support
❯ composer phpbench -- tests/Benchmark/includes/HtmlBench.php --tag=original</pre></div>

<p>And the output [2]:<br />
<div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/ylqovhvmxtmx3l7c74tw/PHID-FILE-5auaryyctiuxhzdayvwz/image.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_137"><img src="https://phab.wmfusercontent.org/file/data/ylqovhvmxtmx3l7c74tw/PHID-FILE-5auaryyctiuxhzdayvwz/image.png" height="842" width="2502" loading="lazy" alt="image.png (842×2 px, 176 KB)" /></a></div></p>

<p>Note that we&#039;ve used <tt class="remarkup-monospaced">--tag=original</tt> to store the results. Now we can check out the newer code, and use <tt class="remarkup-monospaced">--ref=original</tt> to compare with the baseline:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">❯ git checkout -b html-after-optimizations 5deb6a2a4546318d1fa94ad8c3fa54e9eb8fc67c # get the new HTML::openElement code with optimizations
❯ git review -x 727429 # get the core patch which introduces phpbench support
❯ composer phpbench -- tests/Benchmark/includes/HtmlBench.php --ref=original --report=aggregate</pre></div>

<p>And the output [3]:<br />
<div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/wsbrunpgkdxn3j2wkg5a/PHID-FILE-l4fgxstqfmcwmgtwx3ks/image.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_138"><img src="https://phab.wmfusercontent.org/file/data/wsbrunpgkdxn3j2wkg5a/PHID-FILE-l4fgxstqfmcwmgtwx3ks/image.png" height="798" width="2502" loading="lazy" alt="image.png (798×2 px, 177 KB)" /></a></div></p>

<p>We can see that the execution time roughly halved, from 18 microseconds to 8 microseconds. (For understanding the other columns in the report, it&#039;s best to read through the <a href="https://phpbench.readthedocs.io/en/latest/quick-start.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">Quick Start guide for phpbench</a>.) PHPBench can also provide an error exit code if the performance decreased. One way that PHPBench might fit into our testing stack would be to have a job similar to <a href="https://wikitech.wikimedia.org/wiki/Performance/Fresnel" class="remarkup-link remarkup-link-ext" rel="noreferrer">Fresnel</a>, where a non-voting comment on a patch alerts developers whether the PHPBench performance decreased in the patch.</p>

<h3 class="remarkup-header">Testing with extensions</h3>

<p>A slightly more complex example is available in GrowthExperiments (<a href="https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/722594" class="remarkup-link remarkup-link-ext" rel="noreferrer">patch</a>). That patch makes use of setUp/tearDown methods to prepopulate the database entries needed for the code being benchmarked:</p>

<div class="remarkup-code-block" data-code-lang="php" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="sd">/**</span>
<span class="sd"> * @BeforeMethods (&quot;setUpLinkRecommendation&quot;)</span>
<span class="sd"> * @AfterMethods (&quot;tearDownLinkRecommendation&quot;)</span>
<span class="sd"> * @Assert(&quot;mode(variant.time.avg) &lt; 20000 microseconds +/- 10%&quot;)</span>
<span class="sd"> */</span>
<span class="k">public</span> <span class="k">function</span> <span class="no">benchFilter</span><span class="o">()</span> <span class="o">{</span>
	<span class="nv">$this</span><span class="o">-&gt;</span><span class="na" data-symbol-name="linkRecommendationFilter">linkRecommendationFilter</span><span class="o">-&gt;</span><span class="na" data-symbol-name="filter">filter</span><span class="o">(</span> <span class="nv">$this</span><span class="o">-&gt;</span><span class="na" data-symbol-name="tasks">tasks</span> <span class="o">);</span>
<span class="o">}</span></pre></div>

<p>The <tt class="remarkup-monospaced">setUpLinkRecommendation</tt> and <tt class="remarkup-monospaced">tearDownLinkRecommendation</tt> methods have access to MediaWikiServices, and generally you can do similar things you&#039;d do in an integration test to setup and teardown the environment. This test is towards the opposite end of the spectrum from the core test discussed above which looks at <tt class="remarkup-monospaced">Html::openElement()</tt>; here, the goal is to look at a higher level function that involves database queries and interacting with MediaWiki services.</p>

<h3 class="remarkup-header">What&#039;s next</h3>

<p>You can experiment with the tooling and see if it is useful to you. Some open questions:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">do we want to use <tt class="remarkup-monospaced">phpbench</tt>? or are the scripts in <tt class="remarkup-monospaced">maintenance/benchmarks</tt> already sufficient for our benchmarking needs?</li>
<li class="remarkup-list-item">we already have a benchmarking tools in <tt class="remarkup-monospaced">maintenance/benchmarks</tt> that extend a Benchmarker class; would it make sense to convert these to use <tt class="remarkup-monospaced">phpbench</tt>?</li>
<li class="remarkup-list-item">what are sensible defaults for &quot;revs&quot; and &quot;iterations&quot; as well as retry thresholds?</li>
<li class="remarkup-list-item">do we want to run <tt class="remarkup-monospaced">phpbench</tt> assertions in CI?<ul class="remarkup-list">
<li class="remarkup-list-item">if yes, do we want assertions about using absolute times (e.g. &quot;this function should take less than 20 ms&quot;) or relative assertions (&quot;patch code is within 10% +/- of old code)</li>
<li class="remarkup-list-item">if yes, do we want to aggregate reports over time, so we can see trends for the code we benchmark?</li>
<li class="remarkup-list-item">should we disable <tt class="remarkup-monospaced">phpbench</tt> as part of the standard set of tests run by <a href="https://doc.wikimedia.org/quibble/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Quibble</a>, and only have it run as a non-voting job like Fresnel?</li>
</ul></li>
</ul>

<p>Looking forward to your feedback! [4]</p>

<hr class="remarkup-hr" />

<p>[1] thank you, <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_142"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a>, for working with me to include this in Quibble and roll out to CI to help with evaluation!</p>

<p>[2]</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">&gt; phpbench run --config=tests/Benchmark/phpbench.json --report=aggregate &#039;tests/Benchmark/includes/HtmlBench.php&#039; &#039;--tag=original&#039;
PHPBench (1.1.2) running benchmarks...
with configuration file: /Users/kostajh/src/mediawiki/w/tests/Benchmark/phpbench.json
with PHP version 7.4.24, xdebug ✔, opcache ❌

\MediaWiki\Tests\Benchmark\HtmlBench

    benchHtmlOpenElement....................R1 I1 ✔ Mo18.514μs (±1.94%)

Subjects: 1, Assertions: 1, Failures: 0, Errors: 0
Storing results ... OK
Run: 1346543289c75373e513cc3b11fbf5215d8fb6d0
+-----------+----------------------+-----+------+-----+----------+----------+--------+
| benchmark | subject              | set | revs | its | mem_peak | mode     | rstdev |
+-----------+----------------------+-----+------+-----+----------+----------+--------+
| HtmlBench | benchHtmlOpenElement |     | 50   | 5   | 2.782mb  | 18.514μs | ±1.94% |
+-----------+----------------------+-----+------+-----+----------+----------+--------+</pre></div>

<p>[3]</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">&gt; phpbench run --config=tests/Benchmark/phpbench.json --report=aggregate &#039;tests/Benchmark/includes/HtmlBench.php&#039; &#039;--ref=original&#039; &#039;--report=aggregate&#039;
PHPBench (1.1.2) running benchmarks...
with configuration file: /Users/kostajh/src/mediawiki/w/tests/Benchmark/phpbench.json
with PHP version 7.4.24, xdebug ✔, opcache ❌
comparing [actual vs. original]

\MediaWiki\Tests\Benchmark\HtmlBench

    benchHtmlOpenElement....................R5 I4 ✔ [Mo8.194μs vs. Mo18.514μs] -55.74% (±0.50%)

Subjects: 1, Assertions: 1, Failures: 0, Errors: 0
+-----------+----------------------+-----+------+-----+---------------+-----------------+----------------+
| benchmark | subject              | set | revs | its | mem_peak      | mode            | rstdev         |
+-----------+----------------------+-----+------+-----+---------------+-----------------+----------------+
| HtmlBench | benchHtmlOpenElement |     | 50   | 5   | 2.782mb 0.00% | 8.194μs -55.74% | ±0.50% -74.03% |
+-----------+----------------------+-----+------+-----+---------------+-----------------+----------------+</pre></div>

<p>[4] Thanks to <a href="https://phabricator.wikimedia.org/p/zeljkofilipin/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_143"><span class="phui-tag-core phui-tag-color-person">@zeljkofilipin</span></a> for reviewing a draft of this post.</p></div></content></entry><entry><title>How we deploy code</title><link href="/phame/live/1/post/253/how_we_deploy_code/" /><id>https://phabricator.wikimedia.org/phame/post/view/253/</id><author><name>thcipriani (Tyler Cipriani)</name></author><published>2021-09-27T18:44:05+00:00</published><updated>2022-04-13T01:13:23+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p><div class="phabricator-remarkup-embed-layout-right phabricator-remarkup-embed-float-right"><a href="https://phab.wmfusercontent.org/file/data/vg2ma4dkfuk5wf3rgcqs/PHID-FILE-7wv4rremjceomzmysyei/I-bork-the-wikis-merit-badge.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_144"><img src="https://phab.wmfusercontent.org/file/data/tye7a3o5zvy5bq6x2fxn/PHID-FILE-mb7ed2qeyzxoaq5ksyh5/preview-I-bork-the-wikis-merit-badge.png" width="191.39013452915" height="220" alt="I broke Wikipedia and then I fixed it badge" /></a></div></p>

<p>Last week I spoke to a few of my Wikimedia Foundation (WMF) colleagues about how we deploy code—I completely botched it. I got too complex too fast. It only hit me later—to explain deployments, I need to start with a lie.</p>

<p><a href="https://sci-hub.se/10.1080/02564602.2014.891387" class="remarkup-link remarkup-link-ext" rel="noreferrer">M. Jagadesh Kumar</a> explains:</p>

<blockquote><p>Every day, I am faced with the dilemma of explaining some complex phenomena [...] To realize my goal, I tell &quot;lies to students.&quot;</p></blockquote>

<p>This idea comes from Terry Pratchett&#039;s &quot;<a href="https://en.wikipedia.org/wiki/Lie-to-children" class="remarkup-link remarkup-link-ext" rel="noreferrer">lies-to-children</a>&quot; — a false statement that leads to a more accurate explanation. Asymptotically approaching truth via approximation.</p>

<p>Every section of this post is a subtle lie, but approximately correct.</p>

<h3 class="remarkup-header">Release Train</h3>

<p>The first lie I need to tell is that <strong>we deploy code once a week</strong>.</p>

<p>Every Thursday, <a href="/tag/release-engineering-team/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_147"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_146" aria-hidden="true"></span>Release-Engineering-Team</span></a> deploys a MediaWiki release to all <a href="https://noc.wikimedia.org/conf/highlight.php?file=dblists/all.dblist" class="remarkup-link remarkup-link-ext" rel="noreferrer">978
wikis</a>. The &quot;release branch&quot; is 198 different branches—one branch each for <tt class="remarkup-monospaced">mediawiki/core</tt>, <tt class="remarkup-monospaced">mediawiki/vendor</tt>, 188 MediaWiki extensions, and eight skins—that get bundled up via <tt class="remarkup-monospaced">git submodule</tt>.</p>

<h3 class="remarkup-header">Progressive rollout</h3>

<p>The next lie gets a bit closer to the truth: we don&#039;t deploy on Thursday; <strong>we deploy Tuesday through Thursday</strong>.</p>

<p>The cleverly named <a href="https://gerrit.wikimedia.org/r/q/owner:TrainBranchBot+repo:mediawiki/core" class="remarkup-link remarkup-link-ext" rel="noreferrer">TrainBranchBot</a> creates a weekly train branch at 2 am UTC every Tuesday.</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><strong>Tuesday</strong><ul class="remarkup-list">
<li class="remarkup-list-item">Deploy to <tt class="remarkup-monospaced">Group0</tt>—132 wikis, including <a href="https://test.wikipedia.org" class="remarkup-link remarkup-link-ext" rel="noreferrer">Test Wikipedia</a>, <a href="https://www.mediawiki.org" class="remarkup-link remarkup-link-ext" rel="noreferrer">mediawiki.org</a>, and <a href="https://office.wikimedia.org" class="remarkup-link remarkup-link-ext" rel="noreferrer">Office wiki</a> (our internal WMF MediaWiki)</li>
</ul></li>
<li class="remarkup-list-item"><strong>Wednesday</strong><ul class="remarkup-list">
<li class="remarkup-list-item">Deploy to <tt class="remarkup-monospaced">Group1</tt>—528 wikis, including <a href="https://commons.wikimedia.org" class="remarkup-link remarkup-link-ext" rel="noreferrer">Commons</a> and <a href="https://www.wikidata.org" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikidata</a>. Most non-Wikipedia wikis (plus <a href="https://ca.wikipedia.org" class="remarkup-link remarkup-link-ext" rel="noreferrer">Catalan Wikipedia</a>, <a href="https://it.wikipedia.org" class="remarkup-link remarkup-link-ext" rel="noreferrer">Italian Wikipedia</a>, and <a href="https://he.wikipedia.org" class="remarkup-link remarkup-link-ext" rel="noreferrer">Hebrew Wikipedia</a>)</li>
</ul></li>
<li class="remarkup-list-item"><strong>Thursday</strong><ul class="remarkup-list">
<li class="remarkup-list-item">Deploy to remaining 320 wikis, including our largest wiki: <a href="https://en.wikipedia.org" class="remarkup-link remarkup-link-ext" rel="noreferrer">English Wikipedia</a></li>
</ul></li>
</ul>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/kf4it7c672u5h5jaixh7/PHID-FILE-26z2eo5hxzwccmsmzcol/train_%281%29.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_145"><img src="https://phab.wmfusercontent.org/file/data/kf4it7c672u5h5jaixh7/PHID-FILE-26z2eo5hxzwccmsmzcol/train_%281%29.png" height="541" width="1106" loading="lazy" alt="Release train process" /></a></div></p>

<p><strong>Progressive rollouts give users time to spot bugs</strong>. We have an experienced user-base—as Risker attested on the <a href="https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/5ED3JCX2M7NSNOBZYH7SWYNJSWRRCYIB/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikitech-l mailing list</a>:</p>

<blockquote><p>It&#039;s not always possible for even the best developer and the best testing systems to catch an issue that will be spotted by a hands-on user, several of whom are much more familiar with the purpose, expected outcomes and change impact on extensions than the people who have written them or QA&#039;d them.</p></blockquote>



<h3 class="remarkup-header">Bugs</h3>

<p>Now I&#039;m nearing the complete truth: <strong>we deploy every day except for Fridays</strong>.</p>

<p>Brace yourself: we don&#039;t write perfect software. When we find <a href="https://wikitech.wikimedia.org/wiki/Deployments/Holding_the_train#Issues_that_hold_the_train" class="remarkup-link remarkup-link-ext" rel="noreferrer">serious bugs</a>, they <em>block the release train</em> — we will not progress from <tt class="remarkup-monospaced">Group1</tt> to <tt class="remarkup-monospaced">Group2</tt> (for example) until we fix the blocking issue. We fix the blocking issue by <em>backporting</em> a patch to the release branch. If there&#039;s a bug in this release, we patch that bug in our mainline branch, then <tt class="remarkup-monospaced">git cherry-pick</tt> that patch onto our release branch and deploy that code.</p>

<p>We deploy backports three times a day during <em>backport deployment windows</em>.  In addition to backports, developers may opt to deploy new configuration or enable/disable features in the backport deployment windows.</p>

<p>Release engineers <a href="https://wikitech.wikimedia.org/wiki/Deployments/Training" class="remarkup-link remarkup-link-ext" rel="noreferrer">train others to deploy backports twice a week</a>.</p>

<h3 class="remarkup-header">Emergencies</h3>

<p><strong>We deploy on Fridays</strong> when there are <a href="https://wikitech.wikimedia.org/wiki/Deployments/Emergencies#Reasons_for_an_emergency_deploy" class="remarkup-link remarkup-link-ext" rel="noreferrer">major issues</a>. Examples of major issues are:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Security issues</li>
<li class="remarkup-list-item">Data loss or corruption</li>
<li class="remarkup-list-item">Availability of service</li>
<li class="remarkup-list-item">Preventing abuse</li>
<li class="remarkup-list-item">Major loss of functionality/visible breakage</li>
</ul>

<p>We avoid deploying on Fridays because we have a small team of people to respond to incidents. We want those people to be away from computers on the weekends (if they want to be), not responding to emergencies.</p>

<h3 class="remarkup-header">Non-MediaWiki code</h3>

<p>There are 42 microservices on Kubernetes deployed via helm. And there are 64 microservices running on bare metal. The service owners deploy those microservices outside of the train process.</p>

<p>We coordinate deployments on our <a href="https://wikitech.wikimedia.org/wiki/Deployments" class="remarkup-link remarkup-link-ext" rel="noreferrer">deployment calendar</a> wiki page.</p>

<h3 class="remarkup-header">The whole truth</h3>

<p>We progressively deploy a large bundle of MediaWiki patches (between 150 and 950) every week. There are 12 backport windows a week where developers can add new features, fix bugs, or deploy new configurations. There are microservices deployed by developers at their own pace.</p>

<h4 class="remarkup-header">Important Resources:</h4>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Deployments" class="remarkup-link remarkup-link-ext" rel="noreferrer">Deployment calendar</a></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Deployments/Training" class="remarkup-link remarkup-link-ext" rel="noreferrer">Deployment training</a></li>
<li class="remarkup-list-item"><a href="https://versions.toolforge.org/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Current release train status</a></li>
</ul>

<h4 class="remarkup-header">More resources:</h4>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Deployments/Train_vs_backport" class="remarkup-link remarkup-link-ext" rel="noreferrer">Trains vs Backports</a></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar" class="remarkup-link remarkup-link-ext" rel="noreferrer">Long-term deployment calendar</a></li>
<li class="remarkup-list-item"><a href="https://train-blockers.toolforge.org" class="remarkup-link remarkup-link-ext" rel="noreferrer">Train blocker task</a></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Deployments/Emergencies#step-by-step" class="remarkup-link remarkup-link-ext" rel="noreferrer">Emergencies</a></li>
</ul>

<hr class="remarkup-hr" />

<p>Thanks to <a href="https://phabricator.wikimedia.org/p/brennen/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_148"><span class="phui-tag-core phui-tag-color-person">@brennen</span></a>, <a href="https://phabricator.wikimedia.org/p/greg/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_149"><span class="phui-tag-core phui-tag-color-person">@greg</span></a>, <a href="https://phabricator.wikimedia.org/p/KSiebert/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_150"><span class="phui-tag-core phui-tag-color-person">@KSiebert</span></a>, <a href="https://phabricator.wikimedia.org/p/Risker/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_151"><span class="phui-tag-core phui-tag-color-person">@Risker</span></a>, and <a href="https://phabricator.wikimedia.org/p/VPuffetMichel/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_152"><span class="phui-tag-core phui-tag-color-person">@VPuffetMichel</span></a> for reading early drafts of this post. The feedback was very helpful. Stay tuned for &quot;How we deploy code: Part II.&quot;</p></div></content></entry><entry><title>Production Excellence #35: August 2021</title><link href="/phame/live/1/post/248/production_excellence_35_august_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/248/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-09-08T03:53:18+00:00</published><updated>2021-10-20T23:01:24+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">Incidents</h5>

<p>Zero documented incidents last month. Isn&#039;t that something!</p>

<p>Learn about past incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech.  Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up</a> in Phabricator, which are preventive measures and other action items to learn from.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/ggfsx3vumi7uvqbxqcxd/PHID-FILE-kbgcqy7wy3joaitvfw45/proderr-incidents_2021-08.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_153"><img src="https://phab.wmfusercontent.org/file/data/ggfsx3vumi7uvqbxqcxd/PHID-FILE-kbgcqy7wy3joaitvfw45/proderr-incidents_2021-08.png" height="278" alt="proderr-incidents 2021-08.png (834×2 px, 288 KB)" /></a></div></p>

<p>Image from <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident graphs</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> Trends</h5>

<p>In August we resolved 18 of the 156 reports that carried over from previous months, and reported 46 new failures in production. Of the new ones, 17 remain unresolved as of writing and will carry over to next month.</p>

<p>The number of new errors reports in August was fairly high at 46, compared to 31 reports in July, and 26 reports in June.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/2eb2dyaxn4w4gs46hpjj/PHID-FILE-2hbqmdnpgj342nftwuhe/proderr-monthly_2021-09-03.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_154"><img src="https://phab.wmfusercontent.org/file/data/2eb2dyaxn4w4gs46hpjj/PHID-FILE-2hbqmdnpgj342nftwuhe/proderr-monthly_2021-09-03.png" height="300" alt="Unresolved error reports, stacked by month" /></a></div></p>

<p>The backlog of &quot;Old&quot; issues saw no progress this past month and remained constant at 146 open error reports.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/y5sc5xhvko2sjc6v4wt7/PHID-FILE-7qkcf2tqicd2fvo3zqco/proderr-totals_2021-09-03.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_155"><img src="https://phab.wmfusercontent.org/file/data/y5sc5xhvko2sjc6v4wt7/PHID-FILE-7qkcf2tqicd2fvo3zqco/proderr-totals_2021-09-03.png" height="300" alt="Total open production error tasks, by month." /></a></div></p>

<p>Unified graph:<br />
<div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/qs2palfu5jyddm3tbhw2/PHID-FILE-nzvqzkd4264qk6w5asha/proderr-unified_2021-08.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_156"><img src="https://phab.wmfusercontent.org/file/data/qs2palfu5jyddm3tbhw2/PHID-FILE-nzvqzkd4264qk6w5asha/proderr-unified_2021-08.png" height="400" alt="proderr-unified 2021-08.png (1×1 px, 78 KB)" /></a></div></p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 <strong>Did you know</strong>:</div>
<div class="remarkup-reply-body"><p>You can zoom in to your team&#039;s error reports by using the appropriate &quot;Filter&quot; link in the sidebar of our shared <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">workboard</a>.</p></div>
</blockquote>

<p>Take a look at the workboard and look for tasks that could use your help.</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_9" aria-hidden="true"></span>View Workboard</span></a></span></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Progress</h5>

<p>Last few months in review:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>Jan 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/gn7TOpf2LdVE/#R" class="remarkup-link" rel="noreferrer">50 issues</a>)</td><td>3 left.</td></tr>
<tr><td>Feb 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/xQxnXZys4q97/#R" class="remarkup-link" rel="noreferrer">20 issues</a>)</td><td>6 &gt; 5 left.</td></tr>
<tr><td>Mar 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/To0edISjsA9s/#R" class="remarkup-link" rel="noreferrer">48 issues</a>)</td><td>13 &gt; 10 left.</td></tr>
<tr><td>Apr 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/ORxSVxnJBlLc/#R" class="remarkup-link" rel="noreferrer">42 issues</a>)</td><td>18 &gt; 17 left.</td></tr>
<tr><td>May 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/9y.PWGoGgWbK/#R" class="remarkup-link" rel="noreferrer">54 issues</a>)</td><td>22 &gt; 20 left.</td></tr>
<tr><td>Jun 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/DlpqBkLj0aP4/#R" class="remarkup-link" rel="noreferrer">26 issues</a>)</td><td>11 &gt; 10 left.</td></tr>
<tr><td>Jul 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/qQAV178rYaJ_/#R" class="remarkup-link" rel="noreferrer">31 issues</a>)</td><td>16 &gt; 12 left.</td></tr>
<tr><td>Aug 2021 (<a href="https://phabricator.wikimedia.org/maniphest/query/_VlOsgZ9On4g/#R" class="remarkup-link" rel="noreferrer">46 issues</a>)</td><td>+ 17 new unresolved issues.</td></tr>
<tr></tr>
</table></div>

<p>Tally:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>156</td><td>issues open, as of <a href="https://phabricator.wikimedia.org/phame/post/view/247/production_excellence_34_july_2021/" class="remarkup-link" rel="noreferrer">Excellence #34</a> (July 2021).</td></tr>
<tr><td>-18</td><td>issues closed, of the previously open issues.</td></tr>
<tr><td>+17</td><td>new issues that survived August 2021.</td></tr>
<tr><td>155</td><td>issues open, as of today (3 Sep 2021).</td></tr>
<tr></tr>
</table></div>

<p>For more month-over-month numbers refer to the <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p></div></content></entry><entry><title>Production Excellence #34: July 2021</title><link href="/phame/live/1/post/247/production_excellence_34_july_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/247/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-08-19T03:49:57+00:00</published><updated>2021-08-21T12:13:43+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">Incidents</h5>

<p>3 documented incidents last month. That&#039;s at the median for the past twelve months, and slightly below the median of 4 over the past five years (<a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident stats</a>).</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-07-14 eventgate latency spike</a><ul class="remarkup-list">
<li class="remarkup-list-item">Impact: For ~ 10min MediaWiki API clients experienced request failures.</li>
</ul></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-16_asw-a2-codfw_network" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-07-16 codfw-a2 network</a><ul class="remarkup-list">
<li class="remarkup-list-item">Impact: For ~ 1 hour Restbase clients received errors, affecting mobile apps and ContentTranslation.</li>
</ul></li>
<li class="remarkup-list-item"><a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-26_ruwikinews_DynamicPageList" class="remarkup-link remarkup-link-ext" rel="noreferrer">2021-07-26 ruwikinews DynamicPageList</a><ul class="remarkup-list">
<li class="remarkup-list-item">Impact: For 30min, 15% of requests from contributors on all wikis failed. There were also brief moments during which no readers could load recently modified or uncached pages.</li>
</ul></li>
</ul>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/at7qv7l6f4wdysezs5v4/PHID-FILE-rl3m7cj2b5p35s5iggwx/proderr-incidents_2021-07.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_157"><img src="https://phab.wmfusercontent.org/file/data/at7qv7l6f4wdysezs5v4/PHID-FILE-rl3m7cj2b5p35s5iggwx/proderr-incidents_2021-07.png" height="266" alt="proderr-incidents 2021-07.png (796×1 px, 125 KB)" /></a></div></p>

<p>Learn about past incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech. Remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up</a> in Phabricator, which are preventive measures and other action items filed after an incident.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> Trends</h5>

<p>Last month the workboard held 154 non-old unresolved error reports. Over the past thirty days, the collective efforts of our volunteers and engineering teams have closed 14 of those.</p>

<p>In the month of July we&#039;ve also introduced or discovered thirty-one new error reports (that&#039;s an average of one production regression every day!). Of those new error reports, fifteen were resolved and 16 remain unresolved. The workboard now tallies up to 156 tasks.</p>

<p>Take a look at the workboard and look for tasks that could use your help.</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_10" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/vadgmwiump6pm7v5qjow/PHID-FILE-monglrhaxxjbucw5n5dz/proderr-monthly_2021-08-18.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_158"><img src="https://phab.wmfusercontent.org/file/data/vadgmwiump6pm7v5qjow/PHID-FILE-monglrhaxxjbucw5n5dz/proderr-monthly_2021-08-18.png" height="350" alt="proderr-monthly 2021-08-18.png (1×1 px, 97 KB)" /></a></div></p>

<p>Over on the backlog, we&#039;re continuing to ploddingly present progress on production problems from phantoms of christmases past.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/sj7aushp72hyi5fnghwp/PHID-FILE-olov3bkcq32iluo6pxqs/proderr-totals_2021-08-18.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_159"><img src="https://phab.wmfusercontent.org/file/data/sj7aushp72hyi5fnghwp/PHID-FILE-olov3bkcq32iluo6pxqs/proderr-totals_2021-08-18.png" height="300" alt="proderr-totals 2021-08-18.png (900×1 px, 96 KB)" /></a></div></p>

<p>For more month-over-month numbers refer to the <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Outstanding errors</h5>

<p>Below are various older issues that may have fallen by the wayside, taken from somewhat-random stab-in-the-dark queries.</p>

<p>Oldest unresolved errors that are still reproducible  (<a href="https://phabricator.wikimedia.org/maniphest/query/07CAHhY.GApw/#R" class="remarkup-link" rel="noreferrer">Phab query</a>):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Reported in 2015:  Unable to view history of protected Flow board (StructuredDiscussions, Growth team), <a href="https://phabricator.wikimedia.org/T118502" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_160"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T118502</span></span></a>.</li>
<li class="remarkup-list-item">Reported in 2016:  Error when deleting a heading next to a table (VisualEditor, Editing team), <a href="https://phabricator.wikimedia.org/T140871" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_161"><span class="phui-tag-core phui-tag-color-object">T140871</span></a>.</li>
</ul>

<p>Stalled error reports  (<a href="https://phabricator.wikimedia.org/maniphest/query/Dmy0AuERAQct/#R" class="remarkup-link" rel="noreferrer">Phab query</a>):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Stalled Mar 2021:  Constraints check for Q142 France times out (Wikidata, WMDE), <a href="https://phabricator.wikimedia.org/T212282" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_162"><span class="phui-tag-core phui-tag-color-object">T212282</span></a>.</li>
</ul>

<p>Oldest error with a patch for review  (<a href="https://phabricator.wikimedia.org/maniphest/query/eb6hYVaKr0Kx/#R" class="remarkup-link" rel="noreferrer">Phab query</a>):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Reported in 2016:  Maps broken during 2nd live preview (Maps, Product Infra), <a href="https://phabricator.wikimedia.org/T151524" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_163"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T151524</span></span></a>.</li>
<li class="remarkup-list-item">Reported in 2018:  Corrupt connection for cross-wiki db query (Platform team), <a href="https://phabricator.wikimedia.org/T193565" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_164"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T193565</span></span></a>.</li>
</ul>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>Jan 2021 (3 of <a href="https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R" class="remarkup-link" rel="noreferrer">50 issues</a> left)</td><td>⚠️ <em>Unchanged. Have a look-see!</em></td></tr>
<tr><td>Feb 2021 (6 of <a href="https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R" class="remarkup-link" rel="noreferrer">20 issues</a> left)</td><td>⚠️ <em>Unchanged. Take a gander!</em></td></tr>
<tr><td>Mar 2021 (13 of <a href="https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R" class="remarkup-link" rel="noreferrer">48 issues</a> left)</td><td>⚠️ <em>Unchanged. Check it out!</em></td></tr>
<tr><td>Apr 2021 (18 of <a href="https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R" class="remarkup-link" rel="noreferrer">42 issues</a> left)</td><td>-1</td></tr>
<tr><td>May 2021 (22 of <a href="https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R" class="remarkup-link" rel="noreferrer">54 issues</a> left)</td><td>-3</td></tr>
<tr><td>June 2021 (11 of <a href="https://phabricator.wikimedia.org/maniphest/query/roL0TaxtcaLQ/#R" class="remarkup-link" rel="noreferrer">26 issues</a> left)</td><td>-4</td></tr>
<tr><td>July 2021 (16 of <a href="https://phabricator.wikimedia.org/maniphest/query/mUVAD8TJHE3n/#R" class="remarkup-link" rel="noreferrer">31 issues</a> left)</td><td>+31; -15</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">Tally</h5>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>154</td><td>issues open, as of <a href="https://phabricator.wikimedia.org/phame/post/view/240/production_excellence_33_june_2021/" class="remarkup-link" rel="noreferrer">Excellence #33 (June 2021)</a>.</td></tr>
<tr><td>-14</td><td>issues closed, of the previous 154 open issues.</td></tr>
<tr><td>+16</td><td>new issues that survived July 2021.</td></tr>
<tr><td>156</td><td>issues open, as of today.</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p></div></content></entry><entry><title>Shrinking the tasks backlog</title><link href="/phame/live/1/post/244/shrinking_the_tasks_backlog/" /><id>https://phabricator.wikimedia.org/phame/post/view/244/</id><author><name>hashar (Antoine Musso)</name></author><published>2021-07-02T15:05:04+00:00</published><updated>2021-07-14T18:29:30+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The release engineering team triages tasks flagged <a href="/tag/release-engineering-team/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_174"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_173" aria-hidden="true"></span>Release-Engineering-Team</span></a> on a weekly basis. It is an all hands on deck one hour meeting in which we pick tasks one by one and find out what to do with them. We have started with more than a hundred of them and are now down to just a dozen or so, most filed since the last meeting.</p>

<p>I have been doing those routine triages for the projects I closely manage, often on Friday afternoon. I have recently started being a bit more serious about it and even allocated a couple weeks entirely dedicated to act on the backlog. This post summarizes some of my discoveries, will hopefully inspire the reader to tackle their own backlogs, technical debt and hopefully in the end we will have improved our ecosystem.</p>

<h2 class="remarkup-header">Finding tasks</h2>

<h3 class="remarkup-header">Tasks you have filed</h3>

<p>I keep filing tasks rather than taking notes or writing emails, I find Phabricator interface convenient since it lets me flag a task with labels however I want (<a href="/tag/technical-debt/" class="phui-tag-view phui-tag-type-shade phui-tag-yellow phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_176"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-tags" data-meta="0_175" aria-hidden="true"></span>Technical-Debt</span></a> , <a href="/tag/documentation/" class="phui-tag-view phui-tag-type-shade phui-tag-yellow phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_178"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-tags" data-meta="0_177" aria-hidden="true"></span>Documentation</span></a>, #mediawiki), subscribe individuals or even a whole team. It is great.   With time those tasks pill up and it is easy to forget old ones, they have to be revisited from time to time.  It as easy I searching for any open tasks I have filed and order them by creation date:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>Authors</td><td><span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Current viewer</span></span></span></td></tr>
<tr><td>Statuses</td><td><span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Open Stalled</span></span></span></td></tr>
<tr><td>Group By</td><td><span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">None</span></span></span></td></tr>
<tr><td>Order By</td><td><span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Creation (oldest First)</span></span></span></td></tr>
<tr></tr>
</table></div>

<p><a href="https://phabricator.wikimedia.org/maniphest/query/Wws2E0C7IaFd/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/Wws2E0C7IaFd/#R</a></p>

<p>The first bug in the list is the oldest you have created and most probably deserve to be acted on. From there pick the tasks one by one.</p>

<p>Some will surely be obsolete since they have been acted on or the underlying infrastructure entirely changed. An example of a 6 years old task I declined is <a href="https://phabricator.wikimedia.org/T100099" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_166"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T100099</span></span></a>, it followed a meeting to deploy MediaWiki services to <a href="/tag/beta-cluster-infrastructure/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_180"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_179" aria-hidden="true"></span>Beta-Cluster-Infrastructure</span></a> . The task has been partially achieved for a few services (notably Parsoid) and was left open since we never moved all services to the same system.  Nowadays developers deploy a Docker image and restart the Docker container. The notes are obsolete and the task has thus no purpose anymore.</p>

<p><a href="https://phabricator.wikimedia.org/T149924" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_167"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T149924</span></span></a> came from deploying static web assets using git directly to <tt class="remarkup-monospaced">/srv</tt>. However the partition also hosted dynamically generated content such as all the content from <a href="https://doc.wikimedia.org/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://doc.wikimedia.org/</a> , <a href="https://integration.wikimedia.org/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://integration.wikimedia.org/</a> or state from a CI daemon. The issue is problematic when we reimage the server, specially during OS upgrades which we do every two years, and the task history reflect that:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Filed in 2016 after an OS upgrade</li>
<li class="remarkup-list-item">The part affecting <a href="https://integration.wikimedia.org/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://integration.wikimedia.org/</a> is partially addressed in 2018 as part of an OS upgrade</li>
<li class="remarkup-list-item">In 2020 we had yet another OS upgrade and this time I decided to complete the task</li>
</ul>

<p>I completed it because that task showed up in my list of oldest bugs, it thus kept showing up whenever I did the triage and that was an incentive to get it gone. We are in a much better shape, the services have been decoupled on different machines, the static assets are deployed using our deployment tool: <a href="/tag/scap/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_182"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_181" aria-hidden="true"></span>Scap</span></a>.</p>

<h3 class="remarkup-header">Check your projects</h3>

<p>Beside your team projects, you surely have side pet projects or legacy tags you might want to revisit. They can be found in search for your projects you are a member of (assuming you made yourself a member): <a href="https://phabricator.wikimedia.org/project/query/JS0zmX.yalpI/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/project/query/JS0zmX.yalpI/#R</a></p>

<p>I for example introduced <a href="/tag/doxygen/" class="phui-tag-view phui-tag-type-shade phui-tag-yellow phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_184"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-tags" data-meta="0_183" aria-hidden="true"></span>Doxygen</span></a> to generate the <a href="https://doc.wikimedia.org/mediawiki-core/master/php/" class="remarkup-link remarkup-link-ext" rel="noreferrer">MediaWiki PHP documentation</a>, <tt class="remarkup-monospaced">git-review</tt> to assist interactions with Gerrit for which bugs are tracked in a column of the <a href="/tag/gerrit/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_186"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_185" aria-hidden="true"></span>Gerrit</span></a> project, and I am probably the one one actively acting on this task.</p>

<p>You can again list tasks filed against each project sorted by creation dates, and since you are a member of the project you will most probably be able to act on those old tasks.</p>

<p>One of the oldest tasks I had was <a href="https://phabricator.wikimedia.org/T48148" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_168"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T48148</span></span></a>, which is to hide CI or robot comments from Gerrit change. The task has been filed in 2013, I found the upstream proposed solution back in 2019 and well *cough* forgot about it. Since I encountered the task during a triage, I went to tackle it and in short the required code boils down to add a single line in the CI configuration:</p>

<div class="remarkup-code-block" data-code-lang="diff" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span> gerrit:
   verified: 2
<span class="gi">+  tag: autogenerated:ci</span></pre></div>

<p>That took almost 9 months, since I was not actively triaging old tasks.</p>

<h3 class="remarkup-header">Technical debt</h3>

<p>Just like we have the generic <a href="/tag/documentation/" class="phui-tag-view phui-tag-type-shade phui-tag-yellow phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_188"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-tags" data-meta="0_187" aria-hidden="true"></span>Documentation</span></a> tag for any tasks relating to documentation, we have  <a href="/tag/technical-debt/" class="phui-tag-view phui-tag-type-shade phui-tag-yellow phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_190"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-tags" data-meta="0_189" aria-hidden="true"></span>Technical-Debt</span></a> to mark a task as requiring an extra effort and bring us to modernity. When triaging your own or your projects tasks, you can flag them as technical debt to easily find them later on.</p>

<p>Some tasks can immediately be filed as being a technical debt, that was the case of <a href="https://phabricator.wikimedia.org/T141324" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_169"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T141324</span></span></a> which is to send logging of the Gerrit code review system to logstash and thus make them easier to dig through or discover. Sounds simple?  Well not that much.</p>

<p>The story is a bit complicated, but in short Gerrit is a java application and our team does not necessarily have much experience with it, the state of Java logging is a bit unclear (Gerrit uses log4j). Luckily we had some support from actual Java developers and managed to do some injecting, though the fields were not properly formatted, it was a progress.</p>

<p>After I got assigned as the primary maintainer of our Gerrit setup, I definitely needed proper logging. When we upgraded Gerrit to 3.2, the library we used to format the logs to Json was no longer provided by upstream, forcing us to maintain a fork of Gerrit just for that purpose.</p>

<p>Luckily upstream has made improvements and I found out it supports json logging out of the box while our logging infrastructure learned to ingest json logs. We even got as far as supporting <a href="https://doc.wikimedia.org/ecs/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Elastic Common Schema</a> to use predefined field names.</p>

<p>That task has been a technical debt for 5 years, but since I kept seeing it I kept remembering about it and managed to address it.</p>

<p>Some tasks can not be acted on cause they depend on an upstream change that might be delayed for some reasons.  A massive issue we have encountered since at least 2015 was slowness when doing a <tt class="remarkup-monospaced">git fetch</tt> from our busiest repository. I previously blogged about it <a href="/J199" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_165"><span class="phui-tag-core phui-tag-color-object">Blog Post: Faster source code fetches thanks to git protocol version 2</span></a> and Google addressed it by proposing a version 2 of the git protocol. It was one of the incentives for us to upgrade Gerrit, and as soon as we upgraded I made a point to test the fix and make it well known to our developers (do use <tt class="remarkup-monospaced">protocol.version=2</tt> in your <tt class="remarkup-monospaced">.gitconfig</tt>).</p>

<h2 class="remarkup-header">Grooming pleasure</h2>

<p>When processing old tasks, you can find it hard to tackle ones that need to focus for a few days if not weeks as in the example above. But there are also a bunch of little annoying tasks that are surprisingly very easy to solve and give immediate reward. The positive feedback loop would get you in the mood of finding more easy tasks and thus reducing your backlog. A few more examples:</p>

<p><a href="https://phabricator.wikimedia.org/T221510" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_170"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T221510</span></span></a>, filed in 2019 and addressed two years later, was requesting to expose a machine readable test coverage report. The file was there (<tt class="remarkup-monospaced">clover.xml</tt>) it was simply not exposed in the web page, a simple <tt class="remarkup-monospaced">&lt;a href=&quot;clover.xml&quot;&gt;clover.xml&lt;/a&gt;</tt> is the only thing that was required.</p>

<p>My favorite tasks are obviously the ones that <strong>already have been solved</strong> and are just pending the paperwork to mark them <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">resolved</span></span></span>. <a href="https://phabricator.wikimedia.org/T138653" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_171"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T138653</span></span></a> was for a user unable to login to Gerrit due to a duplicate account, 3 years after it had been filed the user reported he was able to login properly and I marked it resolved one hour later.  I guess that user was grooming their old tasks as well.</p>

<p>And finally, some old tasks might not be worth fixing. We are probably too kind with those and should probably be more strict in declining very old tasks.  An example is <a href="https://phabricator.wikimedia.org/T63733" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_172"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T63733</span></span></a>, the MediaWiki source code is deployed to the Wikimedia production cluster under a directory named <tt class="remarkup-monospaced">php-&lt;version&gt;</tt>. Surely the <tt class="remarkup-monospaced">php-</tt> prefix does not offer any meaningful information. However, since it is hardcoded in various places and would require moving files around on the whole fleet of servers, it might be a bit challenging and would definitely be a risky change.  Should we drop that useless prefix? For sure. Is it worth facing outage and possibly multiple degraded services? Definitely not and I have thus just declined it.</p></div></content></entry><entry><title>Production Excellence #33: June 2021</title><link href="/phame/live/1/post/240/production_excellence_33_june_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/240/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-07-14T03:34:25+00:00</published><updated>2021-07-14T14:05:33+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">Incidents</h5>

<p>3 documented incidents. That&#039;s lower than June in the previous five years where the month saw 5-9 incidents. I&#039;ve added a <strong>new panel</strong> ⭐️  to the <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident statistics</a> tool. This one plots monthly statistics on top of previous years, to more easily compare them:</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/xnitivxf2ozlq5nb6wav/PHID-FILE-e7b7yko26l6nw7ctcko2/proderr-incidents_2021-06.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_191"><img src="https://phab.wmfusercontent.org/file/data/xnitivxf2ozlq5nb6wav/PHID-FILE-e7b7yko26l6nw7ctcko2/proderr-incidents_2021-06.png" height="254" alt="proderr-incidents 2021-06.png (381×730 px, 75 KB)" /></a></div></p>

<p>Learn more from the <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documents</a> on Wikitech, and remember to review and schedule <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Incident Follow-up</a> in Phabricator, which are preventive measures and other action items filed after an incident.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> Trends</h5>

<p>In June, work on production errors appears to have stagnated a bit. Or more precisely, the work only resulted in relatively few tasks being resolved. 15 of the 26 new tasks are still open as of writing.</p>

<p>Of the tasks from previous months, only 11 were resolved, leaving most columns unchanged. See the table further down for a more detailed breakdown and links to Phabricator queries for the tasks in question.</p>

<p>With the 15 remaining new tasks, and the 11 tasks resolved from our backlog, this raises the chart from 150 to 154 tasks.</p>

<p>Take a look at the <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">workboard</a> and look for tasks that could use your help.</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_11" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/6lg7duetxaqzzy2o4kxw/PHID-FILE-im6kwriwv6qugwoyudur/proderr-monthly_2021-07-12.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_192"><img src="https://phab.wmfusercontent.org/file/data/6lg7duetxaqzzy2o4kxw/PHID-FILE-im6kwriwv6qugwoyudur/proderr-monthly_2021-07-12.png" height="384" alt="Unresolved error reports, stacked by month." /></a></div></p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/4ege4cdaoruwzx3nr64p/PHID-FILE-ydxjn2j336br72cngyi6/proderr-totals_2021-07-12.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_193"><img src="https://phab.wmfusercontent.org/file/data/4ege4cdaoruwzx3nr64p/PHID-FILE-ydxjn2j336br72cngyi6/proderr-totals_2021-07-12.png" height="361" alt="Total open production error tasks, by month." /></a></div></p>

<p>Month-over-month plots based on <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Outstanding errors</h5>

<p>Summary over recent months:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>Jan 2020 (1 of 7 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Mar 2020 (2 of 2 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Apr 2020 (4 of 14 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>May 2020 (5 of 14 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Jun 2020 (5 of 14 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Jul 2020 (4 of 24 issues)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Aug 2020 (11 of 53 issues)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Sep 2020 (7 of 33 issues)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Oct 2020 (19 of 69 issues)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Nov 2020 (8 of 38 issues)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Dec 2020 (7 of 33 issues)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Jan 2021 (3 of <a href="https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R" class="remarkup-link" rel="noreferrer">50 issues</a>)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Feb 2021 (6 of <a href="https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R" class="remarkup-link" rel="noreferrer">20 issues</a>)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Mar 2021 (13 of <a href="https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R" class="remarkup-link" rel="noreferrer">48 issues</a>)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Apr 2021 (19 of <a href="https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R" class="remarkup-link" rel="noreferrer">42 issues</a>)</td><td>⬇️ Four tasks resolved.</td><td>-4</td></tr>
<tr><td>May 2021 (25 of <a href="https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R" class="remarkup-link" rel="noreferrer">54 issues</a>)</td><td>⬇️ Four tasks resolved.</td><td>-4</td></tr>
<tr><td>June 2021 (15 of <a href="https://phabricator.wikimedia.org/maniphest/query/roL0TaxtcaLQ/#R" class="remarkup-link" rel="noreferrer">26 issues</a>)</td><td>📌  26 new issues, of which 11 were closed.</td><td>+26, -11</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Tally</th><th></th></tr>
<tr><td>150</td><td>issues open, as of <a href="https://phabricator.wikimedia.org/phame/post/view/236/production_excellence_32_may_2021/" class="remarkup-link" rel="noreferrer">Excellence #32 (May 2021)</a>.</td></tr>
<tr><td>-11</td><td>issues closed, of the previous 150 open issues.</td></tr>
<tr><td>+15</td><td>new issues that survived June 2021.</td></tr>
<tr><td>154</td><td>issues open as of yesterday.</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<blockquote><p><em><a href="https://en.wikiquote.org/wiki/Stargate_SG-1#4.6" class="remarkup-link remarkup-link-ext" rel="noreferrer">🕳 O&#039;Neill</a>: We&#039;ve done this!</em><br />
<em>Dr Jackson: We do this every day.</em><br />
<em>O&#039;Neill: I&#039;m not talking about briefings in general, Daniel, I&#039;m talking about </em>this briefing<em>; I&#039;m talking about </em>this day.<br />
<em>Teal&#039;c: Col. O&#039;Neill is correct. Events do appear to be repeating themselves.</em></p></blockquote></div></content></entry><entry><title>Production Excellence #32: May 2021</title><link href="/phame/live/1/post/236/production_excellence_32_may_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/236/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-06-21T01:31:27+00:00</published><updated>2021-06-21T17:47:50+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">Incidents</h5>

<p>Zero incidents recorded in the past month. Yay! That&#039;s only five months after November 2020, the last month without documented incidents (<a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident stats</a>).</p>

<p>Remember to review <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator, which are action items filed after an incident.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> Trends</h5>

<p>In May, we unfortunately saw a repeat of the worrying pattern we saw <a href="https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/#trends" class="remarkup-link" rel="noreferrer">in April</a>, but with higher numbers. We found 54 new errors. This is the most new errors in a single month, since the Excellence monthly began three years ago in 2018. About half of these (29 of 54) remain unresolved as of writing, two weeks into the following month.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/digp3zsi53jtva6bsy54/PHID-FILE-jyprkokubucfyegwvvnz/proderr-monthly_2021-06-12.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_194"><img src="https://phab.wmfusercontent.org/file/data/digp3zsi53jtva6bsy54/PHID-FILE-jyprkokubucfyegwvvnz/proderr-monthly_2021-06-12.png" height="338" alt="Unresolved error reports, stacked by month." /></a></div><br />
<div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/ojuhk35bog7lr4fp7cay/PHID-FILE-ces7nfpoxkqtirlg6jf3/proderr-totals_2021-06-12.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_195"><img src="https://phab.wmfusercontent.org/file/data/ojuhk35bog7lr4fp7cay/PHID-FILE-ces7nfpoxkqtirlg6jf3/proderr-totals_2021-06-12.png" height="308" alt="Total open production error tasks, by month." /></a></div></p>

<p>Month-over-month plots based on <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">New errors in May</h5>

<p>Below is a snapshot of just the <a href="https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R" class="remarkup-link" rel="noreferrer">54 new issues</a> found last month, listed by their <a href="https://www.mediawiki.org/wiki/Developers/Maintainers" class="remarkup-link remarkup-link-ext" rel="noreferrer">code steward</a>.</p>

<p>Be mindful that the reporting of errors is not itself a negative point per-se. I think it should be celebrated when teams have good telemetry, detect their issues early, and address them within their development cycle. It might be more worrisome when teams lack telemetry or time to find such issues, or can&#039;t keep up with the pace at which issues are found.</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Anti Harassment Tools</th><td>None.</td><td></td></tr>
<tr><th>Community Tech</th><td>None.</td><td></td></tr>
<tr><th>Editing Team</th><td>+2, -1</td><td>Cite (<a href="https://phabricator.wikimedia.org/T283755" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_196"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283755</span></span></a>); OOUI (<a href="https://phabricator.wikimedia.org/T282176" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_197"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282176</span></span></a>).</td></tr>
<tr><th>Growth Team</th><td>+17, -4</td><td>Add-Link (<a href="https://phabricator.wikimedia.org/T281960" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_198"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T281960</span></span></a>); GrowthExperiments (<a href="https://phabricator.wikimedia.org/T281525" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_199"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T281525</span></span></a> <a href="https://phabricator.wikimedia.org/T281703" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_200"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T281703</span></span></a> <a href="https://phabricator.wikimedia.org/T283546" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_201"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283546</span></span></a> <a href="https://phabricator.wikimedia.org/T283638" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_202"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283638</span></span></a> <a href="https://phabricator.wikimedia.org/T283924" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_203"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283924</span></span></a>); Echo (<a href="https://phabricator.wikimedia.org/T282446" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_204"><span class="phui-tag-core phui-tag-color-object">T282446</span></a>); Recent-changes (<a href="https://phabricator.wikimedia.org/T282047" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_205"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282047</span></span></a> <a href="https://phabricator.wikimedia.org/T282726" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_206"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282726</span></span></a>); StructuredDiscussions (<a href="https://phabricator.wikimedia.org/T281521" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_207"><span class="phui-tag-core phui-tag-color-object">T281521</span></a> <a href="https://phabricator.wikimedia.org/T281523" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_208"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T281523</span></span></a> <a href="https://phabricator.wikimedia.org/T281782" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_209"><span class="phui-tag-core phui-tag-color-object">T281782</span></a> <a href="https://phabricator.wikimedia.org/T281784" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_210"><span class="phui-tag-core phui-tag-color-object">T281784</span></a> <a href="https://phabricator.wikimedia.org/T282069" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_211"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282069</span></span></a> <a href="https://phabricator.wikimedia.org/T282146" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_212"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282146</span></span></a> <a href="https://phabricator.wikimedia.org/T282599" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_213"><span class="phui-tag-core phui-tag-color-object">T282599</span></a> <a href="https://phabricator.wikimedia.org/T282605" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_214"><span class="phui-tag-core phui-tag-color-object">T282605</span></a>).</td></tr>
<tr><th>Language Team</th><td>+1</td><td>Translate extension (<a href="https://phabricator.wikimedia.org/T283828" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_215"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283828</span></span></a>).</td></tr>
<tr><th>Parsing Team</th><td>+1</td><td>Parsoid (<a href="https://phabricator.wikimedia.org/T281932" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_216"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T281932</span></span></a>).</td></tr>
<tr><th>Reading Web</th><td>None.</td><td></td></tr>
<tr><th>Structured Data</th><td>None.</td><td></td></tr>
<tr><th>Product Infra Team</th><td>+1</td><td>WikimediaEvents (<a href="https://phabricator.wikimedia.org/T282580" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_217"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282580</span></span></a>).</td></tr>
<tr><th>Analytics</th><td>None.</td><td></td></tr>
<tr><th>Performance Team</th><td>None.</td><td></td></tr>
<tr><th>Platform Engineering</th><td>+16, -11</td><td>MediaWiki-API (<a href="https://phabricator.wikimedia.org/T282122" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_218"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282122</span></span></a>); MediaWiki-General (<a href="https://phabricator.wikimedia.org/T282173" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_219"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282173</span></span></a>);  MediaWiki-Page-derived-data (<a href="https://phabricator.wikimedia.org/T281714" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_220"><span class="phui-tag-core phui-tag-color-object">T281714</span></a> <a href="https://phabricator.wikimedia.org/T281802" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_221"><span class="phui-tag-core phui-tag-color-object">T281802</span></a> <a href="https://phabricator.wikimedia.org/T282180" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_222"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282180</span></span></a> <a href="https://phabricator.wikimedia.org/T283282" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_223"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283282</span></span></a>), MediaWiki-Revision-backend (<a href="https://phabricator.wikimedia.org/T282145" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_224"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282145</span></span></a> <a href="https://phabricator.wikimedia.org/T282723" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_225"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282723</span></span></a> <a href="https://phabricator.wikimedia.org/T282825" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_226"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282825</span></span></a> <a href="https://phabricator.wikimedia.org/T283170" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_227"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283170</span></span></a>); MediaWiki-User-management (<a href="https://phabricator.wikimedia.org/T283167" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_228"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283167</span></span></a>); MW Expedition (<a href="https://phabricator.wikimedia.org/T281526" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_229"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T281526</span></span></a> <a href="https://phabricator.wikimedia.org/T281981" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_230"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T281981</span></span></a> <a href="https://phabricator.wikimedia.org/T282038" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_231"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282038</span></span></a> <a href="https://phabricator.wikimedia.org/T282181" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_232"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282181</span></span></a> <a href="https://phabricator.wikimedia.org/T283196" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_233"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283196</span></span></a>).</td></tr>
<tr><th>Search Platform</th><td>+3, -2</td><td>CirrusSearch (<a href="https://phabricator.wikimedia.org/T282036" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_234"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282036</span></span></a> <a href="https://phabricator.wikimedia.org/T282207" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_235"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282207</span></span></a>); GeoData (<a href="https://phabricator.wikimedia.org/T282735" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_236"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282735</span></span></a>).</td></tr>
<tr><th>WMDE TechWish</th><td>+2, -1</td><td>Revision-Slider (<a href="https://phabricator.wikimedia.org/T282067" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_237"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282067</span></span></a>); VisualEditor Template dialog (<a href="https://phabricator.wikimedia.org/T283511" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_238"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283511</span></span></a>).</td></tr>
<tr><th>WMDE Wikidata</th><td>+3, -1</td><td>Wikibase (<a href="https://phabricator.wikimedia.org/T282534" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_239"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282534</span></span></a> <a href="https://phabricator.wikimedia.org/T283198" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_240"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283198</span></span></a> <a href="https://phabricator.wikimedia.org/T283862" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_241"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283862</span></span></a>).</td></tr>
<tr><th><span class="remarkup-highlight">No owner</span></th><td>+7, -6</td><td>CentralAuth (<a href="https://phabricator.wikimedia.org/T282834" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_242"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282834</span></span></a> <a href="https://phabricator.wikimedia.org/T283635" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_243"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283635</span></span></a>); Change-tagging (<a href="https://phabricator.wikimedia.org/T283098" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_244"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283098</span></span></a> <a href="https://phabricator.wikimedia.org/T283099" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_245"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283099</span></span></a>); MapSources (<a href="https://phabricator.wikimedia.org/T282833" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_246"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T282833</span></span></a>); MediaWiki-Page-information (<a href="https://phabricator.wikimedia.org/T283751" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_247"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283751</span></span></a>); Other (<a href="https://phabricator.wikimedia.org/T283252" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_248"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T283252</span></span></a>).</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">Outstanding errors</h5>

<p>Take a look at the workboard and look for tasks that could use your help.</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_12" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p>Summary over recent months:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>Aug 2019 (0 of 14 left)</td><td>✅ Last task resolved!</td><td>-1</td></tr>
<tr><td>Jan 2020 (1 of 7 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Mar 2020 (2 of 2 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Apr 2020 (4 of 14 left)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>May 2020 (5 of 14 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Jun 2020 (5 of 14 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Jul 2020 (4 of 24 issues)</td><td>⏸ —</td><td></td></tr>
<tr><td>Aug 2020 (12 of 53 issues)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Sep 2020 (7 of 33 issues)</td><td>⏸ —</td><td></td></tr>
<tr><td>Oct 2020 (19 of 69 issues)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Nov 2020 (8 of 38 issues)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Dec 2020 (7 of 33 issues)</td><td>⏸ —</td><td></td></tr>
<tr><td>Jan 2021 (3 of <a href="https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R" class="remarkup-link" rel="noreferrer">50 issues</a>)</td><td>⏸ —</td><td></td></tr>
<tr><td>Feb 2021 (7 of <a href="https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R" class="remarkup-link" rel="noreferrer">20 issues</a>)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Mar 2021 (14 of <a href="https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R" class="remarkup-link" rel="noreferrer">48 issues</a>)</td><td>⬇️ Four tasks resolved.</td><td>-4</td></tr>
<tr><td>Apr 2021 (23 of <a href="https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R" class="remarkup-link" rel="noreferrer">42 issues</a>)</td><td>⬇️ Two tasks resolved.</td><td>-2</td></tr>
<tr><td><strong>May 2021</strong> (29 of <a href="https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R" class="remarkup-link" rel="noreferrer">54 issues</a>)</td><td>54 new issues found, of which 29 remain open.</td><td>+54; -25</td></tr>
<tr></tr>
</table></div>



<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Tally</th><th></th></tr>
<tr><td>133</td><td>issues open, as of <a href="https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/" class="remarkup-link" rel="noreferrer">Excellence #31</a> (12 May 2021).</td></tr>
<tr><td>-12</td><td>issues closed, of the previous 133 open issues.</td></tr>
<tr><td>+29</td><td>new issues that survived May 2021.</td></tr>
<tr><td>150</td><td>issues open, as of today (12 June 2021).</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:<br />
<a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status, Wikitech</a>.<br />
<a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia incident stats by Krinkle, CodePen</a>.<br />
<a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">Production error data (spreadsheet and plots)</a>.<br />
<a href="https://phabricator.wikimedia.org/project/reports/1055/" class="remarkup-link" rel="noreferrer">Phabricator report charts for Wikimedia-production-error project</a>.</p></div></content></entry><entry><title>Production Excellence #31: April 2021</title><link href="/phame/live/1/post/235/production_excellence_31_april_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/235/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-05-13T03:49:23+00:00</published><updated>2021-06-12T17:39:06+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">Incidents</h5>

<p>6 documented incidents. That&#039;s above the <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">historical average</a> of 3–4 per month.</p>

<p>Learn about recent incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech, or <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> Trends</h5>

<p>In April, we saw a continuation of the healthy trend that started <a href="https://phabricator.wikimedia.org/phame/post/view/227/production_excellence_28_january_2021/#trends" class="remarkup-link" rel="noreferrer">this January</a> —  a trend where the back of the line is moving forward at least as quickly as the front of the line. We did take a little breather <a href="https://phabricator.wikimedia.org/phame/post/view/229/production_excellence_30_march_2021/#trends" class="remarkup-link" rel="noreferrer">in March</a> where we almost broke even, but otherwise the trend is going well.</p>

<p>Last month we bade farewell to the production errors we found in July 2019. This month we cleared out the column for October 2019.</p>

<p>One point of concern is that we did encounter a high number of new production errors —  errors that we failed to catch during development, code review, continuous integration, beta testing, or pre-deployment checks. Where we used to discover about a dozen of those a month,  we found 42 during this month. As of writing, 17 of the 42 April-discovered errors have been resolved.</p>

<p>The &quot;Old&quot; column (generally tracking pre-2019 tasks) grew for the first time in six months. This increase can largely be attributed to improved telemetry of client-side errors uncovering issues in under-resourced products, such as the old Kaltura video player.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/4j53dwyjk44g5h365jca/PHID-FILE-gdfdvgzzdinhfqtg4nvo/proderr-monthly_2021-05-12.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_249"><img src="https://phab.wmfusercontent.org/file/data/4j53dwyjk44g5h365jca/PHID-FILE-gdfdvgzzdinhfqtg4nvo/proderr-monthly_2021-05-12.png" height="338" alt="Unresolved error reports, stacked by month." /></a></div><br />
<div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/blmdm7ub2nvehdypxw3b/PHID-FILE-slh3ajbeqfiil5l3o65d/proderr-totals_2021-05-12.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_250"><img src="https://phab.wmfusercontent.org/file/data/blmdm7ub2nvehdypxw3b/PHID-FILE-slh3ajbeqfiil5l3o65d/proderr-totals_2021-05-12.png" height="308" alt="Total open production error tasks, by month." /></a></div></p>

<p>Month-over-month plots based on <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Outstanding errors</h5>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_13" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p>Summary over recent months, per <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet</a>:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>Aug 2019 (1 of 14 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Oct 2019 (0 of 12 left)</td><td>✅ Last three tasks resolved!</td><td>-3</td></tr>
<tr><td>Jan 2020 (1 of 7 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Mar 2020 (2 of 2 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>Apr 2020 (5 of 14 left)</td><td>⚠️ Unchanged (over one year old).</td><td></td></tr>
<tr><td>May 2020 (5 of 14 left)</td><td>⏸ —</td><td></td></tr>
<tr><td>Jun 2020 (5 of 14 left)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Jul 2020 (4 of <a href="https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R" class="remarkup-link" rel="noreferrer">24 issues</a>)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Aug 2020 (13 of <a href="https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R" class="remarkup-link" rel="noreferrer">53 issues</a>)</td><td>⬇️ Two tasks resolved.</td><td>-2</td></tr>
<tr><td>Sep 2020 (7 of <a href="https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R" class="remarkup-link" rel="noreferrer">33 issues</a>)</td><td>⏸ —</td><td></td></tr>
<tr><td>Oct 2020 (20 of <a href="https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R" class="remarkup-link" rel="noreferrer">69 issues</a>)</td><td>⬇️ Two tasks resolved.</td><td>-2</td></tr>
<tr><td>Nov 2020 (9 of <a href="https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R" class="remarkup-link" rel="noreferrer">38 issues</a>)</td><td>⏸ —</td><td></td></tr>
<tr><td>Dec 2020 (7 of <a href="https://phabricator.wikimedia.org/maniphest/query/10NQy74iKaZJ/#R" class="remarkup-link" rel="noreferrer">33 issues</a>)</td><td>⬇️ Four tasks resolved.</td><td>-4</td></tr>
<tr><td>Jan 2021 (3 of <a href="https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R" class="remarkup-link" rel="noreferrer">50 issues</a>)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Feb 2021 (8 of <a href="https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R" class="remarkup-link" rel="noreferrer">20 issues</a>)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Mar 2021 (18 of <a href="https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R" class="remarkup-link" rel="noreferrer">48 issues</a>)</td><td>⬇️ Sixteen tasks resolved.</td><td>-16</td></tr>
<tr><td><strong>Apr 2021</strong> (25 of <a href="https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R" class="remarkup-link" rel="noreferrer">42 issues</a>)</td><td>42 new issues found, of which 25 remained open.</td><td>+42; -17</td></tr>
<tr></tr>
</table></div>



<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Tally</th><th></th></tr>
<tr><td>139</td><td>issues open, as of <a href="https://phabricator.wikimedia.org/phame/post/view/229/production_excellence_30_march_2021/" class="remarkup-link" rel="noreferrer">Excellence #30</a> (March 2021).</td></tr>
<tr><td>-31</td><td>issues closed, of the previously open issues.</td></tr>
<tr><td>+25</td><td>new issues that survived April 2021.</td></tr>
<tr><td>133</td><td>issues open, as of today (12 May 2021).</td></tr>
<tr></tr>
</table></div>

<p>Take a look at the workboard and look for tasks that could use your help:</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_14" aria-hidden="true"></span>View Workboard</span></a></span></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in production!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<blockquote><p><a href="https://en.wikiquote.org/wiki/One_Flew_Over_the_Cuckoo%27s_Nest_(film)" class="remarkup-link remarkup-link-ext" rel="noreferrer">🎥 McMurphy</a>: That nurse, man... she, uh, she ain&#039;t honest.<br />
Doctor: Ah now, look. Miss Ratched is one of the finest nurses we&#039;ve got in this institution.<br />
McMurphy: Ha! Well […] She likes a rigged game, know what I mean?</p></blockquote></div></content></entry><entry><title>Tracking memory issue in a Java application</title><link href="/phame/live/1/post/232/tracking_memory_issue_in_a_java_application/" /><id>https://phabricator.wikimedia.org/phame/post/view/232/</id><author><name>hashar (Antoine Musso)</name></author><published>2021-03-12T09:38:31+00:00</published><updated>2025-07-10T22:51:07+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>One of the critical pieces of our infrastructure is <a href="https://www.gerritcodereview.com/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Gerrit</a>. It hosts most of our git repositories and is the primary code review interface. Gerrit is written in the Java programming language which runs in the Java Virtual Machine (JVM).  For a couple years we have been struggling with memory issues which eventually led to an unresponsive service and unattended restarts. The symptoms were the usual ones: the application responses being slower and degrading until server side errors render the service unusable. Eventually the JVM terminates with:</p>

<p><tt class="remarkup-monospaced">java.lang.OutOfMemoryError: Java heap space</tt></p>

<p>This post is my journey toward identifying the root cause and having it fixed up by the upstream developers.  Given I barely knew anything about Java and much less about its ecosystem and tooling, I have learned more than a few things on the road and felt like it was worth sharing.</p>

<h2 class="remarkup-header">Prior work</h2>

<p>The first meaningful task was in June 2019 (<a href="https://phabricator.wikimedia.org/T225166" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_257"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T225166</span></span></a>) which over several months has led us to:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">replace aging underlying hardware</li>
<li class="remarkup-list-item">tuning the memory garbage collector and switching to the G1 garbage collector</li>
<li class="remarkup-list-item">raising the amount of memory allocated to the JVM (the heap)</li>
<li class="remarkup-list-item">upgraded the Debian operating system by two major release (<span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Jessie</span></span><span class="remarkup-nav-sequence-arrow"> → </span><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Stretch</span></span><span class="remarkup-nav-sequence-arrow"> → </span><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Buster</span></span></span>)</li>
<li class="remarkup-list-item">conduct a major upgrade of Gerrit (June 2020, Gerrit <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">2.15</span></span><span class="remarkup-nav-sequence-arrow"> → </span><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">3.2</span></span></span>)<ul class="remarkup-list">
<li class="remarkup-list-item">see <a href="https://phabricator.wikimedia.org/p/QChris/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_264"><span class="phui-tag-core phui-tag-color-person">@QChris</span></a> report <em><a href="https://groups.google.com/g/repo-discuss/c/G5wucKJg9Ag/m/pLin-i3mBgAJ" class="remarkup-link remarkup-link-ext" rel="noreferrer">Some insights from successful Gerrit 2.15 -&gt; 3.2 upgrade at Wikimedia</a></em></li>
</ul></li>
<li class="remarkup-list-item">bots crawling the repositories get moved to a replica</li>
<li class="remarkup-list-item">fixing lack of cache in a MediaWiki extension querying Gerrit more than it should have</li>
</ul>

<p>All of those were sane operations that are part of any application life-cycle, some were meant to address other issues. Raising the maximum heap size (20G to 32G) definitely reduced the frequency of crashes.</p>

<p>Still, we had memory filing over and over. The graph below shows the memory usage from September 2019 to September 2020. The increase of maximum heap usage in October 2020 is the JVM heap being raised from 20G to 32G.  Each of the &quot;little green hills&quot; correspond to memory filing up until we either restarted Gerrit or the JVM unattended crash:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/fmswzzf4jgezwoqkwr4f/PHID-FILE-eys3f3lyngohr2ef7m7a/Gerrit_3months_usedMemory.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_251"><img src="https://phab.wmfusercontent.org/file/data/fmswzzf4jgezwoqkwr4f/PHID-FILE-eys3f3lyngohr2ef7m7a/Gerrit_3months_usedMemory.png" height="499" width="1040" loading="lazy" alt="Gerrit_3months_usedMemory.png (499×1 px, 21 KB)" /></a></div></p>

<p>Zooming on a week, it is clearly seen the memory was almost entirely filled until we had to restart:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/pmys4sr2oxxfpk5iqg75/PHID-FILE-cy5dtxlvekpfzosx5e7v/gerrit_used_memory.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_252"><img src="https://phab.wmfusercontent.org/file/data/f55idqizxdcok3f6jc2t/PHID-FILE-octorojgig2rckpewg3r/preview-gerrit_used_memory.png" width="220" height="105.55769230769" alt="gerrit_used_memory.png (499×1 px, 24 KB)" /></a></div></p>

<p>This had to stop. Complaints about Gerrit being unresponsive, <a href="/tag/sre/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_266"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_265" aria-hidden="true"></span>SRE</span></a> having to respond to  <tt class="remarkup-monospaced">java.lang.OutOfMemoryError: Java heap space</tt> or us having to &quot;proactively&quot; restart before a week-end. They were not good practices.  Back and fresh from vacations, I filed a new task <a href="https://phabricator.wikimedia.org/T263008" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_258"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T263008</span></span></a> in September 2020 and started to tackle the problem on my spare time.  Would I be able to find my way in an ecosystem totally unknown to me?</p>

<p>Challenge accepted!</p>

<p><strong>stuff learned</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Routine maintenance are definitely a need</li>
<li class="remarkup-list-item">Don&#039;t expect things to magically solve but commit to thoroughly identify the root cause instead of hoping.</li>
</ul>



<h2 class="remarkup-header">Looking at memory</h2>

<p>Since the JVM runs out of memory, lets look at memory allocation.  The JDK provides several utilities to interact with a running JVM. Be it to attach a debugger, writing a copy of the whole heap or sending admin commands to the JVM.</p>

<p><tt class="remarkup-monospaced">jmap</tt> lets one take a full capture of the memory used by a Java virtual machine. It has to run as the same user as the application (we use Unix username <tt class="remarkup-monospaced">gerrit2</tt>) and when having multiple JDKs installed, one has to make sure to invoke the <tt class="remarkup-monospaced">jmap</tt> that is provided by the Java version running the targeted JVM.</p>

<p>Dumping the memory is then a magic:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">sudo -u gerrit2 /usr/lib/jvm/java-8-openjdk-amd64/bin/jmap \
  -dump:live,format=b,file=/var/lib/gerrit-202009170755.hprof &lt;pid of java process here&gt;</pre></div>

<p>It takes a few minutes depending on the number of objects. The resulting <tt class="remarkup-monospaced">.hprof</tt> file is a binary format, which can be interpreted by various tools.</p>

<p><tt class="remarkup-monospaced">jhat</tt>, a Java heap analyzer, is provided by the JDK along <tt class="remarkup-monospaced">jmap</tt>. I ran it disabling tracking of of object allocations (<tt class="remarkup-monospaced">-stack false</tt>) as well as references to object (|<tt class="remarkup-monospaced">-refs false</tt>) since even with 64G of RAM and 32 core it took a few hours and eventually crashed. That is due to the insane amount of live objects. On the server I thus ran:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">/usr/lib/jvm/java-8-openjdk-amd64/bin/jhat -stack false -refs false gerrit-202009170755.hprof</pre></div>

<p>It spawns a web service which I can reach from my machine over ssh using some port redirection and open a web browser for it:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">ssh  -C -L 8080:ip6-localhost:7000 gerrit1001.wikimedia.org &amp;
xdg-open http://ip6-localhost:8080/</pre></div>

<p><strong>Instance Counts for All Classes (excluding native types)</strong></p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">2237744 instances of class org.eclipse.jgit.lib.ObjectId
2128766 instances of class org.eclipse.jgit.lib.ObjectIdRef$PeeledNonTag
735294 instances of class org.eclipse.jetty.util.thread.Locker
735294 instances of class org.eclipse.jetty.util.thread.Locker$Lock
735283 instances of class org.eclipse.jetty.server.session.Session
...</pre></div>

<p>And an other view shows 3.5G of byte arrays.</p>

<p>I got pointed to <a href="https://heaphero.io/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://heaphero.io/</a> however the file is too large to upload and it contains sensitive information (credentials, users personal information) which we can not share with a third party.</p>

<p>Nothing really conclusive at this point, the heap dump has been taken shortly after a restart and Gerrit was not in trouble.</p>

<p>Eventually I found <a href="https://github.com/javamelody/javamelody/wiki" class="remarkup-link remarkup-link-ext" rel="noreferrer">Javamelody</a> has a view providing the exact same information without all the trouble of figuring out <tt class="remarkup-monospaced">jmap</tt>, <tt class="remarkup-monospaced">jhat</tt> and <tt class="remarkup-monospaced">ssh</tt> proper set of parameters. Just browse to the monitoring page and:<br />
<div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/qxeddm4b2a5itzrkjp5y/PHID-FILE-mxou2gjfbacpavepjc7v/gerrit_javamelody_heaphisto.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_253"><img src="https://phab.wmfusercontent.org/file/data/qxeddm4b2a5itzrkjp5y/PHID-FILE-mxou2gjfbacpavepjc7v/gerrit_javamelody_heaphisto.png" height="517" width="990" loading="lazy" alt="gerrit_javamelody_heaphisto.png (517×990 px, 131 KB)" /></a></div></p>

<p><strong>stuff learned</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><tt class="remarkup-monospaced">jmap</tt> to issue commands to the jvm including taking a heap dump</li>
<li class="remarkup-list-item"><tt class="remarkup-monospaced">jhat</tt> to run analysis with some options required to make it workable</li>
<li class="remarkup-list-item">Use JavaMelody instead</li>
</ul>

<h2 class="remarkup-header">JVM handling of out of memory error</h2>

<p>An idea was to take a heap dump whenever the JVM encounters an out of memory error. That can be turned on by passing the extended option <tt class="remarkup-monospaced">HeapDumpOnOutOfMemoryError</tt> to the JVM and specifying where the dump will be written to with <tt class="remarkup-monospaced">HeapDumpPath</tt>:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">java \
  -XX:+HeapDumpOnOutOfMemoryError \
  -XX:HeapDumpPath=/srv/gerrit \
  -jar gerrit.war ...</pre></div>

<p>And surely next time it ran out of memory:</p>

<p>Nov 07 13:43:35 gerrit2001 java[30197]: java.lang.OutOfMemoryError: Java heap space<br />
Nov 07 13:43:35 gerrit2001 java[30197]: Dumping heap to /srv/gerrit/java_pid30197.hprof ...<br />
Nov 07 13:47:02 gerrit2001 java[30197]: Heap dump file created [35616147146 bytes in 206.962 secs]</p>

<p>Which results in a 34GB dump file which was not convenient for a full analysis. Even with 16G of heap for the analyze and a couple hours of CPU churning it was not any helpful</p>

<p>And at this point the JVM is still around, the <tt class="remarkup-monospaced">java</tt> process is still there and thus systemd does not restart the service for us even though we have instructed it to do so:</p>

<div class="remarkup-code-block" data-code-lang="ini" data-sigil="remarkup-code-block"><div class="remarkup-code-header">/lib/systemd/system/gerrit.service</div><pre class="remarkup-code"><span></span><span class="k">[Service]</span>
<span class="na">ExecStart</span><span class="o">=</span><span class="s">java -jar gerrit.war</span>
<span class="na">Restart</span><span class="o">=</span><span class="s">always</span>
<span class="na">RestartSec</span><span class="o">=</span><span class="s">2s</span></pre></div>

<p>That lead to our Gerrit replica being down for a whole weekend with no alarm whatsoever (<a href="https://phabricator.wikimedia.org/T267517" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_259"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T267517</span></span></a>). I imagine the reason for the JVM not exiting on an <tt class="remarkup-monospaced">OutOfMemoryError</tt> is to let one investigate the reason. Just like heap dump, the behavior can be configured via the <tt class="remarkup-monospaced">ExitOnOutOfMemoryError</tt> extended option:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">java -XX:+ExitOnOutOfMemoryError</pre></div>

<p>Next time the JVM will exit causing <tt class="remarkup-monospaced">systemd</tt> to notice the service went away and so it will happily restart it again.</p>

<p><strong>stuff learned</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item">automatic heap dumping with the JVM for future analysis</li>
<li class="remarkup-list-item">Be sure to have the JVM exit when running out of memory so <tt class="remarkup-monospaced">systemd</tt> will restart the service</li>
<li class="remarkup-list-item">Process can be up while still not serving its purpose</li>
</ul>



<h2 class="remarkup-header">Side track to jgit cache</h2>

<p>When I filed the task, I suspected enabling git protocol version 2 (<a href="https://phabricator.wikimedia.org/J199" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_256"><span class="phui-tag-core phui-tag-color-object">J199</span></a>) on CI might have been the root cause.  That eventually lead me to look at how Gerrit caches git operations.  Being a Java application it does not use the regular <tt class="remarkup-monospaced">git</tt> command but a pure Java implementation <tt class="remarkup-monospaced">jgit</tt>, a project started by the same author as Gerrit (Shawn Pearce).</p>

<p>To speed up operations, jgit keeps git objects in memory with various tuning settings. You can read more about it at  <a href="https://phabricator.wikimedia.org/T263008#6601490" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_260"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T263008#6601490</span></span></a> , but in the end it was of no use for the problem. <a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_267"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a> would later point out that jgit cache does not overgrow past its limit:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/dk7rcxuzqpzjvqn2mouv/PHID-FILE-cqfyjfptmz7b42tzjsqg/Screenshot-2020-12-21-13%3A09%3A10.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_254"><img src="https://phab.wmfusercontent.org/file/data/4xd54ky6what55jmcj24/PHID-FILE-zhpwlpewpbqlo24drmk2/preview-Screenshot-2020-12-21-13%3A09%3A10.png" width="220" height="86.955145118734" alt="Screenshot-2020-12-21-13:09:10.png (749×1 px, 152 KB)" /></a></div></p>

<p>The investigation was not a good lead, but surely it prompted us to have a better view as to what is going on in the jgit cache. But to do so we would need to expose historical metrics of the status of the cache.</p>

<p><strong>Stuff learned</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Jgit has in memory caches to hold frequently accessed repositories / objects in the JVM memory speeding up access to them.</li>
</ul>



<h2 class="remarkup-header">Metrics collection</h2>

<p>We always had trouble determining whether our jgit cache was properly sized and tuned it randomly with little information. Eventually I found out that Gerrit does have a wide range of metrics available which are described at <a href="https://gerrit.wikimedia.org/r/Documentation/metrics.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit.wikimedia.org/r/Documentation/metrics.html</a> .  I always wondered how we could access them without having to write a plugin.</p>

<p>The first step was to add the <a href="https://gerrit.wikimedia.org/r/plugins/metrics-reporter-jmx/Documentation/config.md" class="remarkup-link remarkup-link-ext" rel="noreferrer">metrics-reporter-jmx</a> plugin. It registers all the metrics with JMX, a Java system to manage resources. That is then exposed by JavaMelody and at least let us browse the metrics:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/sakw2wyfiikgzmdqu7fp/PHID-FILE-dc3x3jyu3h5d65yh5akr/gerrit_jgit_cache_metrics.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_255"><img src="https://phab.wmfusercontent.org/file/data/sakw2wyfiikgzmdqu7fp/PHID-FILE-dc3x3jyu3h5d65yh5akr/gerrit_jgit_cache_metrics.png" height="329" width="422" loading="lazy" alt="gerrit_jgit_cache_metrics.png (329×422 px, 34 KB)" /></a></div></p>

<p>I long had a task to get those metrics exposed (<a href="https://phabricator.wikimedia.org/T184086" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_261"><span class="phui-tag-core phui-tag-color-object">T184086</span></a>) but never had a strong enough incentive to work it.  The idea was to expose those metrics to the Prometheus monitoring system which would scrape them and make them available in Grafana.  They can be exposed using the <a href="https://gerrit.wikimedia.org/r/plugins/metrics-reporter-prometheus/Documentation/config.md" class="remarkup-link remarkup-link-ext" rel="noreferrer">metrics-reporter-prometheus</a> plugin. There is some configuration required to create an authentication token that lets Prometheus scrape the metrics and it is then all set and collected.</p>

<p>In Grafana, discovering which metrics are of interest might be daunting. Surely for the jgit cache it is only a few metrics we are interested in and crafting a basic dashboard for it is simple enough. But since we now collect all those metrics, surely we should have dashboards for anything that could be of interest to us.</p>

<p>While browsing the Gerrit upstream repositories, I found an unadvertised repository: <a href="https://gerrit.googlesource.com/gerrit-monitoring/" class="remarkup-link remarkup-link-ext" rel="noreferrer">gerrit/gerrit-monitoring</a>. The project aims at deploying to Kubernetes a monitoring stack for Gerrit composed of Grafana, Loki, Prometheus and Promtail.  While browsing the code, I found out they already had a Grafana template which I could import to our Grafana instance with some little modifications.</p>

<p>During the Gerrit Virtual Summit I raised that as a potentially interesting project for the whole community and surely a few days later:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">The existing dashboard got adjusted to be easily imported outside of the <tt class="remarkup-monospaced">gerrit-monitoring</tt> context ( <a href="https://gerrit-review.googlesource.com/c/gerrit-monitoring/+/289222/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit-review.googlesource.com/c/gerrit-monitoring/+/289222/</a> )</li>
<li class="remarkup-list-item">The authors have sent a wide announcement to highlight the project: <em><a href="https://groups.google.com/g/repo-discuss/c/AV0CNmBnDFQ/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Monitoring dashboards for Gerrit (An update about the gerrit-monitoring project)</a></em></li>
</ul>

<p>In the end we have a few useful Grafana dashboards, the ones imported from the <tt class="remarkup-monospaced">gerrit-monitoring</tt> repo are suffixed with <tt class="remarkup-monospaced">(upstream)</tt>: <a href="https://grafana.wikimedia.org/dashboards/f/5AnaHr2Mk/gerrit" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://grafana.wikimedia.org/dashboards/f/5AnaHr2Mk/gerrit</a></p>

<p>And I crafted one dedicated to jgit cache: <a href="https://grafana.wikimedia.org/d/8YPId9hGz/jgit-block-cache" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://grafana.wikimedia.org/d/8YPId9hGz/jgit-block-cache</a></p>

<p><strong>Stuff learned</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Prometheus scraping system with auth token</li>
<li class="remarkup-list-item">Querying Prometheus metrics in Grafana and its vector selection mechanism</li>
<li class="remarkup-list-item">Other Gerrit administrators already created Vizualization</li>
<li class="remarkup-list-item">Raising our reuse prompted upstream to further advertise their solution which hopefully has led to more adoption of their solution.</li>
</ul>



<h2 class="remarkup-header">Despair</h2>

<p>After a couple months, there was no good lead. The issue has been around for a while, in a programming language I don&#039;t know with assisting tooling completely alien to me. I even found <tt class="remarkup-monospaced">jcmd</tt> to issue commands to the JVM, such as dumping a class histogram, the same view provided by JavaMelody:</p>

<div class="remarkup-code-block" data-code-lang="console" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="gp">$ sudo -u gerrit2 jcmd 2347 GC.class_histogram</span>
<span class="go">num     #instances         #bytes  class name</span>
<span class="go">3	----------------------------------------------</span>
<span class="go">4	   5:      10042773     1205132760  org.eclipse.jetty.server.session.SessionData</span>
<span class="go">5	   8:      10042773      883764024  org.eclipse.jetty.server.session.Session</span>
<span class="go">6	  11:      10042773      482053104  org.eclipse.jetty.server.session.Session$SessionInactivityTimer$1</span>
<span class="go">7	  13:      10042779      321368928  org.eclipse.jetty.util.thread.Locker</span>
<span class="go">8	  14:      10042773      321368736  org.eclipse.jetty.server.session.Session$SessionInactivityTimer</span>
<span class="go">9	  17:      10042779      241026696  org.eclipse.jetty.util.thread.Locker$Lock</span></pre></div>

<p>That is quite handy when already in a terminal, saves a few click to switch to a browser, head to JavaMelody and find the link.</p>

<p>But it is the last week of work of the year.</p>

<p>Christmas is in two days.</p>

<p>Kids are messing up all around the home office since we are under lockdown.</p>

<p>Despair.</p>

<p>Out of rage I just stall the task shamelessly hoping for Java 11 and Gerrit 3.3 upgrades to solve this. Much like we hoped the system would be fixed by upgrading.</p>

<p>Wait..</p>

<p>1 million?</p>

<p>ONE MILLION ??</p>

<p>TEN TO THE POWER OF SIX ???</p>

<p>WHY IS THERE A MILLION HTTP SESSIONS HELD IN GERRIT !!!!!!?11??!!??</p>

<div class="remarkup-code-block code-block-counterexample" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code remarkup-counterexample">10042773  org.eclipse.jetty.server.session.SessionData</pre></div>

<p>There. Right there. It was there since the start. In plain sight. And surely 19 hours later Gerrit had created 500k sessions for 56 MBytes of memory. It is slowly but surely leaking memory.</p>

<p><strong>stuff learned</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Everything clears up once one has found the root cause</li>
</ul>



<h2 class="remarkup-header">When upstream saves you</h2>

<p>At this point it was just an intuition, albeit a strong one. I know not much about Java or Gerrit internals and went to invoke upstream developers for further assistance.  But first, I had to reproduce the issue and investigate a bit more to give as many details as possible when filing a bug report.</p>

<h3 class="remarkup-header">Reproduction</h3>

<p>I copied a small heap dump I took just a few minutes after Gerrit got restarted, it had a manageable size making it easier to investigate.  Since I am not that familiar with the Java debugging tools, I went with what I call a <em>clickodrome interface</em>, a UI that lets you interact solely with mouse clicks:  <a href="https://visualvm.github.io/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://visualvm.github.io/</a></p>

<p>Once the heap dump is loaded, I could easily access objects. Notably the <tt class="remarkup-monospaced">org.eclipse.jetty.server.session.Session</tt> objects had a property <tt class="remarkup-monospaced">expiry=0</tt>, often an indication of no expiry at all.  Expired sessions are cleared by Jetty via a HouseKeeper thread which inspects sessions and deletes expired ones. I have confirmed it does run every 600 seconds, but since sessions are set to not expire, they pile up leading to the memory leak.</p>

<p>On December 24th, a day before Christmas, I filed a private security issue to upstream (now public): <a href="https://bugs.chromium.org/p/gerrit/issues/detail?id=13858" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://bugs.chromium.org/p/gerrit/issues/detail?id=13858</a></p>

<p>After the Christmas and weekend break upstream acknowledged and I did more investigating to pinpoint the source of the issue. The sessions are created by a <tt class="remarkup-monospaced">SessionHandler</tt> and debug logs show: <tt class="remarkup-monospaced">dftMaxIdleSec=-1</tt> or <em>Default maximum idle seconds</em> set to <tt class="remarkup-monospaced">-1</tt>, which means that by default the sessions are created without any expiry.  The Jetty debug log then gave a bit more insight:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">DEBUG org.eclipse.jetty.server.session : Session xxxx is immortal &amp;&amp; no inactivity eviction</pre></div>

<p>It is immortal and is thus never picked up by the session cleaner:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">DEBUG org.eclipse.jetty.server.session : org.eclipse.jetty.server.session.SessionHandler
==dftMaxIdleSec=-1 scavenging session ids []
                                          ^^^ --- empty array</pre></div>

<p>Our Gerrit instance has several plugins and the leak can potentially come from one of them. I then booted a dummy Gerrit on my machine (<tt class="remarkup-monospaced">java -jar gerrit-3.3.war</tt>) cloned the built-in All-Projects.git repository repeatedly and observed objects with VisualVM.  Jetty sessions with no expiry were created, which rules out plugins and point at Gerrit itself.  Upstream developer Luca Milanesio pointed out that Gerrit creates a Jetty session which is intended for plugins.  I have also narrowed down the leak to only be triggered by git operations made over HTTP.  Eventually, by commenting out a single line of Gerrit code, I eliminated the memory leak and upstream pointed at a change released a few versions ago that may have been the cause.</p>

<p>Upstream then went on to reproduce on their side, took some measurement before and after commenting out and confirmed the leak (750 bytes for each git request made over HTTP).  Given the amount of traffic we received from humans, systems or bots, it is not surprising we ended up hitting the JVM memory limit rather quickly.</p>

<p>Eventually the fix got released and new Gerrit versions were released. We upgraded to the new release and haven&#039;t restarted Gerrit since then. Problem solved!</p>

<p><strong>Stuff learned</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Even with no knowledge about a programming language, if you can build and run it, you can still debug using <tt class="remarkup-monospaced">print</tt> or the universal optimization operator: <tt class="remarkup-monospaced">//</tt>.</li>
<li class="remarkup-list-item">Quickly acknowledge upstream hints, ideas and recommendations. Even if it is to dismiss one of their leads.</li>
<li class="remarkup-list-item">Write a report, this blog.</li>
</ul>

<p>Thank you upstream developers Luca Milanesio and David Ostrovsky for fixing the issue!</p>

<p>Thank you <a href="https://phabricator.wikimedia.org/p/dancy/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_268"><span class="phui-tag-core phui-tag-color-person">@dancy</span></a> for the added clarifications as well as typos and grammar fixes.</p>

<h3 class="remarkup-header">References</h3>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T225166" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_262"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T225166</span></span></a> - 2019 issue referencing out of heap</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T263008" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_263"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T263008</span></span></a> - 2020 fresh new task that eventually has lead to resolution</li>
<li class="remarkup-list-item">Upstream task <a href="https://bugs.chromium.org/p/gerrit/issues/detail?id=13858" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://bugs.chromium.org/p/gerrit/issues/detail?id=13858</a></li>
<li class="remarkup-list-item">The root cause, code adding auditing of GIT over http commands: <a href="https://gerrit-review.googlesource.com/c/gerrit/+/203611" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit-review.googlesource.com/c/gerrit/+/203611</a></li>
<li class="remarkup-list-item">Fix: <a href="https://gerrit.googlesource.com/gerrit/+/4b2b2d831b39f57720923d5b0c23f1b6d94be0e9" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit.googlesource.com/gerrit/+/4b2b2d831b39f57720923d5b0c23f1b6d94be0e9</a></li>
<li class="remarkup-list-item"><a href="https://nvd.nist.gov/vuln/detail/CVE-2021-22553" class="remarkup-link remarkup-link-ext" rel="noreferrer">CVE-2021-22553</a> - Google Inc. vulnerability report</li>
</ul></div></content></entry><entry><title>Production Excellence #30: March 2021</title><link href="/phame/live/1/post/229/production_excellence_30_march_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/229/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-04-03T00:20:25+00:00</published><updated>2021-05-12T22:57:28+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">Incidents</h5>

<p>2 documented incidents. That&#039;s average for this time of year, when we <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">usually had</a> 1-4 incidents.</p>

<p>Learn about recent incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech, or <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> Trends</h5>

<p>In March we made significant progress on the outstanding errors of previous months. Several of the 2020 months are finally starting to empty out. But with over 30 new tasks from March itself remaining, we did not break even, and ended up slightly higher than last month. This could be reversing two positive trends, but I hope not.</p>

<p>Firstly, there was a steep increase in the number of new production errors that were not resolved within the same month. This is counter the positive trend we started in November. The past four months typically saw 10-20 errors outlive their month of discovery, and this past month saw 34 of its 48 new errors remain unresolved.</p>

<p>Secondly, we saw the overall number of unresolved errors increase again. <a href="https://phabricator.wikimedia.org/phame/post/view/227/production_excellence_28_january_2021/" class="remarkup-link" rel="noreferrer">This January</a> began a downward trend for the first time in thirteen months, which continued nicely through February. But, this past month we broke even and even pushed upward by one task. I hope this is just a breather and we can continue our way downward.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/wpe4pce45edgshvcriin/PHID-FILE-qbs6i6pulphglxpvvlxu/proderr-monthly_2021-04-01.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_269"><img src="https://phab.wmfusercontent.org/file/data/wpe4pce45edgshvcriin/PHID-FILE-qbs6i6pulphglxpvvlxu/proderr-monthly_2021-04-01.png" height="344" alt="Unresolved error reports, stacked by month." /></a></div><br />
<div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/325aijec64hoa47wvmdl/PHID-FILE-yejdh7dqgvnfvqnj2d7t/proderr-totals_2021-04-01.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_270"><img src="https://phab.wmfusercontent.org/file/data/325aijec64hoa47wvmdl/PHID-FILE-yejdh7dqgvnfvqnj2d7t/proderr-totals_2021-04-01.png" height="308" alt="Total open production error tasks, by month." /></a></div></p>

<p>Month-over-month plots based on <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">Outstanding errors</h5>

<p>Take a look at the workboard and look for tasks that could use your help:</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_15" aria-hidden="true"></span>View Workboard</span></a></span></p>

<p>Summary over recent months, per <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet</a>:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td>Jul 2019 (0 of 18 left)</td><td>✅ Last two tasks resolved!</td><td>-2</td></tr>
<tr><td>Aug 2019 (1 of 14 left)</td><td>⚠️ <em>Unchanged (over one year old).</em></td><td></td></tr>
<tr><td>Oct 2019 (3 of 12 left)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Nov 2019 (0 of 5 left)</td><td>✅ Last task resolved!</td><td>-1</td></tr>
<tr><td>Dec 2019 (0 of 9 left)</td><td>✅ Last task resolved!</td><td>-1</td></tr>
<tr><td>Jan 2020 (1 of 7 left)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Feb 2020 (0 of 7 left)</td><td>✅ Last task resolved!</td><td>-1</td></tr>
<tr><td>Mar 2020 (2 of 2 left)</td><td>⚠️ <em>Unchanged (over one year old).</em></td><td></td></tr>
<tr><td>Apr 2020 (5 of 14 left)</td><td>⬇️ Four tasks resolved.</td><td>-4</td></tr>
<tr><td>May 2020 (5 of 14 left)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Jun 2020 (6 of 14 left)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Jul 2020 (5 of <a href="https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R" class="remarkup-link" rel="noreferrer">24 issues</a>)</td><td>⬇️ Four tasks resolved.</td><td>-4</td></tr>
<tr><td>Aug 2020 (15 of <a href="https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R" class="remarkup-link" rel="noreferrer">53 issues</a>)</td><td>⬇️ Five tasks resolved.</td><td>-5</td></tr>
<tr><td>Sep 2020 (7 of <a href="https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R" class="remarkup-link" rel="noreferrer">33 issues</a>)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Oct 2020 (22 of <a href="https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R" class="remarkup-link" rel="noreferrer">69 issues</a>)</td><td>⬇️ Four tasks resolved.</td><td>-4</td></tr>
<tr><td>Nov 2020 (9 of <a href="https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R" class="remarkup-link" rel="noreferrer">38 issues</a>)</td><td>⬇️ Two tasks resolved.</td><td>-2</td></tr>
<tr><td>Dec 2020 (11 of <a href="https://phabricator.wikimedia.org/maniphest/query/10NQy74iKaZJ/#R" class="remarkup-link" rel="noreferrer">33 issues</a>)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Jan 2021 (4 of <a href="https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R" class="remarkup-link" rel="noreferrer">50 issues</a>)</td><td>⬇️ One task resolved.</td><td>-1</td></tr>
<tr><td>Feb 2021 (9 of <a href="https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R" class="remarkup-link" rel="noreferrer">20 issues</a>)</td><td>⬇️ Two tasks resolved.</td><td>-2</td></tr>
<tr><td><strong>Mar 2021</strong> (34 of <a href="https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R" class="remarkup-link" rel="noreferrer">48 issues</a>)</td><td>34 new tasks survived and remain unresolved.</td><td>+48; -14</td></tr>
<tr></tr>
</table></div>



<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Tally</th><th></th></tr>
<tr><td>138</td><td>issues open, as of <a href="https://phabricator.wikimedia.org/phame/post/view/228/production_excellence_29_february_2021/" class="remarkup-link" rel="noreferrer">Excellence #29</a> (6 Mar 2021).</td></tr>
<tr><td>-33</td><td>issues closed, of the previous 138 open issues.</td></tr>
<tr><td>+34</td><td>new issues that survived March 2021.</td></tr>
<tr><td>139</td><td>issues open, as of today (2 Apr 2021).</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p><a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status, Wikitech</a>.<br />
<a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia incident stats by Krinkle, CodePen</a>.<br />
<a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">Production Excellence: Month-over-month spreadsheet and plot</a>.<br />
<a href="https://phabricator.wikimedia.org/project/reports/1055/" class="remarkup-link" rel="noreferrer">Report charts for Wikimedia-production-error project, Phabricator</a>.</p></div></content></entry><entry><title>Production Excellence #29: February 2021</title><link href="/phame/live/1/post/228/production_excellence_29_february_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/228/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-03-06T01:03:09+00:00</published><updated>2021-03-06T01:03:09+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📈   Incidents</h5>

<p>3 documented incidents last month, [1] which is average for the time of year. [2]</p>

<p>Learn about these incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech, and their <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator.</p>

<p>For those with NDA-restricted access, there may be additional <a href="https://drive.google.com/drive/u/0/folders/1d2hlER4s_GPe0TWDqNfa2RQcYFuOdhOo" class="remarkup-link remarkup-link-ext" rel="noreferrer">private incident reports 🔒</a> available.</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 <strong>Did you know</strong>: Our <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident reports</a> have switched to using the ISO date format in their titles and listings, for improved readability and edit-ability (esp. when publishing on a later date). So long 202010221, and hello 2021-02-21!</div>
<div class="remarkup-reply-body"></div>
</blockquote>



<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> 📊   Trends</h5>

<p>In February we saw a continuation of the new downward trend that began <a href="https://phabricator.wikimedia.org/phame/post/view/227/production_excellence_28_january_2021/" class="remarkup-link" rel="noreferrer">this January</a>, which came after twelve months of continued rising. Let&#039;s make sure this trend sticks with us as we work our way through the debt, whilst also learning to have a healthy week-to-week iteration where we monitor and follow-up on any new developments such that they don&#039;t introduce lasting regressions.</p>

<p>The recent tally (issues filed since we started reporting in March 2019) is down to 138 unresolved errors, from 152 last month. The old backlog (pre-2019 issues) also continued its 5-month streak and is down to 148, from 160 last month. If this progress continues we&#039;ll soon have fewer &quot;Old&quot; issues than &quot;Recent&quot; issues, and possibly by the start of 2022 we may be able to report and focus only on our rotation through recent issues as hopefully we are then balancing our work such that issues reported this month are addressed mostly in the same month or otherwise later that quarter within 2-3 months. Visually that would manifest as the colored chunks having a short life on the chart with each drawn at a sharp downwards angle – instead of dragged out where it was building up an ever-taller shortcake. I do like cake, but I prefer the kind I can eat. 🍰</p>

<p>Month-over-month plots based on <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>. [3] [4]</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/fmxtcipjtwkpinbxrn7j/PHID-FILE-eaft3axea5nwajzfm7m7/2021-03-05_proderr-monthly.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_271"><img src="https://phab.wmfusercontent.org/file/data/fmxtcipjtwkpinbxrn7j/PHID-FILE-eaft3axea5nwajzfm7m7/2021-03-05_proderr-monthly.png" height="290" alt="Unresolved error reports stacked by recent month" /></a></div></td><td><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/hj2bgli6ejlrf2xfacze/PHID-FILE-mbzjlcscelaoeqprlqqz/2021-03-05_proderr-totals.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_272"><img src="https://phab.wmfusercontent.org/file/data/hj2bgli6ejlrf2xfacze/PHID-FILE-mbzjlcscelaoeqprlqqz/2021-03-05_proderr-totals.png" height="290" alt="Total open production error tasks, by month" /></a></div></td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖   Outstanding errors</h5>

<p>Summary over recent months:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">⚠️ July 2019 (2 of 18 issues left): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ August 2019 (1 of 14 issues): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ October 2019 (4 of 12 issues): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ November 2019 (1 of 5 issues): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ December 2019 (1 of 9 issues): One task resolved (-1).</li>
<li class="remarkup-list-item">⚠️ January 2020 (2 of 7 issues): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ February 2020 (1 of 7 issues): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ March 2020 (2 of 2 issues): <em>no change</em>.</li>
<li class="remarkup-list-item">April 2020 (9 of 14 issues left): <em>no change</em>.</li>
<li class="remarkup-list-item">May 2020 (6 of 14 issues left): <em>no change</em>.</li>
<li class="remarkup-list-item">June 2020 (7 of 14 issues left): <em>no change</em>.</li>
<li class="remarkup-list-item">July 2020 (9 of <a href="https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R" class="remarkup-link" rel="noreferrer">24 new issues</a>): <em>no change</em>.</li>
<li class="remarkup-list-item">August 2020 (20 of <a href="https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R" class="remarkup-link" rel="noreferrer">53 new issues</a>): Two tasks resolved (-2).</li>
<li class="remarkup-list-item">September 2020 (9 of <a href="https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R" class="remarkup-link" rel="noreferrer">33 new issues</a>): Five tasks resolved (-5).</li>
<li class="remarkup-list-item">October 2020 (26 of <a href="https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R" class="remarkup-link" rel="noreferrer">69 new issues</a>): Five tasks resolved (-5).</li>
<li class="remarkup-list-item">November 2020 (11 of <a href="https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R" class="remarkup-link" rel="noreferrer">38 new issues</a>): Three tasks resolved (-3).</li>
<li class="remarkup-list-item">December 2020 (12 of  <a href="https://phabricator.wikimedia.org/maniphest/query/10NQy74iKaZJ/#R" class="remarkup-link" rel="noreferrer">33 new issues</a>): Seven tasks resolved (-7).</li>
<li class="remarkup-list-item">January 2021 (5 of <a href="https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R" class="remarkup-link" rel="noreferrer">50 new issues</a>): Two tasks resolved (-2).</li>
<li class="remarkup-list-item"><strong>February 2021</strong>: 11 of <a href="https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R" class="remarkup-link" rel="noreferrer">20 new issues</a> survived the month and remained unresolved (+20; -9)</li>
</ul>



<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Recent tally</th><th></th></tr>
<tr><td>152</td><td>issues open, as of <a href="https://phabricator.wikimedia.org/phame/post/view/227/production_excellence_28_january_2021/" class="remarkup-link" rel="noreferrer">Excellence #28</a> (16 Feb 2021).</td></tr>
<tr><td>-25</td><td>issues closed since, of the previous 152 open issues.</td></tr>
<tr><td>+11</td><td>new issues that survived Feb 2021.</td></tr>
<tr><td>138</td><td>issues open, as of today 5 Mar 2021.</td></tr>
<tr></tr>
</table></div>

<p>For the on-going month of March 2021, we&#039;ve got <a href="https://phabricator.wikimedia.org/maniphest/query/aDItN1Nm0ron/#R" class="remarkup-link" rel="noreferrer">12 new issues</a> so far.</p>

<p>Take a look at <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">the workboard</a> and look for tasks that could use your help!</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_16" aria-hidden="true"></span>View Workboard</span></a></span></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉   Thanks!</h5>

<p>Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p>[1] <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status Wikitech</a>.<br />
[2] <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia incident stats by Krinkle, CodePen</a>.<br />
[3] <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml" class="remarkup-link remarkup-link-ext" rel="noreferrer">Month-over-month, Production Excellence spreadsheet</a>.<br />
[4] <a href="https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R" class="remarkup-link" rel="noreferrer">Open tasks, Wikimedia-prod-error, Phabricator</a>.</p></div></content></entry><entry><title>Production Excellence #28: January 2021</title><link href="/phame/live/1/post/227/production_excellence_28_january_2021/" /><id>https://phabricator.wikimedia.org/phame/post/view/227/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-02-19T06:45:02+00:00</published><updated>2021-03-05T23:57:55+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📈   Incidents</h5>

<p>1 documented incident last month. That&#039;s the third month in a row that we are at or near zero major incidents – not bad! [1] [2]</p>

<p>Learn about recent incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> on Wikitech, or <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator.</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 <strong>Did you know</strong>: Our <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status</a> page provides a green-yellow status reflection over the past ten days, with a link to the most recent incident doc if there was any during that time.</div>
<div class="remarkup-reply-body"></div>
</blockquote>



<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> 📊   Trends</h5>

<p>This January saw a small recovery in our otherwise negative upward trend. For the first time in twelve month more reports were closed than new reports having outlived the previous month without resolution. What happened twelve months ago? In January 2020, we also saw a small recovery during the otherwise upward trend before and after it.</p>

<p>Perhaps it&#039;s something about the post-December holidays that temporarily improves the quality and/or reduces the quantity — of code changes. Only time will tell if this is the start of a new positive trend, or merely a post-holiday break. [3]</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/6un5ecki7ec2pvjtgviz/PHID-FILE-ziovncq43bjczexwxvtr/2021-02-16_proderr-monthly.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_273"><img src="https://phab.wmfusercontent.org/file/data/6un5ecki7ec2pvjtgviz/PHID-FILE-ziovncq43bjczexwxvtr/2021-02-16_proderr-monthly.png" height="384" alt="Unresolved error reports stacked by recent month" /></a></div></p>

<p>While our month-to-month trend might not (yet) be improving, we do see persistent improvements in our overall backlog of pre-2019 reports. This is in part because we generally don&#039;t file new reports there, so it makes sense that it doesn&#039;t go back up, but it&#039;s still good to see downward progress every month, unlike with reports from more recent months which often see no change month-to-month (see &quot;Outstanding errors&quot; below, for example).</p>

<p>This positive trend on our &quot;Old&quot; backlog started in October 2020 and has consistently progressed every month since then (refer to the &quot;Old&quot; numbers in red on the below chart, or the same column in the <a href="https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet</a>). [3][4]</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/slgtmuuvxaubyunrwduw/PHID-FILE-7qeuw4ijmcf64l77futs/2021-02-16_proderr-totals.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_274"><img src="https://phab.wmfusercontent.org/file/data/slgtmuuvxaubyunrwduw/PHID-FILE-7qeuw4ijmcf64l77futs/2021-02-16_proderr-totals.png" height="371" alt="Total open production error tasks, by month" /></a></div></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖   Outstanding errors</h5>

<p>Summary over recent months:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">⚠️ July 2019 (2 of 18 issues left): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ August 2019 (1 of 14 issues): <em>no change</em>.</li>
<li class="remarkup-list-item">✅  September 2019 (0 of 12 issues): Last two tasks were resolved (-2).</li>
<li class="remarkup-list-item">⚠️ October 2019 (4 of 12 issues): One task resolved (-1).</li>
<li class="remarkup-list-item">⚠️ November 2019 (1 of 5 issues): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ December 2019 (2 of 9 issues), Two tasks resolved (-2).</li>
<li class="remarkup-list-item">⚠️ January 2020 (2 of 7 issues), no change.</li>
<li class="remarkup-list-item">⚠️ February 2020 (1 of 7 issues left), One task resolved (-1).</li>
<li class="remarkup-list-item">March 2020 (2 of 2 issues left), no change.</li>
<li class="remarkup-list-item">April 2020 (9 of 14 issues left): <em>no change</em>.</li>
<li class="remarkup-list-item">May 2020 (6 of 14 issues left): One task resolved (-1).</li>
<li class="remarkup-list-item">June 2020 (7 of 14 issues left): <em>no change</em>.</li>
<li class="remarkup-list-item">July 2020 (9 of <a href="https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R" class="remarkup-link" rel="noreferrer">24 new issues</a>): <em>no change</em>.</li>
<li class="remarkup-list-item">August 2020 (22 of <a href="https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R" class="remarkup-link" rel="noreferrer">53 new issues</a>): One task resolved (-1).</li>
<li class="remarkup-list-item">September 2020 (13 of <a href="https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R" class="remarkup-link" rel="noreferrer">33 new issues</a>): One task resolved (-1).</li>
<li class="remarkup-list-item">October 2020 (31 of <a href="https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R" class="remarkup-link" rel="noreferrer">69 new issues</a>): Four tasks fixed (-4).</li>
<li class="remarkup-list-item">November 2020 (14 of <a href="https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R" class="remarkup-link" rel="noreferrer">38 new issues</a>): <em>no change</em>.</li>
<li class="remarkup-list-item">December 2020 (19 of  <a href="https://phabricator.wikimedia.org/maniphest/query/10NQy74iKaZJ/#R" class="remarkup-link" rel="noreferrer">33 new issues</a>) Three tasks resolved (-3)</li>
<li class="remarkup-list-item"><strong>January 2021</strong>: 7 of <a href="https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R" class="remarkup-link" rel="noreferrer">50 new issues</a> survived the month and remained unresolved (+50; -43)</li>
</ul>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Recent tally</th><th></th></tr>
<tr><td>160</td><td>issues open, as of <a href="https://phabricator.wikimedia.org/phame/post/view/219/production_excellence_27_december_2020/" class="remarkup-link" rel="noreferrer">Excellence #27</a> (4 Feb 2021).</td></tr>
<tr><td>-15</td><td>issues closed since, of the previous 160 open issues.</td></tr>
<tr><td>+7</td><td>new issues that survived January 2021.</td></tr>
<tr><td>152</td><td>issues open, as of today (16 Feb 2021).</td></tr>
<tr></tr>
</table></div>

<p>January saw  <span class="visual-only phui-icon-view phui-font-fa fa-arrow-circle-up red" data-meta="0_275" aria-hidden="true"></span> <strong>+50 new production errors</strong> reported in a single month, which is an unfortunate all-time high. However, we&#039;ve also done remarkably well on addressing <span class="visual-only phui-icon-view phui-font-fa fa-arrow-circle-down green" data-meta="0_276" aria-hidden="true"></span> <strong>43</strong> of them within a month, when the potential root cause and diagnostics data were still fresh in our minds. Well done!</p>

<p>For the on-going month of February, there have been <a href="https://phabricator.wikimedia.org/maniphest/query/xjFr73QLJYlE/#R" class="remarkup-link" rel="noreferrer">16 new issues</a> reported so far.</p>

<p>Take a look at <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">the workboard</a> and look for tasks that could use your help!</p>

<p><span class="remarkup-nav-sequence"><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view " target="_blank" rel="noreferrer"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-list-alt" data-meta="0_17" aria-hidden="true"></span>View Workboard</span></a></span></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉   Thanks!</h5>

<p>Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p>[1] <a href="https://wikitech.wikimedia.org/wiki/Incident_status" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident status Wikitech</a>.<br />
[2] <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia incident stats by Krinkle, CodePen</a>.<br />
[3] <a href="https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit" class="remarkup-link remarkup-link-ext" rel="noreferrer">Month-over-month, Production Excellence spreadsheet</a>.<br />
[4] <a href="https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R" class="remarkup-link" rel="noreferrer">Open tasks, Wikimedia-prod-error, Phabricator</a>.</p></div></content></entry><entry><title>Production Excellence #27: December 2020</title><link href="/phame/live/1/post/219/production_excellence_27_december_2020/" /><id>https://phabricator.wikimedia.org/phame/post/view/219/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2021-02-04T05:46:03+00:00</published><updated>2021-02-04T18:35:07+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📈   Incidents</h5>

<p>1 documented incident in December. [1] In previous years, December typically had 4 or fewer documented incidents. [3]</p>

<p>Learn about recent incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation</a> on Wikitech, or <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> 📊   Trends</h5>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/jvfj6u2mg7662yuphdtk/PHID-FILE-asss4ok3n3thzq3rkdid/20210127-proderror-monthly.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_277"><img src="https://phab.wmfusercontent.org/file/data/jvfj6u2mg7662yuphdtk/PHID-FILE-asss4ok3n3thzq3rkdid/20210127-proderror-monthly.png" height="384" alt="Unresolved error reports stacked by recent month" /></a></div> <div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/bqbbgiwp7ya526eoeu3b/PHID-FILE-2sb6727l6rzsfowcyv54/20210127-proderror-totals.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_278"><img src="https://phab.wmfusercontent.org/file/data/bqbbgiwp7ya526eoeu3b/PHID-FILE-2sb6727l6rzsfowcyv54/20210127-proderror-totals.png" height="372" alt="Total open production error tasks, by month" /></a></div></p>

<p>Month-over-month plots based on <a href="https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>. [4] [2]</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖   Outstanding errors</h5>

<p>Take a look at the workboard and look for tasks that could use your help.<br />
→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Summary over recent months:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">⚠️ July 2019 (2 of 18 issues left): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ August 2019 (1 of 14 issues): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ September 2019 (2 of 12 issues): One task resolved (-1).</li>
<li class="remarkup-list-item">⚠️ October 2019 (5 of 12 issues): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ November 2019 (1 of 5 issues): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ December 2019 (4 of 9 issues), <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ January 2020 (2 of 7 issues), <em>no change</em>.</li>
<li class="remarkup-list-item">February 2020 (2 of 7 issues left), <em>no change</em>.</li>
<li class="remarkup-list-item">March 2020 (2 of 2 issues left), <em>no change</em>.</li>
<li class="remarkup-list-item">April 2020 (9 of 14 issues left): <em>no change</em>.</li>
<li class="remarkup-list-item">May 2020 (7 of 14 issues left): <em>no change</em>.</li>
<li class="remarkup-list-item">June 2020  (7 of 14 issues left): <em>no change</em>.</li>
<li class="remarkup-list-item">July 2020 (9 of <a href="https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R" class="remarkup-link" rel="noreferrer">24 new issues</a>): <em>no change</em>.</li>
<li class="remarkup-list-item">August 2020 (23 of <a href="https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R" class="remarkup-link" rel="noreferrer">53 new issues</a>): <em>no change</em>.</li>
<li class="remarkup-list-item">September 2020 (13 of <a href="https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R" class="remarkup-link" rel="noreferrer">33 new issues</a>): One task resolved (-1).</li>
<li class="remarkup-list-item">October 2020 (35 of <a href="https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R" class="remarkup-link" rel="noreferrer">69 new issues</a>): Four issues fixed (-4).</li>
<li class="remarkup-list-item">November 2020 (14 of <a href="https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R" class="remarkup-link" rel="noreferrer">38 new issues</a>): Five issues fixed (-5).</li>
<li class="remarkup-list-item"><strong>December 2020</strong>: 22 of <a href="https://phabricator.wikimedia.org/maniphest/query/10NQy74iKaZJ/#R" class="remarkup-link" rel="noreferrer">33 new issues</a> survived the month and remained unresolved (+33; -22)</li>
</ul>



<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Recent tally</th><th></th></tr>
<tr><td>149</td><td>as of <a href="https://phabricator.wikimedia.org/phame/post/view/218/production_excellence_26_november_2020/" class="remarkup-link" rel="noreferrer">Excellence #26</a> (15 Dec 2020).</td></tr>
<tr><td>-11</td><td>closed of the 149 recent issues.</td></tr>
<tr><td>+22</td><td>new issues survived December 2020.</td></tr>
<tr><td>160 <span class="visual-only phui-icon-view phui-font-fa fa-arrow-circle-up red" data-meta="0_279" aria-hidden="true"></span></td><td>as of 27 Jan 2020.</td></tr>
<tr></tr>
</table></div>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉   Thanks!</h5>

<p>Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p>[1] <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2021" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation 2020, Wikitech</a>.<br />
[2] <a href="https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R" class="remarkup-link" rel="noreferrer">Open tasks, Wikimedia-prod-error, Phabricator</a>.<br />
[3] <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia incident stats by Krinkle, CodePen</a>.<br />
[4] <a href="https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit" class="remarkup-link remarkup-link-ext" rel="noreferrer">Month-over-month, Production Excellence spreadsheet</a>.</p></div></content></entry><entry><title>Production Excellence #26: November 2020</title><link href="/phame/live/1/post/218/production_excellence_26_november_2020/" /><id>https://phabricator.wikimedia.org/phame/post/view/218/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2020-12-15T21:49:56+00:00</published><updated>2021-02-04T18:34:14+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📈   Incidents</h5>

<p>Zero documented incidents in November. [1] That&#039;s the only month this year without any (publicly documented) incidents. In 2019, November was also the only such month. [3]</p>

<p>Learn about recent incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation</a> on Wikitech, or <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header"><a name="trends"></a> 📊   Trends</h5>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/r3qxgidqo5h6ferwpthh/PHID-FILE-bwtbft5y4gtyevwt3wyv/20201215-proderror-monthly.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_280"><img src="https://phab.wmfusercontent.org/file/data/r3qxgidqo5h6ferwpthh/PHID-FILE-bwtbft5y4gtyevwt3wyv/20201215-proderror-monthly.png" height="384" alt="Unresolved error reports stacked by recent month" /></a></div> <div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/vracfxes6yx7qqb4ieon/PHID-FILE-dxrnvgt3upvh6oy4oovd/20201215-proderror-totals.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_281"><img src="https://phab.wmfusercontent.org/file/data/vracfxes6yx7qqb4ieon/PHID-FILE-dxrnvgt3upvh6oy4oovd/20201215-proderror-totals.png" height="372" alt="Total open production error tasks, by month" /></a></div></p>

<p>The overall increase in errors was relatively low this past month, similar to the November-December period last year.</p>

<p>What&#039;s new is that we can start to see a positive trend emerging in the backlogs where we&#039;ve shrunk issue count three months in a row, from the 233 high in October, down to the 181 <span class="visual-only phui-icon-view phui-font-fa fa-arrow-circle-down green" data-meta="0_282" aria-hidden="true"></span> we have in the ol&#039; backlog today.</p>

<p>Month-over-month plots based on <a href="https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>. [4]</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖   Outstanding errors</h5>

<p>Take a look at the workboard and look for tasks that could use your help.<br />
→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Summary over recent months:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">⚠️ July 2019 (2 of 18 tasks): One task closed (-1).</li>
<li class="remarkup-list-item">⚠️ August 2019 (1 of 14 tasks): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ September 2019 (3 of 12 tasks): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ October 2019 (5 of 12 tasks): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ November 2019 (1 of 5 tasks): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ December 2019 (3 of 9 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">January 2020 (3 of 7 tasks left), One task closed (-1).</li>
<li class="remarkup-list-item">February (2 of 7 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">March (2 of 2 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">April (9 of 14 tasks left): <em>no change</em>.</li>
<li class="remarkup-list-item">May (7 of 14 tasks left): <em>no change</em>.</li>
<li class="remarkup-list-item">June (7 of 14 tasks left): <em>no change</em>.</li>
<li class="remarkup-list-item">July 2020 (9 of <a href="https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R" class="remarkup-link" rel="noreferrer">24 new tasks</a>): <em>no change</em>.</li>
<li class="remarkup-list-item">August 2020 (23 of <a href="https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R" class="remarkup-link" rel="noreferrer">53 new tasks</a>): Three tasks closed (-3).</li>
<li class="remarkup-list-item">September 2020 (14 of <a href="https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R" class="remarkup-link" rel="noreferrer">33 new tasks</a>): One task closed (-1).</li>
<li class="remarkup-list-item">October 2020 (39 of <a href="https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R" class="remarkup-link" rel="noreferrer">69 new tasks</a>): Six tasks closed (-6).</li>
<li class="remarkup-list-item"><strong>November 2020</strong>: 19 of <a href="https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R" class="remarkup-link" rel="noreferrer">38 new tasks</a> survived the month and remain open today (+38; -19)</li>
</ul>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Recent tally</th><th></th></tr>
<tr><td>142</td><td>as of <a href="https://phabricator.wikimedia.org/phame/post/view/213/production_excellence_25_october_2020/" class="remarkup-link" rel="noreferrer">Excellence #25</a> (23 Oct 2020).</td></tr>
<tr><td>-12</td><td>closed of the 142 recent tasks.</td></tr>
<tr><td>+19</td><td>survived November 2020.</td></tr>
<tr><td>149 <span class="visual-only phui-icon-view phui-font-fa fa-arrow-circle-up red" data-meta="0_283" aria-hidden="true"></span></td><td>as of today, 15 Dec 2020.</td></tr>
<tr></tr>
</table></div>

<p>The on-going month of December, has <a href="https://phabricator.wikimedia.org/maniphest/query/10NQy74iKaZJ/#R" class="remarkup-link" rel="noreferrer">19 unresolved tasks</a> so far.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉   Thanks!</h5>

<p>Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head"><a href="https://en.wikiquote.org/wiki/The_Grand_Budapest_Hotel" class="remarkup-link remarkup-link-ext" rel="noreferrer">👨🏻 Monsieur Gustave H.</a>:</div>
<div class="remarkup-reply-body"><p>❝   The plot &quot;thickens&quot; as they say. Why, by the way? Is it a soup metaphor? ❞</p></div>
</blockquote>

<p>Footnotes:</p>

<p>[1] <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2020" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation 2020, Wikitech</a>.<br />
[2] <a href="https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R" class="remarkup-link" rel="noreferrer">Open tasks, Wikimedia-prod-error, Phabricator</a>.<br />
[3] <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia incident stats, Krinkle, CodePen</a>.<br />
[4] <a href="https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit" class="remarkup-link remarkup-link-ext" rel="noreferrer">Month-over-month, Production Excellence (spreadsheet)</a>.</p></div></content></entry><entry><title>Runnable runbooks</title><link href="/phame/live/1/post/217/runnable_runbooks/" /><id>https://phabricator.wikimedia.org/phame/post/view/217/</id><author><name>mmodell (Mukunda Modell)</name></author><published>2020-12-11T23:51:17+00:00</published><updated>2023-09-03T05:36:30+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Recently there has been a small effort on the <a href="/tag/release-engineering-team/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_288"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_287" aria-hidden="true"></span>Release-Engineering-Team</span></a> to encode some of our institutional knowledge as runbooks linked from a <a href="https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Runbooks" class="remarkup-link remarkup-link-ext" rel="noreferrer">page</a> in the team&#039;s wiki space.</p>

<p>What are runbooks, you might ask? This is how they are described on the aforementioned wiki page:</p>

<blockquote><p>This is a list of runbooks for the Wikimedia Release Engineering Team, covering step-by-step lists of what to do when things need doing, especially when things go wrong.</p></blockquote>

<p>So runbooks are each essentially a sequence of commands, intended to be pasted into a shell by a human. Step by step instructions that are intended to help the reader accomplish an anticipated task or resolve a previously-encountered issue.</p>

<p>Presumably runbooks are created when someone encounters an issue, and, recognizing that it might happen again, helpfully documents the steps that were used to resolve said issue.</p>

<p>This all seems pretty sensible at first glance. This type of documentation can be really valuable when you&#039;re in an unexpected situation or trying to accomplish a task that you&#039;ve never attempted before and just about anyone reading this probably has some experience running shell commands pasted from some online tutorials, setup instructions for a program, etc.</p>

<p>Despite the obvious value runbooks can provide, I&#039;ve come to harbor a fairly strong aversion to the idea of encoding what are essentially shell scripts as individual commands on a wiki page. As someone who&#039;s job involves a lot of automation, I would usually much prefer a shell script, a python program, or even a &quot;<a href="https://www.mediawiki.org/wiki/Manual:Maintenance_scripts" class="remarkup-link remarkup-link-ext" rel="noreferrer">maintenance script</a>&quot; over a runbook.</p>

<p>After a lot of contemplation, I&#039;ve identified a few reasons that I don&#039;t like runbooks on wiki pages:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Runbooks are tedious and prone to human errors.<ul class="remarkup-list">
<li class="remarkup-list-item">It&#039;s easy to lose track of where you are in the process.</li>
<li class="remarkup-list-item">It&#039;s easy to accidentally skip a step.</li>
<li class="remarkup-list-item">It&#039;s easy to make typos.</li>
</ul></li>
<li class="remarkup-list-item">A script can be code reviewed and version controlled in git.</li>
<li class="remarkup-list-item">A script can validate it&#039;s arguments which helps to catch typos.</li>
<li class="remarkup-list-item">I think that command line terminal input is more like code than it is prose. I am more comfortable editing code in my usual text editor as apposed to editing in a web browser. The wikitext editor is sufficient for basic text editing, and visual editor is quite nice for rich text editing, but neither is ideal for editing code.</li>
</ul>

<p>I do realize that mediawiki does version control. I also realize that sometimes you just can&#039;t be bothered to write and debug a robust shell script to address some rare circumstances. The cost is high and it&#039;s uncertain whether the script will be worth such an effort.  In those situations a runbook might be the perfect way to contribute to collective knowledge without investing a lot of time into perfecting a script.</p>

<p>My favorite web comic, xkcd, has a lot few things to say about this subject:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/gdh4hk7334s3xrpk2izh/PHID-FILE-x2pb5grw5h4juhvlw6jq/the_general_problem.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_284"><img src="https://phab.wmfusercontent.org/file/data/gdh4hk7334s3xrpk2izh/PHID-FILE-x2pb5grw5h4juhvlw6jq/the_general_problem.png" width="220" alt="the_general_problem.png (230×550 px, 23 KB)" /></a></div><span class="visual-only phui-icon-view phui-font-fa fa-creative-commons" data-meta="0_289" aria-hidden="true"></span> &quot;The General Problem&quot; <a href="https://xkcd.com/974/" class="remarkup-link remarkup-link-ext" rel="noreferrer">xkcd #974</a>.</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/gze66i2bh2z6wcq5z6se/PHID-FILE-3xybreju4pa5l62rqu6n/automation_2x.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_285"><img src="https://phab.wmfusercontent.org/file/data/gze66i2bh2z6wcq5z6se/PHID-FILE-3xybreju4pa5l62rqu6n/automation_2x.png" width="220" alt="automation_2x.png (817×807 px, 55 KB)" /></a></div><span class="visual-only phui-icon-view phui-font-fa fa-creative-commons" data-meta="0_290" aria-hidden="true"></span> &quot;Automation&quot; <a href="https://xkcd.com/1319/" class="remarkup-link remarkup-link-ext" rel="noreferrer">xkcd #1319</a>.</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/ugq6itxtk4kflomckhdc/PHID-FILE-23scaordz7o6vesi7dwa/is_it_worth_the_time_2x.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_286"><img src="https://phab.wmfusercontent.org/file/data/ugq6itxtk4kflomckhdc/PHID-FILE-23scaordz7o6vesi7dwa/is_it_worth_the_time_2x.png" width="220" alt="is_it_worth_the_time_2x.png (927×1 px, 125 KB)" /></a></div><span class="visual-only phui-icon-view phui-font-fa fa-creative-commons" data-meta="0_291" aria-hidden="true"></span> &quot;Is It Worth the Time?&quot; <a href="https://xkcd.com/1205/" class="remarkup-link remarkup-link-ext" rel="noreferrer">xkcd #1205</a>.</p>

<h2 class="remarkup-header">Potential Solutions</h2>

<p>I&#039;ve been pondering a solution to these issues for a long time. Mostly motivated by the pain I have experienced (and the mistakes I&#039;ve made) while executing <a href="https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys" class="remarkup-link remarkup-link-ext" rel="noreferrer">the biggest runbook of all</a> on a regular basis.</p>

<p>Over the past couple of years I&#039;ve come across some promising ideas which I think can help the problems I&#039;ve identified with runbooks. I think that one of the most interesting is <a href="https://blog.danslimmon.com/2019/07/15/do-nothing-scripting-the-key-to-gradual-automation/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Do-nothing scripting</a>. Dan Slimmon identifies some of the same problems that I&#039;ve detailed here. He uses the term *slog* to refer to long and tedious procedures like the <a href="https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia Train Deploys</a>.   The proposed solution comes in the form of a do-nothing script. You should go read that article, it&#039;s not very long. Here are a few relevant quotes:</p>

<blockquote><p>Almost any slog can be turned into a do-nothing script. A do-nothing script is a script that encodes the instructions of a slog, encapsulating each step in a function.</p></blockquote>

<p>...</p>

<blockquote><p>At first glance, it might not be obvious that this script provides value. Maybe it looks like all we’ve done is make the instructions harder to read. But the value of a do-nothing script is immense:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">It’s now much less likely that you’ll lose your place and skip a step. This makes it easier to maintain focus and power through the slog.</li>
<li class="remarkup-list-item">Each step of the procedure is now encapsulated in a function, which makes it possible to replace the text in any given step with code that performs the action automatically.</li>
<li class="remarkup-list-item">Over time, you’ll develop a library of useful steps, which will make future automation tasks more efficient.</li>
</ul>

<p>A do-nothing script doesn’t save your team any manual effort. It lowers the activation energy for automating tasks, which allows the team to eliminate toil over time.</p></blockquote>

<p>I was inspired by this and I think it&#039;s a fairly clever solution to the problems identified. What if we combined the best aspects of gradual automation with the best aspects of a wiki-based runbook? Others were inspired by this as well, resulting in tools like <span class="visual-only phui-icon-view phui-font-fa fa-github" data-meta="0_292" aria-hidden="true"></span> <a href="https://github.com/braintree/runbook" class="remarkup-link remarkup-link-ext" rel="noreferrer">braintree/runbook</a>, <span class="visual-only phui-icon-view phui-font-fa fa-github" data-meta="0_293" aria-hidden="true"></span> <a href="https://github.com/earldouglas/codedown" class="remarkup-link remarkup-link-ext" rel="noreferrer">codedown</a> and the one I&#039;m most interested in, <span class="visual-only phui-icon-view phui-font-fa fa-github" data-meta="0_294" aria-hidden="true"></span> <a href="https://github.com/eclecticiq/rundoc" class="remarkup-link remarkup-link-ext" rel="noreferrer">rundoc</a>.</p>

<h2 class="remarkup-header">Runnable Runbooks</h2>

<p>My ideal tool would combine code and instructions in a free-form &quot;<a href="http://www.literateprogramming.com/" class="remarkup-link remarkup-link-ext" rel="noreferrer">literate programming</a>&quot; style. By following some simple conventions in our runbooks we can use a tool to parse and execute the embedded code blocks in a controlled manner. With a little bit of tooling we can gain many benefits:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">The tooling will keep track of the steps to execute, ensuring that no steps are missed.</li>
<li class="remarkup-list-item">Ensure that errors aren&#039;t missed by carefully checking / logging the result of each step.</li>
<li class="remarkup-list-item">We could also provide a mechanism for inputting the values of any variables / arguments and validate the format of user input.</li>
<li class="remarkup-list-item">With flexible control flow management we can even allow resuming from anywhere in the middle of a runbook after an aborted run.</li>
<li class="remarkup-list-item">Manual steps can just consist of a block of prose that gets displayed to the operator. With embedded markup we can format the instructions nicely and render them in the terminal using [Rich][7]. Once the operator confirms that the step is complete then the workflow moves on to the next step.</li>
</ul>



<h2 class="remarkup-header">Prior Art</h2>

<p>I&#039;ve found a few projects that already implement many of these ideas. Here are a few of the most relevant:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><span class="visual-only phui-icon-view phui-font-fa fa-github" data-meta="0_295" aria-hidden="true"></span> <a href="https://github.com/braintree/runbook" class="remarkup-link remarkup-link-ext" rel="noreferrer">braintree/runbook</a></li>
<li class="remarkup-list-item"><span class="visual-only phui-icon-view phui-font-fa fa-github" data-meta="0_296" aria-hidden="true"></span> <a href="https://github.com/earldouglas/codedown" class="remarkup-link remarkup-link-ext" rel="noreferrer">codedown</a></li>
<li class="remarkup-list-item"><span class="visual-only phui-icon-view phui-font-fa fa-github" data-meta="0_297" aria-hidden="true"></span> <a href="https://github.com/eclecticiq/rundoc" class="remarkup-link remarkup-link-ext" rel="noreferrer">rundoc</a></li>
</ul>

<p>The one I&#039;m most interested in is Rundoc. It&#039;s almost exactly the tool that I would have created. In fact, I started writing code before discovering rundoc but once I realized how closely this matched my ideal solution, I decided to abandon my effort. Instead I will add a couple of missing features to Rundoc in order to get everything that I want and hopefully I can contribute my enhancements back upstream for the benefit of others.</p>

<p>Demo: <a href="https://asciinema.org/a/MKyiFbsGzzizqsGgpI4Jkvxmx" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://asciinema.org/a/MKyiFbsGzzizqsGgpI4Jkvxmx</a><br />
Source: <a href="https://github.com/20after4/rundoc" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://github.com/20after4/rundoc</a></p>

<h2 class="remarkup-header">References</h2>

<p>[1]: <a href="https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Runbooks" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Runbooks</a> &quot;runbooks&quot;<br />
[2]: <a href="https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys</a> &quot;Train deploys&quot;<br />
[3]: <a href="https://blog.danslimmon.com/2019/07/15/do-nothing-scripting-the-key-to-gradual-automation/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://blog.danslimmon.com/2019/07/15/do-nothing-scripting-the-key-to-gradual-automation/</a> &quot;Do-nothing scripting: the key to gradual automation by Dan Slimmon&quot;<br />
[4]: <a href="https://github.com/braintree/runbook" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://github.com/braintree/runbook</a> &quot;runbook by braintree&quot;<br />
[5]: <a href="https://github.com/earldouglas/codedown" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://github.com/earldouglas/codedown</a> &quot;codedown by earldouglas&quot;<br />
[6]: <a href="https://github.com/eclecticiq/rundoc" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://github.com/eclecticiq/rundoc</a> &quot;rundoc by eclecticiq&quot;<br />
[7]: <a href="https://rich.readthedocs.io/en/latest/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://rich.readthedocs.io/en/latest/</a> &quot;Rich python library&quot;</p></div></content></entry><entry><title>Production Excellence #25: October 2020</title><link href="/phame/live/1/post/213/production_excellence_25_october_2020/" /><id>https://phabricator.wikimedia.org/phame/post/view/213/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2020-11-24T05:13:15+00:00</published><updated>2020-11-24T05:50:14+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📈   Incidents</h5>

<p>2 documented incidents in October. [1] Historically,  that&#039;s just below the median of 3 for this time of year. [3]</p>

<p>Learn about recent incidents at <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation</a> on Wikitech, or <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator. <a name="trends"></a></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📊   Trends</h5>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/xbumgntrwps7oxutossh/PHID-FILE-e2jure5txzfm35z5zf7e/20201123-proderror-monthly.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_298"><img src="https://phab.wmfusercontent.org/file/data/xbumgntrwps7oxutossh/PHID-FILE-e2jure5txzfm35z5zf7e/20201123-proderror-monthly.png" height="410" alt="Unresolved error reports stacked by recent month" /></a></div> <div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/kiqdellzmmgl7y2qah6w/PHID-FILE-dtgzq5pdjqz335h3gzst/20201123-proderror-totals.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_299"><img src="https://phab.wmfusercontent.org/file/data/kiqdellzmmgl7y2qah6w/PHID-FILE-dtgzq5pdjqz335h3gzst/20201123-proderror-totals.png" height="359" alt="Total open production error tasks, by month" /></a></div></p>

<p>Month-over-month plots based on <a href="https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>. [5]</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖   Outstanding errors</h5>

<blockquote><p>Take a look at the workboard and look for tasks that could use your help.<br />
→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p></blockquote>

<p>Summary over recent months:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">⚠️ July 2019 (3 of 18 tasks): One task closed.</li>
<li class="remarkup-list-item">⚠️ August 2019 (1 of 14 tasks): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ September 2019 (3 of 12 tasks): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ October 2019 (5 of 12 tasks): One task closed.</li>
<li class="remarkup-list-item">⚠️ November 2019 (1 of 5 tasks): Two tasks closed.</li>
<li class="remarkup-list-item">December (3 of 9 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">January 2020 (4 of 7 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">February (2 of 7 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">March (2 of 2 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">April (9 of 14 tasks left): One task closed.</li>
<li class="remarkup-list-item">May (7 of 14 tasks left): <em>no change</em>.</li>
<li class="remarkup-list-item">June (7 of 14 tasks left): <em>no change</em>.</li>
<li class="remarkup-list-item">July 2020 (9 of <a href="https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R" class="remarkup-link" rel="noreferrer">24 new tasks</a>): One task closed.</li>
<li class="remarkup-list-item">August 2020 (26 of <a href="https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R" class="remarkup-link" rel="noreferrer">53 new tasks</a>): Five tasks closed.</li>
<li class="remarkup-list-item">September 2020 (15 of <a href="https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R" class="remarkup-link" rel="noreferrer">33 new tasks</a>): Two tasks closed.</li>
<li class="remarkup-list-item"><strong>October 2020</strong>: 45 of <a href="https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R" class="remarkup-link" rel="noreferrer">69 new tasks</a> survived the month of October and remain open today.</li>
</ul>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Recent tally</th><th></th></tr>
<tr><td>110</td><td>as of <a href="https://phabricator.wikimedia.org/phame/post/view/205/production_excellence_24_september_2020/" class="remarkup-link" rel="noreferrer">Excellence #24</a> (23rd Oct).</td></tr>
<tr><td>-13</td><td>closed of the 110 recent tasks.</td></tr>
<tr><td>+45</td><td>survived October 2020.</td></tr>
<tr><td>142</td><td>as of today, 23rd Nov.</td></tr>
<tr></tr>
</table></div>

<p>For the on-going month of November, there are <a href="https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R" class="remarkup-link" rel="noreferrer">25 new tasks</a> so far.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉   Thanks!</h5>

<p>Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head"><a href="https://en.wikipedia.org/wiki/The_Outsider_(miniseries)" class="remarkup-link remarkup-link-ext" rel="noreferrer"> 👤 </a> Howard Salomon:</div>
<div class="remarkup-reply-body"><p><a href="https://en.wikipedia.org/wiki/The_Outsider_(miniseries)" class="remarkup-link remarkup-link-ext" rel="noreferrer">❝  </a> Problem is when they arrest you, you get put on the justice train, and the train has no brain. <a href="https://en.wikipedia.org/wiki/The_Outsider_(miniseries)" class="remarkup-link remarkup-link-ext" rel="noreferrer">❞  </a></p></div>
</blockquote>

<p>Footnotes:</p>

<p>[1] <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2019" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation 2020, Wikitech</a><br />
[2] <a href="https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R" class="remarkup-link" rel="noreferrer">Open tasks in Wikimedia-prod-error, Phabricator</a><br />
[3] <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia incident stats by Krinkle, CodePen</a><br />
[4] <a href="https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing" class="remarkup-link remarkup-link-ext" rel="noreferrer">Month-over-month, Production Excellence (spreadsheet)</a></p></div></content></entry><entry><title>CI now updates your deployment-charts</title><link href="/phame/live/1/post/208/ci_now_updates_your_deployment-charts/" /><id>https://phabricator.wikimedia.org/phame/post/view/208/</id><author><name>jeena (Jeena Huneidi)</name></author><published>2020-09-24T17:34:47+00:00</published><updated>2020-11-17T23:46:14+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>If you&#039;re making changes to a service that is deployed to Kubernetes, it sure is annoying to have to update the helm deployment-chart values with the newest image version before you deploy. At least, that&#039;s how I felt when developing on our dockerfile-generating service, blubber.</p>

<p>Over the last two months we&#039;ve added</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">A script to update the image versions (<a href="https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/update_version/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/update_version/</a>)</li>
<li class="remarkup-list-item">A new &#039;promote&#039; step to PipelineLib (<a href="https://wikitech.wikimedia.org/wiki/PipelineLib" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://wikitech.wikimedia.org/wiki/PipelineLib</a>) to run the script and create the patchset on merge</li>
<li class="remarkup-list-item">Various infrastructure pieces to ensure necessary packages are available</li>
</ul>

<p>And I&#039;m excited to say that CI can now handle updating image versions for you (after your change has merged), in the form of a change to deployment-charts that you&#039;ll need to +2 in Gerrit. Here&#039;s what you need to do to get this working in your repo:</p>

<p>Add the following to your <tt class="remarkup-monospaced">.pipeline/config.yaml</tt> file&#039;s publish stage:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">promote: true</pre></div>

<p>The above assumes the defaults, which are the same as if you had added:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">promote:
  - chart: &quot;${setup.projectShortName}&quot;  # The project name
    environments: []                    # All environments
    version: &#039;${.imageTag}&#039;             # The image published in this stage</pre></div>

<p>You can specify any of these values, and you can promote to multiple charts, for example:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">promote:
  - chart: &quot;echostore&quot;
    environments: [&quot;staging&quot;, &quot;codfw&quot;]
  - chart: &quot;sessionstore&quot;</pre></div>

<p>The above values would promote the production image published after merging to all environments for the sessionstore service, and only the staging and codfw environments for the echostore service. You can see more examples at <a href="https://wikitech.wikimedia.org/wiki/PipelineLib/Reference#Promote" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://wikitech.wikimedia.org/wiki/PipelineLib/Reference#Promote</a></p>

<p>If your containerized service doesn&#039;t yet have a .pipeline/config.yaml, now is a great time to migrate it! This tutorial can help you with the basics: <a href="https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial#Publishing_Docker_Images" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial#Publishing_Docker_Images</a></p>

<p>This is just one step closer to achieving continuous delivery of our containerized services! I&#039;m looking forward to continuing to make improvements in that area.</p></div></content></entry><entry><title>Production Excellence #24: September 2020</title><link href="/phame/live/1/post/205/production_excellence_24_september_2020/" /><id>https://phabricator.wikimedia.org/phame/post/view/205/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2020-10-23T23:51:43+00:00</published><updated>2020-10-23T23:59:26+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📈   Incidents</h5>

<p>5 documented incidents. [1] Historically, that&#039;s right on average for the time of year. [3]</p>

<p>For more about recent incidents see <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation</a> on Wikitech, or <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📊   Trends</h5>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/gn4cadlaztehxgdfitqf/PHID-FILE-f2bnbcw5ajqqr5wxzrpr/month-stack-2020-10-23.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_300"><img src="https://phab.wmfusercontent.org/file/data/gn4cadlaztehxgdfitqf/PHID-FILE-f2bnbcw5ajqqr5wxzrpr/month-stack-2020-10-23.png" height="370" alt="month-stack-2020-10-23.png (1×1 px, 113 KB)" /></a></div> <div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/v4owc4guija5lyux3smd/PHID-FILE-lb6iv43wtn7humw4jftb/total-stack-2020-10-23.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_301"><img src="https://phab.wmfusercontent.org/file/data/v4owc4guija5lyux3smd/PHID-FILE-lb6iv43wtn7humw4jftb/total-stack-2020-10-23.png" height="400" alt="total-stack-2020-10-23.png (941×1 px, 97 KB)" /></a></div></p>

<p>Month-over-month plots based on <a href="https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing" class="remarkup-link remarkup-link-ext" rel="noreferrer">spreadsheet data</a>. [5]</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖   Outstanding errors</h5>

<blockquote><p>Take a look at the workboard and look for tasks that could use your help.<br />
→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p></blockquote>

<p>Summary over recent months:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">⚠️ July 2019 (4 of 18 tasks left): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ August 2019 (1 of 14 tasks left): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ September 2019 (3 of 12 tasks left): <em>no change</em>.</li>
<li class="remarkup-list-item">⚠️ October 2019 (6 of 12 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">November (3 of 5 tasks left): <em>no change</em>.</li>
<li class="remarkup-list-item">December (3 of 9 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">January 2020 (4 of 7 tasks left), One task closed.</li>
<li class="remarkup-list-item">February (2 of 7 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">March (2 of 2 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">April (10 of 14 tasks left): <em>no change</em>.</li>
<li class="remarkup-list-item">May (7 of 14 tasks left): <em>no change</em>.</li>
<li class="remarkup-list-item">June (7 of 14 tasks left): Three tasks closed.</li>
<li class="remarkup-list-item">July 2020 (10 of <a href="https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R" class="remarkup-link" rel="noreferrer">24 new tasks</a>): Three tasks closed.</li>
<li class="remarkup-list-item">August 2020 (31 of <a href="https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R" class="remarkup-link" rel="noreferrer">53 new tasks</a>): Six tasks closed.</li>
<li class="remarkup-list-item"><strong>September 2020</strong>: 17 of <a href="https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R" class="remarkup-link" rel="noreferrer">33 new tasks</a> survived the month of September and remain open today.</li>
</ul>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Recent tally</th><th></th></tr>
<tr><td>106</td><td>as of <a href="https://phabricator.wikimedia.org/phame/post/view/204/production_excellence_23_july_august_2020/" class="remarkup-link" rel="noreferrer">Excellence #23</a> (Sep 23rd).</td></tr>
<tr><td>-13</td><td>closed of the 106 recent tasks.</td></tr>
<tr><td>+17</td><td>survived September 2020.</td></tr>
<tr><td>110</td><td>as of today, Oct 23rd.</td></tr>
<tr></tr>
</table></div>

<p><a href="https://phabricator.wikimedia.org/phame/post/view/204/production_excellence_23_july_august_2020/" class="remarkup-link" rel="noreferrer">Previously</a>, we had 106 unresolved production errors from the recent months up to August. Since then, 13 of those were closed. But, the 18 errors surviving September raise our recent tally to 110.</p>

<p>The workboard overall (including errors from 2019 and earlier) holds 343 open tasks in total, an increase of +47 compared to the 296 total on Sept 23rd.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉   Thanks!</h5>

<p>Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head"><a href="https://en.wikiquote.org/wiki/Elementary_(TV_series)#Season_5" class="remarkup-link remarkup-link-ext" rel="noreferrer">🕵️‍♀️</a> <em>Holmes: “So, she pulled five bullets out of you?”</em></div>
<div class="remarkup-reply-body"><p>     Shinwell: ”That&#039;s right.”<br />
      Holmes: “I too have been shot five times. But, uh..., separate occasions.”<br />
     Shinwell: “That&#039;s... great.”</p></div>
</blockquote>

<p>Footnotes:</p>

<p>[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2020" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Incident_documentation</a><br />
[2] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a><br />
[3] Wikimedia incident stats. – <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">codepen.io/Krinkle/full/wbYMZK</a><br />
[4] Month-over-month plots. – <a href="https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing" class="remarkup-link remarkup-link-ext" rel="noreferrer">docs.google.com/spreadsheets/d/1tRC…</a></p></div></content></entry><entry><title>Production Excellence #23: July &amp; August 2020</title><link href="/phame/live/1/post/204/production_excellence_23_july_august_2020/" /><id>https://phabricator.wikimedia.org/phame/post/view/204/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2020-09-23T18:10:33+00:00</published><updated>2020-09-23T18:10:33+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📈   Incidents</h5>

<p>4 documented incidents in July, and 2 documented incidents in August. [1] Historically, that&#039;s on average for this time of year. [5]</p>

<p>For more about recent incidents see <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation</a> on Wikitech, or <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📊   Trends</h5>

<blockquote><p>Take a look at the workboard and look for tasks that could use your help.<br />
→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p></blockquote>

<p>Summary over recent months:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">⚠️ July 2019 (4 of 18 tasks left): One task closed.</li>
<li class="remarkup-list-item">⚠️ August 2019 (1 of 14 tasks left): <em>no change.</em></li>
<li class="remarkup-list-item">⚠️ September 2019 (3 of 12 tasks left): Two tasks closed.</li>
<li class="remarkup-list-item">October (6 of 12 tasks left), <em>no change.</em></li>
<li class="remarkup-list-item">November (3 of 5 tasks left): <em>no change.</em></li>
<li class="remarkup-list-item">December (3 of 9 tasks left), Two tasks closed.</li>
<li class="remarkup-list-item">January 2020 (5 of 7 tasks lef), <em>no change.</em></li>
<li class="remarkup-list-item">February (2 of 7 tasks left), Two tasks closed.</li>
<li class="remarkup-list-item">March (2 of 2 tasks left), <em>no change.</em></li>
<li class="remarkup-list-item">April (10 of 14 tasks left): One task closed.</li>
<li class="remarkup-list-item">May (7 of 14 tasks left): Four tasks closed.</li>
<li class="remarkup-list-item">June (10 of 14 tasks left): Four tasks closed.</li>
<li class="remarkup-list-item">July 2020: 13 of <a href="https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R" class="remarkup-link" rel="noreferrer">24 new tasks</a> survived the month of July and remain open today.</li>
<li class="remarkup-list-item">August 2020: 37 of <a href="https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R" class="remarkup-link" rel="noreferrer">53 new tasks</a> survived the month of August and remain open today.</li>
</ul>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Recent tally</th><th></th></tr>
<tr><td>72</td><td>open, as of <a href="https://phabricator.wikimedia.org/phame/post/view/203/production_excellence_22_june_2020/" class="remarkup-link" rel="noreferrer">Excellence #22</a> (Jul 23rd).</td></tr>
<tr><td>-16</td><td>closed, of the previous 72 recent tasks.</td></tr>
<tr><td>+13</td><td>opened and survived July 2020.</td></tr>
<tr><td>+37</td><td>opened and survived August 2020.</td></tr>
<tr><td>106</td><td>open, as of today (Sep 23rd).</td></tr>
<tr></tr>
</table></div>

<p><a href="https://phabricator.wikimedia.org/phame/post/view/203/production_excellence_22_june_2020/" class="remarkup-link" rel="noreferrer">Previously</a>, we had 72 open production errors over the recent months up to June. Since then, 16 of those were closed. But, the 13 and 37 errors surviving July and August raise our recent tally to 106.</p>

<p>The workboard overall (including tasks from 2019 and earlier) held 192 open production errors on July 23rd. As of writing, the workboard holds 296 open tasks in total. [4] This +104 increase is largely due to the merged backlog of JavaScript client errors, which were previously untracked. Note that we backdated the majority of these JS errors under “Old”, and thus are not amongst the elevated numbers of July and August.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉   Thanks!</h5>

<p>Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head"><a href="https://en.wikiquote.org/wiki/Fight_Club_(film)#Dialogue" class="remarkup-link remarkup-link-ext" rel="noreferrer">👊</a><a href="https://www.youtube.com/watch?v=CWRTqMGvdpc" class="remarkup-link remarkup-link-ext" rel="noreferrer">🍺</a> <em>Tyler: “You know man, it could be worse! […]” Narrator: “[but] I was close... to being complete.”</em></div>
<div class="remarkup-reply-body"><p>Tyler: “Martha&#039;s polishing the brass on the Titanic. It&#039;s all going down, man. […] Evolve! Let the chips fall where they may.”<br />
Narrator: “What!?” Tyler: “The things you own..., they end up owning you.”</p></div>
</blockquote>

<p>Footnotes:<br />
[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2020" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://wikitech.wikimedia.org/wiki/Incident_documentation</a><br />
[2] Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/JuSOycDOe.7R/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query…</a><br />
[3] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/aKNrCHMosori/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query…</a> <br />
[4] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query…</a><br />
[5] Wikimedia incident stats. – <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://codepen.io/Krinkle/full/wbYMZK</a></p></div></content></entry><entry><title>Production Excellence #22: June 2020</title><link href="/phame/live/1/post/203/production_excellence_22_june_2020/" /><id>https://phabricator.wikimedia.org/phame/post/view/203/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2020-07-23T03:25:50+00:00</published><updated>2020-07-28T16:54:55+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📈   Month in review</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">4 documented incidents in June. [1]</li>
<li class="remarkup-list-item">37 new production errors were filed and 27 were closed. [2] [3]</li>
<li class="remarkup-list-item">72 recent production errors still open (up from 68).</li>
<li class="remarkup-list-item">203 total Wikimedia-prod-error tasks currently open (up from 192). [4]</li>
</ul>

<p>For more about recent incidents see <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation</a>, on Wikitech or <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖   Outstanding errors</h5>

<p>Breakdown of new errors reported in June that are still open today:</p>

<ol class="remarkup-list">
<li class="remarkup-list-item">(Needs owner) / Newsletter extension: Unexpected locking SELECT query. <a href="https://phabricator.wikimedia.org/T253926" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_302"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T253926</span></span></a></li>
<li class="remarkup-list-item">(Needs owner) / FlaggedRevs extension: Unable to submit review of page due to bad fr_page_id record. <a href="https://phabricator.wikimedia.org/T256296" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_303"><span class="phui-tag-core phui-tag-color-object">T256296</span></a></li>
<li class="remarkup-list-item">Editing team / MassMessage extension: Delivery fails due to system user conflict. <a href="https://phabricator.wikimedia.org/T171003" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_304"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T171003</span></span></a></li>
<li class="remarkup-list-item">Parsing team / Parsoid: Pagebundle data unavailable due to a bad UTF-8 string. <a href="https://phabricator.wikimedia.org/T236866" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_305"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T236866</span></span></a></li>
<li class="remarkup-list-item">Growth team / Recent changes: Update for ActiveUsers data failing due to deadlock. <a href="https://phabricator.wikimedia.org/T255059" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_306"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T255059</span></span></a></li>
<li class="remarkup-list-item">Growth team / GrowthExperiments: Issue with question display on personal homepage. <a href="https://phabricator.wikimedia.org/T255616" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_307"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T255616</span></span></a></li>
<li class="remarkup-list-item">Language team / Translate extension: Update jobs fail due to invalid function call. <a href="https://phabricator.wikimedia.org/T255669" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_308"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T255669</span></span></a></li>
<li class="remarkup-list-item">Language team / ContentTranslation: Save action fails due to duplicate insert query. <a href="https://phabricator.wikimedia.org/T256230" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_309"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T256230</span></span></a></li>
<li class="remarkup-list-item">Core Platform team / Content handling: Incompatible content type during content merge/stash. <a href="https://phabricator.wikimedia.org/T255700" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_310"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T255700</span></span></a></li>
<li class="remarkup-list-item">Core Platform team / Monolog: API usage logs and error logs sometimes missing due to socket failure. <a href="https://phabricator.wikimedia.org/T255578" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_311"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T255578</span></span></a></li>
<li class="remarkup-list-item">Search Platform team / WikibaseCirrus: Elevated error levels from EntitySearchElastic warnings. <a href="https://phabricator.wikimedia.org/T255658" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_312"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T255658</span></span></a></li>
<li class="remarkup-list-item">Wikidata / API: Generator query fails due to invalid API result format. <a href="https://phabricator.wikimedia.org/T254334" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_313"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T254334</span></span></a></li>
<li class="remarkup-list-item">Wikidata / API: EntityData query emits warning about bad RDF. <a href="https://phabricator.wikimedia.org/T255054" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_314"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T255054</span></span></a></li>
<li class="remarkup-list-item">Wikidata / Repo: Entity relation update jobs fail due to deadlock. <a href="https://phabricator.wikimedia.org/T255706" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_315"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T255706</span></span></a></li>
</ol>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📊   Trends</h5>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">Take a look at the workboard and look for tasks that could use your help.</div>
<div class="remarkup-reply-body"><p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p></div>
</blockquote>

<p>Summary over recent months:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">July 2019 (5 of 18 tasks left): Two tasks closed.</li>
<li class="remarkup-list-item">August (1 of 14 tasks left): Another task closed, only one remaining! 🚀</li>
<li class="remarkup-list-item">September (5 of 12 tasks left): Two tasks closed.</li>
<li class="remarkup-list-item">October (6 of 12 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">November (3 of 5 tasks left): Another task closed.</li>
<li class="remarkup-list-item">December (5 of 9 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">January 2020 (5 of 7 tasks lef), <em>no change</em>.</li>
<li class="remarkup-list-item">February (4 of 7 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">March (2 of 2 tasks left), <em>no change</em>.</li>
<li class="remarkup-list-item">April (11 of 14 tasks left): Three tasks closed.</li>
<li class="remarkup-list-item">May (11 tasks left): Three tasks closed.</li>
<li class="remarkup-list-item"><strong>June</strong>: <strong>14 new tasks survived the month of June</strong>. ⚠️</li>
</ul>

<p>At the end of May the number of open production errors over recent months was 68. Of those, <span class="visual-only phui-icon-view phui-font-fa fa-arrow-circle-o-down green" data-meta="0_316" aria-hidden="true"></span> 10 got closed, but with <span class="visual-only phui-icon-view phui-font-fa fa-arrow-circle-up red" data-meta="0_317" aria-hidden="true"></span> 14 new tasks from June still open, the total has grown further to 72.</p>

<p>The workboard had 192 open tasks last month, which saw another increase, to now 203 open tasks (this includes tasks from 2019 and earlier).</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉   Thanks!</h5>

<p>Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<blockquote><p><em><a href="https://en.wikiquote.org/wiki/Close_Encounters_of_the_Third_Kind#Dialogue" class="remarkup-link remarkup-link-ext" rel="noreferrer">⛰</a> </em>ATC:  “Do you want to report a UFO?” Pilot: “Negative. We don&#039;t want to report.”<br />
   ATC: “Do you wish to file a report of any kind to us?” Pilot: “I wouldn&#039;t know what kind of report to file.”<br />
  ATC: “Me neither…”</p></blockquote>

<p>Footnotes:<br />
[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2020" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://wikitech.wikimedia.org/wiki/Incident_documentation#2020</a><br />
[2] Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/VTpmvaJLYVL1/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/VTpmvaJLYVL1/#R</a><br />
[3] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/qn5yeURqyl3D/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/qn5yeURqyl3D/#R</a><br />
[4] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R</a></p></div></content></entry><entry><title>Faster source code fetches thanks to git protocol version 2</title><link href="/phame/live/1/post/199/faster_source_code_fetches_thanks_to_git_protocol_version_2/" /><id>https://phabricator.wikimedia.org/phame/post/view/199/</id><author><name>hashar (Antoine Musso)</name></author><published>2020-07-06T10:57:02+00:00</published><updated>2020-10-29T10:21:40+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In 2015 I noticed git fetches from our most active repositories to be unreasonably slow, sometimes up to a minute which hindered fast development and collaboration. You can read some of the debugging details I have conducted at the time on <a href="https://phabricator.wikimedia.org/T103990" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_318"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T103990</span></span></a>. <a href="https://bugs.chromium.org/p/gerrit/issues/detail?id=175" class="remarkup-link remarkup-link-ext" rel="noreferrer">Gerrit upstream was aware of the issue</a> and a workaround was presented though we never went to implement it.</p>

<p>When fetching source code from a git repository, the client and server conduct a negotiation to discover which objects have to be sent. The server sends an advertisement that lists every single reference it knows about. For a very active repository in Gerrit it means sending references for each patchset and each change ever made to the repository, or almost 200,000 references for mediawiki/core. That is a noticeable amount of data resulting in a slow fetch, especially on a slow internet connection.</p>

<p>Gerrit originated at Google and has full time maintainers. In 2017 a team at Google went to tackle the problem and proposed a new protocol to address the issue, and they closely worked with git maintainers while doing so. The new protocol makes git smarter during the advertisement phase, notably to filter out references the client is not interested in.  You can read Google introduction post at <a href="https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html</a></p>

<p>Since June 28th 2020, our Gerrit has been upgraded and now supports git protocol version 2. But to benefit from faster fetches, your client also needs to know about the newer protocol and have it explicitly enabled. For git, you will want version 2.18 or later. Enable the new protocol by setting git configuration <tt class="remarkup-monospaced">protocol.version</tt> to <tt class="remarkup-monospaced">2</tt>.</p>

<p>It can be done either on an on demand basis:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">git -c protocol.version=2 fetch</pre></div>

<p>Or enabled in your user configuration file:</p>

<div class="remarkup-code-block" data-code-lang="ini" data-sigil="remarkup-code-block"><div class="remarkup-code-header">$HOME/.gitconfig</div><pre class="remarkup-code"><span></span><span class="k">[protocol]</span>
    <span class="na">version</span> <span class="o">=</span> <span class="s">2</span></pre></div>

<p>On my internet connection, fetching for <tt class="remarkup-monospaced">mediawiki/core.git</tt> went from ~15 seconds to just 3 seconds. A noticeable difference in my day to day activity.</p>

<p>If you encounter any issue with the new protocol, you can file a task in our Phabricator and tag it with <a href="/tag/git-protocol-v2/" class="phui-tag-view phui-tag-type-shade phui-tag-disabled phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_320"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-flag-checkered" data-meta="0_319" aria-hidden="true"></span>git-protocol-v2</span></a>.</p></div></content></entry><entry><title>Production Excellence #21: May 2020</title><link href="/phame/live/1/post/198/production_excellence_21_may_2020/" /><id>https://phabricator.wikimedia.org/phame/post/view/198/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2020-06-24T20:00:31+00:00</published><updated>2020-06-24T20:00:31+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📊   Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">5 documented incidents in May. [1]</li>
<li class="remarkup-list-item">28 new production error tasks filed in May. [2] [3]</li>
<li class="remarkup-list-item">68 recent production errors currently open (up from 61).</li>
<li class="remarkup-list-item">193 currently open Wikimedia-prod-error tasks (up from 178). [4]</li>
</ul>

<p>For more about recent incidents see <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation</a> on Wikitech, or <a href="https://phabricator.wikimedia.org/project/view/4758" class="remarkup-link" rel="noreferrer">Preventive measures</a> in Phabricator.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📉   Outstanding reports</h5>

<p>Take a look at the workboard and look for tasks that could use your help.<br />
→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Breakdown of recent months:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">July 2019: One task closed, 7 of 18 tasks left. ⚠️</li>
<li class="remarkup-list-item">August: 2 of 14 tasks left <em>(unchanged)</em>.</li>
<li class="remarkup-list-item">September: 7 of 12 tasks left <em>(unchanged)</em>.</li>
<li class="remarkup-list-item">October: 4 of 12 tasks left <em>(unchanged)</em>.</li>
<li class="remarkup-list-item">November: 4 of 5 tasks left <em>(unchanged)</em>.</li>
<li class="remarkup-list-item">December: 4 of 9 tasks left <em>(unchanged)</em>.</li>
<li class="remarkup-list-item">January 2020: 5 of 7 tasks left <em>(unchanged)</em>.</li>
<li class="remarkup-list-item">February: Two tasks closed, 4 of 7 tasks left. ⚠️</li>
<li class="remarkup-list-item">March: 2 of 2 tasks left <em>(unchanged)</em>.</li>
<li class="remarkup-list-item">April: 14 of 14 tasks left <em>(unchanged)</em>.</li>
<li class="remarkup-list-item"><strong>May</strong>: 14 new tasks survived the month of May.</li>
</ul>

<p>At the end of April the total of open production errors over recent months was 61. Of those, 7 got closed, but with 14 new tasks from May still open, the total has grown to 68.</p>

<p>The workboard had 178 open tasks in April, which saw a steep increase to now 192 open tasks (this includes June 2020 so far, and pre-2019 tasks).</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉   Thanks!</h5>

<p>Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:<br />
[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2020" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://wikitech.wikimedia.org/wiki/Incident_documentation#2020</a><br />
[2] Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/7Z4Us2BS02Uo/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/7Z4Us2BS02Uo/#R</a><br />
[3] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/FoIFMu5UO8pw/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/FoIFMu5UO8pw/#R</a> <br />
[4] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R</a></p></div></content></entry><entry><title>Celebrating 600,000 commits for Wikimedia</title><link href="/phame/live/1/post/197/celebrating_600_000_commits_for_wikimedia/" /><id>https://phabricator.wikimedia.org/phame/post/view/197/</id><author><name>Jdforrester-WMF (James D. Forrester)</name></author><published>2020-05-29T22:47:22+00:00</published><updated>2020-07-10T11:27:00+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Earlier today, the 600,000th commit was pushed to Wikimedia&#039;s Gerrit server. We thought we&#039;d take this moment to reflect on the developer services we offer and our community of developers, be they Wikimedia staff, third party workers, or volunteers.</p>

<p>At Wikimedia, we currently use a self-hosted installation of <a href="https://en.wikipedia.org/wiki/Gerrit_(software)" class="remarkup-link remarkup-link-ext" rel="noreferrer">Gerrit</a> to provide code review workflow management, and code hosting and browsing. We adopted this in 2011–12, replacing <a href="https://en.wikipedia.org/wiki/Apache_Subversion" class="remarkup-link remarkup-link-ext" rel="noreferrer">Apache Subversion</a>.</p>

<p>Within Gerrit, we host several thousand repositories of code (2,441 as of today). This includes <a href="https://gerrit.wikimedia.org/g/mediawiki/core" class="remarkup-link remarkup-link-ext" rel="noreferrer">MediaWiki itself</a>, plus all the many hundreds of extensions and skins people have created for use with MediaWiki. Approximately 90% of the MediaWiki extensions we host are not used by Wikimedia, only by third parties. We also host key Wikimedia server configuration repositories like <a href="https://gerrit.wikimedia.org/g/operations/puppet" class="remarkup-link remarkup-link-ext" rel="noreferrer">puppet</a> or <a href="https://gerrit.wikimedia.org/g/operations/mediawiki-config" class="remarkup-link remarkup-link-ext" rel="noreferrer">site config</a>, build artefacts like <a href="https://gerrit.wikimedia.org/r/plugins/gitiles/operations/docker-images/production-images/" class="remarkup-link remarkup-link-ext" rel="noreferrer">vetted docker images</a> for production services or local .deb build repos for software we use like <a href="https://gerrit.wikimedia.org/r/plugins/gitiles/operations/debs/etherpad-lite/+/master" class="remarkup-link remarkup-link-ext" rel="noreferrer">etherpad-lite</a>, ancillary software like our special <a href="https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dumps/" class="remarkup-link remarkup-link-ext" rel="noreferrer">database exporting orchestration tool</a> for dumps.wikimedia.org, and dozens of other uses.</p>

<p>Gerrit is not just (or even primarily) a code hosting service, but a code review workflow tool. Per the <a href="https://www.mediawiki.org/wiki/Gerrit/Privilege_policy" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia code review policy</a>, all MediaWiki code heading to production should go through separate development and code review for security, performance, quality, and community reasons. Reviewers are required to use their &quot;good judgement and careful action&quot;, which is a heavy burden, because &quot;[m]erging a change to the MediaWiki core or an extension deployed by Wikimedia is a big deal&quot;. Gerrit helps them do this, providing clear views of what is changing, supporting itemised, character-level, file-level, or commit-level feedback and revision, and allowing series of complex changes to be chained together across multiple repositories, and ensuring that forthcoming and merged changes are visible to product owners, development teams, and other interested parties.</p>

<p>Across all of repositories, we average over <a href="https://wikimedia.biterg.io/app/kibana#/dashboard/Gerrit" class="remarkup-link remarkup-link-ext" rel="noreferrer">200 human commits a day</a>, though activity levels vary widely. Some repositories have dozens of patches a week (MediaWiki itself gets almost 20 patches a day; puppet gets nearly 30), whereas others get a patch every few years. There are over 8,000 accounts registered with Gerrit, although activity is not distributed uniformly throughout that cohort.</p>

<p>To focus engineer time where it&#039;s needed, a fair amount of low-risk development work is automated. This happens in both creating patches and also, in some cases, merging them.</p>

<p>For example, for many years we have partnered with <a href="https://translatewiki.net/" class="remarkup-link remarkup-link-ext" rel="noreferrer">TranslateWiki.net</a>&#039;s volunteer community to translate and maintain MediaWiki interfaces in hundreds of languages. Exports of translators&#039; updates are pushed and merged automatically by one of the TWN team each day, helping our users keep a fresh, usable system whatever their preferred language.</p>

<p>Another key area is <a href="https://www.mediawiki.org/wiki/Libraryupgrader" class="remarkup-link remarkup-link-ext" rel="noreferrer">LibraryUpgrader</a>, a custom tool to automatically upgrade the libraries we use for continuous integration across hundreds of repositories, allowing us to make improvements and increase standards without a single central breaking change. Indeed, <a href="https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GroupsSidebar/+/600000" class="remarkup-link remarkup-link-ext" rel="noreferrer">the 600,000th commit</a> was one of these automatic commits, upgrading the version of the <a href="https://packagist.org/packages/mediawiki/mediawiki-codesniffer" class="remarkup-link remarkup-link-ext" rel="noreferrer">mediawiki-codesniffer</a> tool in <a href="https://www.mediawiki.org/wiki/Extension:GroupsSidebar" class="remarkup-link remarkup-link-ext" rel="noreferrer">the GroupsSidebar extension</a> to the latest version, ensuring it is written following the latest <a href="https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia coding conventions for PHP</a>.</p>

<p>Right now, we&#039;re working on upgrading our installation of Gerrit, moving from our old version based on the 2.x branch through 2.16 to 3.1, which will mean a new user interface and other user-facing changes, as well as improvements behind the scenes. More on those changes will be coming in later posts.</p>

<hr class="remarkup-hr" />

<p><em>Header image: <a href="https://commons.wikimedia.org/wiki/File:Underground_Vehicle.jpg" class="remarkup-link remarkup-link-ext" rel="noreferrer">A vehicle used to transport miners to and from the mine face</a> by &#039;undergrounddarkride&#039;, used under <a href="https://creativecommons.org/licenses/by/2.0/deed.en" class="remarkup-link remarkup-link-ext" rel="noreferrer">CC-BY-2.0</a>.</em></p></div></content></entry><entry><title>Production Excellence #20: April 2020</title><link href="/phame/live/1/post/193/production_excellence_20_april_2020/" /><id>https://phabricator.wikimedia.org/phame/post/view/193/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2020-05-14T16:10:41+00:00</published><updated>2020-05-25T16:23:11+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How are we doing on that strive for operational excellence during these unprecedented times?</p>

<h5 class="remarkup-header">📊  Numbers for March and April</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">3 documented incidents. [1]</li>
<li class="remarkup-list-item">60 new Wikimedia-prod-error reports. [2]</li>
<li class="remarkup-list-item">58 Wikimedia-prod-error reports closed. [3]</li>
<li class="remarkup-list-item">178 currently open Wikimedia-prod-error reports in total. [4]</li>
</ul>

<p>For more about recent incidents and pending actionables see <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikitech</a> and <a href="https://phabricator.wikimedia.org/project/view/4758/" class="remarkup-link" rel="noreferrer">Phabricator</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📉  Outstanding reports</h5>

<p>Take a look at the workboard and look for tasks that could use your help.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Breakdown of recent months:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">April 2019: Two reports closed, 2 of 14 left.</li>
<li class="remarkup-list-item">May: <em>(All clear!)</em></li>
<li class="remarkup-list-item">June: 4 of 11 left <em>(unchanged)</em>. ⚠️</li>
<li class="remarkup-list-item">July: 8 of 18 left <em>(unchanged)</em>.</li>
<li class="remarkup-list-item">August: 2 of 14 reports left <em>(unchanged)</em>.</li>
<li class="remarkup-list-item">September: 7 of 12 left <em>(unchanged)</em>.</li>
<li class="remarkup-list-item">October: Two reports closed, 4 of 12 left.</li>
<li class="remarkup-list-item">November: One report closed, 4 of 5 left.</li>
<li class="remarkup-list-item">December: Two reports closed, 4 of 9 left.</li>
<li class="remarkup-list-item">January 2020: One report closed, 5 of 7 reports left.</li>
<li class="remarkup-list-item">February: One report closed, 6 of 7 reports left.</li>
<li class="remarkup-list-item"><strong>March</strong>: 2 new reports survived the month of March.</li>
<li class="remarkup-list-item"><strong>April</strong>: 13 new reports survived the month of April.</li>
</ul>

<p>At the end of February the total of open reports over recent months was 58. Of those, 12 got closed, but with 15 new reports from March/April still open, the total is now up at 61 open reports.</p>

<p>The workboard overall (which includes pre-2019 tasks) has 178 tasks open. This is actually down by a bit for the first time since October with December at 196, January at 198, and February at 199, and now April at 178. This was largely due to the Release Engineering and Core Platform teams closing out forgotten reports that have since been resolved or otherwise obsoleted.</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 <strong>Tip</strong>: Verifying <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">existing tasks</a> is a good way to <a href="https://wikitech.wikimedia.org/wiki/Performance/Runbook/Kibana_monitoring" class="remarkup-link remarkup-link-ext" rel="noreferrer">(re)familiarise yourself with Kibana</a>. For example: Does the error still occur in the last 30 days? Does it only happen on a certain wiki? What do the URLs or stack traces have in common?</div>
<div class="remarkup-reply-body"></div>
</blockquote>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉  Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:<br />
[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://wikitech.wikimedia.org/wiki/Incident_documentation</a><br />
[2] Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/HjopcKClxTfw/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/HjopcKClxTfw/#R</a><br />
[3] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/ts62HKYPBxod/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/ts62HKYPBxod/#R</a><br />
[4] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R</a></p></div></content></entry><entry><title>Production Excellence #19: February 2020</title><link href="/phame/live/1/post/192/production_excellence_19_february_2020/" /><id>https://phabricator.wikimedia.org/phame/post/view/192/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2020-03-24T21:40:10+00:00</published><updated>2020-03-25T13:46:41+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📊  Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">8 documented incidents. [1]</li>
<li class="remarkup-list-item">27 new Wikimedia-prod-error reports. [2]</li>
<li class="remarkup-list-item">26 Wikimedia-prod-error reports closed. [3]</li>
<li class="remarkup-list-item">199 currently open Wikimedia-prod-error reports in total. [4]</li>
</ul>

<p>With a median of 4–5 documented incidents per month (over the last three years), there were a fairly large number of them this past month.</p>

<p>To read more about these incidents and pending actionables; check <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2020" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation § 2020</a>, or <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Explore Wikimedia incident stats</a> (<em>interactive</em>).</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖  <em>Unset vs array splice</em></h5>

<p>Our error monitor (Logstash) received numerous reports about an “Undefined offset” error from the OATHAuth extension. This extension powers the <a href="https://meta.wikimedia.org/wiki/Help:Two-factor_authentication#Logging_in" class="remarkup-link remarkup-link-ext" rel="noreferrer">Two-factor auth</a> (2FA) login interface on Wikipedia.</p>

<p><a href="https://phabricator.wikimedia.org/p/ItSpiderman/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_322"><span class="phui-tag-core phui-tag-color-person">@ItSpiderman</span></a> and <a href="https://phabricator.wikimedia.org/p/Reedy/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_323"><span class="phui-tag-core phui-tag-color-person">@Reedy</span></a> investigated the problem. The error message:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">PHP Notice: Undefined offset: 8
at /srv/mediawiki/extensions/OATHAuth/src/Key/TOTPKey.php:188</pre></div>

<p>This error means that the code was accessing item number 8 from a list (an array), but the item does not exist. Normally, when a “2FA scratch token” is used, we remove it from a list, and save the remaining list for next time.</p>

<p>The code used the <tt class="remarkup-monospaced">count()</tt> function to compute the length of the list, and used a for-loop to iterate through the list. When the code found the user’s token, it used the <tt class="remarkup-monospaced">unset( $list[$num] )</tt> operation to remove token <tt class="remarkup-monospaced">$num</tt> from the list, and then save <tt class="remarkup-monospaced">$list</tt> for next time.</p>

<p>The problem with removing a list item in this way is that it leaves a “gap”. Imagine a list with 4 items, like <tt class="remarkup-monospaced">[ 1: …, 2: …, 3: … , 4: … ]</tt>. If we unset item 2, then the remaining list will be <tt class="remarkup-monospaced">[ 1: …, 3: …, 4: … ]</tt>. The next time we check this list, the length of the list is now 3 (so far so good!), but the for-loop will access the items as 1-2-3. The code would not know that 3 comes after 1, causing an error because item 2 does not exist. And, the code would not even look at item 4!</p>

<p>When a user used their first ever scratch token, everything worked fine. But from their second token onwards, the tokens could be rejected as “wrong” because the code was not able to find them.</p>

<p>To avoid this bug, we changed the code to use <tt class="remarkup-monospaced">array_splice( $list, $num, 1 )</tt> instead of <tt class="remarkup-monospaced">unset( $list[$num] )</tt>. The important thing about <tt class="remarkup-monospaced">array_splice</tt> is that it renumbers the items in the list, leaving no gaps.</p>

<p>– <a href="https://phabricator.wikimedia.org/T244308" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_321"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T244308</span></span></a> / <a href="https://gerrit.wikimedia.org/r/570253" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit.wikimedia.org/r/570253</a></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📉  Outstanding reports</h5>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists error reports, grouped by the month in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Breakdown of recent months:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">March: 3 of 10 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">April: 4 of 14 left (<em>unchanged</em>).</li>
<li class="remarkup-list-item">May: (<em>All clear!</em>)</li>
<li class="remarkup-list-item">June: 4 of 11 left (<em>unchanged</em>).</li>
<li class="remarkup-list-item">July: 8 of 18 left (<em>unchanged</em>).</li>
<li class="remarkup-list-item">August: Two reports closed! 2 of 14 reports left.</li>
<li class="remarkup-list-item">September: One report closed, 7 of 12 left.</li>
<li class="remarkup-list-item">October: Two reports closed, 6 of 12 left.</li>
<li class="remarkup-list-item">November: 5 of 5 left (<em>unchanged</em>).</li>
<li class="remarkup-list-item">December: 6 of 9 left (<em>unchanged</em>).</li>
<li class="remarkup-list-item">January: One report closed, 6 of 7 reports left.</li>
<li class="remarkup-list-item"><strong>February</strong>: 7 new reports survived the month of February.</li>
</ul>

<p>Last month’s total over recent months was 57 open reports. Of those, 6 got closed, but with 7 new reports from February still open, the total is now up at 58 open reports.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉  Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production.</p>

<p>Together, we’re getting there!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:<br />
[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2020" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Incident_documentation#2020</a><br />
[2] Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/aT3iqdM0EJKW/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a><br />
[3] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/jVexIrtOPkcX/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a><br />
[4] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p></div></content></entry><entry><title>Production Excellence #18: January 2020</title><link href="/phame/live/1/post/180/production_excellence_18_january_2020/" /><id>https://phabricator.wikimedia.org/phame/post/view/180/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2020-02-28T19:39:20+00:00</published><updated>2020-03-24T22:08:17+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📊  Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">3 documented incidents. [1]</li>
<li class="remarkup-list-item">26 new Wikimedia-prod-error reports. [2]</li>
<li class="remarkup-list-item">26 Wikimedia-prod-error reports closed. [3]</li>
<li class="remarkup-list-item">198 currently open Wikimedia-prod-error reports in total. [4]</li>
</ul>

<p>To read more about these incidents and pending actionables; check <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2020" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation § 2020</a>, or <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Explore Wikimedia incident stats</a> (<em>interactive</em>).</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖  <em>Paradoxical array key</em></h5>

<p>Wikimedia encountered several Zend engine bugs that could corrupt a PHP program at run-time, during the upgrade from HHVM to PHP 7.2. (Some of these bugs are still being worked on.) One of the bugs we fixed last month was particularly mysterious. Investigation led by <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_325"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a> and <a href="https://phabricator.wikimedia.org/p/tstarling/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_326"><span class="phui-tag-core phui-tag-color-person">@tstarling</span></a>.</p>

<p>MediaWiki would create an array in PHP and add a key-value pair to it. We could iterate this array, and see that our key was there. Moments later, if we tried to retrieve the key from that same array, sometimes the key would no longer exist!</p>

<p>After many ad-hoc debug logs, core dumps, and GDB sessions, the problem was tracked down to the <a href="https://en.wikipedia.org/wiki/String_interning" class="remarkup-link remarkup-link-ext" rel="noreferrer">string interning</a> system of Zend PHP.  String interning is a memory reduction technique. It means we only store one copy of a character sequence in RAM, even if many parts of the code use the same character sequence. For example, the words “user” and “edit” are frequently used in the MediaWiki codebase. One of those sequences is the empty string (“”), which is also used a lot in our code. This is the string we found disappearing most often from our PHP arrays. This bug affected several components, including Wikibase, the wikimedia/rdbms library, and ResourceLoader.</p>

<p>Tim used a hardware watchpoint in GDB, and traced the root cause to the Memcached client for PHP. The php-memcached client would “free” a string directly from the internal memory manager after doing some work. It did this even for “interned” strings that other parts of the program may still be depending on.</p>

<p><a href="https://phabricator.wikimedia.org/p/jijiki/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_327"><span class="phui-tag-core phui-tag-color-person">@jijiki</span></a> and <a href="https://phabricator.wikimedia.org/p/Joe/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_328"><span class="phui-tag-core phui-tag-color-person">@Joe</span></a> backported the upstream fix to our php-memcached package and deployed it to production. Thanks! — <a href="https://phabricator.wikimedia.org/T232613" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_324"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T232613</span></span></a></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📉   Outstanding reports</h5>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists error reports, grouped by the month in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Breakdown of recent months (past two weeks not included):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">March: 3 of 10 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">April: Two reports closed, 4 of 14 left.</li>
<li class="remarkup-list-item">May: (<em>All clear!</em>)</li>
<li class="remarkup-list-item">June: Two reports closed. 4 of 11 left.</li>
<li class="remarkup-list-item">July: Four reports closed, 8 of 18 left.</li>
<li class="remarkup-list-item">August: 4 of 14 reports left (<em>unchanged</em>).</li>
<li class="remarkup-list-item">September: One report closed, 8 of 12 left.</li>
<li class="remarkup-list-item">October: 8 of 12 left (<em>unchanged</em>).</li>
<li class="remarkup-list-item">November: 5 of 5 left (<em>unchanged).</em></li>
<li class="remarkup-list-item">December: Three reports closed, 6 of 9 left.</li>
<li class="remarkup-list-item"><strong>January</strong>: 7 new reports survived the month of January.</li>
</ul>

<p>There are a total of 57 reports filed in recent months that remain open. This is down from 62 last month.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉  Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:<br />
[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2020" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Incident_documentation#2019</a><br />
[2] Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/qfCVpWqGX0tJ/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a><br />
[3] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/ndeCQjeJ6UNr/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a><br />
[4] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p></div></content></entry><entry><title>Production Excellence #17: December 2019</title><link href="/phame/live/1/post/179/production_excellence_17_december_2019/" /><id>https://phabricator.wikimedia.org/phame/post/view/179/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2020-01-10T02:51:24+00:00</published><updated>2020-07-23T03:09:36+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence in November and December? Read on to find out!</p>

<h5 class="remarkup-header">📊 Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">0 documented incidents in November, 5 incidents in December. [1]</li>
<li class="remarkup-list-item">17 new Wikimedia-prod-error reports. [2]</li>
<li class="remarkup-list-item">23 Wikimedia-prod-error reports closed. [3]</li>
<li class="remarkup-list-item">190 currently open Wikimedia-prod-error reports in total. [4]</li>
</ul>

<p>November had zero reported incidents. Prior to this, the last month with no documented incidents was December 2017. To read about past incidents and unresolved actionables; check <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2019" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation § 2019</a>.</p>

<p>Explore <strong><a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia incident graphs</a></strong> (<em>interactive</em>)</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/zomaiwru36rknqh3xqqr/PHID-FILE-76bnooco3aoanq2s2n7e/cap.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_329"><img src="https://phab.wmfusercontent.org/file/data/zomaiwru36rknqh3xqqr/PHID-FILE-76bnooco3aoanq2s2n7e/cap.png" height="218" alt="cap.png (654×1 px, 33 KB)" /></a></div></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖 <em>Many dots, do not a query make!</em></h5>

<p><a href="https://phabricator.wikimedia.org/p/dcausse/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_331"><span class="phui-tag-core phui-tag-color-person">@dcausse</span></a> investigated a flood of exceptions from SpecialSearch, which reported “Cannot consume query at offset 0 (need to go to 7296)”. This exception served as a safeguard in the parser for search queries. The code path was not meant to be reached. The root cause was narrowed down to the following regex:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">/\G(?&lt;negated&gt;[-!](?=[\w]))?(?&lt;word&gt;(?:\\\\.|[!-](?!&quot;)|[^&quot;!\pZ\pC-])+)/u</pre></div>

<p>This regex looks complex, but it can actually be simplified to:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">/(?:ab|c)+/</pre></div>

<p>This regex still triggers the problematic behavior in PHP. It fails with a <tt class="remarkup-monospaced">PREG_JIT_STACKLIMIT_ERROR</tt>, when given a long string. Below is a reduced test case:</p>

<div class="remarkup-code-block" data-code-lang="php" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="nv">$ret</span> <span class="o">=</span> <span class="nf" data-symbol-name="preg_match">preg_match</span><span class="o">(</span> <span class="s1">&#039;/(?:ab|c)+/&#039;</span><span class="o">,</span> <span class="nf" data-symbol-name="str_repeat">str_repeat</span><span class="o">(</span> <span class="s1">&#039;c&#039;</span><span class="o">,</span> <span class="mi">8192</span> <span class="o">)</span> <span class="o">);</span>
<span class="k">if</span> <span class="o">(</span> <span class="nv">$ret</span> <span class="o">===</span> <span class="kc">false</span> <span class="o">)</span> <span class="o">{</span>
    <span class="nf" data-symbol-name="print">print</span><span class="o">(</span> <span class="s2">&quot;failed with: &quot;</span> <span class="o">.</span> <span class="nf" data-symbol-name="preg_last_error">preg_last_error</span><span class="o">()</span> <span class="o">);</span>
<span class="o">}</span></pre></div>



<ul class="remarkup-list">
<li class="remarkup-list-item">Fails when given 1365 contiguous c on PHP 7.0.</li>
<li class="remarkup-list-item">Fails with 2731 characters on PHP 7.2, PHP 7.1, and PHP 7.0.13.</li>
<li class="remarkup-list-item">Fails with 8192 characters on PHP 7.3. (Might be due to <a href="https://github.com/php/php-src/commit/bb2f1a683003559ada1c70166557bd7ac2845a11" class="remarkup-link remarkup-link-ext" rel="noreferrer">php-src@bb2f1a6</a>).</li>
</ul>

<p>In the end, the fix we applied was to split the regex into two separate ones, and remove the non-capturing group with a quantifier, and loop through at the PHP level (<a href="https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CirrusSearch/+/546209" class="remarkup-link remarkup-link-ext" rel="noreferrer">Gerrit change 546209</a>).</p>

<p>The lesson learned here is that the code did not properly check the return value of <tt class="remarkup-monospaced">preg_match</tt>, this is even more important as the size allowed for the JIT stack changes between PHP versions.</p>

<p>For future reference, <a href="https://phabricator.wikimedia.org/p/dcausse/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_332"><span class="phui-tag-core phui-tag-color-person">@dcausse</span></a> concluded: The regex could be optimized to support more chars (~3 times more) by using atomic groups, like so <tt class="remarkup-monospaced">/(?&gt;ab|c)+/</tt>. — <a href="https://phabricator.wikimedia.org/T236419" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_330"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T236419</span></span></a></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📉  Outstanding reports</h5>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists error reports, grouped by the month in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Or help someone that’s already started with their patch:</p>

<p>→  Open prod-error tasks with a Patch-For-Review</p>

<p>Breakdown of recent months (past two weeks not included):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">March: 3 of 10 reports left. (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">April: Three reports closed, 6 of 14 left.</li>
<li class="remarkup-list-item">May: <em>(All clear!)</em></li>
<li class="remarkup-list-item">June: Three reports closed. 6 of 11 left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">July: One report closed, 12 of 18 left.</li>
<li class="remarkup-list-item">August: Two reports closed, 4 of 14 left.</li>
<li class="remarkup-list-item">September: One report closed, with 9 of 12 left.</li>
<li class="remarkup-list-item">October: Four reports closed, 8 of 12 left.</li>
<li class="remarkup-list-item"><strong>November</strong>: 5 new reports survived the month of November.</li>
<li class="remarkup-list-item"><strong>December</strong>: 9 new reports survived the month of December.</li>
</ul>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉 Thanks!</h5>

<p>Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production.</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:<br />
[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2019" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Incident_documentation#2019</a><br />
[2] Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/AFDaPqjd5PTe/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a><br />
[3] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/YkIxmhRvEZ8R/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a><br />
[4] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p></div></content></entry><entry><title>Production Excellence #16: October 2019</title><link href="/phame/live/1/post/178/production_excellence_16_october_2019/" /><id>https://phabricator.wikimedia.org/phame/post/view/178/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2019-11-08T05:57:12+00:00</published><updated>2020-07-23T03:09:58+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h6 class="remarkup-header">📊 Month in numbers</h6>

<ul class="remarkup-list">
<li class="remarkup-list-item">3 documented incidents. [1]</li>
<li class="remarkup-list-item">33 new Wikimedia-prod-error reports. [2]</li>
<li class="remarkup-list-item">30 Wikimedia-prod-error reports closed. [3]</li>
<li class="remarkup-list-item">207 currently open Wikimedia-prod-error reports in total. [4]</li>
</ul>

<p>There were three recorded incidents last month, which is slightly below our median of the past two years (<a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Explore this data</a>). To read more about these incidents, their investigations, and pending actionables; check <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2019" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation § 2019</a>.</p>

<hr class="remarkup-hr" />

<h6 class="remarkup-header">📖 <em>To Log or not To Log</em></h6>

<p>MediaWiki uses the PSR-3 compliant Monolog library to send messages to Logstash (via <a href="https://wikitech.wikimedia.org/wiki/Logstash/Interface" class="remarkup-link remarkup-link-ext" rel="noreferrer">rsyslog and Kafka</a>). These messages are used to automatically detect (by quantity) when the production cluster is in an unstable state. For example, due to an increase in application errors when deploying code, or if a backend system is failing. Two distinct issues hampered the storing of these messages this month, and both affected us simultaneously.</p>

<p><strong>Elasticsearch mapping limit</strong></p>

<p>The Elasticsearch storage behind Logstash optimises responses to Logstash queries with an index. This index has an upper limit to how many distinct fields (or columns) it can have. When reached, messages with fields not yet in the index are discarded. Our Logstash indexes are sharded by date and source (one for “mediawiki”, one for “syslog”, and one for everthing else).</p>

<p>This meant that error messages were only stored if they only contained fields used before, by other errors stored that day. Which in turn would only succeed if that day’s columns weren’t already fully taken. A seemingly random subset of error messages was then rejected for a full day. Each day it got a new chance at reserving its columns, so long as the specific kind of error is triggered early enough.</p>

<p>To unblock deployment automation and monitoring of MediaWiki, an interim solution was devised. The subset of messages from “mediawiki” that deal with application errors now have their own index shard. These error reports follow a consistent structure, and contain no free-form context fields. As such, this index (hopefully) can’t reach its mapping limit or suffer message loss.</p>

<p>The general index mapping limit was also raised from 1000 to 2000. For now that means we’re not dropping any non-critical/debug messages. More information about the incident at <a href="https://phabricator.wikimedia.org/T234564" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_333"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T234564</span></span></a>. The general issue with accommodating debug messages in Logstash long-term, is tracked at <a href="https://phabricator.wikimedia.org/T180051" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_334"><span class="phui-tag-core phui-tag-color-object">T180051</span></a>. Thanks <a href="https://phabricator.wikimedia.org/p/matmarex/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_336"><span class="phui-tag-core phui-tag-color-person">@matmarex</span></a>, <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_337"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a>, and <a href="https://phabricator.wikimedia.org/p/herron/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_338"><span class="phui-tag-core phui-tag-color-person">@herron</span></a>.</p>

<p><strong>Crash handling</strong></p>

<p>Wikimedia’s PHP configuration has a “crash handler” that kicks in if everything else fails. For example, when the memory limit or execution timeout is reached, or if some crucial part of MediaWiki fails very early on. In that case our crash handler renders a Wikimedia-branded system error page (separate from MediaWiki and its skins). It also increments a counter metric for monitoring purposes, and sends a detailed report to Logstash. In migrating the crash handler from HHVM to PHP7, one part of the puzzle was forgotten. Namely the Logstash configuration that forwards these reports from php-fpm’s syslog channel to the one for mediawiki.</p>

<p>As such, our deployment automation and several Logstash dashboards were blind to a subset of potential fatal errors for a few days. Regressions during that week were instead found by manually digging through the raw feed of the php-fpm channel instead. As a temporary measure, Scap was updated to consider the php-fpm’s channel as well in its automation that decides whether a deployment is “green”.</p>

<p>We’ve created new Logstash configurations that forward PHP7 crashes in a similar way as we did for HHVM in the past. Bookmarked MW dashboards/queries you have for Logstash now provide a complete picture once again. Thanks <a href="https://phabricator.wikimedia.org/p/jijiki/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_339"><span class="phui-tag-core phui-tag-color-person">@jijiki</span></a> and <a href="https://phabricator.wikimedia.org/p/colewhite/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_340"><span class="phui-tag-core phui-tag-color-person">@colewhite</span></a>! – <a href="https://phabricator.wikimedia.org/T234283" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_335"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T234283</span></span></a></p>

<hr class="remarkup-hr" />

<h6 class="remarkup-header">📉  Outstanding reports</h6>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists error reports, grouped by the month in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Or help someone that’s already started with their patch:<br />
→  <a href="https://phabricator.wikimedia.org/maniphest/query/CFzrDj3vFbE_/#R" class="remarkup-link" rel="noreferrer">Open prod-error tasks with a Patch-For-Review</a></p>

<p>Breakdown of recent months (past two weeks not included):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">March: 1 report fixed. (3 of 10 reports left).</li>
<li class="remarkup-list-item">April: 8 of 14 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">May: (<em>All clear!</em>)</li>
<li class="remarkup-list-item">June: 9 of 11 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">July: 13 of 18 reports left (<em>unchanged</em>).</li>
<li class="remarkup-list-item">August: 2 reports were fixed! (6 of 14 reports left).</li>
<li class="remarkup-list-item">September: 2 reports were fixed! (10 of 12 new reports left).</li>
<li class="remarkup-list-item"><strong>October</strong>: 12 new reports survived the month of October.</li>
</ul>

<hr class="remarkup-hr" />

<h6 class="remarkup-header">🎉 Thanks!</h6>

<p>Thank you, to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">🌴“<em>Gotta love crab. In time, too. I couldn&#039;t take much more of those coconuts. Coconut milk is a natural laxative. That&#039;s something Gilligan never told us.</em>”</div>
<div class="remarkup-reply-body"></div>
</blockquote>

<p>Footnotes:<br />
[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201910&amp;namespace=0&amp;hideredirects=1&amp;stripprefix=1" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident…</a><br />
[2] Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/sJI9Af6LqvKL/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a><br />
[3] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/scW28HMEJemU/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a><br />
[4] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p></div></content></entry><entry><title>Introducing Phatality</title><link href="/phame/live/1/post/177/introducing_phatality/" /><id>https://phabricator.wikimedia.org/phame/post/view/177/</id><author><name>mmodell (Mukunda Modell)</name></author><published>2019-10-07T00:36:27+00:00</published><updated>2019-10-18T13:39:06+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><h3 class="remarkup-header">Introduction</h3>

<p>This past week marks the release of a little tool that I&#039;ve been working on for a while. In fact, it&#039;s something I&#039;ve wanted to build for more than a year. But before I tell you about the solution, I need to describe the problem that I set out to solve.</p>

<h4 class="remarkup-header">Problem</h4>

<p>Production errors are tracked with the tag <a href="/tag/wikimedia-production-error/" class="phui-tag-view phui-tag-type-shade phui-tag-yellow phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_345"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-tags" data-meta="0_344" aria-hidden="true"></span>Wikimedia-production-error</span></a>. As a member of the <a href="/tag/release-engineering-team/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_347"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_346" aria-hidden="true"></span>Release-Engineering-Team</span></a>, I&#039;ve spent a significant amount of time copying details from Kibana log entries and pasting into the <a href="https://phabricator.wikimedia.org/maniphest/task/edit/form/46/" class="remarkup-link" rel="noreferrer">Production Error Report form</a> here in Phabricator. There are several of us who do this on a regular basis, including most of my team and several others as well.  I don&#039;t know precisely how much time is spent on error reporting but at least a handful of people are going through this process several times each week.</p>

<p>This is what lead to the idea for <a href="/source/Phatality/" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_343"><span class="phui-tag-core phui-tag-color-object">rPHAT Phatality</span></a>: I recognized immediately that if I could streamline the process and save even a few seconds each time, the aggregate time savings could really add up quickly.</p>

<h4 class="remarkup-header">Solution</h4>

<p>So after considering a few ways in which the process could be automated or otherwise streamlined, I finally focused on what seemed like the most practical: build a Kibana plugin that will format the log details and send them over to Phabricator, eliminating the tedious series of copy/paste operations.</p>

<p>Phatality has a couple of other tricks up it&#039;s sleeve but the essence of it is just that: capture all of the pertinent details from a single log message in Kibana and send it to Phabricator all at once with the click of a button in Kibana.</p>

<p><span class="phabricator-remarkup-embed-layout-inline"><a href="https://phab.wmfusercontent.org/file/data/hlfxmk4xaxewuwj2y22a/PHID-FILE-xrzqnjd63sfdzox57es7/Screenshot_from_2019-10-06_13-58-11.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_341"><img src="https://phab.wmfusercontent.org/file/data/hlfxmk4xaxewuwj2y22a/PHID-FILE-xrzqnjd63sfdzox57es7/Screenshot_from_2019-10-06_13-58-11.png" height="216" width="631" loading="lazy" alt="Phatality screenshot showing the submit and search buttons" /></a></span></p>

<p>Clicking the [<span class="visual-only phui-icon-view phui-font-fa fa-plus" data-meta="0_348" aria-hidden="true"></span>Submit] button, as seen in the above screenshot, will take you to the phabricator Production Error form with all of the details pre-filled and ready to submit:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/i42xfvacllxy3i4t3b6f/PHID-FILE-luvx2lyqgx5cz7oub2ny/Screenshot_from_2019-10-06_14-05-09.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_342"><img src="https://phab.wmfusercontent.org/file/data/i42xfvacllxy3i4t3b6f/PHID-FILE-luvx2lyqgx5cz7oub2ny/Screenshot_from_2019-10-06_14-05-09.png" height="742" width="990" loading="lazy" alt="Screenshot from 2019-10-06 14-05-09.png (742×990 px, 81 KB)" /></a></div></p>

<h4 class="remarkup-header">Conclusion</h4>

<p>Now that Phatality is deployed to production and a few of us have had a chance to use it to submit error reports, I can say that I definitely think it was a worthwhile effort. The Kibana plugin wasn&#039;t terribly difficult to write, and thanks to <a href="https://phabricator.wikimedia.org/p/fgiunchedi/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_349"><span class="phui-tag-core phui-tag-color-person">@fgiunchedi</span></a>&#039;s help, the deployment went fairly smoothly. Phatality definitely streamlines the reporting process, saving several clicks each time and ensuring accuracy in the details that get sent to Phabricator.  In a future version of the tool I plan to add more features such as duplicate detection to help avoid duplicate submissions.</p>

<p>If you use Wikimedia&#039;s Kibana to report errors in Phabricator then I encourage you to look for the Phatality tab in the log details section and save some clicks!</p>

<p>What other repetitive tasks are ripe for automation? I&#039;d love to hear suggestions and ideas in the comments.</p></div></content></entry><entry><title>Integrating code coverage metrics with your development workflow</title><link href="/phame/live/1/post/174/integrating_code_coverage_metrics_with_your_development_workflow/" /><id>https://phabricator.wikimedia.org/phame/post/view/174/</id><author><name>kostajh (Kosta Harlan)</name></author><published>2019-10-09T10:04:00+00:00</published><updated>2019-11-23T14:01:50+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In <a href="https://phabricator.wikimedia.org/phame/post/view/169/changes_and_improvements_to_phpunit_testing_in_mediawiki/" class="remarkup-link" rel="noreferrer">Changes and improvements to PHPUnit testing in MediaWiki</a>, I wrote about efforts to help speed up PHPUnit code coverage generation for local development.[0] While this improves code coverage generation time for local development, it could be better.</p>

<p>As the <a href="https://www.mediawiki.org/wiki/Manual:PHP_unit_testing/Code_coverage" class="remarkup-link remarkup-link-ext" rel="noreferrer">Manual:PHP unit testing/Code coverage</a> page advises, adjusting the whitelist in the PHPUnit XML configuration can speed things up dramatically.  The problem is, adjusting that file is a manual process and a little cumbersome, so I usually didn&#039;t do it. And then because code coverage generation reports were slow locally[1], I ended up not running them while working on a patch. True, you will get feedback on code coverage metrics from CI, but it would be nicer if you could quickly get this information in your local environment first.</p>

<p>This was the motivation to add a Composer script in MediaWiki core that will help you adjust the PHPUnit coverage whitelist quickly while you&#039;re working on a patch for an extension or skin.</p>

<p>You can run it with <tt class="remarkup-monospaced">composer phpunit:coverage-edit -- extensions/$EXT_NAME</tt>, e.g. <tt class="remarkup-monospaced">composer phpunit:coverage-edit -- extensions/GrowthExperiments</tt>.</p>

<p>The <a href="https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/composer/ComposerPhpunitXmlCoverageEdit.php" class="remarkup-link" rel="noreferrer">ComposerPhpunitXmlCoverageEdit.php</a> script copies the <tt class="remarkup-monospaced">phpunit.xml.dist</tt> file to <tt class="remarkup-monospaced">phpunit.xml</tt> (not version controlled), and modifies the whitelist to add directories for that extension/skin. <tt class="remarkup-monospaced">vendor/bin/phpunit</tt> then reads <tt class="remarkup-monospaced">phpunit.xml</tt> instead of the <tt class="remarkup-monospaced">phpunit.xml.dist</tt> file. Tip: Make sure &quot;Edit configurations&quot; in your IDE (PhpStorm in my case) is using <tt class="remarkup-monospaced">vendor/bin/phpunit</tt> and <tt class="remarkup-monospaced">phpunit.xml</tt>, not <tt class="remarkup-monospaced">phpunit.xml.dist</tt>, when executing the tests.</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/3we3kpnm4hkkmsc3tx3u/PHID-FILE-rmrmzvise44yofzixyve/phpunit.gif" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_350"><img src="https://phab.wmfusercontent.org/file/data/3we3kpnm4hkkmsc3tx3u/PHID-FILE-rmrmzvise44yofzixyve/phpunit.gif" height="834" width="1427" loading="lazy" alt="generating phpunit.xml and running code coverage in phpstorm" /></a></div></p>

<p>When you want to reset your configuration, you can <tt class="remarkup-monospaced">rm phpunit.xml</tt> and <tt class="remarkup-monospaced">vendor/bin/phpunit</tt> will read from <tt class="remarkup-monospaced">phpunit.xml.dist</tt> again.</p>

<p>Further improvements to the script could include:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Reading the <tt class="remarkup-monospaced">extension.json</tt> file to determine which directories to add to the whitelist, rather than using a hardcoded list (<a href="https://phabricator.wikimedia.org/T235029" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_351"><span class="phui-tag-core phui-tag-color-object">T235029</span></a>)</li>
<li class="remarkup-list-item">Allow passing arbitrary directories/filenames, e.g. for working with subsections of core or of a larger extension (<a href="https://phabricator.wikimedia.org/T235030" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_352"><span class="phui-tag-core phui-tag-color-object">T235030</span></a>)</li>
<li class="remarkup-list-item">Adding a flag for flipping the <tt class="remarkup-monospaced">addUncoveredFilesFromWhitelist</tt> property, so that <tt class="remarkup-monospaced">phpunit-suite-edit.py</tt> in the <a href="https://phabricator.wikimedia.org/source/integration-config/" class="remarkup-link" rel="noreferrer">integration/config repo</a> could be removed in favor of the Composer script (<a href="https://phabricator.wikimedia.org/T235031" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_353"><span class="phui-tag-core phui-tag-color-object">T235031</span></a>)</li>
</ul>

<p>Thanks to <a href="https://phabricator.wikimedia.org/p/Mainframe98/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_354"><span class="phui-tag-core phui-tag-color-person">@Mainframe98</span></a> and <a href="https://phabricator.wikimedia.org/p/Krinkle/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_355"><span class="phui-tag-core phui-tag-color-person">@Krinkle</span></a> for review of the patch and to <a href="https://phabricator.wikimedia.org/p/AnneT/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_356"><span class="phui-tag-core phui-tag-color-person">@AnneT</span></a> for reviewing this post. Happy hacking!</p>

<hr class="remarkup-hr" />

<p>[0] [[ <a href="https://gerrit.wikimedia.org/r/c/mediawiki/core/+/520459" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit.wikimedia.org/r/c/mediawiki/core/+/520459</a> | One patch changed <tt class="remarkup-monospaced">&lt;whitelist addUncoveredFilesFromWhitelist=&quot;true&quot;&gt;</tt> to <tt class="remarkup-monospaced">false</tt> ]] to help speed up PHPUnit code coverage generation, the [[ <a href="https://gerrit.wikimedia.org/r/c/integration/config/+/521190" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit.wikimedia.org/r/c/integration/config/+/521190</a> | second patch flipped the flag back to <tt class="remarkup-monospaced">true</tt> in CI ]] for generating complete coverage reports.<br />
[1] For <a href="https://phabricator.wikimedia.org/diffusion/EGRE/" class="remarkup-link" rel="noreferrer">GrowthExperiments</a>, generating coverage reports without a customized whitelist takes ~17 seconds. With a custom whitelist, it takes ~1 second. While 17 seconds is arguably not a lot of time, the near-instant feedback with a customized whitelist means one is less likely to face interruptions to their flow or concentration while working on a patch.</p></div></content></entry><entry><title>Production Excellence #15: September 2019</title><link href="/phame/live/1/post/173/production_excellence_15_september_2019/" /><id>https://phabricator.wikimedia.org/phame/post/view/173/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2019-10-24T23:25:57+00:00</published><updated>2020-04-03T16:16:21+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📊 Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">5 documented incidents. [1]</li>
<li class="remarkup-list-item">22 new errors reported. [2]</li>
<li class="remarkup-list-item">31 error reports closed. [3]</li>
<li class="remarkup-list-item">213 currently open Wikimedia-prod-error reports in total. [4]</li>
</ul>

<p>There were five recorded incidents last month, equal to the median for this and last year. – <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Explore this data</a>.</p>

<p>To read more about these incidents, their investigations, and pending actionables; check <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2019" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation § 2019</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">*️⃣ <em>A Tale of Three Great Upgrades</em></h5>

<p>This month saw three major upgrades across the MediaWiki stack.</p>

<h6 class="remarkup-header">Migrate from HHVM to PHP 7.2</h6>

<p>The client-side switch to toggle between HHVM and PHP 7.2 saw its final push — from the 50% it was at previously, to 100% of page view sessions on 17 September. The switch further solidified on 24 September when static MediaWiki traffic followed suit (e.g. API and ResourceLoader). Thanks <a href="https://phabricator.wikimedia.org/p/jijiki/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_362"><span class="phui-tag-core phui-tag-color-person">@jijiki</span></a> and <a href="https://phabricator.wikimedia.org/p/Joe/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_363"><span class="phui-tag-core phui-tag-color-person">@Joe</span></a> for the final push. – More details at <a href="https://phabricator.wikimedia.org/T219150" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_357"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T219150</span></span></a> and <a href="https://phabricator.wikimedia.org/T176370" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_358"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T176370</span></span></a>.</p>

<h6 class="remarkup-header">Drop support for IE6 and IE7</h6>

<p>The RFC to discontinue basic compatibility for the IE6 and IE7 browsers entered Last Call on 18 September. It was approved on 2 Oct (<a href="https://phabricator.wikimedia.org/T232563" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_359"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T232563</span></span></a>). Thanks to <a href="https://phabricator.wikimedia.org/p/Volker_E/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_364"><span class="phui-tag-core phui-tag-color-person">@Volker_E</span></a> for leading the sprint to optimise our CSS payloads by removing now-redundant style rules for IE6-7 compat. – More at <a href="https://phabricator.wikimedia.org/T234582" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_360"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T234582</span></span></a>.</p>

<h6 class="remarkup-header">Transition from PHPUnit 4/6 to PHPUnit 8</h6>

<p>With HHVM behind us, our Composer configuration no longer needs to be compatible with a “PHP 5.6 like” run-time. Support for the real PHP 5.6 was dropped over 2 years ago, and the HHVM engine supports PHP 7 features. But, the HHVM engine identifies as “PHP 5.6.999-hhvm”. As such, Composer refused to install PHPUnit 6 (which requires PHP 7.0+). Instead, Composer could only install PHPUnit 4 under HHVM (as for PHP 5.6). Our unit tests have had to remain compatible with both PHPUnit 4 and PHPUnit 6 simultaneously.</p>

<p>Now that we’re fully on PHP 7.2+, our Composer configuration effectively drops PHP 5.6, 7.0 and 7.1 all at once. This means that we no longer run PHPUnit tests on multiple PHPUnit versions (PHPUnit 6 only). The upgrade to PHPUnit 8 (PHP 7.2+) is also unlocked! Thanks <a href="https://phabricator.wikimedia.org/p/MaxSem/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_365"><span class="phui-tag-core phui-tag-color-person">@MaxSem</span></a>, <a href="https://phabricator.wikimedia.org/p/Jdforrester-WMF/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_366"><span class="phui-tag-core phui-tag-color-person">@Jdforrester-WMF</span></a> and <a href="https://phabricator.wikimedia.org/p/Daimona/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_367"><span class="phui-tag-core phui-tag-color-person">@Daimona</span></a> for leading this transition. – <a href="https://phabricator.wikimedia.org/T192167" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_361"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T192167</span></span></a></p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📉  Outstanding reports</h5>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists error reports, grouped by the month in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Or help someone that’s already started with their patch:<br />
→  <a href="https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R" class="remarkup-link" rel="noreferrer">Open prod-error tasks with a Patch-For-Review</a></p>

<p>Breakdown of recent months (past two weeks not included):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">February: 1 report was closed. (1 / 5 reports left).</li>
<li class="remarkup-list-item">March: 4 / 10 reports left (<em>unchanged</em>).</li>
<li class="remarkup-list-item">April: 8 / 14 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">May: The last 4 reports were resolved. Done! ❇️</li>
<li class="remarkup-list-item">June: 9 of 11 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">July: 4 reports were fixed! (13 / 18 reports left).</li>
<li class="remarkup-list-item">August: 6 reports were fixed! (8 / 14 reports left).</li>
<li class="remarkup-list-item"><strong>September</strong>: 12 new reports survived the month of September.</li>
</ul>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉 Thanks!</h5>

<p>Thank you, to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">📖“<em>I&#039;m not crazy about reality, but it&#039;s still the only place to get a decent meal.</em>”</div>
<div class="remarkup-reply-body"></div>
</blockquote>

<p>Footnotes:</p>

<p>[1] Incidents. –<br />
<a href="https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201909&amp;namespace=0&amp;hideredirects=1&amp;stripprefix=1" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident…</a></p>

<p>[2] Tasks created. –<br />
<a href="https://phabricator.wikimedia.org/maniphest/query/XicVcsN1XkVH/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[3] Tasks closed. –<br />
<a href="https://phabricator.wikimedia.org/maniphest/query/SXjsllmYHwAO/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[4] Open tasks. –<br />
<a href="https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p></div></content></entry><entry><title>Production Excellence #14: August 2019</title><link href="/phame/live/1/post/172/production_excellence_14_august_2019/" /><id>https://phabricator.wikimedia.org/phame/post/view/172/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2019-10-03T04:27:16+00:00</published><updated>2020-04-03T16:20:49+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence in August? Read on to find out!</p>

<h5 class="remarkup-header">📊 Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">3 documented incidents. [1]</li>
<li class="remarkup-list-item">42 new Wikimedia-prod-error reports. [2]</li>
<li class="remarkup-list-item">31 Wikimedia-prod-error reports closed. [3]</li>
<li class="remarkup-list-item">210 currently open Wikimedia-prod-error reports in total. [4]</li>
</ul>

<p>The number of recorded incidents in August, at three, was below average for the year so far. However, in previous years (2017-2018), August also has 2-3 incidents. – <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Explore this data</a>.</p>

<p>To read more about these incidents, their investigations, and pending actionables; check <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2019" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation § 2019</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">*️⃣ <em>When you have eliminated the impossible...</em></h5>

<p>Reports from Logstash indicated that some user requests were aborted by a fatal PHP error from the MessageCache class. The user would be shown a generic system error page. The affected requests didn’t seem to have anything obvious in common, however. This made it difficult to diagnose.</p>

<p>MessageCache is responsible for fetching interface messages, such as the localised word “Edit” on the edit button. It calls a “load()” function and then tries to access the loaded information. However, sometimes the load function would claimed to have finished its work, but yet the information was not there.</p>

<p>When the load function initialises all the messages for a particular language, it keeps track of this, so as to not do the same a second time. From any one angle I could look at this code, no obvious mistakes stood out. A deeper investigation revealed that two unrelated changes (more than a year apart), each broke 1 assumption that was safe to break. But, put together, and this seemingly impossible problem emerges. Check out <a href="https://phabricator.wikimedia.org/T208897#5373846" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_368"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T208897#5373846</span></span></a> for the details of the investigation.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📉  Outstanding reports</h5>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists error reports, grouped by the month in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Or help someone that’s already started with their patch:<br />
→  <a href="https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R" class="remarkup-link" rel="noreferrer">Open prod-error tasks with a Patch-For-Review</a></p>

<p>Breakdown of recent months (past two weeks not included):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">January: 1 report left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">February: 2 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">March: 4 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">April: 2 reports got fixed! (8 of 14 reports left). ❇️</li>
<li class="remarkup-list-item">May: 4 of 10 reports left (<em>unchanged</em>).</li>
<li class="remarkup-list-item">June: 1 report got fixed! (8 of 11 reports left). ❇️</li>
<li class="remarkup-list-item">July: 2 reports got fixed (17 of 18 reports left).</li>
<li class="remarkup-list-item"><strong>August</strong>: 14 new reports remain unsolved.</li>
<li class="remarkup-list-item"><strong>September</strong>: 11 new reports remain unsolved.</li>
</ul>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉 Thanks!</h5>

<p>Thank you to <a href="https://phabricator.wikimedia.org/p/aaron/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_369"><span class="phui-tag-core phui-tag-color-person">@aaron</span></a>, <a href="https://phabricator.wikimedia.org/p/Catrope/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_370"><span class="phui-tag-core phui-tag-color-person">@Catrope</span></a>, <a href="https://phabricator.wikimedia.org/p/Daimona/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_371"><span class="phui-tag-core phui-tag-color-person">@Daimona</span></a>, <a href="https://phabricator.wikimedia.org/p/dbarratt/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_372"><span class="phui-tag-core phui-tag-color-person">@dbarratt</span></a>, <a href="https://phabricator.wikimedia.org/p/Jdforrester-WMF/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_373"><span class="phui-tag-core phui-tag-color-person">@Jdforrester-WMF</span></a>, <a href="https://phabricator.wikimedia.org/p/kostajh/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_374"><span class="phui-tag-core phui-tag-color-person">@kostajh</span></a>, <a href="https://phabricator.wikimedia.org/p/pmiazga/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_375"><span class="phui-tag-core phui-tag-color-person">@pmiazga</span></a>, <a href="https://phabricator.wikimedia.org/p/Tarrow/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_376"><span class="phui-tag-core phui-tag-color-person">@Tarrow</span></a>, <a href="https://phabricator.wikimedia.org/p/zeljkofilipin/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_377"><span class="phui-tag-core phui-tag-color-person">@zeljkofilipin</span></a>, and everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">🎭“<em>I think you should call it Seb&#039;s because no one will come to a place called Chicken on a Stick.</em>”</div>
<div class="remarkup-reply-body"></div>
</blockquote>

<p>Footnotes:</p>

<p>[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201908&amp;namespace=0&amp;hideredirects=1&amp;stripprefix=1" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident…</a></p>

<p>[2] Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/8fpsoBLrmlFu/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[3] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/U9.KRVNW52Yb/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[4] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p></div></content></entry><entry><title>Changes and improvements to PHPUnit testing in MediaWiki</title><link href="/phame/live/1/post/169/changes_and_improvements_to_phpunit_testing_in_mediawiki/" /><id>https://phabricator.wikimedia.org/phame/post/view/169/</id><author><name>kostajh (Kosta Harlan)</name></author><published>2019-07-16T04:13:53+00:00</published><updated>2020-11-25T10:32:58+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Building off the work done at the <a href="https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2019" class="remarkup-link remarkup-link-ext" rel="noreferrer">Prague Hackathon</a> (<a href="https://phabricator.wikimedia.org/T216260" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_380"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T216260</span></span></a>), we&#039;re happy to announce some significant changes and improvements to the PHP testing tools included with MediaWiki.</p>

<h3 class="remarkup-header">PHP unit tests can now be run statically, without installing MediaWiki</h3>

<p>You can now download MediaWiki, run <tt class="remarkup-monospaced">composer install</tt>, and then <tt class="remarkup-monospaced">composer phpunit:unit</tt> to run core&#039;s unit test suite (<a href="https://phabricator.wikimedia.org/T89432" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_381"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T89432</span></span></a>).</p>

<h3 class="remarkup-header">The standard PHPUnit entrypoint can be used, instead of the PHPUnit Maintenance class</h3>

<p>You can now use the plain PHPUnit entrypoint at <tt class="remarkup-monospaced">vendor/bin/phpunit</tt> instead of the MediaWiki maintenance class which wraps PHPUnit (<tt class="remarkup-monospaced">tests/phpunit/phpunit.php</tt>).</p>

<p>Both the unit tests and integration tests can be executed with the standard <tt class="remarkup-monospaced">phpunit</tt> entrypoint (<tt class="remarkup-monospaced">vendor/bin/phpunit</tt>) or if you prefer, with the composer scripts defined in <tt class="remarkup-monospaced">composer.json</tt> (e.g. <tt class="remarkup-monospaced">composer phpunit:unit</tt>). We accomplished this by writing a new <tt class="remarkup-monospaced">bootstrap.php</tt> file (the old one which the maintenance class uses was moved to <tt class="remarkup-monospaced">tests/phpunit/bootstrap.maintenance.php</tt>) which executes the minimal amount of code necessary to make core, extension and skin classes discoverable by test classes.</p>

<h3 class="remarkup-header">Tests should be placed in <tt class="remarkup-monospaced">tests/phpunit/{integration,unit}</tt></h3>

<p>Integration tests should be placed in <tt class="remarkup-monospaced">tests/phpunit/integration</tt> while unit tests go in <tt class="remarkup-monospaced">tests/phpunit/unit</tt>, these are discoverable by the new test suites (<a href="https://phabricator.wikimedia.org/T87781" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_382"><span class="phui-tag-core phui-tag-color-object">T87781</span></a>). It sounds obvious now to write this, but a nice side effect is that by organizing tests into these directories it&#039;s immediately clear to authors and reviewers what type of test one is looking at.</p>

<h3 class="remarkup-header">Introducing MediaWikiUnitTestCase</h3>

<p>A new base test case, <tt class="remarkup-monospaced">MediaWikiUnitTestCase</tt> has been introduced with a minimal amount of boilerplate (<tt class="remarkup-monospaced">@covers</tt> validator, ensuring the globals are disabled, and that the tests are in the proper directory, the default PHPUnit 4 and 6 compatibility layer). The <tt class="remarkup-monospaced">MediaWikiTestCase</tt> has been renamed to <tt class="remarkup-monospaced">MediaWikiIntegrationTestCase</tt> for clarity.</p>

<h3 class="remarkup-header">Please migrate tests to be unit tests where appropriate</h3>

<p>A significant portion of core&#039;s unit tests have been ported  to use <tt class="remarkup-monospaced">MediaWikiUnitTestCase</tt>, approximately 50% of the total. We have also worked on porting extension tests to the unit/integration directories. <a href="https://phabricator.wikimedia.org/p/Ladsgroup/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_384"><span class="phui-tag-core phui-tag-color-person">@Ladsgroup</span></a> wrote a helpful script to assist with automating the identification and moving of unit tests, see <a href="https://phabricator.wikimedia.org/P8702" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_378"><span class="phui-tag-core phui-tag-color-object">P8702</span></a>. Migrating tests from <tt class="remarkup-monospaced">MediaWikiIntegrationTestCase</tt> to <tt class="remarkup-monospaced">MediaWikiUnitTestCase</tt> makes them faster.</p>

<p>Note that unit tests in CI are still run with the PHPUnit maintenance class (<tt class="remarkup-monospaced">tests/phpunit/phpunit.php</tt>), so when reviewing unit test patches please execute them locally with <tt class="remarkup-monospaced">vendor/bin/phpunit /path/to/tests/phpunit/unit</tt> or <tt class="remarkup-monospaced">composer phpunit -- /path/to/tests/phpunit/unit</tt>.</p>

<h3 class="remarkup-header">Generating code coverage is now faster</h3>

<p>The PHPUnit configuration file now resides at the root of the repository, and is called <tt class="remarkup-monospaced">phpunit.xml.dist</tt>. (As an aside, you can copy this to <tt class="remarkup-monospaced">phpunit.xml</tt> and make local changes, as that file is git-ignored, although you should not need to do that.) We made a modification (<a href="https://phabricator.wikimedia.org/T192078" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_383"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T192078</span></span></a>) to the PHPUnit configuration inside MediaWiki to speed up code coverage generation. This makes it feasible to have a split window in your IDE (e.g. PhpStorm), run &quot;Debug with coverage&quot;, and see the results in your editor fairly quickly after running the tests.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/gbkcvv6pirmvwrgiloix/PHID-FILE-uhpm6ku5jxgvqds2jt6g/image.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_379"><img src="https://phab.wmfusercontent.org/file/data/gbkcvv6pirmvwrgiloix/PHID-FILE-uhpm6ku5jxgvqds2jt6g/image.png" height="410" alt="Debug coverage in PhpStorm" /></a></div></p>

<h3 class="remarkup-header">What is next?</h3>

<p>Things we are working on:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Porting core tests to integration/unit</li>
<li class="remarkup-list-item">Porting extension tests to integration/unit.</li>
<li class="remarkup-list-item">Removing legacy testsuites or ensuring they can be run in a different way (passing the directory name for example).</li>
<li class="remarkup-list-item">Switching CI to use new entrypoint for unit tests, then for unit and integration tests</li>
</ul>

<p>Help is wanted in all areas of the above! We can be found in the <tt class="remarkup-monospaced">#wikimedia-codehealth</tt> channel and via the phab issues linked in this post.</p>

<h3 class="remarkup-header">Credits</h3>

<p>The above work has been done and supported by Máté (<a href="https://phabricator.wikimedia.org/p/TK-999/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_385"><span class="phui-tag-core phui-tag-color-person">@TK-999</span></a>), Amir (<a href="https://phabricator.wikimedia.org/p/Ladsgroup/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_386"><span class="phui-tag-core phui-tag-color-person">@Ladsgroup</span></a>), Kosta (<a href="https://phabricator.wikimedia.org/p/kostajh/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_387"><span class="phui-tag-core phui-tag-color-person">@kostajh</span></a>), James (<a href="https://phabricator.wikimedia.org/p/Jdforrester-WMF/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_388"><span class="phui-tag-core phui-tag-color-person">@Jdforrester-WMF</span></a>), Timo (<a href="https://phabricator.wikimedia.org/p/Krinkle/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_389"><span class="phui-tag-core phui-tag-color-person">@Krinkle</span></a>),  Leszek (<a href="https://phabricator.wikimedia.org/p/WMDE-leszek/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_390"><span class="phui-tag-core phui-tag-color-person">@WMDE-leszek</span></a>), Kunal (<a href="https://phabricator.wikimedia.org/p/Legoktm/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_391"><span class="phui-tag-core phui-tag-color-person">@Legoktm</span></a>), Daniel (<a href="https://phabricator.wikimedia.org/p/daniel/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_392"><span class="phui-tag-core phui-tag-color-person">@daniel</span></a>), Michael Große (<a href="https://phabricator.wikimedia.org/p/Michael/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_393"><span class="phui-tag-core phui-tag-color-person">@Michael</span></a>), Adam (<a href="https://phabricator.wikimedia.org/p/awight/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_394"><span class="phui-tag-core phui-tag-color-person">@awight</span></a>), Antoine (<a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_395"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a>), JR (<a href="https://phabricator.wikimedia.org/p/Jrbranaa/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_396"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@Jrbranaa</span></a>) and Greg (<a href="https://phabricator.wikimedia.org/p/greg/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_397"><span class="phui-tag-core phui-tag-color-person">@greg</span></a>) along with several others. Thank you!</p>

<p>thanks for reading, and happy testing!</p>

<p>Amir, Kosta, &amp; Máté</p></div></content></entry><entry><title>Production Excellence #13: July 2019</title><link href="/phame/live/1/post/164/production_excellence_13_july_2019/" /><id>https://phabricator.wikimedia.org/phame/post/view/164/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2019-08-30T20:08:00+00:00</published><updated>2020-04-03T16:30:28+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’re we doing on that strive for operational excellence? Read this first anniversary edition to find out!</p>

<h5 class="remarkup-header">📊 Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">5 documented incidents. [1]</li>
<li class="remarkup-list-item">53 new Wikimedia-prod-error reports. [2]</li>
<li class="remarkup-list-item">44 closed Wikimedia-prod-error reports. [3]</li>
<li class="remarkup-list-item">218 currently open Wikimedia-prod-error reports in total. [4]</li>
</ul>

<p>The number of recorded incidents over the past month, at five, is equal to the median number of incidents per month (2016-2019). – <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">Explore this data</a>.</p>

<p>To read more about these incidents, their investigations, and pending actionables; check <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2019" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation § 2019</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖 One year of Excellent adventures!</h5>

<p>Exactly one year ago this periodical started to provide regular insights on production stability. The idea was to shorten the feedback cycle between deployment of code that leads to fatal errors and the discovery of those errors. This allows more people to find reports earlier, which (hopefully) prevents them from sneaking into a growing pile of “normal” errors.</p>

<p>576 reports were created between 15 July 2018 and 31 July 2019 (tagged Wikimedia-prod-error).<br />
425 reports got closed over that same time period.</p>

<p>Read the <a href="https://phabricator.wikimedia.org/phame/post/view/119/production_excellence_september_2018/" class="remarkup-link" rel="noreferrer">first issue in story format</a>, or the <a href="https://lists.wikimedia.org/pipermail/wikitech-l/2018-July/090363.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">initial e-mail</a>.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📉  Outstanding reports</h5>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists error reports, grouped by the month in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Or help someone who already started with their patch:<br />
→  <a href="https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R" class="remarkup-link" rel="noreferrer">Open prod-error tasks with a Patch-For-Review</a></p>

<p>Breakdown of recent months (past two weeks not included):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">November: 1 report left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">December: 3 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">January: 1 report left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">February: 2 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">March: 4 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">April: 10 of 14 reports left (<em>unchanged</em>). ⚠️</li>
<li class="remarkup-list-item">May: 2 reports got fixed! (4 of 10 reports left). ❇️</li>
<li class="remarkup-list-item">June: 2 reports got fixed! (9 of 11 reports left). ❇️</li>
<li class="remarkup-list-item"><strong>July</strong>: 18 new reports from last month remain unsolved.</li>
</ul>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉 Thanks!</h5>

<p>Thank you to <a href="https://phabricator.wikimedia.org/p/aaron/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_398"><span class="phui-tag-core phui-tag-color-person">@aaron</span></a>, <a href="https://phabricator.wikimedia.org/p/Anomie/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_399"><span class="phui-tag-core phui-tag-color-person">@Anomie</span></a>, <a href="https://phabricator.wikimedia.org/p/ArielGlenn/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_400"><span class="phui-tag-core phui-tag-color-person">@ArielGlenn</span></a>, <a href="https://phabricator.wikimedia.org/p/Catrope/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_401"><span class="phui-tag-core phui-tag-color-person">@Catrope</span></a>, <a href="https://phabricator.wikimedia.org/p/cscott/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_402"><span class="phui-tag-core phui-tag-color-person">@cscott</span></a>, <a href="https://phabricator.wikimedia.org/p/Daimona/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_403"><span class="phui-tag-core phui-tag-color-person">@Daimona</span></a>, <a href="https://phabricator.wikimedia.org/p/dbarratt/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_404"><span class="phui-tag-core phui-tag-color-person">@dbarratt</span></a>, <a href="https://phabricator.wikimedia.org/p/dcausse/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_405"><span class="phui-tag-core phui-tag-color-person">@dcausse</span></a>, <a href="https://phabricator.wikimedia.org/p/EBernhardson/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_406"><span class="phui-tag-core phui-tag-color-person">@EBernhardson</span></a>, <a href="https://phabricator.wikimedia.org/p/Jdforrester-WMF/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_407"><span class="phui-tag-core phui-tag-color-person">@Jdforrester-WMF</span></a>, <a href="https://phabricator.wikimedia.org/p/jeena/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_408"><span class="phui-tag-core phui-tag-color-person">@jeena</span></a>, <a href="https://phabricator.wikimedia.org/p/MarcoAurelio/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_409"><span class="phui-tag-core phui-tag-color-person">@MarcoAurelio</span></a>, <a href="https://phabricator.wikimedia.org/p/SBisson/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_410"><span class="phui-tag-core phui-tag-color-person">@SBisson</span></a>, <a href="https://phabricator.wikimedia.org/p/Tchanders/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_411"><span class="phui-tag-core phui-tag-color-person">@Tchanders</span></a>, <a href="https://phabricator.wikimedia.org/p/Tgr/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_412"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-orange"></span>@Tgr</span></a>, <a href="https://phabricator.wikimedia.org/p/tstarling/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_413"><span class="phui-tag-core phui-tag-color-person">@tstarling</span></a>, <a href="https://phabricator.wikimedia.org/p/Urbanecm/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_414"><span class="phui-tag-core phui-tag-color-person">@Urbanecm</span></a>; and everyone else who helped by finding, investigating, or resolving error reports in Wikimedia production. Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head"><em>Quote</em>: <a href="https://en.wikiquote.org/wiki/Fausto_Cercignani#Quotes" class="remarkup-link remarkup-link-ext" rel="noreferrer">🎙</a> “Unlike money, hope is for all: for the rich as well as for the poor.”</div>
<div class="remarkup-reply-body"></div>
</blockquote>

<p>Footnotes:</p>

<p>[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201907&amp;namespace=0&amp;hideredirects=1&amp;stripprefix=1" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident…</a></p>

<p>[2] Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/9m.NNvBAvuHF/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[3] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/6sekLiUhHCmq/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[4] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p></div></content></entry><entry><title>Production Excellence #12: June 2019</title><link href="/phame/live/1/post/163/production_excellence_12_june_2019/" /><id>https://phabricator.wikimedia.org/phame/post/view/163/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2019-07-31T18:44:42+00:00</published><updated>2020-04-03T16:29:38+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📊 Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">11 documented incidents. ⚠️ [1]</li>
<li class="remarkup-list-item">39 new Wikimedia-prod-error reports. [2]</li>
<li class="remarkup-list-item">25 Wikimedia-prod-error reports closed. [3]</li>
</ul>

<p>The number of incidents in June was high compared to previous years. At 11 incidents, this is higher than this year’s median (5), the 2018 median (4), and the 2017 median (5). It is also higher than any month of June in the last 4 years. – More data at <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">CodePen</a>.</p>

<p>To read more about these incidents, their investigations, and pending actionables; check <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation#2019" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident documentation § 2019</a>.</p>

<p>There are currently 204 open Wikimedia-prod-error reports (up from 186 in April, and 201 in May). [4]</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖  [Op-ed] Integrated maintenance cost</h5>

<p>Hereby a shoutout to the Wikidata and Core Platform teams, at WMDE and WMF respectively. They both recently established a rotating subteam that focuses on incidental work. Such as maintenance, and other work that might otherwise hinder feature development.</p>

<p>I expect this to improve efficiency by avoiding context switches between feature and incidental work. The rotational aspect should distribute the work more evenly among team members (avoiding burnout). And, it may increase exposure to other teams, and lesser-known areas of our code; which provide opportunities for personal growth and to retain institutional knowledge.</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📉  Current problems</h5>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists known issues, grouped by the month in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error</a></p>

<p>Or help someone who already started with their patch:<br />
→  <a href="https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R" class="remarkup-link" rel="noreferrer">Open prod-error tasks with a </a><strong><a href="https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R" class="remarkup-link" rel="noreferrer">Patch-For-Review</a></strong></p>

<p>Breakdown of recent months (past two weeks not included):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">November: 1 issue got fixed! (1 issue left).</li>
<li class="remarkup-list-item">December: 3 issues left <em>(unchanged)</em>. ⚠️</li>
<li class="remarkup-list-item">January: 1 issue left <em>(unchanged)</em>. ⚠️</li>
<li class="remarkup-list-item">February: 2 issues left <em>(unchanged)</em>. ⚠️</li>
<li class="remarkup-list-item">March: 4 issues left <em>(unchanged)</em>. ⚠️</li>
<li class="remarkup-list-item">April: 2 issues got fixed! (10 of 14 issues, that survived April, remain open). ❇️</li>
<li class="remarkup-list-item">May: 4 issues got fixed! (6 of 10 issues, that survived May, are left). ❇️</li>
<li class="remarkup-list-item">June: 11 new issues from last month remain unresolved.</li>
</ul>

<p>By steward and software component, the unresolved issues that survived June:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">CPT / MW Auth (PHP fatal): <a href="https://phabricator.wikimedia.org/T228717" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_415"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T228717</span></span></a></li>
<li class="remarkup-list-item">CPT / MW Actor (DB contention): <a href="https://phabricator.wikimedia.org/T227739" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_416"><span class="phui-tag-core phui-tag-color-object">T227739</span></a></li>
<li class="remarkup-list-item">CPT or Multimedia / Thumb handler (MultiCurl error): <a href="https://phabricator.wikimedia.org/T225197" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_417"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T225197</span></span></a></li>
<li class="remarkup-list-item">Multimedia / File metadata (PHP error): <a href="https://phabricator.wikimedia.org/T226751" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_418"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T226751</span></span></a></li>
<li class="remarkup-list-item">Wikidata / Commons page view (PHP fatal): <a href="https://phabricator.wikimedia.org/T227360" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_419"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T227360</span></span></a></li>
<li class="remarkup-list-item">Wikidata / Jobrunner (PHP memory fatal): <a href="https://phabricator.wikimedia.org/T227450" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_420"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T227450</span></span></a></li>
<li class="remarkup-list-item">Wikidata / Jobrunner (Trx error): <a href="https://phabricator.wikimedia.org/T225098" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_421"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T225098</span></span></a></li>
<li class="remarkup-list-item">Product-Infra / ReadingList API (PHP fatal): <a href="https://phabricator.wikimedia.org/T226593" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_422"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T226593</span></span></a></li>
<li class="remarkup-list-item">(Unknown?) / Special:ConfirmEmail (PHP fatal): <a href="https://phabricator.wikimedia.org/T226337" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_423"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T226337</span></span></a></li>
<li class="remarkup-list-item">(Unknown?) / Page renaming (DB timeout): <a href="https://phabricator.wikimedia.org/T226898" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_424"><span class="phui-tag-core phui-tag-color-object">T226898</span></a></li>
<li class="remarkup-list-item">(Unknown?) / Page renaming (Bad revision fatal): <a href="https://phabricator.wikimedia.org/T225366" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_425"><span class="phui-tag-core phui-tag-color-object">T225366</span></a></li>
</ul>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡<strong>Ideas</strong>: To suggest something to investigate or highlight in a future edition, contact me by e-mail or private IRC message.</div>
<div class="remarkup-reply-body"></div>
</blockquote>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉 Thanks!</h5>

<p>Thank you to everyone who has helped by reporting, investigating, or resolving problems in Wikimedia production. Including: <a href="https://phabricator.wikimedia.org/p/Anomie/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_426"><span class="phui-tag-core phui-tag-color-person">@Anomie</span></a>, <span class="phabricator-remarkup-mention-unknown">@brion</span>, <a href="https://phabricator.wikimedia.org/p/Catrope/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_427"><span class="phui-tag-core phui-tag-color-person">@Catrope</span></a>, <a href="https://phabricator.wikimedia.org/p/cscott/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_428"><span class="phui-tag-core phui-tag-color-person">@cscott</span></a>, <a href="https://phabricator.wikimedia.org/p/daniel/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_429"><span class="phui-tag-core phui-tag-color-person">@daniel</span></a>, <a href="https://phabricator.wikimedia.org/p/dcausse/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_430"><span class="phui-tag-core phui-tag-color-person">@dcausse</span></a>, <a href="https://phabricator.wikimedia.org/p/DerFussi/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_431"><span class="phui-tag-core phui-tag-color-person">@DerFussi</span></a>, <a href="https://phabricator.wikimedia.org/p/Ebe123/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_432"><span class="phui-tag-core phui-tag-color-person">@Ebe123</span></a>, <a href="https://phabricator.wikimedia.org/p/fgiunchedi/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_433"><span class="phui-tag-core phui-tag-color-person">@fgiunchedi</span></a>, <a href="https://phabricator.wikimedia.org/p/Jdforrester-WMF/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_434"><span class="phui-tag-core phui-tag-color-person">@Jdforrester-WMF</span></a>, <a href="https://phabricator.wikimedia.org/p/kostajh/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_435"><span class="phui-tag-core phui-tag-color-person">@kostajh</span></a>, <a href="https://phabricator.wikimedia.org/p/Legoktm/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_436"><span class="phui-tag-core phui-tag-color-person">@Legoktm</span></a>, <a href="https://phabricator.wikimedia.org/p/Lucas_Werkmeister_WMDE/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_437"><span class="phui-tag-core phui-tag-color-person">@Lucas_Werkmeister_WMDE</span></a>, <a href="https://phabricator.wikimedia.org/p/matmarex/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_438"><span class="phui-tag-core phui-tag-color-person">@matmarex</span></a>, <a href="https://phabricator.wikimedia.org/p/matthiasmullie/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_439"><span class="phui-tag-core phui-tag-color-person">@matthiasmullie</span></a>, <a href="https://phabricator.wikimedia.org/p/Michael/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_440"><span class="phui-tag-core phui-tag-color-person">@Michael</span></a>, <a href="https://phabricator.wikimedia.org/p/Nikerabbit/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_441"><span class="phui-tag-core phui-tag-color-person">@Nikerabbit</span></a>, <a href="https://phabricator.wikimedia.org/p/SBisson/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_442"><span class="phui-tag-core phui-tag-color-person">@SBisson</span></a>, <a href="https://phabricator.wikimedia.org/p/Smalyshev/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_443"><span class="phui-tag-core phui-tag-color-person">@Smalyshev</span></a>, <a href="https://phabricator.wikimedia.org/p/Tchanders/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_444"><span class="phui-tag-core phui-tag-color-person">@Tchanders</span></a>, <a href="https://phabricator.wikimedia.org/p/Tgr/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_445"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-orange"></span>@Tgr</span></a>, <a href="https://phabricator.wikimedia.org/p/Tpt/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_446"><span class="phui-tag-core phui-tag-color-person">@Tpt</span></a>, <a href="https://phabricator.wikimedia.org/p/Umherirrender/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_447"><span class="phui-tag-core phui-tag-color-person">@Umherirrender</span></a>, and <a href="https://phabricator.wikimedia.org/p/Urbanecm/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_448"><span class="phui-tag-core phui-tag-color-person">@Urbanecm</span></a>.</p>

<p>Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head"><a href="https://en.wikiquote.org/wiki/Hook_(film)#Dialogue" class="remarkup-link remarkup-link-ext" rel="noreferrer">🔮</a>“<em>These are his marbles...</em>” “<em>Ha! He really did lose his marbles, didn&#039;t he?</em>” “<em>Yeah, he lost them good.</em>”</div>
<div class="remarkup-reply-body"></div>
</blockquote>



<hr class="remarkup-hr" />

<p>Footnotes:</p>

<ol class="remarkup-list">
<li class="remarkup-list-item">Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201906&amp;namespace=0&amp;hideredirects=1&amp;stripprefix=1" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:PrefixIndex…</a></li>
<li class="remarkup-list-item">Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/barA9u0RtzE3/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></li>
<li class="remarkup-list-item">Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/zXZdtEhocO1a/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></li>
<li class="remarkup-list-item">Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></li>
</ol></div></content></entry><entry><title>Production Excellence #11: May 2019</title><link href="/phame/live/1/post/162/production_excellence_11_may_2019/" /><id>https://phabricator.wikimedia.org/phame/post/view/162/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2019-07-01T18:56:32+00:00</published><updated>2020-10-04T22:05:06+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📊 Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">6 documented incidents. [1]</li>
<li class="remarkup-list-item">41 new Wikimedia-prod-error tasks created. [2]</li>
<li class="remarkup-list-item">36 Wikimedia-prod-error tasks closed. [3]</li>
</ul>

<p>The number of incidents in May of this year was comparable to previous years (6 in May 2019, 2 in May 2018, 5 in May 2017), and previous months (6 in May, 8 in April, 8 in March) – comparisons at <a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">CodePen</a>.</p>

<p>To read more about these incidents, their investigations, and pending actionables; check <a href="https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201905&amp;namespace=0&amp;hideredirects=1&amp;stripprefix=1" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Incident_documentation#2019</a>.</p>

<p>As of writing, there are 201 open Wikimedia-prod-error tasks (up from 186 last month). [4]</p>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📉  Current problems</h5>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists known issues, grouped by the month in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error</a></p>

<p>Or help someone that’s already started with their patch:<br />
→  <a href="https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R" class="remarkup-link" rel="noreferrer">Open prod-error tasks with a </a><strong><a href="https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R" class="remarkup-link" rel="noreferrer">Patch-For-Review</a></strong></p>

<p>Breakdown of recent months (past two weeks not included):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">November: 2 issues left (unchanged).</li>
<li class="remarkup-list-item">December: 1 issue got fixed. 3 issues left (down from 4).</li>
<li class="remarkup-list-item">January: 1 issue left (unchanged).</li>
<li class="remarkup-list-item">February: 2 issues left (unchanged).</li>
<li class="remarkup-list-item">March: 1 issue got fixed. 4 issues remaining (down from 5).</li>
<li class="remarkup-list-item">April: 2 issues got fixed. 12 issues remain unresolved (down from 14).</li>
<li class="remarkup-list-item">May: 10 new issues found last month survived the month of May, and remain unresolved.</li>
</ul>

<p>By steward and software component, unresolved issues from April and May:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Wikidata / Lexeme (API query fatal): <a href="https://phabricator.wikimedia.org/T223995" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_449"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T223995</span></span></a></li>
<li class="remarkup-list-item">Wikidata / WikibaseRepo (API Fatal hasSlot): <a href="https://phabricator.wikimedia.org/T225104" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_450"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T225104</span></span></a></li>
<li class="remarkup-list-item">Wikidata / WikibaseRepo (Diff link fatal): <a href="https://phabricator.wikimedia.org/T224270" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_451"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T224270</span></span></a></li>
<li class="remarkup-list-item">Wikidata / WikibaseRepo (Edit undo fatal): <a href="https://phabricator.wikimedia.org/T224030" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_452"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T224030</span></span></a></li>
<li class="remarkup-list-item">Growth / Echo (Notification storage): <a href="https://phabricator.wikimedia.org/T217079" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_453"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T217079</span></span></a></li>
<li class="remarkup-list-item">Growth / Flow (Topic link fatal): <a href="https://phabricator.wikimedia.org/T224098" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_454"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T224098</span></span></a></li>
<li class="remarkup-list-item">Growth / Page deletion (File pages): <a href="https://phabricator.wikimedia.org/T222691" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_455"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T222691</span></span></a></li>
<li class="remarkup-list-item">Multimedia or CPT / API (Image info fatal): <a href="https://phabricator.wikimedia.org/T221812" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_456"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T221812</span></span></a></li>
<li class="remarkup-list-item">CPT / PHP7 refactoring (File descriptions): <a href="https://phabricator.wikimedia.org/T223728" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_457"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T223728</span></span></a></li>
<li class="remarkup-list-item">CPT / Title refactor (Block log fatal): <a href="https://phabricator.wikimedia.org/T224811" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_458"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T224811</span></span></a></li>
<li class="remarkup-list-item">CPT / Title refactor (Pageview fatals): <a href="https://phabricator.wikimedia.org/T224814" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_459"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T224814</span></span></a></li>
<li class="remarkup-list-item">(Unstewarded) Page renaming: <a href="https://phabricator.wikimedia.org/T223175" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_460"><span class="phui-tag-core phui-tag-color-object">T223175</span></a>, <a href="https://phabricator.wikimedia.org/T205675" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_461"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T205675</span></span></a></li>
</ul>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡<strong>Ideas</strong>: To suggest an investigation to write about in a future edition, contact me by e-mail, or private message on IRC.</div>
<div class="remarkup-reply-body"></div>
</blockquote>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉 Thanks!</h5>

<p>Thank you to everyone who has helped by reporting, investigating, or resolving problems in Wikimedia production.</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">🎙“<em>It’s not too shabby is it?</em>”</div>
<div class="remarkup-reply-body"></div>
</blockquote>



<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p>[1] Incidents. –<br />
<a href="https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201905&amp;namespace=0&amp;hideredirects=1&amp;stripprefix=1" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:PrefixIndex…</a></p>

<p>[2] Tasks created. –<br />
<a href="https://phabricator.wikimedia.org/maniphest/query/fHFqxAZwk1fW/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[3] Tasks closed. –<br />
<a href="https://phabricator.wikimedia.org/maniphest/query/y2e1BxPmGlub/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[4] Open tasks. –<br />
<a href="https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p></div></content></entry><entry><title>Introducing the codehealth pipeline beta</title><link href="/phame/live/1/post/160/introducing_the_codehealth_pipeline_beta/" /><id>https://phabricator.wikimedia.org/phame/post/view/160/</id><author><name>kostajh (Kosta Harlan)</name></author><published>2019-05-14T20:29:35+00:00</published><updated>2019-06-12T02:54:51+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>After many months of discussion, work and consultation across teams and departments[0], and with much gratitude and appreciation to the hard work and patience of <a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_474"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a> and <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_475"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a>, the <a href="/tag/code-health-metrics/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_465"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_464" aria-hidden="true"></span>Code-Health-Metrics</span></a> group is pleased to announce the introduction of the code health pipeline. The pipeline is currently in beta and enabled for <a href="/tag/growthexperiments/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_467"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_466" aria-hidden="true"></span>GrowthExperiments</span></a>, soon to be followed by <a href="/tag/notifications_echo/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_469"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_468" aria-hidden="true"></span>Notifications (Echo)</span></a>, <a href="/tag/pagetriage/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_471"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_470" aria-hidden="true"></span>PageTriage</span></a>, and <a href="/tag/structureddiscussions/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_473"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_472" aria-hidden="true"></span>StructuredDiscussions</span></a>. (If you&#039;d like to enable the pipeline for an extension you maintain or contribute to, please reach out to us via the comments on this post.)</p>

<h3 class="remarkup-header">What are we trying to do?</h3>

<p>The <a href="/tag/code-health-metrics/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_477"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_476" aria-hidden="true"></span>Code-Health-Metrics</span></a> group has been working to define a set of common code health metrics. Our current understanding of <a href="https://www.mediawiki.org/wiki/Code_Health" class="remarkup-link remarkup-link-ext" rel="noreferrer">code health</a> factors are: simplicity, readability, testability, buildability. Beyond analyzing a given patch set for these factors, we also want to have a historical view of code as it evolves over time. We want to be able to see which areas of code lack test coverage, where refactoring a class due to excessive complexity might be called for, and where possible bugs exist.</p>

<p>After talking through some options, we settled on a proof-of-concept to integrate Wikimedia&#039;s gerrit patch sets with <a href="https://sonarqube.org/" class="remarkup-link remarkup-link-ext" rel="noreferrer">SonarQube</a> as the hub for analyzing and displaying metrics on our code[1]. SonarQube is a Java project that analyzes code according to a set of a <a href="https://rules.sonarsource.com" class="remarkup-link remarkup-link-ext" rel="noreferrer">rules</a>. SonarQube has a concept of a &quot;Quality Gate&quot;, which can be defined organization wide or overridden on a per-project basis. The <a href="https://sonarcloud.io/organizations/wmftest/quality_gates/show/9" class="remarkup-link remarkup-link-ext" rel="noreferrer">default Quality Gate</a> says that of code added in a patch set, over 80% of it must be covered by tests, less than 3% of it may contain duplicated lines of code, and the maintainability, reliability and security ratings should be graded as an A. If code passes these criteria then we say it has passed the quality gate, otherwise it has failed.</p>

<p>Here&#039;s <a href="https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/509928" class="remarkup-link remarkup-link-ext" rel="noreferrer">an example of a patch</a> that <a href="https://sonarcloud.io/dashboard?branch=509928&amp;id=mediawiki-extensions-GrowthExperiments" class="remarkup-link remarkup-link-ext" rel="noreferrer">failed the quality gate</a>:</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/egypjll4pwun2javpmzm/PHID-FILE-2prsp7yigxna7hfdcoj2/image.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_462"><img src="https://phab.wmfusercontent.org/file/data/egypjll4pwun2javpmzm/PHID-FILE-2prsp7yigxna7hfdcoj2/image.png" height="400" alt="screenshot of sonarqube quality gate" /></a></div></p>

<p>If you click through to the report, you can see that it failed because the patch introduced an <a href="https://sonarcloud.io/project/issues?branch=509928&amp;id=mediawiki-extensions-GrowthExperiments&amp;resolved=false&amp;types=CODE_SMELL" class="remarkup-link remarkup-link-ext" rel="noreferrer">unused local variable (code smell)</a>, so the maintainability score for that patch was graded as a C.</p>

<h3 class="remarkup-header">How does it integrate with gerrit?</h3>

<p>For projects that have been opted in to the code health pipeline, submitting a new patch or commenting with &quot;check codehealth&quot; will result in the following actions:</p>

<ol class="remarkup-list">
<li class="remarkup-list-item">The <tt class="remarkup-monospaced">mwext-codehealth-patch</tt> job checks out the patchset and installs MediaWiki</li>
<li class="remarkup-list-item">PHPUnit is run and a code coverage report is generated</li>
<li class="remarkup-list-item"><tt class="remarkup-monospaced">npm test:unit</tt> is run which may generate a code coverage report if the <tt class="remarkup-monospaced">package.json</tt> file is configured to do so</li>
<li class="remarkup-list-item"><tt class="remarkup-monospaced">sonar-scanner</tt> binary runs which sends 1) the code, 2) PHP code coverage, and 3) the JavaScript code coverage to Sonar</li>
<li class="remarkup-list-item">After Sonar is done analyzing the code and coverage reports, the pipeline reports if the quality gate passed or failed. The outcome does not prevent merge in case of failure.</li>
</ol>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/gtfdoozb3but7chqw7ja/PHID-FILE-gonxhxbwmudnsm5n5jcf/image.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_463"><img src="https://phab.wmfusercontent.org/file/data/gtfdoozb3but7chqw7ja/PHID-FILE-gonxhxbwmudnsm5n5jcf/image.png" height="110" alt="pipeline screenshot" /></a></div></p>

<p>If you click the link, you&#039;ll be able to view the analysis in SonarQube. From there you can also view the code of a project and see which lines are covered by tests, which lines have issues, etc.</p>

<p>Also, when a patch merges, the <tt class="remarkup-monospaced">mwext-codehealth-master-non-voting</tt> job executes which will update the default view of a project in SonarQube with the latest code coverage and code metrics.[3]</p>

<h3 class="remarkup-header">What&#039;s next?</h3>

<p>We would like to enable the code health pipeline for more projects, and eventually we would like to use it for core. One challenge with core is that it currently takes ~2 hours to generate the PHPUnit coverage report. We also want to gather feedback from the developer community on false positives and unhelpful rules. We have tried to start with a minimal set of rules that we think everyone could agree with but are happy to adjust based on developer feedback[2]. Our current list of rules can be seen in this <a href="https://sonarcloud.io/organizations/wmftest/quality_profiles/show?language=php&amp;name=MediaWiki" class="remarkup-link remarkup-link-ext" rel="noreferrer">quality profile</a>.</p>

<p>If you&#039;ll be at the Hackathon, we will be presenting on the code health pipeline and SonarQube at the <a href="https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2019/Program#Friday,_17_May:_Hackathon_starts" class="remarkup-link remarkup-link-ext" rel="noreferrer">Code health and quality metrics in Wikimedia continuous integration</a> session on Friday at 3 PM. We look forward to your feedback!</p>

<p>Kosta, for the <a href="/tag/code-health-metrics/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_479"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_478" aria-hidden="true"></span>Code-Health-Metrics</span></a> group</p>

<hr class="remarkup-hr" />

<p>[0] More about the Code Health Metrics group: <a href="https://www.mediawiki.org/wiki/Code_Health_Group/projects/Code_Health_Metrics" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.mediawiki.org/wiki/Code_Health_Group/projects/Code_Health_Metrics</a>, currently comprised of Guillaume Lederrey (R), Jean-Rene Branaa (A), Kosta Harlan (R), Kunal Mehta (C), Piotr Miazga (C), Željko Filipin (R). Thank you also to <a href="https://phabricator.wikimedia.org/p/daniel/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_480"><span class="phui-tag-core phui-tag-color-person">@daniel</span></a> for feedback and review of rules in SonarQube.<br />
[1] While SonarQube is an open source project, we currently use the hosted version at <a href="https://sonarcloud.io" class="remarkup-link remarkup-link-ext" rel="noreferrer">sonarcloud.io</a>. We plan to eventually migrate to our own self-hosted SonarQube instance, so we have full ownership of tools and data.<br />
[2] You can add a topic here <a href="https://www.mediawiki.org/wiki/Talk:Code_Health_Group/projects/Code_Health_Metrics" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.mediawiki.org/wiki/Talk:Code_Health_Group/projects/Code_Health_Metrics</a><br />
[3] You might have also noticed a post-merge job over the last few months, <tt class="remarkup-monospaced">wmf-sonar-scanner-change</tt>. This job did not incorporate code coverage, but it did analyze most of our extensions and MediaWiki core, and as a result there is a set of <a href="https://sonarcloud.io/organizations/wmftest/projects?sort=-analysis_date" class="remarkup-link remarkup-link-ext" rel="noreferrer">project data</a> and <a href="https://sonarcloud.io/organizations/wmftest/issues?resolved=false" class="remarkup-link remarkup-link-ext" rel="noreferrer">issues</a> that might be of interest to you. The Issues view in SonarQube might be interesting, for example, as a starting point for new developers who want to contribute to a project and want to make some small fixes.</p></div></content></entry><entry><title>Quibble hibernated, it is time to flourish</title><link href="/phame/live/1/post/155/quibble_hibernated_it_is_time_to_flourish/" /><id>https://phabricator.wikimedia.org/phame/post/view/155/</id><author><name>hashar (Antoine Musso)</name></author><published>2019-03-28T11:48:29+00:00</published><updated>2019-03-29T11:01:09+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Writing blog is neither my job nor something that I enjoy, I am thus late in the Quibble updates. The last one <a href="/J118" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_481"><span class="phui-tag-core phui-tag-color-object">Blog Post: Quibble in summer</span></a> has been written in September 2018 and I forgot to publish it until now. You might want to read it first to get a glance about some nice changes that got implemented last summer.</p>

<p>I guess personal changes that happened in October and the traditional norther hemisphere winter hibernation kind of explain the delay (see note [ 1 ]). Now that spring is finally there (<tt class="remarkup-monospaced">{{NPOV}}</tt>), it is time for another update.</p>

<p>Quibble went from 0.0.26 to 0.0.30 which I have cut just before starting this post. I wanted to highlight a few changes from an overall small change log:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Use stronger password in Quibble related browser tests - <a href="https://phabricator.wikimedia.org/T204569" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_482"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T204569</span></span></a></li>
<li class="remarkup-list-item">Parallelize ext/skin linter</li>
<li class="remarkup-list-item">Parallelize mediawiki/core linter</li>
<li class="remarkup-list-item">PHPUnit generates Junit results - <a href="https://phabricator.wikimedia.org/T207841" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_483"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T207841</span></span></a></li>
<li class="remarkup-list-item">readme: how to reproduce a CI build - <a href="https://phabricator.wikimedia.org/T200991" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_484"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T200991</span></span></a></li>
<li class="remarkup-list-item">doc: quibble-stretch no more has php</li>
<li class="remarkup-list-item">mediawiki.d: Avoid vars that look like core or wmf names</li>
<li class="remarkup-list-item">Drop /p from Gerrit clone URL - <a href="https://phabricator.wikimedia.org/T218844" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_485"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T218844</span></span></a></li>
<li class="remarkup-list-item">Support to clone repositories in parallel - <a href="https://phabricator.wikimedia.org/T211701" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_486"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T211701</span></span></a></li>
<li class="remarkup-list-item">Properly abort when git submodule processing fails - <a href="https://phabricator.wikimedia.org/T198980" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_487"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T198980</span></span></a></li>
<li class="remarkup-list-item">mediawiki.d: Improve docs about dev settings and combine env sections</li>
<li class="remarkup-list-item">mediawiki.d: Merge into one file</li>
</ul>

<h2 class="remarkup-header">Parallelism [ 2 ]</h2>

<p>The first inception of Quibble did not have much thoughts put into it with regard to speed. The main goal at the time was simply to gather all the complicated logic from CI shell scripts, Jenkins jobs shell snippets, python or javascript scripts all in one single command. That in turn made it easier to reproduce a build but with a serious limitation: commands are just run serially which is far from being optimum.</p>

<p>Quibble would now run the lint commands in parallel for both extensions/skins and mediawiki/core. Internally, it forks run <tt class="remarkup-monospaced">composer test</tt> and <tt class="remarkup-monospaced">npm test</tt> in parallel, that slightly speed up the time to get linting commands to complete.</p>

<p>Another annoyance is when testing multiple repositories together, preparing the git repositories could takes several minutes. An example is for an extension depending on several other extensions or the gated <tt class="remarkup-monospaced">wmf-quibble-*</tt> jobs which run tests for several Wikimedia deployed extensions.  Even when using a local cache of git repositories (<tt class="remarkup-monospaced">--git-cache</tt>) the serially run git commands take a while. Quibble 0.0.30 learned <tt class="remarkup-monospaced">--git-parallel</tt> to run the git commands in parallel. An example speed up using git cache, several repositories and a DSL connection:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>git-parallel</th><th>Duration</th></tr>
<tr><td>16</td><td>30 seconds</td></tr>
<tr><td>1</td><td>50 seconds</td></tr>
<tr></tr>
</table></div>

<p>The option defaults to <tt class="remarkup-monospaced">1</tt> which retain the exact same behavior / code path as before. I invite you to try <tt class="remarkup-monospaced">--git-parallel=8</tt> for example and draw your own conclusion. Wikimedia CI will be updated once Quibble 0.0.30 is deployed.</p>

<p>Parallelism added by myself, <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_494"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a>, and got partly tracked in <a href="https://phabricator.wikimedia.org/T211701" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_488"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T211701</span></span></a>.</p>

<h2 class="remarkup-header">Documentation</h2>

<p>Some part of the documentation referred to a Wikimedia CI containers that were no more suitable for running tests due to refactoring. The documentation as thus been updated to use the proper containers: <tt class="remarkup-monospaced">docker-registry.wikimedia.org/releng/quibble-stretch-php72</tt> or <tt class="remarkup-monospaced">docker-registry.wikimedia.org/releng/quibble-stretch-hhvm</tt>. -- <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_495"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a></p>

<p>In August, Wikidata developers used Quibble to reproduce a test failure and they did the extra step to capture their session and document how to reproduce it. Thank you <a href="https://phabricator.wikimedia.org/p/Pablo-WMDE/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_496"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@Pablo-WMDE</span></a> for leading this and <a href="https://phabricator.wikimedia.org/p/Tarrow/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_497"><span class="phui-tag-core phui-tag-color-person">@Tarrow</span></a>, <a href="https://phabricator.wikimedia.org/p/Addshore/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_498"><span class="phui-tag-core phui-tag-color-person">@Addshore</span></a>, <a href="https://phabricator.wikimedia.org/p/Michael/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_499"><span class="phui-tag-core phui-tag-color-person">@Michael</span></a>, <a href="https://phabricator.wikimedia.org/p/Ladsgroup/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_500"><span class="phui-tag-core phui-tag-color-person">@Ladsgroup</span></a> for the reviews - <a href="https://phabricator.wikimedia.org/T200991" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_489"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T200991</span></span></a>.</p>

<p>You can read the documentation online at:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://doc.wikimedia.org/quibble/#reproducing-a-ci-build" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://doc.wikimedia.org/quibble/#reproducing-a-ci-build</a></li>
</ul>

<p><em>Note: as of this writing, the CI git servers are NOT publicly reachable (<tt class="remarkup-monospaced">git://contint1001.wikimedia.org</tt> and <tt class="remarkup-monospaced">git://contint2001.wikimedia.org</tt>).</em></p>

<h2 class="remarkup-header">Submodule failures</h2>

<p>Some extensions or skins might have submodules, however we never caught errors when they failed to process and kept continuing. That later causes tests to fail in non obvious way and caused several people to loose time recently. <a href="https://phabricator.wikimedia.org/T198980" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_490"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T198980</span></span></a></p>

<p>The reason is Quibble simply borrowed a legacy shell script to handle submodules and that script has been broken since its first introduction in 2014. It relied on the <tt class="remarkup-monospaced">find</tt> command which still exit 0 even with <tt class="remarkup-monospaced">-exec /bin/false</tt>. The reason is that although <tt class="remarkup-monospaced">/bin/false</tt> exit code is 1, that simply causes <tt class="remarkup-monospaced">find</tt> to consider the <tt class="remarkup-monospaced">-exec</tt> predicate to be false, <tt class="remarkup-monospaced">find</tt> thus abort processing further predicates but that is not an error.</p>

<p>The logic has been ported to pure python and now properly abort when <tt class="remarkup-monospaced">git submodule</tt> fails. That also drop the requirement to have the <tt class="remarkup-monospaced">find</tt> command available which might help on Windows. -- <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_501"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a></p>

<h2 class="remarkup-header">Miscellaneous  tweaks</h2>

<p>The configuration injected by Quibble in LocalSettings.php is now a single file when it previously was made of several small PHP files glued together by shelling out to php. The inline comments have been improved. -- <a href="https://phabricator.wikimedia.org/p/Krinkle/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_502"><span class="phui-tag-core phui-tag-color-person">@Krinkle</span></a></p>

<p>MediaWiki installer uses a slightly stronger password (<tt class="remarkup-monospaced">testwikijenkinspass</tt>) to accommodate for a security hardening in MediaWiki core itself. -- <a href="https://phabricator.wikimedia.org/p/Reedy/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_503"><span class="phui-tag-core phui-tag-color-person">@Reedy</span></a> <a href="https://phabricator.wikimedia.org/T204569" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_491"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T204569</span></span></a></p>

<p>The Gerrit URL to clone the canonical git repository from has been updated to catch up with a change in Gerrit. Updated <tt class="remarkup-monospaced">r/p</tt> to simply <tt class="remarkup-monospaced">/r</tt>. -- <a href="https://phabricator.wikimedia.org/p/Legoktm/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_504"><span class="phui-tag-core phui-tag-color-person">@Legoktm</span></a> <a href="https://phabricator.wikimedia.org/T218844" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_492"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T218844</span></span></a></p>

<p>PHPUnit generates JUnit test results in the log directory, intended to be captured and interpreted by CI. -- <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_505"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a> <a href="https://phabricator.wikimedia.org/T207841" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_493"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T207841</span></span></a></p>

<div class="remarkup-note"><span class="remarkup-note-word">NOTE:</span> those changes have not all been deployed to Wikimedia CI as of March 28th 2019 but should be next week.</div>



<hr class="remarkup-hr" />

<p><strong>footnotes</strong></p>

<p>[ 1 ] Seasons are location based and a cultural agreement, they are quite interesting in their own. They are reversed in the Norther and Southern hemisphere, do not exist at the equator while in India they define six seasons.  Thus when I refer to a winter hibernation, it really just reflect my own biased point of view.</p>

<p>[ 2 ] Parallelism is fun, I can never manage to write that word without mixing up the number of <tt class="remarkup-monospaced">r</tt> or <tt class="remarkup-monospaced">l</tt> for some reason. As a sideway note, my favorite sport to watch is <a href="https://en.wikipedia.org/wiki/Parallel_bars" class="remarkup-link remarkup-link-ext" rel="noreferrer">parallel bars (enwiki)</a>.</p></div></content></entry><entry><title>CI working group report, with recommendations of new tools to try</title><link href="/phame/live/1/post/153/ci_working_group_report_with_recommendations_of_new_tools_to_try/" /><id>https://phabricator.wikimedia.org/phame/post/view/153/</id><author><name>LarsWirzenius (Lars Wirzenius)</name></author><published>2019-03-25T18:29:49+00:00</published><updated>2019-03-29T18:49:57+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The working group to consider future CI tooling for Wikimedia has finished and produced a report. The report is at <a href="https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG/Report" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG/Report</a> and the short summary is that the release engineering team should do prototype implementations of Argo, GitLab CI/CD, and Zuul v3.</p></div></content></entry><entry><title>Help my CI job fails with exit status -11</title><link href="/phame/live/1/post/152/help_my_ci_job_fails_with_exit_status_-11/" /><id>https://phabricator.wikimedia.org/phame/post/view/152/</id><author><name>hashar (Antoine Musso)</name></author><published>2019-03-21T09:52:59+00:00</published><updated>2022-09-01T13:16:09+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>For a few weeks, a CI job had PHPUnit tests abruptly ending with:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">returned non-zero exit status -11</pre></div>

<p>The connoisseur [ <a href="https://en.wiktionary.org/wiki/connoisseur" class="remarkup-link remarkup-link-ext" rel="noreferrer">1</a> ] would have recognized that the negative exit status indicates the process exited due to a signal. On Linux, <tt class="remarkup-monospaced">11</tt> is the value for the SIGSEGV signal, which is usually sent by the kernel to the process as a result of an improper machine instruction. The default behavior is to terminate the process (<tt class="remarkup-monospaced">man 7 signal</tt>) and to generate a core dump file (I will come to that later).</p>

<p>But why? Some PHP code ended up triggering a code path in HHVM that would eventually try to read outside of its memory range, or some similar low level fault.  The kernel knows that the process completely misbehaved and thus, well, terminates it. Problem solved, you never want your program to misbehave when the kernel is in charge.</p>

<p>The job had recently been switched to use a new container in order to benefit from more recent lib and to match the OS distributions used by the Wikimedia production system. My immediate recommendation was to rollback to the previous known state, but eventually I have let the task to go on and have been absorbed by other tasks (such as updating MediaWiki on the infrastructure).</p>

<p>Last week, the job suddenly began to fail constantly. We prevent code from being merged when a test fails, and thus the code stays in a quarantine zone (Gerrit) and cannot be shipped. A whole team could not ship code (the <a href="/tag/language-team/" class="phui-tag-view phui-tag-type-shade phui-tag-disabled phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_508"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_507" aria-hidden="true"></span>Language-Team</span></a> ) for one of their flagship projects (<a href="/tag/contenttranslation/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_510"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_509" aria-hidden="true"></span>ContentTranslation</span></a> .) That in turn prevents end users from benefiting from new features they are eager for. The issue had to be acted on and became an <strong>unbreak now!</strong> kind of task. And I went to my journey.</p>

<p><tt class="remarkup-monospaced">returned non-zero exit status -11</tt>, that is a good enough error message. A process in a Docker container is really just an isolated process and is still managed by the host kernel.  First thing I did was to look at the <tt class="remarkup-monospaced">kernel</tt> syslog facility on our instances, which yields:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">kernel: [7943146.540511] php[14610]:
  segfault at 7f1b16ffad13 ip 00007f1b64787c5e sp 00007f1b53d19d30
     error 4 in libpthread-2.24.so[7f1b64780000+18000]</pre></div>

<p><tt class="remarkup-monospaced">php</tt> there is just HHVM invoked via a <tt class="remarkup-monospaced">php</tt> symbolic link. The message hints at libpthread which is where the fault is. But we need a stacktrace to better determine the problem, and ideally a reproduction case.</p>

<p>Thus, what I am really looking for is the core dump file I alluded to earlier. The file is generated by the kernel and contains an image of the process memory at the time of the failure. Given the full copy of the program instructions, the instructions it was running at that time, and all the memory segments, a debugger can reconstruct a human readable state of the failure. That is a backtrace, and is what we rely on to find faulty code and fix bugs.</p>

<p>The core file is not generated. Or the error message would state it had <tt class="remarkup-monospaced">coredumped</tt>, i.e. the kernel generated the core dump file.  Our default configuration is to not generate any core file, but usually one can adjust it from the shell with <tt class="remarkup-monospaced">ulimit -c XXX</tt> where XXX is the maximum size a core file can occupy (in kilobytes, in order to prevent filling the disk). Docker being just a fancy way to start a process, it has a setting to adjust the limit. The <tt class="remarkup-monospaced">docker run</tt> inline help states:</p>

<p><tt class="remarkup-monospaced">--ulimit ulimit        Ulimit options (default [])</tt></p>

<p>It is as far as useful as possible, eventually the option to set is: <tt class="remarkup-monospaced">--ulimit core=2147483648</tt> or up to 2 gigabytes.  I have updated the CI jobs and instructed them to capture a file named <tt class="remarkup-monospaced">core</tt>, the default file name. After a few runs, although I could confirm failures, no files got captured. Why not?</p>

<p>Our machines do not use <tt class="remarkup-monospaced">core</tt> as the default filename. It can be found in the kernel configuration:</p>

<p><tt class="remarkup-monospaced">name=/proc/sys/kernel/core_pattern</tt><br />
<tt class="remarkup-monospaced">/var/tmp/core/core.%h.%e.%p.%t</tt></p>

<p>I thus went on the hosts looking for such files. There were none.</p>

<p>Or maybe I mean <tt class="remarkup-monospaced">None</tt> or <tt class="remarkup-monospaced">NaN</tt>.</p>

<p>Nada, rien.</p>

<p>The void.</p>

<p>The result is obvious, try to reproduce it! I ran a Docker container doing a basic while loop, from the host I have sent the <tt class="remarkup-monospaced">SIGSEGV</tt> signal to the process. The host still had no core file. But surprise it was <strong>in the container</strong>.   Although the kernel is handling it from the host, it is not namespace-aware when it comes time to resolve the path. My quest will soon end, I have simply mounted a host directory to the containers at the expected place:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">mkdir /tmp/coredumps
docker run --volume /tmp/coredumps:/var/tmp/core ....</pre></div>

<p>After a few builds, I had harvested enough core files. The investigation is then very straightforward:</p>

<div class="remarkup-code-block" data-code-lang="console" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="gp">$ gdb /usr/bin/hhvm /coredump/core.606eb29eab46.php.2353.1552570410</span>
<span class="go">Core was generated by `php tests/phpunit/phpunit.php --debug-tests --testsuite extensions --exclude-gr&#039;.</span>
<span class="go">Program terminated with signal SIGSEGV, Segmentation fault.</span>
<span class="go">#0  0x00007f557214ac5e in __pthread_create_2_1 (newthread=newthread@entry=0x7f55614b9e18, attr=attr@entry=0x7f5552aa62f8, </span>
<span class="go">    start_routine=start_routine@entry=0x7f556f461c20 &lt;timer_sigev_thread&gt;, arg=&lt;optimized out&gt;) at pthread_create.c:813</span>
<span class="go">813	pthread_create.c: No such file or directory.</span>
<span class="go">[Current thread is 1 (Thread 0x7f55614be3c0 (LWP 2354))]</span>
<span class="go"></span>
<span class="go">(gdb) bt</span>
<span class="go">#0  0x00007f557214ac5e in __pthread_create_2_1 (newthread=newthread@entry=0x7f55614b9e18, attr=attr@entry=0x7f5552aa62f8, </span>
<span class="go">    start_routine=start_routine@entry=0x7f556f461c20 &lt;timer_sigev_thread&gt;, arg=&lt;optimized out&gt;) at pthread_create.c:813</span>
<span class="go">#1  0x00007f556f461bb2 in timer_helper_thread (arg=&lt;optimized out&gt;) at ../sysdeps/unix/sysv/linux/timer_routines.c:120</span>
<span class="go">#2  0x00007f557214a494 in start_thread (arg=0x7f55614be3c0) at pthread_create.c:456</span>
<span class="go">#3  0x00007f556aeebacf in __libc_ifunc_impl_list (name=&lt;optimized out&gt;, array=0x7f55614be3c0, max=&lt;optimized out&gt;)</span>
<span class="go">    at ../sysdeps/x86_64/multiarch/ifunc-impl-list.c:387</span>
<span class="go">#4  0x0000000000000000 in ?? ()</span></pre></div>

<p>Which <a href="https://phabricator.wikimedia.org/p/Anomie/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_511"><span class="phui-tag-core phui-tag-color-person">@Anomie</span></a> kindly pointed out is an issue solved in libc6. Once the container has been rebuilt to apply the package update, the fault disappears.</p>

<p>One can now expect new changes to appear to <a href="/tag/contenttranslation/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_513"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_512" aria-hidden="true"></span>ContentTranslation</span></a>.</p>

<hr class="remarkup-hr" />

<p>[ 1 ] &#039;&#039;connoisseur&#039;&#039;, from obsolete French, means &quot;to know&quot; <a href="https://en.wiktionary.org/wiki/connoisseur" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://en.wiktionary.org/wiki/connoisseur</a> . I guess the English language forgot to apply update on due time and can not make any such change for fear of breaking back compatibility or locution habits.</p>

<p>The task has all the technical details and log leading to solving the issue: <a href="/T216689" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_506"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T216689: Merge blocker: quibble-vendor-mysql-hhvm-docker in gate fails for most merges (exit status -11)</span></span></a></p>

<p>(Some light copyedits to above -- Brennen Bearnes)</p></div></content></entry><entry><title>Production Excellence #10: April 2019</title><link href="/phame/live/1/post/151/production_excellence_10_april_2019/" /><id>https://phabricator.wikimedia.org/phame/post/view/151/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2019-05-31T19:21:08+00:00</published><updated>2020-04-03T16:27:51+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Month in numbers.</li>
<li class="remarkup-list-item">Highlighted stories.</li>
<li class="remarkup-list-item">Current problems.</li>
</ul>

<h5 class="remarkup-header">📊 Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">8 documented incidents. [1]</li>
<li class="remarkup-list-item">30 new Wikimedia-prod-error tasks created. [2]</li>
<li class="remarkup-list-item">31 Wikimedia-prod-error tasks closed. [3]</li>
</ul>

<p>The number of incidents in April was relatively high at 8. Both compared to this year (4 in January, 7 in February, 8 in March), and compared to last year (4 in April 2018).</p>

<p>To read more about these incidents, their investigations, and conclusions; check <a href="https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201904&amp;namespace=0&amp;hideredirects=1&amp;stripprefix=1" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Incident_documentation#2019</a>.</p>

<p>As of writing, there are 186 open Wikimedia-prod-error issues (up from 177 last month). [4]</p>

<h5 class="remarkup-header">📖  Rehabilitation of MediaWiki-DateFormatter</h5>

<p>Following the report of a PHP error that happened when saving edits to certain pages, Tim Starling investigated. The <a href="https://phabricator.wikimedia.org/T220563#5099856" class="remarkup-link" rel="noreferrer">investigation</a> motivated a big commit that brings this class into the modern era. I think this change serves as a good overview of what’s changed in MediaWiki over the last 10 years, and demonstrates our current best practices.</p>

<p>Take a look at <a href="https://gerrit.wikimedia.org/r/502678" class="remarkup-link remarkup-link-ext" rel="noreferrer">Gerrit change 502678</a> / <a href="https://phabricator.wikimedia.org/T220563" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_514"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T220563</span></span></a>.</p>

<h5 class="remarkup-header">📉 Current problems</h5>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists known issues, grouped by the week in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error</a></p>

<p>Or help someone that’s already started with their patch:<br />
→  <a href="https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R" class="remarkup-link" rel="noreferrer">Open prod-error tasks with a </a><strong><a href="https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R" class="remarkup-link" rel="noreferrer">Patch-For-Review</a></strong></p>

<p>Breakdown of recent months (past two weeks not included):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">November: 2 issues left (unchanged).</li>
<li class="remarkup-list-item">December: 4 issues left (unchanged).</li>
<li class="remarkup-list-item">January: 1 issue got fixed. One last issue remaining (down from 2).</li>
<li class="remarkup-list-item">February: 2 issues were fixed. Another 3 issues remaining (down from 5).</li>
<li class="remarkup-list-item">March: 5 issues were fixed. Another 5 issues remaining (down from 10).</li>
<li class="remarkup-list-item">April: 14 new issues were found last month that remain unresolved.</li>
</ul>

<p>By steward and software component, issues left from March and April:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Anti-Harassment / User blocking: <a href="https://phabricator.wikimedia.org/T222170" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_515"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T222170</span></span></a></li>
<li class="remarkup-list-item">CPT / Revision-backend (Save redirect pages): <a href="https://phabricator.wikimedia.org/T220353" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_516"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T220353</span></span></a></li>
<li class="remarkup-list-item">CPT / Revision-backend (Import a page): <a href="https://phabricator.wikimedia.org/T219702" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_517"><span class="phui-tag-core phui-tag-color-object">T219702</span></a></li>
<li class="remarkup-list-item">CPT / Revision-backend (Export pages for dumps): <a href="https://phabricator.wikimedia.org/T220160" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_518"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T220160</span></span></a></li>
<li class="remarkup-list-item">Growth / Watchlist: <a href="https://phabricator.wikimedia.org/T220245" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_519"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T220245</span></span></a></li>
<li class="remarkup-list-item">Growth / Page deletion (Restore an archived page): <a href="https://phabricator.wikimedia.org/T219816" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_520"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T219816</span></span></a></li>
<li class="remarkup-list-item">Growth / Page deletion (File pages): <a href="https://phabricator.wikimedia.org/T222691" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_521"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T222691</span></span></a></li>
<li class="remarkup-list-item">Growth / Echo (Job execution): <a href="https://phabricator.wikimedia.org/T217079" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_522"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T217079</span></span></a></li>
<li class="remarkup-list-item">Multimedia / File management (Upload mime error): <a href="https://phabricator.wikimedia.org/T223728" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_523"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T223728</span></span></a></li>
<li class="remarkup-list-item">Performance / Deferred-Updates: <a href="https://phabricator.wikimedia.org/T221577" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_524"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T221577</span></span></a></li>
<li class="remarkup-list-item">Search Platform / CirrusSearch (Job execution): <a href="https://phabricator.wikimedia.org/T222921" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_525"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T222921</span></span></a></li>
<li class="remarkup-list-item">(Unstewarded) / Page renaming: <a href="https://phabricator.wikimedia.org/T223175" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_526"><span class="phui-tag-core phui-tag-color-object">T223175</span></a>, <a href="https://phabricator.wikimedia.org/T221763" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_527"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T221763</span></span></a>, <a href="https://phabricator.wikimedia.org/T221595" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_528"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T221595</span></span></a></li>
</ul>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉 Thanks!</h5>

<p>Thank you to everyone who has helped by reporting, investigating, or resolving problems in Wikimedia production. Including: <a href="https://phabricator.wikimedia.org/p/aaron/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_529"><span class="phui-tag-core phui-tag-color-person">@aaron</span></a>, <a href="https://phabricator.wikimedia.org/p/ArielGlenn/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_530"><span class="phui-tag-core phui-tag-color-person">@ArielGlenn</span></a>, <a href="https://phabricator.wikimedia.org/p/Daimona/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_531"><span class="phui-tag-core phui-tag-color-person">@Daimona</span></a>, <a href="https://phabricator.wikimedia.org/p/dcausse/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_532"><span class="phui-tag-core phui-tag-color-person">@dcausse</span></a>, <a href="https://phabricator.wikimedia.org/p/EBernhardson/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_533"><span class="phui-tag-core phui-tag-color-person">@EBernhardson</span></a>, <a href="https://phabricator.wikimedia.org/p/Jdforrester-WMF/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_534"><span class="phui-tag-core phui-tag-color-person">@Jdforrester-WMF</span></a>, <a href="https://phabricator.wikimedia.org/p/Joe/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_535"><span class="phui-tag-core phui-tag-color-person">@Joe</span></a>, <a href="https://phabricator.wikimedia.org/p/KartikMistry/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_536"><span class="phui-tag-core phui-tag-color-person">@KartikMistry</span></a>, <a href="https://phabricator.wikimedia.org/p/Ladsgroup/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_537"><span class="phui-tag-core phui-tag-color-person">@Ladsgroup</span></a>, <a href="https://phabricator.wikimedia.org/p/Lucas_Werkmeister_WMDE/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_538"><span class="phui-tag-core phui-tag-color-person">@Lucas_Werkmeister_WMDE</span></a>, <a href="https://phabricator.wikimedia.org/p/MaxSem/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_539"><span class="phui-tag-core phui-tag-color-person">@MaxSem</span></a>, <a href="https://phabricator.wikimedia.org/p/MusikAnimal/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_540"><span class="phui-tag-core phui-tag-color-person">@MusikAnimal</span></a>, <a href="https://phabricator.wikimedia.org/p/Mvolz/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_541"><span class="phui-tag-core phui-tag-color-person">@Mvolz</span></a>, <a href="https://phabricator.wikimedia.org/p/Niharika/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_542"><span class="phui-tag-core phui-tag-color-person">@Niharika</span></a>, <a href="https://phabricator.wikimedia.org/p/Nikerabbit/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_543"><span class="phui-tag-core phui-tag-color-person">@Nikerabbit</span></a>, <a href="https://phabricator.wikimedia.org/p/Pchelolo/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_544"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@Pchelolo</span></a>, <a href="https://phabricator.wikimedia.org/p/pmiazga/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_545"><span class="phui-tag-core phui-tag-color-person">@pmiazga</span></a>, <a href="https://phabricator.wikimedia.org/p/Reedy/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_546"><span class="phui-tag-core phui-tag-color-person">@Reedy</span></a>, <a href="https://phabricator.wikimedia.org/p/SBisson/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_547"><span class="phui-tag-core phui-tag-color-person">@SBisson</span></a>, <a href="https://phabricator.wikimedia.org/p/tstarling/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_548"><span class="phui-tag-core phui-tag-color-person">@tstarling</span></a>, and <a href="https://phabricator.wikimedia.org/p/Umherirrender/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_549"><span class="phui-tag-core phui-tag-color-person">@Umherirrender</span></a>.</p>

<p>Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">🏴‍☠️ “<em>One good deed is not enough to save a man.</em>” “<em>Though it seems enough to condemn him?</em>” “<em>Indeed…</em>”</div>
<div class="remarkup-reply-body"></div>
</blockquote>



<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p>[1] Incidents reports by month and year. –<br />
<a href="https://codepen.io/Krinkle/full/wbYMZK" class="remarkup-link remarkup-link-ext" rel="noreferrer">codepen.io/Krinkle/…</a></p>

<p>[2] Tasks created. –<br />
<a href="https://phabricator.wikimedia.org/maniphest/query/pJQdvhVYtHTi/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[3] Tasks closed. –<br />
<a href="https://phabricator.wikimedia.org/maniphest/query/uh7zL9HEnAmw/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[4] Open tasks. –<br />
<a href="https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p></div></content></entry><entry><title>Production Excellence #9: March 2019</title><link href="/phame/live/1/post/150/production_excellence_9_march_2019/" /><id>https://phabricator.wikimedia.org/phame/post/view/150/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2019-04-21T18:51:31+00:00</published><updated>2020-04-03T16:26:51+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📊 Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">8 documented incidents. [1]</li>
<li class="remarkup-list-item">31 new Wikimedia-prod-error issues reported. [2]</li>
<li class="remarkup-list-item">28 Wikimedia-prod-error issues closed. [3]</li>
</ul>

<p>The number of incidents this month was slightly above average compared to earlier this year (7 in February, 4 in January), and this time last year (4 in March 2018, 7 in February 2018).</p>

<p>To read more about these incidents, their investigations, and conclusions, check <a href="https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201903&amp;namespace=0&amp;hideredirects=1&amp;stripprefix=1" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Incident_documentation#2019-03</a>.</p>

<p>There are currently 177 open Wikimedia-prod-error issues, similar to last month. [4]</p>

<blockquote><p><em> 💡 <strong>Ideas</strong>:</em> To suggest an investigation to highlight in a future edition, feel free contact me by e-mail, or private message on IRC.</p></blockquote>



<h5 class="remarkup-header">📉  Current problems</h5>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists known issues, grouped by the week in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Or help someone that’s already started with their patch:<br />
→  <a href="https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R" class="remarkup-link" rel="noreferrer">Open prod-error tasks with a </a><strong><a href="https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R" class="remarkup-link" rel="noreferrer">Patch-For-Review</a></strong></p>

<p>Breakdown of recent months (past two weeks not included):</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">September: <span class="visual-only phui-icon-view phui-font-fa fa-check green" data-meta="0_562" aria-hidden="true"></span> Done! <em>The last two issues were resolved.</em></li>
<li class="remarkup-list-item">October: <span class="visual-only phui-icon-view phui-font-fa fa-check green" data-meta="0_563" aria-hidden="true"></span> Done! <em>The last issue was resolved.</em></li>
<li class="remarkup-list-item">November: 2 issues left (from 1.33-wmf.2). <em>1 issue was fixed.</em></li>
<li class="remarkup-list-item">December: 4 issues left (from 1.33-wmf.9). <em>1 issue was fixed.</em></li>
<li class="remarkup-list-item">January: 2 issues left (1.33-wmf.13 – 14). <em>1 issue was fixed.</em></li>
<li class="remarkup-list-item">February: 5 issues (1.33-wmf.16 – 19).</li>
<li class="remarkup-list-item">March: 10 new issues (1.33-wmf.20 – 23).</li>
</ul>

<p>By steward and software component, for issues remaining from February and March:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">CPT / JobQueue: <a href="https://phabricator.wikimedia.org/T218692" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_550"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T218692</span></span></a></li>
<li class="remarkup-list-item">CPT / MCR: <a href="https://phabricator.wikimedia.org/T217329" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_551"><span class="phui-tag-core phui-tag-color-object">T217329</span></a></li>
<li class="remarkup-list-item">CPT / Parser: <a href="https://phabricator.wikimedia.org/T216664" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_552"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T216664</span></span></a></li>
<li class="remarkup-list-item">CPT / Revision-backend: <a href="https://phabricator.wikimedia.org/T220353" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_553"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T220353</span></span></a>, <a href="https://phabricator.wikimedia.org/T220257" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_554"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T220257</span></span></a></li>
<li class="remarkup-list-item">Growth / Page deletion: <a href="https://phabricator.wikimedia.org/T219816" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_555"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T219816</span></span></a></li>
<li class="remarkup-list-item">Growth / Watchlist: <a href="https://phabricator.wikimedia.org/T220245" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_556"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T220245</span></span></a></li>
<li class="remarkup-list-item">Language / Translate: <a href="https://phabricator.wikimedia.org/T217380" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_557"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T217380</span></span></a>, <a href="https://phabricator.wikimedia.org/T219736" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_558"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T219736</span></span></a></li>
<li class="remarkup-list-item">Multimedia / WikibaseMediaInfo: <a href="https://phabricator.wikimedia.org/T217285" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_559"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T217285</span></span></a></li>
<li class="remarkup-list-item">Operations / PHP-7.2: <a href="https://phabricator.wikimedia.org/T221347" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_560"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T221347</span></span></a></li>
<li class="remarkup-list-item">Search Platform / Elastica: <a href="https://phabricator.wikimedia.org/T219234" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_561"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T219234</span></span></a></li>
</ul>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉 Thanks!</h5>

<p>Thanks to <a href="https://phabricator.wikimedia.org/p/aaron/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_564"><span class="phui-tag-core phui-tag-color-person">@aaron</span></a>, <a href="https://phabricator.wikimedia.org/p/Anomie/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_565"><span class="phui-tag-core phui-tag-color-person">@Anomie</span></a>, <a href="https://phabricator.wikimedia.org/p/Arlolra/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_566"><span class="phui-tag-core phui-tag-color-person">@Arlolra</span></a>, <a href="https://phabricator.wikimedia.org/p/Daimona/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_567"><span class="phui-tag-core phui-tag-color-person">@Daimona</span></a>, <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_568"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a>, <a href="https://phabricator.wikimedia.org/p/Jdforrester-WMF/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_569"><span class="phui-tag-core phui-tag-color-person">@Jdforrester-WMF</span></a>, <a href="https://phabricator.wikimedia.org/p/kostajh/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_570"><span class="phui-tag-core phui-tag-color-person">@kostajh</span></a>, <a href="https://phabricator.wikimedia.org/p/matmarex/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_571"><span class="phui-tag-core phui-tag-color-person">@matmarex</span></a>, <a href="https://phabricator.wikimedia.org/p/MaxSem/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_572"><span class="phui-tag-core phui-tag-color-person">@MaxSem</span></a>, <a href="https://phabricator.wikimedia.org/p/Niedzielski/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_573"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@Niedzielski</span></a>, <a href="https://phabricator.wikimedia.org/p/Nikerabbit/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_574"><span class="phui-tag-core phui-tag-color-person">@Nikerabbit</span></a>, <a href="https://phabricator.wikimedia.org/p/Petar.petkovic/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_575"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@Petar.petkovic</span></a>, <a href="https://phabricator.wikimedia.org/p/santhosh/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_576"><span class="phui-tag-core phui-tag-color-person">@santhosh</span></a>, <a href="https://phabricator.wikimedia.org/p/ssastry/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_577"><span class="phui-tag-core phui-tag-color-person">@ssastry</span></a>, <a href="https://phabricator.wikimedia.org/p/Umherirrender/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_578"><span class="phui-tag-core phui-tag-color-person">@Umherirrender</span></a>, <a href="https://phabricator.wikimedia.org/p/WMDE-leszek/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_579"><span class="phui-tag-core phui-tag-color-person">@WMDE-leszek</span></a>, <a href="https://phabricator.wikimedia.org/p/zeljkofilipin/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_580"><span class="phui-tag-core phui-tag-color-person">@zeljkofilipin</span></a>, and everyone else who helped last month by reporting, investigating, or patching errors found in production!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">🦅 “<em>This isn’t flying. This is falling… with style!</em>”</div>
<div class="remarkup-reply-body"></div>
</blockquote>



<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p>[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201903&amp;namespace=0&amp;hideredirects=1&amp;stripprefix=1" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:PrefixIndex/Incident_documentation/201903 …</a></p>

<p>[2] Tasks created. – <a href="https://phabricator.wikimedia.org/maniphest/query/As7RRbh3r2Bq/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query …</a></p>

<p>[3] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/9N6mnWSsJTsu/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query …</a></p>

<p>[4] Open tasks. – <a href="https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query …</a></p></div></content></entry><entry><title>Work progresses on CI tool evaluation</title><link href="/phame/live/1/post/149/work_progresses_on_ci_tool_evaluation/" /><id>https://phabricator.wikimedia.org/phame/post/view/149/</id><author><name>LarsWirzenius (Lars Wirzenius)</name></author><published>2019-03-08T16:59:04+00:00</published><updated>2019-03-14T15:13:28+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The working group to consider future tooling for continuous integration is making progress (see previous blog post <a href="https://phabricator.wikimedia.org/J148" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_581"><span class="phui-tag-core phui-tag-color-object">J148</span></a> for more information). We&#039;re looking at and evaluating alternatives and learning of new needs within WMF.</p>

<p>If you have CI needs that are not covered by building from git in a Linux container, we would like to hear from you. For example, building iOS applications is difficult without a Mac/OS X build worker, so we&#039;re looking into what we can do to provide that. What else is needed?</p>

<p>We&#039;re currently aiming to make CI much more &quot;self-serve&quot; so that as much as possible can be done by developers themselves, without having to go via or through the Release Engineering team.</p>

<p>Our list of candidates include systems that are not open source or are &quot;open core&quot; (open source, but with optional proprietary parts). We will be self-hosting, and open source is going to be a hard requirement. &quot;Open core&quot; may be an acceptable compromise for a system that is otherwise very good. We want to look at all alternatives, however, so that we know what&#039;s out there and what&#039;s possible.</p>

<p>We track our work in Phabricator, ticket <a href="https://phabricator.wikimedia.org/T217325" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_582"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T217325</span></span></a>.</p></div></content></entry><entry><title>Choosing tools for continuous integration</title><link href="/phame/live/1/post/148/choosing_tools_for_continuous_integration/" /><id>https://phabricator.wikimedia.org/phame/post/view/148/</id><author><name>LarsWirzenius (Lars Wirzenius)</name></author><published>2019-02-28T18:27:09+00:00</published><updated>2019-03-07T00:36:29+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The Release Engineering team has started a working group to discuss and consider our future continuous integration tooling. Please help!</p>

<p>The RelEng team is working with SRE to build a continuous delivery and deployment pipeline, as well as changing production to run things in containers under Kubernetes. We aim to improve the process of making changes to software behind our various sites by making it take less effort, happen faster, be less risky, and as automated as possible. The developers will have a better development experience, be more empowered, and more productive.</p>

<p>Wikimedia has had a CI system for many years now, but is based on versions of tools that are reaching the end of their useful life. Those tools need to be upgraded, and this will probably require further changes due to how the new versions function. This is a good point to consider what tools and functionality we need and want.</p>

<p>The working group is tasked to consider the needs and wants, and evaluate the available options, and make a recommendation of what to use in the future. The deadline is March 25. The work is being documented at <a href="https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG</a> and we&#039;re currently collecting requirements and candidates to evaluate.</p>

<p>We would welcome any feedback on those! Via IRC (#wikimedia-pipeline), on the talk page of the working group&#039;s wiki page above, or as a comment to this blog post.</p></div></content></entry><entry><title>Production Excellence #8: February 2019</title><link href="/phame/live/1/post/141/production_excellence_8_february_2019/" /><id>https://phabricator.wikimedia.org/phame/post/view/141/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2019-03-21T19:11:32+00:00</published><updated>2020-04-03T16:24:44+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence? Read on to find out!</p>

<h5 class="remarkup-header">📊 Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">7 documented incidents. [1]</li>
<li class="remarkup-list-item">30 new Wikimedia-prod-error tasks created. [2] (17 new in Jan, and 18 in Dec.)</li>
<li class="remarkup-list-item">27 Wikimedia-prod-error tasks closed. [3] (16 closed in Jan, and 20 in Dec.)</li>
</ul>

<p>There are in total 177 open Wikimedia-prod-error tasks today. (188 in Feb, 172 in Jan, and 165 in Dec.)</p>

<h5 class="remarkup-header">📉  Current problems</h5>

<p>There’s been an increase in how many application errors are reported each week. And, we’ve also managed to mostly keep up with those each week, so that’s great!</p>

<p>But, it does appear that most weeks we accumulated one or two unresolved errors, which is starting to add up. I believe this is mainly because they were reported a day after the branch went out. That is, if the same issues had been reported 24 hours earlier in a given week, then they might’ve blocked the train as a regression.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a></p>

<p>Below is breakdown of unresolved prod errors since last quarter. (I’ve omitted the last three weeks.)</p>

<p>By month:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">February: 5 reports (1.33-wmf.16, 1.33-wmf.17, 1.33-wmf.18).</li>
<li class="remarkup-list-item">January: 3 reports (1.33-wmf.13, 1.33-wmf.14).</li>
<li class="remarkup-list-item">December 2018: 5 reports (1.33-wmf.9).</li>
<li class="remarkup-list-item">November 2018: 3 reports (1.33-wmf.2).</li>
<li class="remarkup-list-item">October 2018: 1 report (1.32-wmf.26).</li>
<li class="remarkup-list-item">September 2018: 2 reports (1.32-wmf.20).</li>
</ul>

<p>By steward and software component:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Core Platform:<ul class="remarkup-list">
<li class="remarkup-list-item">Parser: <a href="https://phabricator.wikimedia.org/T216664" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_583"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T216664</span></span></a>.</li>
<li class="remarkup-list-item">Revision backend: <a href="https://phabricator.wikimedia.org/T214035" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_584"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T214035</span></span></a>, <a href="https://phabricator.wikimedia.org/T212428" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_585"><span class="phui-tag-core phui-tag-color-object">T212428</span></a>.</li>
</ul></li>
<li class="remarkup-list-item">Fundraising-Tech:<ul class="remarkup-list">
<li class="remarkup-list-item">CentralNotice: <a href="https://phabricator.wikimedia.org/T209741" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_586"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T209741</span></span></a>.</li>
</ul></li>
<li class="remarkup-list-item">Growth:<ul class="remarkup-list">
<li class="remarkup-list-item">Echo: <a href="https://phabricator.wikimedia.org/T217079" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_587"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T217079</span></span></a>.</li>
<li class="remarkup-list-item">Flow: <a href="https://phabricator.wikimedia.org/T212742" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_588"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T212742</span></span></a>, <a href="https://phabricator.wikimedia.org/T204793" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_589"><span class="phui-tag-core phui-tag-color-object">T204793</span></a>.</li>
<li class="remarkup-list-item">Page deletion: <a href="https://phabricator.wikimedia.org/T203913" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_590"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T203913</span></span></a>.</li>
</ul></li>
<li class="remarkup-list-item">Multimedia:<ul class="remarkup-list">
<li class="remarkup-list-item">MediaWiki uploading: <a href="https://phabricator.wikimedia.org/T208539" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_591"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T208539</span></span></a>.</li>
</ul></li>
<li class="remarkup-list-item">Performance:<ul class="remarkup-list">
<li class="remarkup-list-item">Lib-rdmbs: <a href="https://phabricator.wikimedia.org/T212284" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_592"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T212284</span></span></a>.</li>
</ul></li>
<li class="remarkup-list-item">Wikidata:<ul class="remarkup-list">
<li class="remarkup-list-item">Wikibase: <a href="https://phabricator.wikimedia.org/T217329" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_593"><span class="phui-tag-core phui-tag-color-object">T217329</span></a>, <a href="https://phabricator.wikimedia.org/T215380" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_594"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T215380</span></span></a>, <a href="https://phabricator.wikimedia.org/T213483" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_595"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T213483</span></span></a>.</li>
<li class="remarkup-list-item">WikibaseLexeme: <a href="https://phabricator.wikimedia.org/T207479" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_596"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T207479</span></span></a>, <a href="https://phabricator.wikimedia.org/T200906" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_597"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T200906</span></span></a>.</li>
<li class="remarkup-list-item">WikibaseQualityConstraints: <a href="https://phabricator.wikimedia.org/T212282" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_598"><span class="phui-tag-core phui-tag-color-object">T212282</span></a>.</li>
</ul></li>
<li class="remarkup-list-item">(Nobody - pending code ownership process):<ul class="remarkup-list">
<li class="remarkup-list-item">ImageMap extension: <a href="https://phabricator.wikimedia.org/T217087" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_599"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T217087</span></span></a>.</li>
<li class="remarkup-list-item">Nuke extension: <a href="https://phabricator.wikimedia.org/T212690" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_600"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T212690</span></span></a>.</li>
</ul></li>
</ul>

<h5 class="remarkup-header">📖  Fixed exposed fatal error on Special:Contributions</h5>

<p>Previously, a link to Special:Contributions could pass invalid options to a part of MediaWiki that doesn’t allow invalid options. Why would anything allow invalid options? Let’s find out.</p>

<p>Think about software as an onion. Software tends to have an outer layer where everything is allowed. If this layer finds illegal user input, it has to respond somehow. For example, by informing the user. In this outer layer, illegal input is not a problem in the software. It is a normal thing to see as we interact with the user. This outer layer responds directly to a user, is translated, and can do things like “view recent changes”, “view user contributions” or “rename a page”.</p>

<p>Internally, such action is divided into many smaller tasks (or functions). For example, a function might be “get talk namespace for given subject namespace”. This would answer “Talk:” to “(Article)”, and “Wikipedia_talk:” to “Wikipedia:”. When searching for edits on My Contributions with “Associated namespaces” ticked, this function is used. It is also used by Move Page if renaming a page together with its talk page. And it’s used on Recent Changes and View History, for all those little “talk” links next to each page title and username.</p>

<p>If one of your edits is for a page that has no discussion namespace, what should MediaWiki do? Show no edits? Skip that edit and tell the user “1 edit was hidden”? Show normally, but without a talk link? That decision is made by the outer layer for a feature, when it catches the internal exception. Alternatively, it can sometimes avoid an exception by asking a different question first – a question that cannot fail. Such as “Does namespace X have a talk space?”, instead of “What is the talk space for X?”.</p>

<p>When a program doesn’t catch or avoid an exception, a fatal error occurs. Thanks to <span class="phabricator-remarkup-mention-unknown">@D3r1ck01</span> for fixing this fatal error. – <a href="https://phabricator.wikimedia.org/T150324" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_601"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T150324</span></span></a></p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 <strong>ProTip:</strong> If your Jenkins build is failing and you suspect it’s unrelated to the project itself, be sure to <a href="https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?project=shared-build-failure" class="remarkup-link" rel="noreferrer">report it to Phabricator under “Shared Build Failure”</a>.</div>
<div class="remarkup-reply-body"></div>
</blockquote>



<h5 class="remarkup-header">🎉 Thanks!</h5>

<p>Thank you to everyone who has helped by reporting, investigating, or resolving problems in Wikimedia production. Including: <a href="https://phabricator.wikimedia.org/p/aaron/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_602"><span class="phui-tag-core phui-tag-color-person">@aaron</span></a>, <a href="https://phabricator.wikimedia.org/p/Addshore/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_603"><span class="phui-tag-core phui-tag-color-person">@Addshore</span></a>, <a href="https://phabricator.wikimedia.org/p/alaa_wmde/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_604"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@alaa_wmde</span></a>, <a href="https://phabricator.wikimedia.org/p/Amorymeltzer/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_605"><span class="phui-tag-core phui-tag-color-person">@Amorymeltzer</span></a>, <a href="https://phabricator.wikimedia.org/p/Anomie/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_606"><span class="phui-tag-core phui-tag-color-person">@Anomie</span></a> <span class="phabricator-remarkup-mention-unknown">@D3r1ck01</span> <a href="https://phabricator.wikimedia.org/p/Daimona/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_607"><span class="phui-tag-core phui-tag-color-person">@Daimona</span></a> <a href="https://phabricator.wikimedia.org/p/daniel/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_608"><span class="phui-tag-core phui-tag-color-person">@daniel</span></a> <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_609"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a> <a href="https://phabricator.wikimedia.org/p/hoo/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_610"><span class="phui-tag-core phui-tag-color-person">@hoo</span></a>, <a href="https://phabricator.wikimedia.org/p/jcrespo/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_611"><span class="phui-tag-core phui-tag-color-person">@jcrespo</span></a>, <a href="https://phabricator.wikimedia.org/p/KaMan/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_612"><span class="phui-tag-core phui-tag-color-person">@KaMan</span></a>, <a href="https://phabricator.wikimedia.org/p/Mainframe98/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_613"><span class="phui-tag-core phui-tag-color-person">@Mainframe98</span></a>, <a href="https://phabricator.wikimedia.org/p/Marostegui/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_614"><span class="phui-tag-core phui-tag-color-person">@Marostegui</span></a>, <a href="https://phabricator.wikimedia.org/p/matej_suchanek/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_615"><span class="phui-tag-core phui-tag-color-person">@matej_suchanek</span></a>, <a href="https://phabricator.wikimedia.org/p/Ottomata/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_616"><span class="phui-tag-core phui-tag-color-person">@Ottomata</span></a>, <a href="https://phabricator.wikimedia.org/p/Pchelolo/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_617"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@Pchelolo</span></a>, <a href="https://phabricator.wikimedia.org/p/Reedy/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_618"><span class="phui-tag-core phui-tag-color-person">@Reedy</span></a>, <a href="https://phabricator.wikimedia.org/p/revi/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_619"><span class="phui-tag-core phui-tag-color-person">@revi</span></a>, <a href="https://phabricator.wikimedia.org/p/Smalyshev/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_620"><span class="phui-tag-core phui-tag-color-person">@Smalyshev</span></a>, <a href="https://phabricator.wikimedia.org/p/Tarrow/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_621"><span class="phui-tag-core phui-tag-color-person">@Tarrow</span></a>, <a href="https://phabricator.wikimedia.org/p/Tgr/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_622"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-orange"></span>@Tgr</span></a>, <a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_623"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a>, <a href="https://phabricator.wikimedia.org/p/Umherirrender/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_624"><span class="phui-tag-core phui-tag-color-person">@Umherirrender</span></a>, and <a href="https://phabricator.wikimedia.org/p/Volker_E/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_625"><span class="phui-tag-core phui-tag-color-person">@Volker_E</span></a>.</p>

<p>Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p>[1] Incidents. — <a href="https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documentation%2F20190200&amp;to=Incident+documentation%2F20190300&amp;namespace=0" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:AllPages…</a></p>

<p>[2] Tasks created. — <a href="https://phabricator.wikimedia.org/maniphest/query/a0yuo6bqDOrh/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[3] Tasks closed. — <a href="https://phabricator.wikimedia.org/maniphest/query/7pmQcTvTWw_4/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<hr class="remarkup-hr" />

<blockquote><p>🍏 He got me invested in some kind of.. fruit company.</p></blockquote>

</div></content></entry><entry><title>Production Excellence #7: January 2019</title><link href="/phame/live/1/post/140/production_excellence_7_january_2019/" /><id>https://phabricator.wikimedia.org/phame/post/view/140/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2019-02-13T03:53:12+00:00</published><updated>2020-04-03T16:23:33+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h5 class="remarkup-header">📊 Month in numbers</h5>

<ul class="remarkup-list">
<li class="remarkup-list-item">4 documented incidents in January 2019. [1]</li>
<li class="remarkup-list-item">16 Wikimedia-prod-error tasks closed. [2]</li>
<li class="remarkup-list-item">17 Wikimedia-prod-error tasks created. [3]</li>
</ul>

<hr class="remarkup-hr" />

<h5 class="remarkup-header">📖  Unable to move certain file pages</h5>

<p>Xiplus reported that renaming a File page on zh.wikipedia.org led to a fatal database exception. Andre Klapper identified the stack trace from the logs, and Brad (<a href="https://phabricator.wikimedia.org/p/Anomie/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_634"><span class="phui-tag-core phui-tag-color-person">@Anomie</span></a>) investigated.</p>

<p>The File renaming failed because the File page did not have a media file associated with it (such move action is not currently allowed in MediaWiki). But, while handling this error the code caused a different error. The impact was that the user didn&#039;t get informed about why the move failed. Instead, they received a generic error page about a fatal database exception.</p>

<p><a href="https://phabricator.wikimedia.org/p/Tgr/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_635"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-orange"></span>@Tgr</span></a> fixed the code a few hours later, and it was deployed by Roan later that same day.<br />
Thanks! —  <a href="https://phabricator.wikimedia.org/T213168" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_626"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T213168</span></span></a></p>

<h5 class="remarkup-header">📖  DBPerformance regression detected and fixed</h5>

<p>During a routine audit of Logstash dashboards, I found a DBPerformance warning. The warning indicated that the limit of 0 for “master connections” was violated. That&#039;s a cryptic way of saying it found code in MediaWiki that uses a database master connection on a regular page view.</p>

<p>MediaWiki can have many replica database servers, but there can be only one master database at any given moment. To reduce chances of overload, delaying edits, or network congestion; we make sure to use replicas whenever possible. We usually involve the master only when source data is being changed, or is about to be changed. For example, when editing a page, or saving changes.</p>

<p>As the vast majority of traffic is page views, we have lower thresholds for latency and dependency on page views. In particular, page views may (in the future) be routed to secondary data centres that don’t even have a master DB.</p>

<p><a href="https://phabricator.wikimedia.org/p/Tchanders/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_636"><span class="phui-tag-core phui-tag-color-person">@Tchanders</span></a> from the Anti-Harassment team investigated the issue, found the culprit, and fixed it in time for the next MediaWiki train. Thanks! — <a href="https://phabricator.wikimedia.org/T214735" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_627"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T214735</span></span></a></p>

<h5 class="remarkup-header">📖  TemplateData missing in action</h5>

<p><a href="https://phabricator.wikimedia.org/p/Tacsipacsi/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_637"><span class="phui-tag-core phui-tag-color-person">@Tacsipacsi</span></a> and <a href="https://phabricator.wikimedia.org/p/Evad37/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_638"><span class="phui-tag-core phui-tag-color-person">@Evad37</span></a> both independently reported the same TemplateData issue. TemplateData powers the template insertion dialog in VisualEditor. It wasn&#039;t working for some templates after we deployed the 1.33-wmf.13 branch.</p>

<p>The error was “Argument 1 passed to ApiResult::setIndexedTagName() must be an instance of array, null given”. This means there was code that calls a function with the wrong parameter. For example, the variable name may&#039;ve been misspelled, or it may&#039;ve been the wrong variable, or (in this case) the variable didn&#039;t exist. In such case, PHP implicitly assumes “null”.</p>

<p>Bartosz (<a href="https://phabricator.wikimedia.org/p/matmarex/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_639"><span class="phui-tag-core phui-tag-color-person">@matmarex</span></a>) found the culprit. The week before, I made a change to TemplateData that changed the “template parameter order” feature to be optional. This allows users to decide whether VisualEditor should force an order for the parameters in the wikitext. It turned out I forgot to update one of the references to this variable, which still assumed it was always present.</p>

<p>Brad (Anomie) fixed it later that week, and it was deployed the next day. Thanks! — <a href="https://phabricator.wikimedia.org/T213953" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_628"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T213953</span></span></a></p>

<h5 class="remarkup-header">📈  Current problems</h5>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists known issues, grouped by the week in which they were first observed.</p>

<p>→  <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/tag/wikimedia-production-error</a></p>

<p>There are currently 188 open Wikimedia-prod-error tasks as of 12 February 2019. (We’ve had a slight increase since November; 165 in December, 172 in January.)</p>

<p>For this month’s edition, I’d like to draw attention to a few older issues that are still reproducible:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">[2013; Collection extension] Special:Book fatal error for blocked users. <a href="https://phabricator.wikimedia.org/T56179" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_629"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T56179</span></span></a></li>
<li class="remarkup-list-item">[2013; CentralNotice] Fatal error when placeholder key contains a space. <a href="https://phabricator.wikimedia.org/T58105" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_630"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T58105</span></span></a></li>
<li class="remarkup-list-item">[2014; LQT] Fatal error when attempting to view certain threads. <a href="https://phabricator.wikimedia.org/T61791" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_631"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T61791</span></span></a></li>
<li class="remarkup-list-item">[2015; MassMessage] Warning about Invalid message parameters. <a href="https://phabricator.wikimedia.org/T93110" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_632"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T93110</span></span></a></li>
<li class="remarkup-list-item">[2015; Wikibase] Warning “UnresolvedRedirectException” for some pages on Wikidata (and Commons). <a href="https://phabricator.wikimedia.org/T93273" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_633"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T93273</span></span></a></li>
</ul>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 Terminology:</div>
<div class="remarkup-reply-body"><p>A “<strong>Fatal error</strong>” (or uncaught exception) prevents a user action. For example — a page might display “MWException: Unknown class NotificationCount.”, instead the article content.<br />
A “<strong>Warning</strong>” (or non-fatal, or PHP error) lets the program continue to display a mostly page regardless. This may cause corrupt, incorrect, or incomplete information to be shown. For example — a user may receive a notification that says “You have (null) new messages”.</p></div>
</blockquote>



<hr class="remarkup-hr" />

<h5 class="remarkup-header">🎉 Thanks!</h5>

<p>Thank you to everyone who has helped by reporting, investigating, or resolving problems in Wikimedia production. Including: A2093064‚ <a href="https://phabricator.wikimedia.org/p/Anomie/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_640"><span class="phui-tag-core phui-tag-color-person">@Anomie</span></a>, <a href="https://phabricator.wikimedia.org/p/Daimona/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_641"><span class="phui-tag-core phui-tag-color-person">@Daimona</span></a> <a href="https://phabricator.wikimedia.org/p/Gilles/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_642"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@Gilles</span></a>, <a href="https://phabricator.wikimedia.org/p/He7d3r/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_643"><span class="phui-tag-core phui-tag-color-person">@He7d3r</span></a>, <a href="https://phabricator.wikimedia.org/p/Jdforrester-WMF/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_644"><span class="phui-tag-core phui-tag-color-person">@Jdforrester-WMF</span></a>, <a href="https://phabricator.wikimedia.org/p/matmarex/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_645"><span class="phui-tag-core phui-tag-color-person">@matmarex</span></a>, <a href="https://phabricator.wikimedia.org/p/mmodell/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_646"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@mmodell</span></a>, <a href="https://phabricator.wikimedia.org/p/Nikerabbit/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_647"><span class="phui-tag-core phui-tag-color-person">@Nikerabbit</span></a>, <a href="https://phabricator.wikimedia.org/p/Catrope/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_648"><span class="phui-tag-core phui-tag-color-person">@Catrope</span></a>, <a href="https://phabricator.wikimedia.org/p/Tchanders/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_649"><span class="phui-tag-core phui-tag-color-person">@Tchanders</span></a>, <a href="https://phabricator.wikimedia.org/p/Tgr/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_650"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-orange"></span>@Tgr</span></a>, and <a href="https://phabricator.wikimedia.org/p/thiemowmde/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_651"><span class="phui-tag-core phui-tag-color-person">@thiemowmde</span></a>.</p>

<p>Thanks!</p>

<p>Until next time,</p>

<p>— Timo Tijhof</p>

<blockquote><p>👢There&#039;s a snake in my boot. Reach for the sky!</p></blockquote>



<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p>[1] Incidents. — <a href="https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documentation%2F20190100&amp;to=Incident+documentation%2F20190200&amp;namespace=0" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:AllPages…</a></p>

<p>[2] Tasks closed. — <a href="https://phabricator.wikimedia.org/maniphest/query/COTGbmxGcm_l/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p>

<p>[3] Tasks created. — <a href="https://phabricator.wikimedia.org/maniphest/query/DLRuzOg9bSJA/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query…</a></p></div></content></entry><entry><title>Gerrit now automatically adds reviewers</title><link href="/phame/live/1/post/139/gerrit_now_automatically_adds_reviewers/" /><id>https://phabricator.wikimedia.org/phame/post/view/139/</id><author><name>hashar (Antoine Musso)</name></author><published>2019-01-17T16:53:56+00:00</published><updated>2021-03-05T10:19:18+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><div class="remarkup-warning"><span class="remarkup-note-word">WARNING:</span> 20210305 the <em>reviewers by blame</em> Gerrit plugin got disabled after it got announced by this blog post. It turns out the author of change is not necessarily an adequate reviewer suggestion in our context and some were being added to review for a whole lot code than they would expect.   The post still have some worthy information as to how one can find reviewers.</div>



<hr class="remarkup-hr" />

<p>Finding reviewers for a change is often a challenge, especially for a newcomer or folks proposing changes to projects they are not familiar with. Since January 16th, 2019, Gerrit automatically adds reviewers on your behalf based on who last changed the code you are affecting.</p>

<p>Antoine &quot;<a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_656"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a>&quot; Musso exposes what lead us to enable that feature and how to configure it to fit your project. He will offers tip as to how to seek more reviewers based on years of experience.</p>

<hr class="remarkup-hr" />

<p><strong>When uploading a new patch, reviewers should be added automatically</strong>, that is the subject of the task <a href="https://phabricator.wikimedia.org/T91190" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_654"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T91190</span></span></a> opened almost four years ago (March 2015). I declined the task since we already have the <em>Reviewer bot</em> (see section below), <a href="https://phabricator.wikimedia.org/p/Tgr/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_657"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-orange"></span>@Tgr</span></a> found a plugin for Gerrit which analyzes the code history with <tt class="remarkup-monospaced">git blame</tt> and uses that to determine potential reviewers for a change. It took us a while to add that particular Gerrit plugin and the first version we installed was not compatible with our Gerrit version. The plugin was upgraded yesterday (Jan 16th) and is working fine (<a href="https://phabricator.wikimedia.org/T101131" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_655"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T101131</span></span></a>).</p>

<p>Let&#039;s have a look at the functionality the plugin provides, and how it can be configured per repository. I will then offer a refresher of how one can search for reviewers based on git history.</p>

<h2 class="remarkup-header">Reviewers by blame plugin</h2>

<div class="remarkup-note"><span class="remarkup-note-word">NOTE:</span> the <em>reviewers by blame plugin</em> has been removed the day after this announce blog post got posted. This section thus does not apply to the Wikimedia Gerrit instance anymore. It is left here for historical reason.</div>

<p>The Gerrit plugin looks at affected code using <tt class="remarkup-monospaced">git blame</tt>, it extracts the top three past authors which are then added as reviewers to the change on your behalf. Added reviewers will thus receive a notification showing you have asked them for code review.</p>

<p>The configuration is done on a per project basis and inherits from the parent project. Without any tweaks, your project inherits the <a href="https://gerrit.wikimedia.org/r/#/admin/projects/All-Projects" class="remarkup-link remarkup-link-ext" rel="noreferrer">configuration from All-Projects</a>. If you are a project owner, you can adjust the configuration. As an example <a href="https://gerrit.wikimedia.org/r/#/admin/projects/operations/mediawiki-config" class="remarkup-link remarkup-link-ext" rel="noreferrer">the configuration for operations/mediawiki-config</a> which shows inherited values and an exception to not process a file named <tt class="remarkup-monospaced">InitialiseSettings.php</tt>:</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/t4j6hvkvm4kzlc77mh7c/PHID-FILE-siq6b7jparedqwnxvawf/mwconfig-reviewers-by-blame-config.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_652"><img src="https://phab.wmfusercontent.org/file/data/t4j6hvkvm4kzlc77mh7c/PHID-FILE-siq6b7jparedqwnxvawf/mwconfig-reviewers-by-blame-config.png" height="136" width="542" loading="lazy" alt="mwconfig-reviewers-by-blame-config.png (136×542 px, 16 KB)" /></a></div></p>

<p>The three settings are described in the <a href="https://gerrit.wikimedia.org/r/plugins/reviewers-by-blame/Documentation/config.md" class="remarkup-link remarkup-link-ext" rel="noreferrer">documentation for the plugin</a>:</p>

<blockquote><p><strong>plugin.reviewers-by-blame.maxReviewers</strong><br />
The maximum number of reviewers that should be added to a change by this plugin.<br />
By default 3.</p>

<p><strong>plugin.reviewers-by-blame.ignoreFileRegEx</strong><br />
Ignore files where the filename matches the given regular expression when computing the reviewers. If empty or not set, no files are ignored.<br />
By default not set.</p>

<p><strong>plugin.reviewers-by-blame.ignoreSubjectRegEx</strong><br />
Ignore commits where the subject of the commit messages matches the given regular expression. If empty or not set, no commits are ignored.<br />
By default not set.</p></blockquote>

<p>By making past authors aware of a change to code they previously altered, I believe you will get more reviews and hopefully get your changes approved faster.</p>

<p>Previously we had other methods to add reviewers, one opt-in based and the others being cumbersome manual steps. They should be used to compliment the Gerrit reviewers by blame plugin, and I am giving an overview of each of them in the following sections.</p>

<h2 class="remarkup-header">Gerrit watchlist</h2>

<p><div class="phabricator-remarkup-embed-layout-left phabricator-remarkup-embed-float-left"><a href="https://phab.wmfusercontent.org/file/data/vyk33jchh4tu7wkynibj/PHID-FILE-iqqrkkuzjhmiuoz2qtab/gerrit-watched-projects.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_653"><img src="https://phab.wmfusercontent.org/file/data/vyk33jchh4tu7wkynibj/PHID-FILE-iqqrkkuzjhmiuoz2qtab/gerrit-watched-projects.png" width="400" alt="gerrit-watched-projects.png (493×1 px, 72 KB)" /></a></div></p>

<p>The original system from Gerrit lets you watch projects, similar to a user watch list on MediaWiki. In <a href="https://gerrit.wikimedia.org/r/#/settings/projects" class="remarkup-link remarkup-link-ext" rel="noreferrer">Gerrit preferences</a>, one can get notified for new changes, patchsets, comments... Simply indicate a repository, optionally a <a href="https://gerrit.wikimedia.org/r/Documentation/user-search.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">search query</a> and you will receive email notifications for matching events.</p>

<p>The attached image is my watched projects configuration, I thus receive notifications for any changes made to the <tt class="remarkup-monospaced">integration/config</tt> config as well as for changes in <tt class="remarkup-monospaced">mediawiki/core</tt> which affect either <tt class="remarkup-monospaced">composer.json</tt> or one of the Wikimedia deployment branches for that repo.</p>

<p>One drawback is that we can not watch a whole hierarchy of projects such as <tt class="remarkup-monospaced">mediawiki</tt> and all its descendants, which would be helpful to watch our deployment branch. It is still useful when you are the primary maintainer of a repository since you can keep track of all activity for the repository.</p>

<h2 class="remarkup-header">Reviewer bot</h2>

<p>The reviewer bot has been written by Merlijn van Deen (<a href="https://phabricator.wikimedia.org/p/valhallasw/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_658"><span class="phui-tag-core phui-tag-color-person">@valhallasw</span></a>), it is similar to the Gerrit watched projects feature with some major benefits:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">watcher is added as a reviewer, the author thus knows you were notified</li>
<li class="remarkup-list-item">it supports watching a hierarchy of projects (eg: <tt class="remarkup-monospaced">mediawiki/*</tt>)</li>
<li class="remarkup-list-item">the file/branch filtering might be easier to gasp compared to Gerrit search queries</li>
<li class="remarkup-list-item">the watchers are stored at a central place which is public to anyone, making it easy to add others as reviewers.</li>
</ul>

<p>One registers reviewers on a single wiki page: <a href="https://www.mediawiki.org/wiki/Git/Reviewers" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.mediawiki.org/wiki/Git/Reviewers</a>.</p>

<p>Each repository filter is a wikitext section (eg: <tt class="remarkup-monospaced">=== mediawiki/core ===</tt>) followed by a wikitext template and a file filter using <a href="https://docs.python.org/2/library/fnmatch.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">using python fnmatch</a>. Some examples:</p>

<p>Listen to any changes that touch i18n:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">== Listen to repository groups ==
=== * ===
* {{Gerrit-reviewer|JohnDoe|file_regexp=&lt;nowiki&gt;i18n&lt;/nowiki&gt;}}</pre></div>

<p>Listen to MediaWiki core search related code:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">=== mediawiki/core ===
* {{Gerrit-reviewer|JaneDoe|file_regexp=&lt;nowiki&gt;^includes/search/&lt;/nowiki&gt;</pre></div>

<p>The system works great, given maintainers remember to register on the page and that the files are not moved around. The bot is not that well known though and most repositories do not have any reviewers listed.</p>

<h2 class="remarkup-header">Inspecting git history</h2>

<p>A source of reviewers is the git history, one can easily retrieve a list of past authors which should be good candidates to review code. I typically use <tt class="remarkup-monospaced">git shortlog --summary --no-merges</tt> for that (<tt class="remarkup-monospaced">--no-merges</tt> filters out merge commit crafted by Gerrit when a change is submitted). Example for MediaWiki Job queue system:</p>

<div class="remarkup-code-block" data-code-lang="console" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="gp">$ git shortlog --no-merges --summary --since &quot;one year ago&quot; includes/jobqueue/|sort -n|tail -n4</span>
<span class="go">     3	Petr Pchelko</span>
<span class="go">     4	Brad Jorsch</span>
<span class="go">     4	Umherirrender</span>
<span class="go">    16	Aaron Schulz</span></pre></div>

<p>Which gives me four candidates that acted on that directory over a year.</p>

<h2 class="remarkup-header">Past reviewers from git notes</h2>

<p>When a patch is merged, Gerrit records in git trace votes and the canonical URL of the change. They are available in git notes under <tt class="remarkup-monospaced">/refs/notes/review</tt>, once notes are fetched, they can be show in <tt class="remarkup-monospaced">git show</tt> or <tt class="remarkup-monospaced">git log</tt> by passing <tt class="remarkup-monospaced">--show-notes=review</tt>, for each commit, after the commit messages, the notes get displayed and show votes among other metadata:</p>

<div class="remarkup-code-block" data-code-lang="console" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="gp">$ git fetch refs/notes/review:refs/notes/review</span>
<span class="gp">$ git log --no-merges --show-notes=review -n1</span>
<span class="go">commit e1d2c92ac69b6537866c742d8e9006f98d0e82e8</span>
<span class="go">Author: Gergő Tisza &lt;tgr.huwiki@gmail.com&gt;</span>
<span class="go">Date:   Wed Jan 16 18:14:52 2019 -0800</span>
<span class="go"></span>
<span class="go">    Fix error reporting in MovePage</span>
<span class="go">    </span>
<span class="go">    Bug: T210739</span>
<span class="go">    Change-Id: I8f6c9647ee949b33fd4daeae6aed6b94bb1988aa</span>
<span class="go"></span>
<span class="go">Notes (review):</span>
<span class="go">    Code-Review+2: Jforrester &lt;jforrester@wikimedia.org&gt;</span>
<span class="go">    Verified+2: jenkins-bot</span>
<span class="go">    Submitted-by: jenkins-bot</span>
<span class="go">    Submitted-at: Thu, 17 Jan 2019 05:02:23 +0000</span>
<span class="go">    Reviewed-on: https://gerrit.wikimedia.org/r/484825</span>
<span class="go">    Project: mediawiki/core</span>
<span class="go">    Branch: refs/heads/master</span></pre></div>

<p>And I can then get the list of authors that previously voted <tt class="remarkup-monospaced">Code-Review +2</tt> for a given path. Using the previous example of <tt class="remarkup-monospaced">includes/jobqueue/</tt> over a year, the list is slightly different:</p>

<div class="remarkup-code-block" data-code-lang="console" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="gp">$ git log --show-notes=review --since &quot;1 year ago&quot; includes/jobqueue/|grep &#039;Code-Review+2:&#039;|sort|uniq -c|sort -n|tail -n5</span>
<span class="go">      2     Code-Review+2: Umherirrender &lt;umherirrender_de.wp@web.de&gt;</span>
<span class="go">      3     Code-Review+2: Jforrester &lt;jforrester@wikimedia.org&gt;</span>
<span class="go">      3     Code-Review+2: Mobrovac &lt;mobrovac@wikimedia.org&gt;</span>
<span class="go">      9     Code-Review+2: Aaron Schulz &lt;aschulz@wikimedia.org&gt;</span>
<span class="go">     18     Code-Review+2: Krinkle &lt;krinklemail@gmail.com&gt;</span></pre></div>

<p>User Krinkle has approved a lot of patches, even if he doesn&#039;t show in the list of authors obtained by the previous mean (inspecting git history).</p>

<h2 class="remarkup-header">Conclusion</h2>

<p>The Gerrit reviewers by blame plugin acts automatically which offers a good chance your newly uploaded patch will get reviewers added out of the box. For finer tweaking one should register as a reviewer on <a href="https://www.mediawiki.org/wiki/Git/Reviewers" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.mediawiki.org/wiki/Git/Reviewers</a> which benefits everyone. The last course of action is meant to compliment the git log history.</p>

<p>For any remarks, support, concerns, reach out on IRC freenode channel <tt class="remarkup-monospaced">#wikimedia-releng</tt> or <a href="https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?title=reviewers-by-blame%20plugin%3A+(a%20short%20description)&amp;owner=&amp;projects=gerrit" class="remarkup-link" rel="noreferrer">fill a task in Phabricator</a>.</p>

<p><em>Thank you <a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_659"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a> for the proof reading and english fixes</em>.</p></div></content></entry><entry><title>Code Health Metrics and SonarQube</title><link href="/phame/live/1/post/133/code_health_metrics_and_sonarqube/" /><id>https://phabricator.wikimedia.org/phame/post/view/133/</id><author><name>zeljkofilipin (Željko Filipin)</name></author><published>2019-01-10T14:54:38+00:00</published><updated>2019-01-15T11:42:06+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><h2 class="remarkup-header">Code Health</h2>

<p>Inside a broad <a href="https://www.mediawiki.org/wiki/Code_Health_Group" class="remarkup-link remarkup-link-ext" rel="noreferrer">Code Health</a> project there is a small <a href="https://www.mediawiki.org/wiki/Code_Health_Group/projects/Code_Health_Metrics" class="remarkup-link remarkup-link-ext" rel="noreferrer">Code Health Metrics</a> group. We meet weekly and discuss how code health could be improved by metrics. Each member has only a few hours each week to work on this, so our projects are small.</p>

<p>In our discussions, we have agreed on a few principles. Some of them are:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Metrics are about improving the process as much improving the code.</li>
<li class="remarkup-list-item">Focus on new code, not existing one.</li>
<li class="remarkup-list-item">Humans are smarter than tools.</li>
</ul>

<p>The goal of the project is to provide fast and actionable feedback on code health metrics. Since our time for this project is limited, we&#039;ve decided to make a spike (<a href="https://phabricator.wikimedia.org/T207046" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_664"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T207046</span></span></a>). The spike focuses on:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">one repository,</li>
<li class="remarkup-list-item">one language,</li>
<li class="remarkup-list-item">one metric,</li>
<li class="remarkup-list-item">one tool,</li>
<li class="remarkup-list-item">one feedback mechanism.</li>
</ul>

<p>All of the above tasks are already completed, except for the last one. In parallel to finishing the spike, we are also working on expanding the scope to more repositories, languages and metrics. At the moment, the spike works for several Java repositories.</p>

<h2 class="remarkup-header">SonarQube</h2>

<p>After some investigation, the tool we have selected is <a href="https://www.sonarqube.org/" class="remarkup-link remarkup-link-ext" rel="noreferrer">SonarQube</a>. The tool does everything we need, and more. In this post I&#039;ll only mention one feature. We have decided not to host SonarQube ourselves at the moment. We are using a hosted solution, <a href="https://sonarcloud.io" class="remarkup-link remarkup-link-ext" rel="noreferrer">SonarCloud</a>. You can see the our current dashboart at <a href="https://sonarcloud.io/organizations/wmftest" class="remarkup-link remarkup-link-ext" rel="noreferrer">wmftest</a> organization at SonarCloud.</p>

<p>As mentioned in the principles, in order to make the metrics actionable, we&#039;ve decided to focus only on new code, ignoring existing code for now. That means that when you make a change to a repository with a lot of code, you are not overwhelmed with all metrics (and problems) the tool has found. Instead, the tool focuses just on the code you have wrote. So, for example, if a small patch you have submitted to a big repository does not introduce new problems, the tool says so. If the patch introduces new problems (like decreased <a href="https://en.wikipedia.org/wiki/Code_coverage" class="remarkup-link remarkup-link-ext" rel="noreferrer">branch coverage</a>) the tools let&#039;s you know.</p>

<p>Members of the Code Health Metrics group have reminded me multiple times that I have to mention <a href="https://www.sonarlint.org/" class="remarkup-link remarkup-link-ext" rel="noreferrer">SonarLint</a>, an IDE extension. I don&#039;t use it myself, since it doesn&#039;t support my favorite editor.</p>

<h2 class="remarkup-header">Example</h2>

<p>A good example is at at <a href="https://sonarcloud.io/organizations/wmftest" class="remarkup-link remarkup-link-ext" rel="noreferrer">wmftest</a> organization at SonarCloud. Elasticsearch extra plugins has failed quality gate.</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/t7gkna7g72nbjkgcsmm6/PHID-FILE-4nrpananwazligrkbnl2/wmftest.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_660"><img src="https://phab.wmfusercontent.org/file/data/t7gkna7g72nbjkgcsmm6/PHID-FILE-4nrpananwazligrkbnl2/wmftest.png" height="821" width="1392" loading="lazy" alt="wmftest.png (821×1 px, 173 KB)" /></a></div></p>

<p>Opening the project <a href="https://sonarcloud.io/dashboard?id=org.wikimedia.search%3Aextra-parent" class="remarkup-link remarkup-link-ext" rel="noreferrer">Elasticsearch extra plugins</a> project you see that the failure is related to test coverage (less than 80%).</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/sv4s5tf2ubn6f5tzxsal/PHID-FILE-4ihd5fhrdtasjgjtdjau/extra-parent.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_661"><img src="https://phab.wmfusercontent.org/file/data/sv4s5tf2ubn6f5tzxsal/PHID-FILE-4ihd5fhrdtasjgjtdjau/extra-parent.png" height="821" width="1392" loading="lazy" alt="extra-parent.png (821×1 px, 181 KB)" /></a></div></p>

<p>Click the warning and you get more details: <tt class="remarkup-monospaced">Coverage on New Code 0.0%</tt>.</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/exlh4bwok5i3wfjdm42l/PHID-FILE-yykjqoedfvyu4ou7jyc5/new-coverage.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_662"><img src="https://phab.wmfusercontent.org/file/data/exlh4bwok5i3wfjdm42l/PHID-FILE-yykjqoedfvyu4ou7jyc5/new-coverage.png" height="821" width="1392" loading="lazy" alt="new-coverage.png (821×1 px, 175 KB)" /></a></div></p>

<p>Click the <a href="https://sonarcloud.io/component_measures?id=org.wikimedia.search%3Aextra-parent&amp;metric=new_coverage" class="remarkup-link remarkup-link-ext" rel="noreferrer">ExtraCorePlugin.java</a> file. New lines have yellow background. It&#039;s easy to see that there are lines that are marked red (meaning no coverage) but it&#039;s also easy to see which new lines (yellow background) have no coverage (red sidebar).</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/jyc7yvff6exf7voq2syz/PHID-FILE-xj2wyrnhdfomuzrs2ue4/ExtraCorePlugin.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_663"><img src="https://phab.wmfusercontent.org/file/data/jyc7yvff6exf7voq2syz/PHID-FILE-xj2wyrnhdfomuzrs2ue4/ExtraCorePlugin.png" height="793" width="1364" loading="lazy" alt="ExtraCorePlugin.png (793×1 px, 259 KB)" /></a></div></p>

<h2 class="remarkup-header">Talks</h2>

<p>We have planned to present what we have so far during <a href="https://office.wikimedia.org/wiki/All_hands/2019" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia Foundation All Hands</a>. The prepare for that, we&#039;re created this blog post and presented at <a href="https://office.wikimedia.org/wiki/Technology/5_Minute_Demo" class="remarkup-link remarkup-link-ext" rel="noreferrer">5 Minute Demo</a> and <a href="https://www.meetup.com/testival/events/257897967/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Testival Meetup</a>.</p>

<p>I would like to thank all members of the Code Health Metrics Working group for help writing this post and especially to Guillaume Lederrey and Kosta Harlan.</p>

<h2 class="remarkup-header">FAQ</h2>

<p>Q: Sonar-what?!<br />
A: <a href="https://www.sonarqube.org/" class="remarkup-link remarkup-link-ext" rel="noreferrer">SonarQube</a> is the tool. <a href="https://sonarcloud.io/" class="remarkup-link remarkup-link-ext" rel="noreferrer">SonarCloud</a> is the hosted version of the tool. <a href="https://www.sonarlint.org/" class="remarkup-link remarkup-link-ext" rel="noreferrer">SonarLint</a> in an IDE extension.</p>

<p>Q: When can I use this on my project?<br />
A: Soon. Probably when <a href="https://phabricator.wikimedia.org/T207046" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_665"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T207046</span></span></a> is resolved. If there are no blockers, in a few weeks.</p>

<p>Q: Why are we using SonarCloud instead of hosting SonarQube ourselves?<br />
A: We did not want to invest time in hosting it ourselves until we&#039;re sure the tool is the right choice for us.</p></div></content></entry><entry><title>Production Excellence #6: December 2018</title><link href="/phame/live/1/post/130/production_excellence_6_december_2018/" /><id>https://phabricator.wikimedia.org/phame/post/view/130/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2019-01-22T02:54:23+00:00</published><updated>2020-04-03T16:18:09+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Month in numbers.</li>
<li class="remarkup-list-item">Lightning round.</li>
<li class="remarkup-list-item">Current problems.</li>
</ul>

<h3 class="remarkup-header">📊 Month in numbers</h3>

<ul class="remarkup-list">
<li class="remarkup-list-item">4 documented incidents. [1]</li>
<li class="remarkup-list-item">20 Wikimedia-prod-error tasks closed. [2]</li>
<li class="remarkup-list-item">18 Wikimedia-prod-error tasks created. [3]</li>
<li class="remarkup-list-item">172 currently open Wikimedia-prod-error tasks (as of 16 January 2019).</li>
</ul>

<p>Terminology:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">An <strong>Exception</strong> (or <strong>fatal</strong>) prevents a user action. For example, a page would display  “Exception: Unable to render page”, instead the article content.</li>
<li class="remarkup-list-item">An Error (or non-fatal, <strong>warning</strong>) can produce pages that are technically unaware of a problem, but may show corrupt, incorrect, or incomplete information. For example — a user may receive a notification that says “<em>You have (null) new messages</em>”.</li>
</ul>

<p>For December, I haven’t prepared any stories or taken interviews. Instead, I’ve got a lightning round of errors in various areas that were found and fixed this past month.</p>

<h3 class="remarkup-header">⚡️ Contributions view fixed</h3>

<p>MarcoAurelio reported that Special:Contributions failed to load for certain user names on meta.wikimedia.org (PHP Fatal error, due to a faulty database record). Brad Jorsch investigated and found a relation to database maintenance from March 2018. He corrected the faulty records, which resolved the problem. Thanks!  — <a href="https://phabricator.wikimedia.org/T210985" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_666"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T210985</span></span></a></p>

<h3 class="remarkup-header">⚡️ Undefined talk space now defined</h3>

<p>The newly created Cantonese Wiktionary (yue.wiktionary.org) was encountering errors from the Siteinfo API. We found this was due to invalid site configuration. Urbanecm patched the issue, and also created a new unit test for wmf-config that will prevent this issue from happening on other wikis in the future. Thanks!  — <a href="https://phabricator.wikimedia.org/T211529" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_667"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T211529</span></span></a></p>

<h3 class="remarkup-header">⚡️ The undefined error status... error</h3>

<p>After deploying the 1.33.0-wmf.8 train to all wikis, we found a regression in the HTTP library for MediaWiki. When MediaWiki requested an HTTP resource from another service, and this resource was unavailable, then MediaWiki failed to correctly determine the HTTP status code of that error. Which then caused another error! This happened, for example, when Special:Collection was unable to reach the PediaPress.com backend in some cases. Patched by Bill Pirkle. Thanks!  — <a href="https://phabricator.wikimedia.org/T212005" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_668"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T212005</span></span></a></p>

<h3 class="remarkup-header">⚡️ Fatal error: Call to undefined function in Kartographer API</h3>

<p>When the 1.33.0-wmf-9 train reached the canary phase on Tue 18 December (aka, group0 [1]), Željko spotted a new fatal error in the logs. The fatal originated in the Kartographer extension and would have affected various users of the MediaWiki API. Patched the same day by Michael Holloway, reviewed by James Forrester, and deployed by Željko. Thanks!  — <a href="https://phabricator.wikimedia.org/T212218" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_669"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T212218</span></span></a></p>

<h3 class="remarkup-header">📉 Current problems</h3>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists known issues, grouped by the week in which they were first observed.</p>

<p><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">→  https://phabricator.wikimedia.org/tag/wikimedia-production-error</a></p>

<p>November&#039;s theme will continue for now, as I imagine lots of you were on vacation during that time! I’d like to draw attention to a subset of PHP fatal errors. Specifically, those that are publicly exposed (e.g. don’t need elevated user rights) and emit an HTTP 500 error code.</p>

<ol class="remarkup-list">
<li class="remarkup-list-item">Wikibase: Clicking “undo” for certain revisions fatals with a PatcherException. — <a href="https://phabricator.wikimedia.org/T97146" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_670"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T97146</span></span></a></li>
<li class="remarkup-list-item">Flow: Unable to view certain talk pages due to workflow InvalidDataException. — <a href="https://phabricator.wikimedia.org/T70526" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_671"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T70526</span></span></a></li>
<li class="remarkup-list-item">Translate: Certain Special:Translate urls fatal. — <a href="https://phabricator.wikimedia.org/T204833" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_672"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T204833</span></span></a></li>
<li class="remarkup-list-item">MediaWiki (Special-pages): SpecialDoubleRedirects unavailable on tt.wikipedia.org. — <a href="https://phabricator.wikimedia.org/T204800" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_673"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T204800</span></span></a></li>
<li class="remarkup-list-item">MediaWiki (Parser): Parse API exposes fatal content model error. — <a href="https://phabricator.wikimedia.org/T206253" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_674"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T206253</span></span></a></li>
<li class="remarkup-list-item">CentralNotice: Certain SpecialCentralNoticeBanners urls fatal. — <a href="https://phabricator.wikimedia.org/T149240" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_675"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T149240</span></span></a></li>
<li class="remarkup-list-item">PageViewInfo: Certain “mostviewed” API queries fail. — <a href="https://phabricator.wikimedia.org/T208691" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_676"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T208691</span></span></a></li>
</ol>

<p>Public user requests resulting in fatals can (and have) caused alerts to fire that notify SRE of wikis potentially being less available or down.</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 <strong>ProTip:</strong></div>
<div class="remarkup-reply-body"><p>Use “Report Error” on <a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error/</a> to create a task with a helpful template. This template is also available as “Report Application Error”, from the “Create Task” dropdown menu, on any task creation form.</p></div>
</blockquote>



<h3 class="remarkup-header">🎉 Thanks!</h3>

<p>Thank you to everyone who has helped by reporting, investigating, or resolving problems in Wikimedia production. Including <a href="https://phabricator.wikimedia.org/p/MarcoAurelio/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_677"><span class="phui-tag-core phui-tag-color-person">@MarcoAurelio</span></a>, <a href="https://phabricator.wikimedia.org/p/Anomie/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_678"><span class="phui-tag-core phui-tag-color-person">@Anomie</span></a>, <a href="https://phabricator.wikimedia.org/p/Urbanecm/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_679"><span class="phui-tag-core phui-tag-color-person">@Urbanecm</span></a>, <a href="https://phabricator.wikimedia.org/p/BPirkle/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_680"><span class="phui-tag-core phui-tag-color-person">@BPirkle</span></a>, <a href="https://phabricator.wikimedia.org/p/zeljkofilipin/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_681"><span class="phui-tag-core phui-tag-color-person">@zeljkofilipin</span></a>, <a href="https://phabricator.wikimedia.org/p/Mholloway/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_682"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@Mholloway</span></a>, <a href="https://phabricator.wikimedia.org/p/Esanders/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_683"><span class="phui-tag-core phui-tag-color-person">@Esanders</span></a>, <a href="https://phabricator.wikimedia.org/p/Jdforrester-WMF/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_684"><span class="phui-tag-core phui-tag-color-person">@Jdforrester-WMF</span></a>, and <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_685"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a>.</p>

<p>Until next time,</p>

<p>— Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p>[1] Incidents. — <a href="https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documentation%2F20181200&amp;to=Incident+documentation%2F20190100&amp;namespace=0" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:AllPages...</a></p>

<p>[2] Tasks closed. — <a href="https://phabricator.wikimedia.org/maniphest/query/Pe2KaRZhJJ.H/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query...</a></p>

<p>[3] Tasks opened. — <a href="https://phabricator.wikimedia.org/maniphest/query/aqbDey80TU02/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query...</a></p>

<p>[4] What is group0? — <a href="https://wikitech.wikimedia.org/wiki/Deployments/One_week#Groups" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Deployments/One_week#Three_groups</a></p></div></content></entry><entry><title>Production Excellence #5: November 2018</title><link href="/phame/live/1/post/129/production_excellence_5_november_2018/" /><id>https://phabricator.wikimedia.org/phame/post/view/129/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2018-12-12T04:40:26+00:00</published><updated>2020-04-03T16:17:55+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Month in numbers.</li>
<li class="remarkup-list-item">Highlighted stories.</li>
<li class="remarkup-list-item">Current problems.</li>
</ul>

<h3 class="remarkup-header">📊 Month in numbers</h3>

<ul class="remarkup-list">
<li class="remarkup-list-item">4 documented incidents in November 2018. [1]</li>
<li class="remarkup-list-item">42 Wikimedia-prod-error tasks closed in November 2018. [2]</li>
<li class="remarkup-list-item">36 Wikimedia-prod-error tasks created in November 2018. [3]</li>
<li class="remarkup-list-item">165 currently open Wikimedia-prod-error tasks (as of 12 December 2018).</li>
</ul>

<p>Terminology:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">An <strong>Exception</strong> (or <strong>fatal</strong>) causes user actions to be prevented. For example, a page would display  &quot;Exception: Unable to render page&quot;, instead the article content.</li>
<li class="remarkup-list-item">An <strong>Error</strong> (or <strong>non-fatal</strong>, or <strong>warning</strong>) can produce page views that are technically unaware of a problem, but may show corrupt, incorrect, or incomplete information.  Examples – an article would display the code word “null” instead of the actual content, a user looking for Vegetables may be taken to an article about Vegetarians, a user may receive a notification that says “<em>You have (null) new messages.</em>”</li>
</ul>

<p>With that behind us... Let’s celebrate this month’s highlights!</p>

<h3 class="remarkup-header">*️⃣ Fatal DB exception at wikitech.wikimedia.org</h3>

<p>Quiddity reported that he was unable to disable a spam account, due to a fatal exception. Andre Klapper used the Exception ID to find the stack trace in the logs. The trace revealed that a table was missing in Wikitech’s database.</p>

<p>The MediaWiki software was recently expanded with a “Partial blocking” ability. [4] This involved introducing a new database table that stores block metadata differently. This software update was deployed to Wikitech, but this new table was not created.</p>

<p><a href="https://phabricator.wikimedia.org/p/Marostegui/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_699"><span class="phui-tag-core phui-tag-color-person">@Marostegui</span></a> (Database administrator) quickly applied the schema patches that create the missing table. Thanks Manuel, Andre, and Quiddity; Teamwork!</p>

<p>– <a href="https://phabricator.wikimedia.org/T209674" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_686"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T209674</span></span></a></p>

<h3 class="remarkup-header">*️⃣ Big-page Deletion Unleashed!</h3>

<p>It had been known for years, [5] that users are unable to delete or restore pages with more than a few hundred revisions. Attempts to do so could fail, with a fatal “DBTransactionSizeError” exception. This error indicates that the change is too big or too slow. Such changes risk replication lag, and may impact the stability of the infrastructure.</p>

<p>The database structure used by MediaWiki for page archives dates back to 2003 (over 15 years ago). I&#039;ll spare you the details, but it depends on database interactions that are inherently slow when applied to systems as big as Wikipedia! RFC <a href="https://phabricator.wikimedia.org/T20493" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_687"><span class="phui-tag-core phui-tag-color-object">T20493</span></a> intends to modernise this structure for the long-term.</p>

<p>Then along came <a href="https://phabricator.wikimedia.org/p/BPirkle/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_700"><span class="phui-tag-core phui-tag-color-person">@BPirkle</span></a>. Bill joined the WMF Core platform team earlier this year. He took on the challenge of making page deletion work for any size page, today.</p>

<p>Previously, page deletion happened in a single step. This simple approach had the benefit of either succeeding in its entirety, or safely rolling back like nothing happened. It also meant that the database protected us against conflicting changes. In August, Bill started a two-month effort that carefully split the logic for “delete a page” into smaller steps that each are safe and quick. It now uses our JobQueue to schedule and run these steps, without the user waiting for it.</p>

<p>–  <a href="https://phabricator.wikimedia.org/T198176" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_688"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T198176</span></span></a></p>

<h3 class="remarkup-header">📉 Current problems</h3>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists known issues, grouped by the week in which they were first observed.</p>

<p><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">→  https://phabricator.wikimedia.org/tag/wikimedia-production-error</a></p>

<p>I’d like to draw attention to a subset of PHP fatal errors. Specifically, those that are publicly exposed (e.g. don’t require elevated user rights) and use an HTTP 500 status code.</p>

<ol class="remarkup-list">
<li class="remarkup-list-item">CentralNotice: Some Special:CentralNoticeBanners urls fatal. – <a href="https://phabricator.wikimedia.org/T149240" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_689"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T149240</span></span></a></li>
<li class="remarkup-list-item">Flow: Unable to view certain talk pages due to workflow InvalidDataException. – <a href="https://phabricator.wikimedia.org/T70526" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_690"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T70526</span></span></a></li>
<li class="remarkup-list-item">JsonConfig: Unable to diff certain “.map” pages on Commons. – <a href="https://phabricator.wikimedia.org/T203063" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_691"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T203063</span></span></a></li>
<li class="remarkup-list-item">MediaWiki (Parser): Parse API exposes fatal content model error. – <a href="https://phabricator.wikimedia.org/T206253" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_692"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T206253</span></span></a></li>
<li class="remarkup-list-item">MediaWiki (Special-pages): Special:DoubleRedirects unavailable on ttwiki. – <a href="https://phabricator.wikimedia.org/T204800" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_693"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T204800</span></span></a></li>
<li class="remarkup-list-item">MobileFrontend: Some Special:MobileDiff urls fatal. – <a href="https://phabricator.wikimedia.org/T156293" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_694"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T156293</span></span></a></li>
<li class="remarkup-list-item">ProofreadPage: Unable to edit certain pages on Wikisource. – <a href="https://phabricator.wikimedia.org/T176196" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_695"><span class="phui-tag-core phui-tag-color-object">T176196</span></a></li>
<li class="remarkup-list-item">Translate: Some Special:Translate urls fatal. – <a href="https://phabricator.wikimedia.org/T204833" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_696"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T204833</span></span></a></li>
<li class="remarkup-list-item">Wikibase: Clicking “undo” for some revisions fatals with a PatcherException. – <a href="https://phabricator.wikimedia.org/T97146" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_697"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T97146</span></span></a></li>
</ol>

<p>Public user requests resulting in fatals can (and have) caused alerts to fire that notify SRE of wikis potentially being less available or down.</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 <strong>ProTip:</strong></div>
<div class="remarkup-reply-body"><p>Cross-reference one workboard with another via <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view "><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-search" data-meta="0_18" aria-hidden="true"></span>Open Tasks</span></span><span class="remarkup-nav-sequence-arrow"> → </span><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Advanced Filter</span></span></span> and enter Tag(s) to apply as a filter.</p></div>
</blockquote>



<h3 class="remarkup-header">🎉 Thank you</h3>

<p>Thank you to everyone who helped by reporting or investigating problems in Wikimedia production; and for implementing or reviewing their solutions. Including: <a href="https://phabricator.wikimedia.org/p/tstarling/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_701"><span class="phui-tag-core phui-tag-color-person">@tstarling</span></a>, <a href="https://phabricator.wikimedia.org/p/thiemowmde/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_702"><span class="phui-tag-core phui-tag-color-person">@thiemowmde</span></a>, <a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_703"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a>, <a href="https://phabricator.wikimedia.org/p/Tgr/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_704"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-orange"></span>@Tgr</span></a>, <a href="https://phabricator.wikimedia.org/p/Steinsplitter/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_705"><span class="phui-tag-core phui-tag-color-person">@Steinsplitter</span></a>, <a href="https://phabricator.wikimedia.org/p/Quiddity/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_706"><span class="phui-tag-core phui-tag-color-person">@Quiddity</span></a>, <a href="https://phabricator.wikimedia.org/p/pmiazga/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_707"><span class="phui-tag-core phui-tag-color-person">@pmiazga</span></a>, <a href="https://phabricator.wikimedia.org/p/Nikerabbit/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_708"><span class="phui-tag-core phui-tag-color-person">@Nikerabbit</span></a>, <a href="https://phabricator.wikimedia.org/p/Mvolz/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_709"><span class="phui-tag-core phui-tag-color-person">@Mvolz</span></a>, <a href="https://phabricator.wikimedia.org/p/Lucas_Werkmeister_WMDE/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_710"><span class="phui-tag-core phui-tag-color-person">@Lucas_Werkmeister_WMDE</span></a>, <a href="https://phabricator.wikimedia.org/p/kostajh/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_711"><span class="phui-tag-core phui-tag-color-person">@kostajh</span></a>, <a href="https://phabricator.wikimedia.org/p/jrbs/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_712"><span class="phui-tag-core phui-tag-color-person">@jrbs</span></a>, <a href="https://phabricator.wikimedia.org/p/JJMC89/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_713"><span class="phui-tag-core phui-tag-color-person">@JJMC89</span></a>, <a href="https://phabricator.wikimedia.org/p/Jdforrester-WMF/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_714"><span class="phui-tag-core phui-tag-color-person">@Jdforrester-WMF</span></a>, <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_715"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a>, <a href="https://phabricator.wikimedia.org/p/Gilles/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_716"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@Gilles</span></a>, <a href="https://phabricator.wikimedia.org/p/Daimona/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_717"><span class="phui-tag-core phui-tag-color-person">@Daimona</span></a>, <a href="https://phabricator.wikimedia.org/p/Ciencia_Al_Poder/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_718"><span class="phui-tag-core phui-tag-color-person">@Ciencia_Al_Poder</span></a>, <a href="https://phabricator.wikimedia.org/p/Catrope/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_719"><span class="phui-tag-core phui-tag-color-person">@Catrope</span></a>, <a href="https://phabricator.wikimedia.org/p/BPirkle/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_720"><span class="phui-tag-core phui-tag-color-person">@BPirkle</span></a>, <a href="https://phabricator.wikimedia.org/p/Barkeep49/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_721"><span class="phui-tag-core phui-tag-color-person">@Barkeep49</span></a>, <a href="https://phabricator.wikimedia.org/p/Anomie/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_722"><span class="phui-tag-core phui-tag-color-person">@Anomie</span></a>, and <a href="https://phabricator.wikimedia.org/p/Aklapper/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_723"><span class="phui-tag-core phui-tag-color-person">@Aklapper</span></a>.</p>

<p>Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p>[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documentation%2F20181101&amp;to=Incident+documentation%2F20181131&amp;namespace=0" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:AllPages...</a><br />
[2] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/.PkyGL4Rz_4i/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query...</a><br />
[3] Tasks opened. – <a href="https://phabricator.wikimedia.org/maniphest/query/WsqbAxlHPLwk/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query...</a><br />
[4] Partial blocks. – <a href="https://meta.wikimedia.org/wiki/Community_health_initiative/Per-user_page,_namespace,_and_upload_blocking" class="remarkup-link remarkup-link-ext" rel="noreferrer">meta.wikimedia.org/wiki/Community_health_initiative</a><br />
[5] Bug report about page deletion, 2007. – <a href="https://phabricator.wikimedia.org/T13402" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_698"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T13402</span></span></a></p></div></content></entry><entry><title>Incident Documentation: An Unexpected Journey</title><link href="/phame/live/1/post/128/incident_documentation_an_unexpected_journey/" /><id>https://phabricator.wikimedia.org/phame/post/view/128/</id><author><name>zeljkofilipin (Željko Filipin)</name></author><published>2018-11-22T18:06:07+00:00</published><updated>2019-01-25T11:41:28+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><h2 class="remarkup-header">Introduction</h2>

<p>The Release Engineering team wants to continually improve the quality of our software over time. One of the ways in which we hoped to do that this year is by creating more useful Selenium smoke tests. (From now on, <em>test</em> will be used instead of <em>Selenium test</em>.) This blog post is about how we determined where the tests should focus and the relative priority.</p>

<p>At first, I thought this would be a trivial task. A few hours of work. A few days at most. A week or two if I&#039;ve completely underestimated it. A couple of months later, I know I have completely underestimated it.</p>

<p>Things I needed to do:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Define prioritization scheme.</li>
<li class="remarkup-list-item">Prioritize target repositories.</li>
</ul>

<h2 class="remarkup-header">Define Prioritization Scheme</h2>

<p>In general:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Does a repository have stewards? (Do the stewards want tests?)</li>
<li class="remarkup-list-item">Does a repository have existing tests?</li>
</ul>

<p>For the last year:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">How much change did happen for a repository? Simply put: more change can lead to more risk.</li>
<li class="remarkup-list-item">How many incidents is a repository connected to? We wanted to make sure we didn&#039;t miss any obvious problematic areas.</li>
</ul>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/bwbl67gmjjntb23zqzue/PHID-FILE-7ravy7gkncmb5tavfje7/Coverage_Change_Incidents_Stewards.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_727"><img src="https://phab.wmfusercontent.org/file/data/bwbl67gmjjntb23zqzue/PHID-FILE-7ravy7gkncmb5tavfje7/Coverage_Change_Incidents_Stewards.png" height="559" width="945" loading="lazy" alt="Coverage Change Incidents Stewards.png (559×945 px, 25 KB)" /></a></div></p>

<h2 class="remarkup-header">Does a Repository Have Stewards?</h2>

<p>This was relatively simple task. The best source of information is <a href="https://www.mediawiki.org/wiki/Developers/Maintainers" class="remarkup-link remarkup-link-ext" rel="noreferrer">Developers/Maintainers</a> page.</p>

<h2 class="remarkup-header">Does a Repository Have Existing Tests?</h2>

<p>This was also easy. <a href="https://www.mediawiki.org/wiki/Selenium/Node.js" class="remarkup-link remarkup-link-ext" rel="noreferrer">Selenium/Node.js</a> page has list of repositories that have tests in Node.js. I already had all repositories with Node.js and Ruby tests on my machine, so a quick search for <tt class="remarkup-monospaced">webdriverio</tt> (Node.js) and <tt class="remarkup-monospaced">mediawiki_selenium</tt> (Ruby) found all the tests. In order to be really sure I&#039;ve found all repositories with tests, I&#039;ve <a href="https://github.com/zeljkofilipin/gerrit" class="remarkup-link remarkup-link-ext" rel="noreferrer">cloned all repositories from Gerrit</a>.</p>

<div class="remarkup-code-block" data-code-lang="console" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="gp">$ ack --json webdriverio</span>
<span class="go">extensions/Echo/package.json</span>
<span class="go">27:        &quot;webdriverio&quot;: &quot;4.12.0&quot;</span>
<span class="go">...</span></pre></div>



<div class="remarkup-code-block" data-code-lang="console" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span class="gp">$ ack --type-add=lock:ext:lock --lock mediawiki_selenium</span>
<span class="go">skins/MinervaNeue/Gemfile.lock</span>
<span class="go">42:    mediawiki_selenium (1.7.3)</span>
<span class="go">...</span></pre></div>

<p>To make extra sure I have not missed any repositories, I&#039;ve used MediaWiki code search (<a href="https://codesearch.wmflabs.org/search/?q=mediawiki_selenium" class="remarkup-link remarkup-link-ext" rel="noreferrer">mediawiki_selenium</a>, <a href="https://codesearch.wmflabs.org/search/?q=webdriverio" class="remarkup-link remarkup-link-ext" rel="noreferrer">webdriverio</a>) and GitHub search (<a href="https://github.com/search?q=org%3Awikimedia+extension%3Alock+mediawiki_selenium" class="remarkup-link remarkup-link-ext" rel="noreferrer">org:wikimedia extension:lock mediawiki_selenium</a>, <a href="https://github.com/search?q=org%3Awikimedia+extension%3Ajson+webdriverio" class="remarkup-link remarkup-link-ext" rel="noreferrer">org:wikimedia extension:json webdriverio</a>)</p>

<p>This is the list.</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td><strong>Repository</strong></td><td><strong>Language</strong></td></tr>
<tr><td>mediawiki/core</td><td>JavaScript</td></tr>
<tr><td>mediawiki/extensions/AdvancedSearch</td><td>JavaScript</td></tr>
<tr><td>mediawiki/extensions/CentralAuth</td><td>Ruby</td></tr>
<tr><td>mediawiki/extensions/CentralNotice</td><td>Ruby</td></tr>
<tr><td>mediawiki/extensions/CirrusSearch</td><td>JavaScript</td></tr>
<tr><td>mediawiki/extensions/Cite</td><td>JavaScript</td></tr>
<tr><td>mediawiki/extensions/Echo</td><td>JavaScript</td></tr>
<tr><td>mediawiki/extensions/ElectronPdfService</td><td>JavaScript</td></tr>
<tr><td>mediawiki/extensions/GettingStarted</td><td>Ruby</td></tr>
<tr><td>mediawiki/extensions/Math</td><td>JavaScript</td></tr>
<tr><td>mediawiki/extensions/MobileFrontend</td><td>Ruby</td></tr>
<tr><td>mediawiki/extensions/MultimediaViewer</td><td>Ruby</td></tr>
<tr><td>mediawiki/extensions/Newsletter</td><td>JavaScript</td></tr>
<tr><td>mediawiki/extensions/ORES</td><td>JavaScript</td></tr>
<tr><td>mediawiki/extensions/Popups</td><td>JavaScript</td></tr>
<tr><td>mediawiki/extensions/QuickSurveys</td><td>Ruby</td></tr>
<tr><td>mediawiki/extensions/RelatedArticles</td><td>JavaScript</td></tr>
<tr><td>mediawiki/extensions/RevisionSlider</td><td>Ruby</td></tr>
<tr><td>mediawiki/extensions/TwoColConflict</td><td>JavaScript, Ruby</td></tr>
<tr><td>mediawiki/extensions/Wikibase</td><td>JavaScript, Ruby</td></tr>
<tr><td>mediawiki/extensions/WikibaseLexeme</td><td>JavaScript, Ruby</td></tr>
<tr><td>mediawiki/extensions/WikimediaEvents</td><td>PHP</td></tr>
<tr><td>mediawiki/skins/MinervaNeue</td><td>Ruby</td></tr>
<tr><td>phab-deployment</td><td>JavaScript</td></tr>
<tr><td>wikimedia/community-tech-tools</td><td>Ruby</td></tr>
<tr><td>wikimedia/portals/deploy</td><td>JavaScript</td></tr>
<tr></tr>
</table></div>



<h2 class="remarkup-header">How Much Change Did Happen for a Repository?</h2>

<p>After reviewing several tools, I&#039;ve found that we already use <a href="https://wikimedia.biterg.io" class="remarkup-link remarkup-link-ext" rel="noreferrer">Bitergia</a> for various metrics. There is even a nice list of top 50 repositories by the number of commits. The tool even supports limiting the report from a date to a date. Exactly what I needed.</p>

<p>Bitergia &gt; Last 90 days &gt; Absolute &gt; From <tt class="remarkup-monospaced">2017-11-01 00:00:00.000</tt> &gt; To <tt class="remarkup-monospaced">2018-10-31 23:59:59.999</tt> &gt; Go &gt; Git &gt; Overview &gt; Repositories (raw data: <a href="https://phabricator.wikimedia.org/P7776" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_724"><span class="phui-tag-core phui-tag-color-object">P7776</span></a>, <a href="https://wikimedia.biterg.io/app/kibana#/dashboard/Git?_g=(filters:!((&#039;$state&#039;:(store:globalState),meta:(alias:&#039;Empty%20Commits&#039;,disabled:!f,index:git,key:files,negate:!t,params:(query:&#039;0&#039;,type:phrase),type:phrase,value:&#039;0&#039;),query:(match:(files:(query:&#039;0&#039;,type:phrase)))),(&#039;$state&#039;:(store:globalState),meta:(alias:Bots,disabled:!f,index:git,key:author_bot,negate:!t,params:(query:!t,type:phrase),type:phrase,value:true),query:(match:(author_bot:(query:!t,type:phrase))))),refreshInterval:(display:Off,pause:!f,value:0),time:(from:&#039;2017-10-31T23:00:00.000Z&#039;,mode:absolute,to:&#039;2018-10-31T22:59:59.999Z&#039;))&amp;_a=(description:&#039;Git%20Overview%20panel%20by%20Bitergia&#039;,filters:!(),fullScreenMode:!f,options:(darkTheme:!f,useMargins:!t),panels:!((gridData:(h:4,i:&#039;1&#039;,w:3,x:5,y:0),id:git_commits_organizations,panelIndex:&#039;1&#039;,title:Organizations,type:visualization,version:&#039;6.1.0-3&#039;),(gridData:(h:3,i:&#039;2&#039;,w:5,x:0,y:12),id:git_commits_timezone,panelIndex:&#039;2&#039;,title:&#039;Commits%20by%20Time%20Zone&#039;,type:visualization,version:&#039;6.1.0-3&#039;),(gridData:(h:2,i:&#039;3&#039;,w:5,x:0,y:4),id:git_evolution_authors,panelIndex:&#039;3&#039;,title:Authors,type:visualization,version:&#039;6.1.0-3&#039;),(gridData:(h:2,i:&#039;4&#039;,w:5,x:0,y:2),id:git_evolution_commits,panelIndex:&#039;4&#039;,title:Commits,type:visualization,version:&#039;6.1.0-3&#039;),(gridData:(h:5,i:&#039;5&#039;,w:7,x:5,y:4),id:git_evolution_organizations,panelIndex:&#039;5&#039;,title:Organizations,type:visualization,version:&#039;6.1.0-3&#039;),(gridData:(h:2,i:&#039;6&#039;,w:5,x:0,y:0),id:git_main_numbers,panelIndex:&#039;6&#039;,title:Git,type:visualization,version:&#039;6.1.0-3&#039;),(gridData:(h:6,i:&#039;7&#039;,w:7,x:5,y:9),id:git_organizations_table,panelIndex:&#039;7&#039;,title:Organizations,type:visualization,version:&#039;6.1.0-3&#039;),(gridData:(h:6,i:&#039;8&#039;,w:5,x:0,y:6),id:git_top_authors,panelIndex:&#039;8&#039;,title:Authors,type:visualization,version:&#039;6.1.0-3&#039;),(gridData:(h:4,i:&#039;9&#039;,w:4,x:8,y:0),id:git_top_projects,panelIndex
:&#039;9&#039;,title:Projects,type:visualization,version:&#039;6.1.0-3&#039;),(gridData:(h:6,i:&#039;10&#039;,w:12,x:0,y:15),id:git_top_repositories,panelIndex:&#039;10&#039;,title:Repositories,type:visualization,version:&#039;6.1.0-3&#039;)),query:(language:lucene,query:(query_string:(analyze_wildcard:!t,default_field:&#039;*&#039;,query:&#039;*&#039;))),timeRestore:!f,title:Git,uiState:(P-1:(title:Organizations),P-10:(spy:(mode:(fill:!f,name:!n)),title:Repositories,vis:(params:(config:(searchKeyword:&#039;&#039;),sort:(columnIndex:!n,direction:!n)))),P-2:(title:&#039;Commits%20by%20Time%20Zone&#039;,vis:(legendOpen:!f)),P-3:(title:Authors,vis:(legendOpen:!f)),P-4:(title:Commits,vis:(legendOpen:!f)),P-5:(title:Organizations),P-6:(title:Git),P-7:(title:Organizations,vis:(params:(config:(searchKeyword:&#039;&#039;),sort:(columnIndex:!n,direction:!n)))),P-8:(title:Authors,vis:(params:(config:(searchKeyword:&#039;&#039;),sort:(columnIndex:!n,direction:!n)))),P-9:(title:Projects,vis:(params:(config:(searchKeyword:&#039;&#039;),sort:(columnIndex:!n,direction:!n))))),viewMode:view)" class="remarkup-link remarkup-link-ext" rel="noreferrer">direct link</a>).</p>

<p>This is the top 50 list (excludes empty commits and bots).</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Repository</th><th>Commits</th></tr>
<tr><td>mediawiki/extensions</td><td>11300</td></tr>
<tr><td>operations/puppet</td><td>7988</td></tr>
<tr><td>mediawiki/core</td><td>4590</td></tr>
<tr><td>operations/mediawiki-config</td><td>4005</td></tr>
<tr><td>integration/config</td><td>1652</td></tr>
<tr><td>operations/software/librenms</td><td>1169</td></tr>
<tr><td>pywikibot/core</td><td>927</td></tr>
<tr><td>mediawiki/extensions/Wikibase</td><td>806</td></tr>
<tr><td>apps/android/wikipedia</td><td>789</td></tr>
<tr><td>mediawiki/services/parsoid</td><td>700</td></tr>
<tr><td>mediawiki/extensions/VisualEditor</td><td>692</td></tr>
<tr><td>operations/dns</td><td>653</td></tr>
<tr><td>VisualEditor/VisualEditor</td><td>599</td></tr>
<tr><td>mediawiki/skins</td><td>570</td></tr>
<tr><td>mediawiki/extensions/MobileFrontend</td><td>504</td></tr>
<tr><td>mediawiki/extensions/ContentTranslation</td><td>491</td></tr>
<tr><td>translatewiki</td><td>486</td></tr>
<tr><td>oojs/ui</td><td>469</td></tr>
<tr><td>wikimedia/fundraising/crm</td><td>457</td></tr>
<tr><td>mediawiki/extensions/BlueSpiceFoundation</td><td>414</td></tr>
<tr><td>mediawiki/extensions/CirrusSearch</td><td>357</td></tr>
<tr><td>mediawiki/extensions/AbuseFilter</td><td>306</td></tr>
<tr><td>phabricator/phabricator</td><td>302</td></tr>
<tr><td>mediawiki/services/restbase</td><td>290</td></tr>
<tr><td>mediawiki/extensions/Flow</td><td>232</td></tr>
<tr><td>mediawiki/extensions/Echo</td><td>223</td></tr>
<tr><td>mediawiki/vagrant</td><td>221</td></tr>
<tr><td>mediawiki/extensions/Popups</td><td>184</td></tr>
<tr><td>mediawiki/extensions/Translate</td><td>182</td></tr>
<tr><td>mediawiki/extensions/DonationInterface</td><td>180</td></tr>
<tr><td>analytics/refinery</td><td>178</td></tr>
<tr><td>mediawiki/extensions/PageTriage</td><td>177</td></tr>
<tr><td>mediawiki/extensions/Cargo</td><td>176</td></tr>
<tr><td>mediawiki/tools/codesniffer</td><td>156</td></tr>
<tr><td>mediawiki/extensions/TimedMediaHandler</td><td>152</td></tr>
<tr><td>mediawiki/extensions/UniversalLanguageSelector</td><td>142</td></tr>
<tr><td>mediawiki/vendor</td><td>140</td></tr>
<tr><td>mediawiki/extensions/SocialProfile</td><td>139</td></tr>
<tr><td>analytics/refinery/source</td><td>138</td></tr>
<tr><td>operations/software</td><td>137</td></tr>
<tr><td>mediawiki/services/restbase/deploy</td><td>136</td></tr>
<tr><td>operations/debs/pybal</td><td>123</td></tr>
<tr><td>mediawiki/extensions/CentralAuth</td><td>116</td></tr>
<tr><td>mediawiki/tools/release</td><td>116</td></tr>
<tr><td>mediawiki/services/cxserver</td><td>112</td></tr>
<tr><td>mediawiki/extensions/BlueSpiceExtensions</td><td>110</td></tr>
<tr><td>mediawiki/extensions/WikimediaEvents</td><td>110</td></tr>
<tr><td>labs/private</td><td>108</td></tr>
<tr><td>operations/debs/python-kafka</td><td>104</td></tr>
<tr><td>labs/tools/heritage</td><td>96</td></tr>
<tr></tr>
</table></div>

<p>I&#039;ve got similar results with running <tt class="remarkup-monospaced">git rev-list</tt> for all repositories (<a href="https://github.com/zeljkofilipin/gerrit/blob/master/git-rev-list.sh" class="remarkup-link remarkup-link-ext" rel="noreferrer">script</a>, results: <a href="https://phabricator.wikimedia.org/P7834" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_725"><span class="phui-tag-core phui-tag-color-object">P7834</span></a>).</p>

<h2 class="remarkup-header">How Many Incidents Is a Repository Connected To?</h2>

<p>This proved to be the most time consuming task.</p>

<p>I have started by reviewing existing <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation" class="remarkup-link remarkup-link-ext" rel="noreferrer">incident documentation</a>. Take a look at a few incidents. Can you tell which incident report is connected to which repository? I couldn&#039;t. (If you can, please let me know. I need your help.)</p>

<p>Incident reports are a wall of text. It was really hard for me to connect an incident report to a repository. An incident report has a title and text, example: <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/20180724-Train" class="remarkup-link remarkup-link-ext" rel="noreferrer">20180724-Train</a>. Text has several sections, including Actionables. Text contains links to Gerrit patches and Phabricator tasks. (From now on, I&#039;ll use <em>patches</em> instead of <em>Gerrit patches</em> and <em>tasks</em> instead of <em>Phabricator tasks</em>.)</p>

<p>A patch belongs to a repository. Wikitext <tt class="remarkup-monospaced">[[gerrit:448103]]</tt> is patch <a href="https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/448103" class="remarkup-link remarkup-link-ext" rel="noreferrer">mediawiki/extensions/Wikibase/+/448103</a>, so repository is <tt class="remarkup-monospaced">mediawiki/extensions/Wikibase</tt>. That is the strongest link between an incident and a repository.</p>

<p>A task usually has patches associated with it. Wikitext <tt class="remarkup-monospaced">[[phab:T181315]]</tt> is patch <a href="https://phabricator.wikimedia.org/T181315" class="remarkup-link" rel="noreferrer">T181315</a>. Gerrit search <a href="https://gerrit.wikimedia.org/r/q/bug:T181315" class="remarkup-link remarkup-link-ext" rel="noreferrer">bug:T181315</a> finds many connected patches, many of them in <tt class="remarkup-monospaced">operations/puppet</tt> and one in <tt class="remarkup-monospaced">mediawiki/vagrant</tt>. That is an useful, but not a strong link between an incident and a repository. Some tasks have several related patches, so it provides a lot of data.</p>

<p>A task also usually has several tags. Most of them are not useful in this context, but tags that are components (and not for example milestones or tags) could be useful, if the component can be linked to a repository. It is also not a strong link between an incident and a repository, and it usually does not provide a lot of data.</p>

<p>At the end, I wrote a tool with imaginative name, <a href="https://github.com/zeljkofilipin/incident-documentation" class="remarkup-link remarkup-link-ext" rel="noreferrer">Incident Documentation</a>. The tool currently collects data from patches and tasks from Actionables section of the incident report. It does not collect data from task components.  It is tracked as issue <a href="https://github.com/zeljkofilipin/incident-documentation/issues/5" class="remarkup-link remarkup-link-ext" rel="noreferrer">#5</a>.</p>

<h2 class="remarkup-header">Incident Review 2017-11-01 to 2018-10-31</h2>

<p>After reviewing Actionables section for each incident report, related patches and tasks, here are the results. Please note this table only connects incident report and repositories. It does not show how many patches from a repository are connected to an incident report. It is tracked as issue <a href="https://github.com/zeljkofilipin/incident-documentation/issues/11" class="remarkup-link remarkup-link-ext" rel="noreferrer">#11</a>.</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Repository</th><th>Incidents</th></tr>
<tr><td>operations/puppet</td><td>22</td></tr>
<tr><td>mediawiki/core</td><td>6</td></tr>
<tr><td>operations/mediawiki-config</td><td>4</td></tr>
<tr><td>mediawiki/extensions/Wikibase</td><td>4</td></tr>
<tr><td>wikidata/query/rdf</td><td>2</td></tr>
<tr><td>operations/debs/pybal</td><td>2</td></tr>
<tr><td>mediawiki/extensions/ORES</td><td>2</td></tr>
<tr><td>integration/config</td><td>2</td></tr>
<tr><td>wikidata/query/blazegraph</td><td>1</td></tr>
<tr><td>operations/software</td><td>1</td></tr>
<tr><td>operations/dns</td><td>1</td></tr>
<tr><td>mediawiki/vagrant</td><td>1</td></tr>
<tr><td>mediawiki/tools/release</td><td>1</td></tr>
<tr><td>mediawiki/services/ores/deploy</td><td>1</td></tr>
<tr><td>mediawiki/services/eventstreams</td><td>1</td></tr>
<tr><td>mediawiki/extensions/WikibaseQualityConstraints</td><td>1</td></tr>
<tr><td>mediawiki/extensions/PropertySuggester</td><td>1</td></tr>
<tr><td>mediawiki/extensions/PageTriage</td><td>1</td></tr>
<tr><td>mediawiki/extensions/Cognate</td><td>1</td></tr>
<tr><td>mediawiki/extensions/Babel</td><td>1</td></tr>
<tr><td>maps/tilerator/deploy</td><td>1</td></tr>
<tr><td>maps/kartotherian/deploy</td><td>1</td></tr>
<tr><td>integration/jenkins</td><td>1</td></tr>
<tr><td>eventlogging</td><td>1</td></tr>
<tr><td>analytics/refinery/source</td><td>1</td></tr>
<tr><td>analytics/refinery</td><td>1</td></tr>
<tr><td>All-Projects</td><td>1</td></tr>
<tr></tr>
</table></div>



<h2 class="remarkup-header">Selecting Repositories</h2>

<p>This table is sorted by the amount of change. The only column that needs explanation is Selected. It shows if a test makes sense for the repository, taking into account all available data. Repositories without maintainers and with existing tests are excluded.</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Repository</th><th>Change</th><th>Stewards</th><th>Coverage</th><th>Incidents</th><th>Selected</th></tr>
<tr><td>mediawiki/extensions</td><td>11300</td><td></td><td></td><td></td><td></td></tr>
<tr><td>operations/puppet</td><td>7988</td><td>SRE</td><td></td><td>22</td><td></td></tr>
<tr><td>mediawiki/core</td><td>4590</td><td>Core Platform</td><td>JavaScript</td><td>6</td><td></td></tr>
<tr><td>operations/mediawiki-config</td><td>4005</td><td>Release Engineering</td><td></td><td>4</td><td></td></tr>
<tr><td>integration/config</td><td>1652</td><td>Release Engineering</td><td></td><td>2</td><td></td></tr>
<tr><td>operations/software/librenms</td><td>1169</td><td>SRE</td><td></td><td></td><td></td></tr>
<tr><td>pywikibot/core</td><td>927</td><td></td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/Wikibase</td><td>806</td><td>WMDE</td><td>JavaScript, Ruby</td><td>4</td><td></td></tr>
<tr><td>apps/android/wikipedia</td><td>789</td><td></td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/services/parsoid</td><td>700</td><td>Parsing</td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/VisualEditor</td><td>692</td><td>Editing</td><td></td><td></td><td>✅</td></tr>
<tr><td>operations/dns</td><td>653</td><td>SRE</td><td></td><td>1</td><td></td></tr>
<tr><td>VisualEditor/VisualEditor</td><td>599</td><td>Editing</td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/skins</td><td>570</td><td>Reading</td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/MobileFrontend</td><td>504</td><td>Reading</td><td>Ruby</td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/ContentTranslation</td><td>491</td><td>Language engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>translatewiki</td><td>486</td><td></td><td></td><td></td><td></td></tr>
<tr><td>oojs/ui</td><td>469</td><td></td><td></td><td></td><td></td></tr>
<tr><td>wikimedia/fundraising/crm</td><td>457</td><td>Fundraising tech</td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/BlueSpiceFoundation</td><td>414</td><td></td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/CirrusSearch</td><td>357</td><td>Search Platform</td><td>JavaScript</td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/AbuseFilter</td><td>306</td><td>Contributors</td><td></td><td></td><td>✅</td></tr>
<tr><td>phabricator/phabricator</td><td>302</td><td>Release Engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/services/restbase</td><td>290</td><td>Core Platform</td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/Flow</td><td>232</td><td>Growth</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/Echo</td><td>223</td><td>Growth</td><td>JavaScript</td><td></td><td></td></tr>
<tr><td>mediawiki/vagrant</td><td>221</td><td>Release Engineering</td><td></td><td>1</td><td></td></tr>
<tr><td>mediawiki/extensions/Popups</td><td>184</td><td>Reading</td><td>JavaScript</td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/Translate</td><td>182</td><td>Language engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/DonationInterface</td><td>180</td><td>Fundraising tech</td><td></td><td></td><td>✅</td></tr>
<tr><td>analytics/refinery</td><td>178</td><td>Analytics</td><td></td><td>1</td><td></td></tr>
<tr><td>mediawiki/extensions/PageTriage</td><td>177</td><td>Growth</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>mediawiki/extensions/Cargo</td><td>176</td><td></td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/tools/codesniffer</td><td>156</td><td></td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/TimedMediaHandler</td><td>152</td><td>Reading</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/UniversalLanguageSelector</td><td>142</td><td>Language engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/vendor</td><td>140</td><td></td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/SocialProfile</td><td>139</td><td></td><td></td><td></td><td></td></tr>
<tr><td>analytics/refinery/source</td><td>138</td><td>Analytics</td><td></td><td>1</td><td></td></tr>
<tr><td>operations/software</td><td>137</td><td>SRE</td><td></td><td>1</td><td></td></tr>
<tr><td>mediawiki/services/restbase/deploy</td><td>136</td><td>Core Platform</td><td></td><td></td><td></td></tr>
<tr><td>operations/debs/pybal</td><td>123</td><td>SRE</td><td></td><td>2</td><td></td></tr>
<tr><td>mediawiki/extensions/CentralAuth</td><td>116</td><td></td><td>Ruby</td><td></td><td></td></tr>
<tr><td>mediawiki/tools/release</td><td>116</td><td></td><td></td><td>1</td><td></td></tr>
<tr><td>mediawiki/services/cxserver</td><td>112</td><td></td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/BlueSpiceExtensions</td><td>110</td><td></td><td></td><td></td><td></td></tr>
<tr><td>mediawiki/extensions/WikimediaEvents</td><td>110</td><td></td><td>PHP</td><td></td><td></td></tr>
<tr><td>labs/private</td><td>108</td><td></td><td></td><td></td><td></td></tr>
<tr><td>operations/debs/python-kafka</td><td>104</td><td>SRE</td><td></td><td></td><td></td></tr>
<tr><td>labs/tools/heritage</td><td>96</td><td></td><td></td><td></td><td></td></tr>
<tr></tr>
</table></div>

<p>Since some of the repositories connected to incidents are not in the top 50 Bitergia report, I&#039;ve used <tt class="remarkup-monospaced">git rev-list</tt> to sort them. Numbers are different because Bitergia excludes empty commits and bots (<a href="https://github.com/zeljkofilipin/gerrit/blob/master/git-rev-list.sh" class="remarkup-link remarkup-link-ext" rel="noreferrer">script</a>, results: <a href="https://phabricator.wikimedia.org/P7834" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_726"><span class="phui-tag-core phui-tag-color-object">P7834</span></a>).</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Repository</th><th>Change</th><th>Stewards</th><th>Coverage</th><th>Incidents</th><th>Selected</th></tr>
<tr><td>mediawiki/extensions/WikibaseQualityConstraints</td><td>910</td><td>WMDE</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>mediawiki/extensions/ORES</td><td>364</td><td>Growth</td><td>JavaScript</td><td>2</td><td></td></tr>
<tr><td>wikidata/query/rdf</td><td>204</td><td>WMDE</td><td></td><td>2</td><td></td></tr>
<tr><td>mediawiki/extensions/Babel</td><td>146</td><td>Editing</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>mediawiki/services/ores/deploy</td><td>84</td><td>Growth</td><td></td><td>1</td><td></td></tr>
<tr><td>maps/kartotherian/deploy</td><td>80</td><td></td><td></td><td>1</td><td></td></tr>
<tr><td>mediawiki/extensions/PropertySuggester</td><td>67</td><td>WMDE</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>maps/tilerator/deploy</td><td>61</td><td></td><td></td><td>1</td><td></td></tr>
<tr><td>mediawiki/extensions/Cognate</td><td>47</td><td>WMDE</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>All-Projects</td><td>37</td><td></td><td></td><td>1</td><td></td></tr>
<tr><td>eventlogging</td><td>26</td><td></td><td></td><td>1</td><td></td></tr>
<tr><td>integration/jenkins</td><td>19</td><td>Release Engineering</td><td></td><td>1</td><td></td></tr>
<tr><td>mediawiki/services/eventstreams</td><td>16</td><td></td><td></td><td>1</td><td></td></tr>
<tr><td>wikidata/query/blazegraph</td><td>10</td><td>WMDE</td><td></td><td>1</td><td></td></tr>
<tr></tr>
</table></div>



<h2 class="remarkup-header">Prioritize Repositories</h2>

<p>Change column uses Bitergia numbers. Numbers in italic are from <tt class="remarkup-monospaced">git rev-list</tt>.</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Repository</th><th>Change</th><th>Stewards</th><th>Coverage</th><th>Incidents</th><th>Selected</th></tr>
<tr><td>mediawiki/extensions/VisualEditor</td><td>692</td><td>Editing</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/ContentTranslation</td><td>491</td><td>Language engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/AbuseFilter</td><td>306</td><td>Contributors</td><td></td><td></td><td>✅</td></tr>
<tr><td>phabricator/phabricator</td><td>302</td><td>Release Engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/Flow</td><td>232</td><td>Growth</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/Translate</td><td>182</td><td>Language engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/DonationInterface</td><td>180</td><td>Fundraising tech</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/PageTriage</td><td>177</td><td>Growth</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>mediawiki/extensions/TimedMediaHandler</td><td>152</td><td>Reading</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/UniversalLanguageSelector</td><td>142</td><td>Language engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/WikibaseQualityConstraints</td><td><em>910</em></td><td>WMDE</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>mediawiki/extensions/Babel</td><td><em>146</em></td><td>Editing</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>mediawiki/extensions/PropertySuggester</td><td><em>67</em></td><td>WMDE</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>mediawiki/extensions/Cognate</td><td><em>47</em></td><td>WMDE</td><td></td><td>1</td><td>✅</td></tr>
<tr></tr>
</table></div>

<p>The same table grouped by stewards.</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Repository</th><th>Change</th><th>Stewards</th><th>Coverage</th><th>Incidents</th><th>Selected</th></tr>
<tr><td>mediawiki/extensions/VisualEditor</td><td>692</td><td>Editing</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/Babel</td><td><em>146</em></td><td>Editing</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>mediawiki/extensions/ContentTranslation</td><td>491</td><td>Language engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/Translate</td><td>182</td><td>Language engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/UniversalLanguageSelector</td><td>142</td><td>Language engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/AbuseFilter</td><td>306</td><td>Contributors</td><td></td><td></td><td>✅</td></tr>
<tr><td>phabricator/phabricator</td><td>302</td><td>Release Engineering</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/Flow</td><td>232</td><td>Growth</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/PageTriage</td><td>177</td><td>Growth</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>mediawiki/extensions/DonationInterface</td><td>180</td><td>Fundraising tech</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/TimedMediaHandler</td><td>152</td><td>Reading</td><td></td><td></td><td>✅</td></tr>
<tr><td>mediawiki/extensions/WikibaseQualityConstraints</td><td><em>910</em></td><td>WMDE</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>mediawiki/extensions/PropertySuggester</td><td><em>67</em></td><td>WMDE</td><td></td><td>1</td><td>✅</td></tr>
<tr><td>mediawiki/extensions/Cognate</td><td><em>47</em></td><td>WMDE</td><td></td><td>1</td><td>✅</td></tr>
<tr></tr>
</table></div>



<h2 class="remarkup-header">Conclusions</h2>

<ul class="remarkup-list">
<li class="remarkup-list-item">There are some repositories that do not fit the Selenium/end-to-end testing model (eg: <tt class="remarkup-monospaced">operations/puppet</tt> or <tt class="remarkup-monospaced">operations/mediawiki-config</tt>) but could benefit from other testing mechanisms or deployment practices.</li>
<li class="remarkup-list-item">A test could prevent an outage if it runs:<ul class="remarkup-list">
<li class="remarkup-list-item">Every time a patch is uploaded to Gerrit. That way it could find a problem during development. That is already done for repositories that have tests.</li>
<li class="remarkup-list-item">After deployment. That way it could find a problem that was not found during development. In ideal case, deployment would be made to a test server in production, a test would run targeting the tests server. If it fails, further deployment would be cancelled. This is not yet done.</li>
</ul></li>
<li class="remarkup-list-item"><a href="https://watirmelon.blog/2018/10/19/testbash-sydney-automated-e2e-testing-at-wordpress-com/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Automattic runs tests targeting WordPress.com production</a>:</li>
</ul>

<blockquote><p>We decided to implement some basic e2e test scenarios which would only run in production – both after someone deploys a change and a few times a day to cover situations where someone makes some changes to a server or something.</p></blockquote>

<p>Next steps:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">I will contact owners of selected repositories (see Prioritize Repositories section) and offer help in creating the first test.</li>
<li class="remarkup-list-item">I will add results from Incident Documentation tool to incident reports as a new Related Repositories section. The section will link to the tool and explain how it got the data. It will also ask for edits if the data is not correct.</li>
<li class="remarkup-list-item">I will reach out to people that created (or edited) incident reports and ask them to populate Related Repositories section. This might have mixed results. For best results, the section will already be populated with the data from Incident Documentation tool.</li>
<li class="remarkup-list-item">I will add Related Repositories section to the <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/Report_Template" class="remarkup-link remarkup-link-ext" rel="noreferrer">incident report template</a>.</li>
</ul>

<p>Incident Documentation tool improvements:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">There are several way to link from a wiki page to a patch or task. The tool for now only supports <tt class="remarkup-monospaced">[[gerrit:]]</tt> and <tt class="remarkup-monospaced">[[phab:]]</tt>. Tracked as issue <a href="https://github.com/zeljkofilipin/incident-documentation/issues/6" class="remarkup-link remarkup-link-ext" rel="noreferrer">#6</a>.</li>
<li class="remarkup-list-item">Gerrit patches and Phabricator tasks from Actionables section do not provide enough data. The entire incident report should be used. I have limited it first because I was collecting data manually (and Actionables looked like the most important part of the incident report), later because of #6. Tracked as issue <a href="https://github.com/zeljkofilipin/incident-documentation/issues/4" class="remarkup-link remarkup-link-ext" rel="noreferrer">#4</a>.</li>
<li class="remarkup-list-item">Find Gerrit repository from task component. Tracked as issue <a href="https://github.com/zeljkofilipin/incident-documentation/issues/5" class="remarkup-link remarkup-link-ext" rel="noreferrer">#5</a>.</li>
<li class="remarkup-list-item">A table with the number of patches from each repository would be helpful. Tracked as issue <a href="https://github.com/zeljkofilipin/incident-documentation/issues/11" class="remarkup-link remarkup-link-ext" rel="noreferrer">#11</a>.</li>
<li class="remarkup-list-item">A report with folder/file names from a repository that are mentioned the most. Especially useful for big repositories like <tt class="remarkup-monospaced">operations/puppet</tt> and <tt class="remarkup-monospaced">mediawiki/core</tt>. Tracked as issue <a href="https://github.com/zeljkofilipin/incident-documentation/issues/12" class="remarkup-link remarkup-link-ext" rel="noreferrer">#12</a>.</li>
</ul></div></content></entry><entry><title>Bring in &#039;da noise, bring in defunct. It&#039;s a zombie party!</title><link href="/phame/live/1/post/127/bring_in_da_noise_bring_in_defunct._it_s_a_zombie_party/" /><id>https://phabricator.wikimedia.org/phame/post/view/127/</id><author><name>dduvall (Dan Duvall)</name></author><published>2018-11-16T19:22:51+00:00</published><updated>2023-02-07T22:01:01+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Halloween is a full two weeks behind us here in the United States, but it&#039;s still on my mind. It happens to be my favorite holiday, and I receive it both gleefully and somberly.</p>

<p>Some of the more obvious and delightful ways I appreciate Halloween include: busting out my giant spider to hang in the front yard; getting messy with gory and gaudy decorations; scaring neighborhood children; stuffing candy in my face. What&#039;s not to like about all that, really?</p>

<p>But there are more deeply felt reasons to appreciate Halloween, reasons that aren&#039;t often fully internalized or even discussed. Rooted in its pagan Celtic traditions and echoed by similar traditions worldwide, like Día de los Muertos of Mexico and Obon of Japan, Halloween asks us, for a night, to put away our timidness about living and dying. It asks us to turn toward the growing darkness of winter, turn toward the ones we&#039;ve lost, turn toward the decay of our own bodies, and honor these very real experiences as equal partners to the light, birth, and growth embodied by our everyday expectations. <em>More precisely it asks us to turn toward these often difficult aspects of life not with hesitation or fear but with strength, jubilation, a sense of humor.</em> It is this brave posture of Halloween&#039;s traditions that I appreciate so very much.</p>

<p>So Halloween is over and I&#039;m looking back. What does that have to do with anything here at WMF and in Phabricator no less? Well, I want to take you into another dark and ominous cauldron of our experience that most would rather just forget about.</p>

<p>I want to show you some Continuous Integration build metrics for the month of October!</p>

<p><em>Will we see darkness?</em> Oh yes. <em>Will we see decay?</em> Surely. <em>Was that an awkward transition to the real subject of this post?</em> Yep! Sorry, but I just had to have a thematic introduction, and brace yourself with a sigh because the theme will continue.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/ui2iwvm4ncaxkgefgmwk/PHID-FILE-mon2c6srjr45yv3b27d7/container-zombie.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_728"><img src="https://phab.wmfusercontent.org/file/data/ui2iwvm4ncaxkgefgmwk/PHID-FILE-mon2c6srjr45yv3b27d7/container-zombie.png" height="1411" width="1411" loading="lazy" alt="DOCKER WHALE – BRIIIIIINE!" /></a></div></p>

<p>You see this past October, Release Engineering battled a <strong>HORDE OF ZOMBIE CONTAINERS!</strong> And we&#039;ll be seeing in our metrics proof that this horde was, for longer than anyone wishes zombies to ever hang around, chowing down on the brains of our CI.</p>

<p>Before I get to the zombies, let&#039;s look briefly at a big picture view of last month&#039;s build durations... Let&#039;s also get just a bit more serious.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/grdkknfb4im7wcvpctdj/PHID-FILE-emdjilf7mk5kacvzfg5t/Daily_75th%2C_95th_and_98th_percentiles_for_successful_build_durations_%E2%80%93_October_2018_%281%29.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_729"><img src="https://phab.wmfusercontent.org/file/data/grdkknfb4im7wcvpctdj/PHID-FILE-emdjilf7mk5kacvzfg5t/Daily_75th%2C_95th_and_98th_percentiles_for_successful_build_durations_%E2%80%93_October_2018_%281%29.png" height="626" width="988" loading="lazy" alt="Daily 75th, 95th, and 98th percentiles for successful build durations – October 2018" /></a></div></p>

<p>What are we looking at? We&#039;re looking at statistics for build <em>durations</em>. The above chart plots the daily 75th, 95th, and 98th percentiles of <em>successful</em> build durations during the month of October as well as the number of job configuration changes made within the same range of time.</p>

<p>These data points were chosen for a few reasons.</p>

<p>First, percentiles are used over daily means to better represent what the vast majority of users experience when they&#039;re waiting on CI[1]. It excludes outliers, build durations that occur only about 2 percent of the time, not because they&#039;re unimportant to us, but because setting them aside temporarily allows us to find <em>patterns of most common use</em> and issues that might otherwise be obfuscated by the extra noise of extraordinarily long builds.</p>

<p>Next, three percentiles were chosen so that we might look for patterns among both faster builds and the longer running ones. Practically this means we can measure the effects of our changes on the chosen percentiles independently, and if we make changes to improve the build durations of jobs that typically perform closer to one percentile, we can measure the effect discretely while also making sure performance at other percentiles has not regressed.</p>

<p>Finally, job configuration changes are plotted alongside daily duration percentiles to help find indications of whether our changes to <tt class="remarkup-monospaced">integration/config</tt> during October had an impact on overall build performance. Of course, measuring the exact impact of these changes is quite a bit more difficult and requires the build data used to populate this chart to be classified and analyzed much further—as we&#039;ll see later—but having the extra information there is an important first step.</p>

<p>So what can we see in this chart? Well, let&#039;s start with that very conspicuous dip smack dab in the middle.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/watjovjqcbr63qnimoii/PHID-FILE-wtpiczskjvmeevr4clp6/Daily_75th%2C_95th_and_98th_percentiles_for_successful_build_durations_%E2%80%93_October_2018_%E2%80%93_dip.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_730"><img src="https://phab.wmfusercontent.org/file/data/watjovjqcbr63qnimoii/PHID-FILE-wtpiczskjvmeevr4clp6/Daily_75th%2C_95th_and_98th_percentiles_for_successful_build_durations_%E2%80%93_October_2018_%E2%80%93_dip.png" height="626" width="988" loading="lazy" alt="Daily 75th, 95th, and 98th percentiles for successful build durations – dip around 10/14" /></a></div></p>

<p>And for background, another short thematic interlude:</p>

<p>Back in June, <a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_740"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a> of Release Engineering was waiting on a particularly long build to complete—it was a &quot;dark and stormy night&quot; or something, *sighs and rolls eyes*—and during his investigation on the labs instance that was running the build, he noticed a curious thing: There was a Docker container just chugging away running a build that had started more than 6 hours prior, a build that had thought to be canceled and reaped by Jenkins, a build that should have been long dead but was sitting there very much undead and seemingly loving its long and private binge before the terminal specter of a meat-space man had so rudely interrupted.</p>

<p>&quot;It&#039;s a zombie container,&quot; <a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_741"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a> (probably) muttered as he felt his way backward on outstretched fingertips (ctrl-ccccc), logged out, and filed task <a href="https://phabricator.wikimedia.org/T198517" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_733"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T198517</span></span></a> to which <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_742"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a> soon replied and offered a rational but disturbing explanation.</p>

<p>I&#039;m not going to explain the <em>why</em> in its entirety but you can read more about it in the comments of an associated task, <a href="https://phabricator.wikimedia.org/T176747" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_734"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T176747</span></span></a>, and the links posted therein. I will, however, briefly explain what I mean by &quot;zombie container.&quot;</p>

<p>A zombie container for the sake of this post is not strictly a zombie process in the POSIX sense, but means that a build&#039;s main process is still running, even after Jenkins has told it to stop. It is both taking up some amount of valuable host resources (CPU, memory, or disk space), and is invisible to anyone looking only at the monitoring interfaces of Gerrit, Zuul, or Jenkins.</p>

<p>We didn&#039;t see much evidence of these zombie containers having enough impact on the overall system to demand dropping other priorities—and to be perfectly honest, I half assumed that Tyler&#039;s account had simply been due to madness after ingesting a bad batch of homebrew honey mead—but the data shows that they continued to lurk and that they may have even proliferated under the generally increasing load on CI. By early October, these zombie containers were wreaking absolute havoc—compounded by the way our CI system deals with chains of dependent builds and superseding patchsets—and it was clear that hunting them down should be a priority.</p>

<p>Task <a href="https://phabricator.wikimedia.org/T198517" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_735"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T198517</span></span></a> was claimed and conquered, and to the dismay of zombie containers across CI:</p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">In <a href="https://phabricator.wikimedia.org/T198517#4662026" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_736"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T198517#4662026</span></span></a>, <a href="https://phabricator.wikimedia.org/p/dduvall/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_743"><span class="phui-tag-core phui-tag-color-person">@dduvall</span></a> wrote:</div>
<div class="remarkup-reply-body"><p>Two <tt class="remarkup-monospaced">integration/config</tt> patches were deployed to fix the issue. The first refactored all Docker based jobs to invoke <tt class="remarkup-monospaced">docker run</tt> via a common builder. The second adds to the common <tt class="remarkup-monospaced">docker-run</tt> builder the <tt class="remarkup-monospaced">--init</tt> option which ensures a PID 1 within the container that will properly reap child processes and forward signals, and <tt class="remarkup-monospaced">--label</tt> options which tag the running containers with the job name and build number; it also implements an additional safety measure, a <tt class="remarkup-monospaced">docker-reap-containers</tt> post-build script that kills any running containers that could be errantly running at the end of the build (using the added labels to filter for only the build&#039;s containers).</p>

<p>Between the deployed fix and periodically running a manual process to kill off long-running containers that were started prior to the fix being deployed, I think we may be out of the woods for now.</p></div>
</blockquote>

<p>Looking again at that dip in the percentiles chart, a few things are clear.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/h7zypyjdmsnpragi4fzq/PHID-FILE-ezaamtxkrjwz3yvx26yc/Daily_75th%2C_95th_and_98th_percentiles_for_successful_build_durations_%E2%80%93_October_2018_%E2%80%93_dip_zombie.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_731"><img src="https://phab.wmfusercontent.org/file/data/h7zypyjdmsnpragi4fzq/PHID-FILE-ezaamtxkrjwz3yvx26yc/Daily_75th%2C_95th_and_98th_percentiles_for_successful_build_durations_%E2%80%93_October_2018_%E2%80%93_dip_zombie.png" height="626" width="988" loading="lazy" alt="Daily 75th, 95th, and 98th percentiles for successful build durations – dip around 10/14" /></a></div></p>

<p>There&#039;s a noticeable drop among <em>all three</em> daily duration percentiles. Second, there also seems to be a decrease in both the variance of each day&#039;s percentile average expressed by the plotted error bars—remember that our percentile precision demands we average multiple values for each percentile/day—and the day-to-day differences in plotted percentiles after the dip. And lastly, the dip strongly coincides with the job configuration changes that were made to resolve <a href="https://phabricator.wikimedia.org/T198517" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_737"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T198517</span></span></a>.</p>

<p><strong>WE. DID. IT.</strong> WE&#039;VE FREED CI FROM THOSE DREADED ZOMBIE CONTAINERS! THEY ARE TRULY (UN)^2-DEAD AGAIN SO LET&#039;S DITCH THESE BORING CHARTS AND CELEBRA...</p>

<p><em>Say what?</em> Oh. Right. I guess we didn&#039;t adequately measure exactly how much of an improvement in duration there was pre-and-post <a href="https://phabricator.wikimedia.org/T198517" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_738"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T198517</span></span></a> and whether or not there was unnoticed/unanticipated regression. Let&#039;s pause on that celebration and look a little deeper.</p>

<p>So how does one get a bigger picture of overall CI build durations before and after a change? Or of categories within <em>any</em> real and highly heterogeneous performance data for that matter? I did not have a good answer to this question, so I went searching and I found <a href="https://blog.apnic.net/2017/11/24/dns-performance-metrics-logarithmic-percentile-histogram/" class="remarkup-link remarkup-link-ext" rel="noreferrer">a lovely blog post on analyzing DNS performance across various geo-distributed servers</a>[2]. It&#039;s a great read really, and talks about a specific statistical tool that seemed like it might be useful in our case: The logarithmic percentile histogram.</p>

<p><em>&quot;I like the way you talk...&quot;</em> Yes, it&#039;s a fancy name, but it&#039;s pretty simple when broken down... backwards, because, well, English.</p>

<p>A <em>histogram</em> shows the distribution of one quantitative variable in a dataset, in our case build duration, across various &#039;buckets&#039;. A <em>percentile histogram</em> buckets values for the variable of the histogram by its percentiles, and a <em>logarithmic percentile histogram</em> plots the distribution of values across percentile buckets on a logarithmic scale.</p>

<p>I think it&#039;s a bit easier to show than to describe, so here&#039;s our plot of build duration percentiles before and after <a href="https://phabricator.wikimedia.org/T198517" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_739"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T198517</span></span></a> was resolved, represented as a histogram on a logarithmic scale.</p>

<p><div class="phabricator-remarkup-embed-layout-center"><a href="https://phab.wmfusercontent.org/file/data/ekkbzc35klxd3vmnyae7/PHID-FILE-zisw3td26i4dqyq2rviu/High-to-low_percentiles_before_and_after_the_zombie_container_issue_was_resolved_%281%29.png" class="phabricator-remarkup-embed-image-full" data-sigil="lightboxable" data-meta="0_732"><img src="https://phab.wmfusercontent.org/file/data/ekkbzc35klxd3vmnyae7/PHID-FILE-zisw3td26i4dqyq2rviu/High-to-low_percentiles_before_and_after_the_zombie_container_issue_was_resolved_%281%29.png" height="670" width="880" loading="lazy" alt="High-to-low percentiles before and after the zombie container issue was resolved" /></a></div></p>

<p>First, note that while we ranked build durations low to high in our other chart, this one presents a high-to-low ranking, meaning that longer durations (slower builds) are ranked within lower percentiles and shorter durations (faster builds) are ranked in higher percentiles. This better fits the logarithmic scale, and more importantly it brings the lowest percentiles (the slowest durations) into focus, <em>letting us see where the biggest gains were made by resolving the zombie container issue</em>.</p>

<p>Also valuable about this representation is the fact that it shows <em>all percentiles</em>, not just the three that we saw earlier in the chart of daily calculations, which shows us that gains were made consistently across the board and there are no notable regressions among the percentile ranks where it would matter—there <em>is</em> a small section of the plot that shows percentiles of post-T198517 durations being slighter higher (slower), but this is among some of the percentiles for the very fastest of builds where the absolute values of differences are very small and perhaps not even statistically significant.</p>

<p>Looking at the percentage gains annotated parenthetically in the plot, we can see major gains at the 0.2, 1, 2, 10, 25, and 50th percentiles. Here they are as a table.</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td><strong>percentile</strong></td><td><strong>duration w/ zombies</strong></td><td><strong>w/o zombies</strong></td><td><strong>gain from killing zombies</strong></td></tr>
<tr><td>p0.2</td><td>43.3 minutes</td><td>39.3 minutes</td><td>-9.2%</td></tr>
<tr><td>p1</td><td>34.0</td><td>26.5</td><td>-22.2%</td></tr>
<tr><td>p2</td><td>27.7</td><td>22.2</td><td>-19.7%</td></tr>
<tr><td>p10</td><td>17.6</td><td>12.7</td><td>-27.9%</td></tr>
<tr><td>p25</td><td>11.0</td><td>7.2</td><td>-34.4%</td></tr>
<tr><td>p50</td><td>5.3</td><td>3.4</td><td>-36.9%</td></tr>
<tr></tr>
</table></div>

<p>So there it is quite plain, a CI world with and without zombie containers, and builds running upwards of 37% faster without those zombies chomping away at our brains! It&#039;s demonstrably a better world without them I&#039;d say, but you be the judge; We all have different tastes. 8D</p>

<p>Now celebrate or don&#039;t celebrate accordingly!</p>

<p>Oh and please have at <a href="https://docs.google.com/spreadsheets/d/1-HLTy8Z4OqatLnufFEszbqkS141MBXJNEPZQScDD1hQ/edit#gid=1462593305" class="remarkup-link remarkup-link-ext" rel="noreferrer">the data</a>[3] yourself if you&#039;re interested in it. Better yet, find all the ways I screwed up and let me know! It was all done in a giant Google Sheet—that might crash your browser—because, well, I don&#039;t know R! (Side note: someone please teach me how to use R.)</p>

<h3 class="remarkup-header">References</h3>

<p>[1] <a href="https://www.dynatrace.com/news/blog/why-averages-suck-and-percentiles-are-great/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.dynatrace.com/news/blog/why-averages-suck-and-percentiles-are-great/</a><br />
[2] <a href="https://blog.apnic.net/2017/11/24/dns-performance-metrics-logarithmic-percentile-histogram/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://blog.apnic.net/2017/11/24/dns-performance-metrics-logarithmic-percentile-histogram/</a><br />
[3] <a href="https://docs.google.com/spreadsheets/d/1-HLTy8Z4OqatLnufFEszbqkS141MBXJNEPZQScDD1hQ/edit#gid=1462593305" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://docs.google.com/spreadsheets/d/1-HLTy8Z4OqatLnufFEszbqkS141MBXJNEPZQScDD1hQ/edit#gid=1462593305</a></p>

<h3 class="remarkup-header">Credits</h3>

<p><em>Thanks to <a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_744"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a> and <a href="https://phabricator.wikimedia.org/p/greg/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_745"><span class="phui-tag-core phui-tag-color-person">@greg</span></a> for their review of this post!</em></p>

<p>//&quot;DOCKER ZOMBIE&quot; is a derivative of <a href="https://linux.pictures/projects/dark-docker-picture-in-playing-cards-style" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://linux.pictures/projects/dark-docker-picture-in-playing-cards-style</a> and shared under the same idgaf license as original <a href="https://linux.pictures/about" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://linux.pictures/about</a>. It was inspired by but not expressly derived from a different work by drewdomkus <a href="https://flickr.com/photos/drewdomkus/3146756158//" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://flickr.com/photos/drewdomkus/3146756158//</a></p></div></content></entry><entry><title>Wikimedia Release Engineering&#039;s 1st Annual Developer Satisfaction Survey</title><link href="/phame/live/1/post/126/wikimedia_release_engineering_s_1st_annual_developer_satisfaction_survey/" /><id>https://phabricator.wikimedia.org/phame/post/view/126/</id><author><name>zeljkofilipin (Željko Filipin)</name></author><published>2018-11-07T16:02:28+00:00</published><updated>2018-12-15T20:02:47+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><div class="remarkup-note"><span class="remarkup-note-word">NOTE:</span> The survey is now closed</div>

<p>This survey will help the Release Engineering team measure developer satisfaction and determine where to invest resources. The topics covered will include the following:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Local Development Environment</li>
<li class="remarkup-list-item">Beta Cluster / Staging Environment</li>
<li class="remarkup-list-item">Testing / CI</li>
<li class="remarkup-list-item">Code Review</li>
<li class="remarkup-list-item">Deployments</li>
<li class="remarkup-list-item">Production Systems</li>
<li class="remarkup-list-item">Development and Productivity Tools</li>
<li class="remarkup-list-item">Developer Documentation</li>
<li class="remarkup-list-item">General Feedback</li>
</ul>

<p>We are soliciting feedback from all Wikimedia developers, including Staff, 3rd party contributors and volunteer developers. The survey will be open for 2 weeks, closing on November 14th.</p>

<p>This survey will be conducted via a third-party service, which may subject it to additional terms. For more information on privacy and data-handling, see the <a href="https://foundation.wikimedia.org/wiki/Developer_Satisfaction_Survey_Privacy_Statement" class="remarkup-link remarkup-link-ext" rel="noreferrer">survey privacy statement</a>.</p>

<p>To participate in this survey, please start here: <a href="https://docs.google.com/forms/d/e/1FAIpQLSfXGpjUIO3ARqxPHOYPwI2Dw-jEg1xMeLi_HpZ_HcU-_i_Arw/viewform" class="remarkup-link remarkup-link-ext" rel="noreferrer">Developer Satisfaction Survey</a>.</p>

<p><em>Mukunda Modell</em></p></div></content></entry><entry><title>Production Excellence #4: October 2018</title><link href="/phame/live/1/post/125/production_excellence_4_october_2018/" /><id>https://phabricator.wikimedia.org/phame/post/view/125/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2018-11-28T17:47:20+00:00</published><updated>2020-03-24T22:06:23+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Month in numbers.</li>
<li class="remarkup-list-item">Highlighted stories.</li>
<li class="remarkup-list-item">Current problems.</li>
</ul>

<h4 class="remarkup-header">📊 Month in numbers</h4>

<ul class="remarkup-list">
<li class="remarkup-list-item">7 documented incident since from 24 September to 31 October. [1]</li>
<li class="remarkup-list-item">79 Wikimedia-prod-error tasks closed from 24 September to 31 October. [2]</li>
<li class="remarkup-list-item">69 Wikimedia-prod-error tasks created from 24 September to 31 October. [3]</li>
<li class="remarkup-list-item">175 currently open Wikimedia-prod-error tasks (as of 25 November 2018).</li>
</ul>

<p>October had a relatively high number of incidents – compared to prior months and compared to the same month last year (<a href="https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documentation%2F20180924&amp;to=Incident+documentation%2F20181101&amp;namespace=0" class="remarkup-link remarkup-link-ext" rel="noreferrer">details</a>).</p>

<p>Terminology:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">An <strong>Exception</strong> (or <strong>fatal</strong>) causes user actions to be prevented. For example, a page would display  &quot;Exception: Unable to render page&quot;, instead the article content.</li>
<li class="remarkup-list-item">A <strong>Warning</strong> (or <strong>non-fatal</strong>, or <strong>error</strong>) can produce page views that are technically unaware of a problem, but may show corrupt, incorrect, or incomplete information.  Examples – an article would display the code word “null” instead of the actual content, a user looking for Vegetables may be taken to an article about Vegetarians, a user may receive a notification that says “You have (null) new messages.”</li>
</ul>

<p>I’ve highlighted a few of last month’s resolved tasks below.</p>

<h4 class="remarkup-header">📖 Send your thanks for talk contributions</h4>

<p>Fixed by volunteer <a href="https://phabricator.wikimedia.org/p/Mh-3110/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_750"><span class="phui-tag-core phui-tag-color-person">@Mh-3110</span></a> (Mahuton).</p>

<p>The Thanks functionality for MediaWiki (created in 2013) wasn’t working in some cases. This problem was first reported in April, with four more reports since then. Mahuton investigated together with <a href="https://phabricator.wikimedia.org/p/SBisson/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_751"><span class="phui-tag-core phui-tag-color-person">@SBisson</span></a>. They found that the issue was specific to talk pages with structured discussions.</p>

<p>It turned out to be caused by an outdated array access key in SpecialThanks.php. Once adjusted, the functionality was restored to its former glory. The error existed for about eight months, since internal refactoring in March for <a href="https://phabricator.wikimedia.org/T186920" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_746"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T186920</span></span></a> changed the internal array.</p>

<p>This was Mahuton’s first Gerrit contribution. Thank you <a href="https://phabricator.wikimedia.org/p/Mh-3110/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_752"><span class="phui-tag-core phui-tag-color-person">@Mh-3110</span></a>, and welcome!</p>

<p>–  <a href="https://phabricator.wikimedia.org/T191442" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_747"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T191442</span></span></a> / <a href="https://gerrit.wikimedia.org/r/461189" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit.wikimedia.org/r/461189</a></p>

<h4 class="remarkup-header">📖 One space led to Fatal exception</h4>

<p>Fixed by volunteer <span class="phabricator-remarkup-mention-unknown">@D3r1ck01</span> (Derick Alangi).</p>

<p>Administrators use the Special:DeletedContributions page to search for edits that are hidden from public view. When an admin typed a space at the end of their search, the MediaWiki application would throw a fatal exception. The user would see a generic error page, suggesting that the website may be unavailable.</p>

<p>Derick went in and updated the input handler to automatically correct these inputs for the user.</p>

<p>– <a href="https://phabricator.wikimedia.org/T187619" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_748"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T187619</span></span></a></p>

<h4 class="remarkup-header">📖 Fatal exception from translation draft access</h4>

<p>Accessing the private link for ContentTranslation when logged-out isn’t meant to work. But, the code didn’t account for this fact. When users attempted to open such url when not logged in, the ContentTranslation code performed an invalid operation. This caused a fatal error from the MediaWiki application. The user would see a system error page without further details.</p>

<p>This could happen when opening the link from your bookmarks before logging in, or after restarting the browser, or after clearing one’s cookies.</p>

<p>Fixed by <a href="https://phabricator.wikimedia.org/p/santhosh/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_753"><span class="phui-tag-core phui-tag-color-person">@santhosh</span></a> (Santhosh Thottingal, WMF Language Engineering team).</p>

<p>– <a href="https://phabricator.wikimedia.org/T205433" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_749"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T205433</span></span></a></p>

<h4 class="remarkup-header">🎉 Thanks!</h4>

<p>Thank you to everyone who helped by reporting or investigating problems in Wikimedia production; and for devising, coding or reviewing the corrective measures. Including: <a href="https://phabricator.wikimedia.org/p/Addshore/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_754"><span class="phui-tag-core phui-tag-color-person">@Addshore</span></a>, <a href="https://phabricator.wikimedia.org/p/Aklapper/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_755"><span class="phui-tag-core phui-tag-color-person">@Aklapper</span></a>, <a href="https://phabricator.wikimedia.org/p/Anomie/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_756"><span class="phui-tag-core phui-tag-color-person">@Anomie</span></a>, <a href="https://phabricator.wikimedia.org/p/ArielGlenn/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_757"><span class="phui-tag-core phui-tag-color-person">@ArielGlenn</span></a>, <a href="https://phabricator.wikimedia.org/p/Catrope/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_758"><span class="phui-tag-core phui-tag-color-person">@Catrope</span></a>, <span class="phabricator-remarkup-mention-unknown">@D3r1ck01</span>, <a href="https://phabricator.wikimedia.org/p/Daimona/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_759"><span class="phui-tag-core phui-tag-color-person">@Daimona</span></a>, <a href="https://phabricator.wikimedia.org/p/Fomafix/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_760"><span class="phui-tag-core phui-tag-color-person">@Fomafix</span></a>, <a href="https://phabricator.wikimedia.org/p/Ladsgroup/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_761"><span class="phui-tag-core phui-tag-color-person">@Ladsgroup</span></a>, <a href="https://phabricator.wikimedia.org/p/Legoktm/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_762"><span class="phui-tag-core phui-tag-color-person">@Legoktm</span></a>, <a href="https://phabricator.wikimedia.org/p/MSantos/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_763"><span class="phui-tag-core phui-tag-color-person">@MSantos</span></a>, <a href="https://phabricator.wikimedia.org/p/Mainframe98/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_764"><span class="phui-tag-core phui-tag-color-person">@Mainframe98</span></a>, <a href="https://phabricator.wikimedia.org/p/Melos/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_765"><span class="phui-tag-core phui-tag-color-person">@Melos</span></a>, <a href="https://phabricator.wikimedia.org/p/Mh-3110/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_766"><span class="phui-tag-core phui-tag-color-person">@Mh-3110</span></a>, <a href="https://phabricator.wikimedia.org/p/SBisson/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_767"><span class="phui-tag-core phui-tag-color-person">@SBisson</span></a>, <a href="https://phabricator.wikimedia.org/p/Tgr/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_768"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-orange"></span>@Tgr</span></a>, <a href="https://phabricator.wikimedia.org/p/Umherirrender/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_769"><span class="phui-tag-core phui-tag-color-person">@Umherirrender</span></a>, <a href="https://phabricator.wikimedia.org/p/Vort/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_770"><span class="phui-tag-core phui-tag-color-person">@Vort</span></a>, <a href="https://phabricator.wikimedia.org/p/aaron/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_771"><span class="phui-tag-core phui-tag-color-person">@aaron</span></a>, <a href="https://phabricator.wikimedia.org/p/aezell/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_772"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@aezell</span></a>, <a href="https://phabricator.wikimedia.org/p/cscott/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_773"><span class="phui-tag-core phui-tag-color-person">@cscott</span></a>, <a href="https://phabricator.wikimedia.org/p/dcausse/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_774"><span class="phui-tag-core phui-tag-color-person">@dcausse</span></a>, <a href="https://phabricator.wikimedia.org/p/jcrespo/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_775"><span class="phui-tag-core phui-tag-color-person">@jcrespo</span></a>,  <a href="https://phabricator.wikimedia.org/p/kostajh/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_776"><span class="phui-tag-core phui-tag-color-person">@kostajh</span></a>, <a href="https://phabricator.wikimedia.org/p/matmarex/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_777"><span class="phui-tag-core phui-tag-color-person">@matmarex</span></a>, <a href="https://phabricator.wikimedia.org/p/mmodell/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_778"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@mmodell</span></a>, <a href="https://phabricator.wikimedia.org/p/mobrovac/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_779"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@mobrovac</span></a>, <a href="https://phabricator.wikimedia.org/p/santhosh/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_780"><span class="phui-tag-core phui-tag-color-person">@santhosh</span></a>, <a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_781"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a>, and <a href="https://phabricator.wikimedia.org/p/thiemowmde/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_782"><span class="phui-tag-core phui-tag-color-person">@thiemowmde</span></a>.</p>

<h4 class="remarkup-header">📉 Current problems</h4>

<p>Take a look at the workboard and look for tasks that might need your help. The workboard lists known issues, grouped by the week in which they were first observed.</p>

<p><a href="https://phabricator.wikimedia.org/tag/wikimedia-production-error/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/tag/wikimedia-production-error</a></p>

<blockquote class="remarkup-reply-block">
<div class="remarkup-reply-head">💡 <strong>ProTip:</strong></div>
<div class="remarkup-reply-body"><p>Cross-reference one workboard with another via <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade phui-tag-icon-view "><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-search" data-meta="0_19" aria-hidden="true"></span>Open Tasks</span></span><span class="remarkup-nav-sequence-arrow"> → </span><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Advanced Filter</span></span></span> and enter Tag(s) to apply as a filter.</p></div>
</blockquote>

<p>Thanks!</p>

<p>Until next time,<br />
– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Footnotes:</p>

<p>[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documentation%2F20180924&amp;to=Incident+documentation%2F20181101&amp;namespace=0" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech.wikimedia.org/wiki/Special:AllPages...</a><br />
[2] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/2FueDFF3G9zU/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query...</a><br />
[3] Tasks opened. – <a href="https://phabricator.wikimedia.org/maniphest/query/Ifhw.G3VvBMJ/#R" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org/maniphest/query...</a></p></div></content></entry><entry><title>Production Excellence #3: September 2018</title><link href="/phame/live/1/post/119/production_excellence_3_september_2018/" /><id>https://phabricator.wikimedia.org/phame/post/view/119/</id><author><name>Krinkle (Timo Tijhof)</name></author><published>2018-09-25T18:41:42+00:00</published><updated>2020-03-24T22:06:14+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>How’d we do in our strive for operational excellence last month? Read on to find out!</p>

<h4 class="remarkup-header">Month in numbers</h4>

<ul class="remarkup-list">
<li class="remarkup-list-item">1 documented incident since August 9. [1]</li>
<li class="remarkup-list-item">113 Wikimedia-prod-error tasks closed since August 9. [2]</li>
<li class="remarkup-list-item">99 Wikimedia-prod-error tasks created since August 9. [3]</li>
</ul>

<h4 class="remarkup-header">Current problems</h4>

<p>Frequent:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">[MediaWiki-Logging] Exception from Special:Log (public GET). – <a href="https://phabricator.wikimedia.org/T201411" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_783"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T201411</span></span></a></li>
<li class="remarkup-list-item">[Graph] Warning &quot;data error&quot; from ApiGraph in gzdecode. – <a href="https://phabricator.wikimedia.org/T184128" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_784"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T184128</span></span></a></li>
<li class="remarkup-list-item">[RemexHtml] Exception &quot;backtrack_limit exhausted&quot; from search index jobs. – <a href="https://phabricator.wikimedia.org/T201184" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_785"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T201184</span></span></a></li>
</ul>

<p>Other:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">[MediaWiki-Redirects] Exception from NS_MEDIA redirect (public GET). – <a href="https://phabricator.wikimedia.org/T203942" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_786"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T203942</span></span></a></li>
</ul>

<p>This is an oldie: (<em>Well..., it&#039;s an oldie where I come from</em>... 🎸)</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">[FlaggedRevs] Exception from Special:ProblemChanges (since 2011). – <a href="https://phabricator.wikimedia.org/T176232" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_787"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T176232</span></span></a></li>
</ul>

<p>Terminology:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">An <strong>Exception</strong> (or <strong>fatal</strong> error) causes user actions to be aborted. For example, a page would display  &quot;Exception: Unable to render page&quot;, instead the article content.</li>
<li class="remarkup-list-item">A <strong>Warning</strong> (or <strong>non-fatal</strong> error) can produce page views that are technically unaware of a problem, but may show corrupt or incomplete information.  For example, an article would display the word &quot;null&quot; instead of the actual content. Or, a user may be told &quot;<em>You have <tt class="remarkup-monospaced">null</tt> new messages.</em>&quot;</li>
</ul>

<p>The combined volume of infrequent non-fatal errors is high. This limits our ability to automatically detect whether a deployment caused problems. The “public GET” risks in particular can (and have) caused alerts to fire that notify Operations of wikis potentially being down. Such exceptions must not be publicly exposed.</p>

<p>With that behind us... Let’s celebrate this month’s highlights!</p>

<h4 class="remarkup-header">📖 Quiz defect – &quot;0&quot; is not nothing!</h4>

<p>Tyler Cipriani (Release Engineering) reported an error in Quiz. Wikiversity uses Quiz for interactive learning. Editors define quizzes in the source text (wikitext). The Quiz program processes this text, creates checkboxes with labels, and sends it to a user. When the sending part failed, &quot;Error: Undefined index&quot; appeared in the logs. @<strong>Umherirrender</strong> investigated.</p>

<p>A line in the source text can: define a question, or an answer, or nothing at all. The code that creates checkboxes needs to decide between &quot;something&quot; and &quot;nothing&quot;. The code utilised the PHP &quot;if&quot; statement for this, which compares a value to True and False. The answers to a quiz can be any text, which means PHP first transforms the text to one of True or False. In doing so, values like &quot;0&quot; became False. This meant the code thought &quot;0&quot; was not an answer. The code responsible for sending checkboxes did not have this problem. When the code tried to access the checkbox to send, it did not exist. Hence, &quot;Error: Undefined index&quot;.</p>

<p>Umherirrender fixed the problem by using a strict comparison. A strict comparison doesn&#039;t transform a value first, it only compares.</p>

<p>– <a href="https://phabricator.wikimedia.org/T196684" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_788"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T196684</span></span></a></p>

<h4 class="remarkup-header">📖 PageTriage enters JobQueue for better performance</h4>

<p><strong>Kosta Harlan</strong> (from Audiences&#039;s Growth team) investigated a warning for PageTriage. This extension provides the New Pages Feed tool on the English Wikipedia. Each page in the feed has metadata, usually calculated when an editor creates a page. Sometimes, this is not available. Then, it must be calculated on-demand, when a user triages pages. So far, so good. The information was then saved to the database for re-use by other triagers. This last part caused the serious performance warning: &quot;Unexpected database writes&quot;.</p>

<p>Database changes must not happen on page views. The database has many replicas for reading, but only one &quot;master&quot; for all writing. We avoid using the master during page views to make our systems independent. This is a key design principle for MediaWiki performance. [5] It lets a secondary data centre build pages without connecting to the primary (which can be far away).</p>

<p>Kosta addressed the warning by improving the code that saves the calculated information. Instead of saving it immediately, an instruction is now sent via a job queue, after the page view is ready. This job queue then calculates and saves the information to the master database. The master synchronises it to replicas, and then page views can use it.</p>

<p>– <a href="https://phabricator.wikimedia.org/T199699" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_789"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T199699</span></span></a> / <a href="https://gerrit.wikimedia.org/r/455870" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit.wikimedia.org/r/455870</a></p>

<h4 class="remarkup-header">📖 Tomorrow, may be sooner than you think</h4>

<p>After developers submit code to Gerrit, they eagerly await the result from Jenkins, an automated test runner. It sometimes incorrectly reported a problem with the MergeHistory feature. The code assumed that the tests would finish by &quot;tomorrow&quot;.</p>

<p>It might be safe to assume our tests will not take one day to finish. Unfortunately, the programming utility &quot;strtotime&quot;, does not interpret &quot;tomorrow&quot; as &quot;this time tomorrow&quot;. Instead, it means &quot;the start of tomorrow&quot;. In other words, the next strike of midnight! The tests use UTC as the neutral timezone.</p>

<p>Every day in the 15 minutes before 5 PM in San Francisco (which is midnight UTC), code submitted to Code Review, could have mysteriously failing tests.</p>

<p>– Continue at <a href="https://gerrit.wikimedia.org/r/452873" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit.wikimedia.org/r/452873</a></p>

<h4 class="remarkup-header">📖 Continuous Whac-A-Mole</h4>

<p>In August, developers started to notice rare and mysterious failures from Jenkins. No obvious cause or solution was known at that time.</p>

<p>Later that month, <strong>Dan Duvall</strong> (Release Engineering team) started exploring ways to run our tests faster. Before, we had many small virtual servers, where each server runs only one test at a time. The idea: Have a smaller group of much larger virtual servers where each server could run many tests at the same time. We hope that during busier times this will better share the resources between tests. And, during less busy times, allow a single test to use more resources.</p>

<p>As implementation of this idea began, the mysterious test failures became commonplace. &quot;No space left on device&quot;, was a common error. The test servers had their hard disk full. This was surprising. The new (larger) servers seemed to have enough space to accommodate the number of tests it ran at the same time. Together with Antoine Musso and Tyler Cipriani, they identified and resolved two problems:</p>

<ol class="remarkup-list">
<li class="remarkup-list-item">Some automated tests did not clean up after themselves.</li>
<li class="remarkup-list-item">The test-templates were stored on the &quot;root disk&quot; (the hard drive for the operating system), instead of the hard drive with space reserved for tests. This root disk is quite small, and is the same size on small servers and large servers.</li>
</ol>

<p>– <a href="https://phabricator.wikimedia.org/T202160" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_790"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T202160</span></span></a> /  <a href="https://phabricator.wikimedia.org/T202457" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_791"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T202457</span></span></a></p>

<h4 class="remarkup-header">🎉 Thanks!</h4>

<p>Thank you to everyone who has helped report, investigate, or resolve production errors past month. Including:</p>

<p>Tpt<br />
Ankry<br />
Daimona<br />
Legoktm<br />
Volker_E<br />
Pchelolo<br />
Dan Duvall<br />
Gilles Dubuc<br />
Daniel Kinzler<br />
Umherirrender<br />
Greg Grossmeier<br />
Gergő Tisza (Tgr)<br />
Sam Reed (Reedy)<br />
Giuseppe Lavagetto<br />
Brad Jorsch (Anomie)<br />
Tim Starling (tstarling)<br />
Kosta Harlan (kostajh)<br />
Jaime Crespo (jcrespo)<br />
Antoine Musso (hashar)<br />
Roan Kattouw (Catrope)<br />
Adam WMDE (Addshore)<br />
Stephane Bisson (SBisson)<br />
Niklas Laxström (Nikerabbit)<br />
Thiemo Kreuz (thiemowmde)<br />
Subramanya Sastry (ssastry)<br />
This, that and the other (TTO)<br />
Manuel Aróstegui (Marostegui)<br />
Bartosz Dziewoński (matmarex)<br />
James D. Forrester (Jdforrester-WMF)</p>

<p>Thanks!</p>

<p>Until next time,</p>

<p>– Timo Tijhof</p>

<hr class="remarkup-hr" />

<p>Further reading:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Production Excellence #2 (August 2018 edition). – <a href="https://lists.wikimedia.org/pipermail/wikitech-l/2018-August/090594.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://lists.wikimedia.org/pipermail/wikitech-l/2018-August/090594.html</a></li>
<li class="remarkup-list-item">Production Excellence #1 (July 2018 edition). – <a href="https://lists.wikimedia.org/pipermail/wikitech-l/2018-July/090363.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://lists.wikimedia.org/pipermail/wikitech-l/2018-July/090363.html</a></li>
</ul>

<p>Footnotes:</p>

<p>[1] Incidents. – <a href="https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documentation%2F20180809&amp;to=Incident+documentation%2F20180922&amp;namespace=0" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documentation%2F20180809&amp;to=Incident+documentation%2F20180922&amp;namespace=0</a> <br />
[2] Tasks closed. – <a href="https://phabricator.wikimedia.org/maniphest/query/wOuWkMNsZheu/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/wOuWkMNsZheu/#R</a> <br />
[3] Tasks opened. – <a href="https://phabricator.wikimedia.org/maniphest/query/6HpdI76rfuDg/#R" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/maniphest/query/6HpdI76rfuDg/#R</a><br />
[4] Quiz on Wikiversity. – <a href="https://en.wikiversity.org/wiki/How_things_work_college_course/Conceptual_physics_wikiquizzes/Velocity_and_acceleration" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://en.wikiversity.org/wiki/How_things_work_college_course/Conceptual_physics_wikiquizzes/Velocity_and_acceleration</a> <br />
[5] Operate multiple datacenters. – <a href="https://www.mediawiki.org/wiki/Requests_for_comment/Master-slave_datacenter_strategy_for_MediaWiki" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.mediawiki.org/wiki/Requests_for_comment/Master-slave_datacenter_strategy_for_MediaWiki</a></p></div></content></entry><entry><title>Quibble in summer</title><link href="/phame/live/1/post/118/quibble_in_summer/" /><id>https://phabricator.wikimedia.org/phame/post/view/118/</id><author><name>hashar (Antoine Musso)</name></author><published>2019-03-28T10:42:04+00:00</published><updated>2019-03-28T10:47:34+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p><em>Note: this post has been published on 03/28 but has been originally written in September 2018 after Quibble 0.0.26 and never got published.</em></p>

<hr class="remarkup-hr" />

<p>The last update about Quibble is from June 1st (<a href="/J107" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_792"><span class="phui-tag-core phui-tag-color-object">Blog Post: Quibble in May</span></a>), this is about updating on progress made over the summer.</p>

<p>Since the <a href="https://phabricator.wikimedia.org/J107" class="remarkup-link" rel="noreferrer">last update</a>, Quibble version went from 0.0.17 to 0.0.26:</p>

<p>For <tt class="remarkup-monospaced">--commands</tt> one pass them as shell snippets such as: <tt class="remarkup-monospaced">--commands &#039;echo starting&#039; &#039;phpunit&#039; &#039;echo done&#039;</tt>. A future version of Quibble would make it only accept a single argument though it can be repeated. Or in other terms, in the future one would have to use: <tt class="remarkup-monospaced">--command &#039;echo starting&#039; --command &#039;phpunit&#039; --command &#039;echo done&#039;</tt>.</p>

<p>The MediaWiki PHPUnit test suite to use is determined based on <tt class="remarkup-monospaced">ZUUL_PROJECT</tt>.  <tt class="remarkup-monospaced">--phpunit-testsuite</tt> lets one explicitly set it, a use case is to run extensions tests for a change made to mediawiki/core and ensure it does not break extensions (<tt class="remarkup-monospaced">ZUUL_PROJECT=mediawiki/core quibble --phpunit-testsuite=extensions mediawiki/extensions/BoilerPlate</tt>). On Wikimedia CI they are the <tt class="remarkup-monospaced">wmf-quibble-*</tt> jobs.</p>

<p>You can get great speed up by using a tmpfs for the database. Create a tmpfs and then pass <tt class="remarkup-monospaced">--db-dir</tt> to make use of it. With a Docker container one would do: <tt class="remarkup-monospaced">docker run --tmpfs /workspace/db:size=320M quibble:latest --db-dir=/workspace/db</tt>.</p>

<p>In the future, I would like Quibble to be faster, it runs the commands in a serialized way and would be made faster by parallelizing at least some of the test commands (edit: done in 0.0.29).</p>

<hr class="remarkup-hr" />

<p><strong>Changelog for 0.0.17 to 0.0.26</strong></p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T196013" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_793"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T196013</span></span></a> MediaWiki configuration injected by Quibble is now prepended at start of <tt class="remarkup-monospaced">LocalSettings.php</tt>, that makes the configuration snippets available to <tt class="remarkup-monospaced">wfLoadExtension()</tt> / <tt class="remarkup-monospaced">wfLoadSkin()</tt>.</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T197687" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_794"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T197687</span></span></a> - Fix Chrome autoplay policy which prevented Qunit tests to run for <a href="/tag/wikispeech/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_807"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-umbrella" data-meta="0_806" aria-hidden="true"></span>Wikispeech</span></a>  <a href="https://goo.gl/xX8pDD" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://goo.gl/xX8pDD</a></li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T198171" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_795"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T198171</span></span></a> - In Chrome do not rate limit <tt class="remarkup-monospaced">history.pushState()</tt>, prevents some Qunit tests from passing since they overflow the limit.</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T195918" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_796"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T195918</span></span></a><ul class="remarkup-list">
<li class="remarkup-list-item">Enhance inline help for <tt class="remarkup-monospaced">--run</tt> and <tt class="remarkup-monospaced">--skip</tt> by grouping group them in a <tt class="remarkup-monospaced">stages</tt> argument group.</li>
<li class="remarkup-list-item">New <tt class="remarkup-monospaced">--skip=all</tt> to skip all tests</li>
</ul></li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T195084" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_797"><span class="phui-tag-core phui-tag-color-object">T195084</span></a> <a href="https://phabricator.wikimedia.org/T195918" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_798"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T195918</span></span></a> - Support running any command inside the Quibble environment by using <tt class="remarkup-monospaced">--commands</tt> (see below). They are run with a web server exposed (<a href="https://phabricator.wikimedia.org/T203178" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_799"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T203178</span></span></a>).</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T22471" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_800"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T22471</span></span></a> <a href="https://phabricator.wikimedia.org/T196347" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_801"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T196347</span></span></a> - rebuildLocalisationCache after update.php, fix locking issues when doing the first page request, multiple requests were racing over generating the localization cache.</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T200017" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_802"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T200017</span></span></a> - Allow overriding the PHPUnit testsuite to run.</li>
<li class="remarkup-list-item">Do not spawn a WebServer when running PHPUnit tests, its is only needed for Qunit and Selenium tests.</li>
<li class="remarkup-list-item">Add a link to <a href="https://doc.wikimedia.org/quibble/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://doc.wikimedia.org/quibble/</a> in the <tt class="remarkup-monospaced">README.rst</tt>.</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T192132" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_803"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T192132</span></span></a> - Quibble is now licensed under Apache 2.0</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T202710" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_804"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T202710</span></span></a> - Xvfb no more listens on an unix socket.</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/T200991" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_805"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T200991</span></span></a> - Passing <tt class="remarkup-monospaced">--dump-db-postrun</tt> will dump the content of the database to the log directory (<tt class="remarkup-monospaced">--log-dir</tt>).  Thanks <a href="https://phabricator.wikimedia.org/p/Pablo-WMDE/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_808"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@Pablo-WMDE</span></a></li>
<li class="remarkup-list-item">Add support for Zuul cloner <tt class="remarkup-monospaced">--branch</tt> and <tt class="remarkup-monospaced">--project-branch</tt>, used to test <a href="/tag/mediawiki-extensions-donationinterface/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_810"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_809" aria-hidden="true"></span>MediaWiki-extensions-DonationInterface</span></a> master branch against MediaWiki release branches.</li>
<li class="remarkup-list-item">The environment variable <tt class="remarkup-monospaced">TMPDIR</tt> set by Quibble is no more hardcoded to <tt class="remarkup-monospaced">/tmp</tt>, it now follows the logic of Python <tt class="remarkup-monospaced">tempfile.gettempdir()</tt>.</li>
<li class="remarkup-list-item">When running under Docker, default the log directory to be under the workspace instead of <tt class="remarkup-monospaced">/log</tt>.</li>
<li class="remarkup-list-item">Allow specifying database data directory with <tt class="remarkup-monospaced">--db-dir</tt> (default is the temporary directory based on environment variable).</li>
</ul>

</div></content></entry><entry><title>An introduction to Task Types in Phabricator</title><link href="/phame/live/1/post/116/an_introduction_to_task_types_in_phabricator/" /><id>https://phabricator.wikimedia.org/phame/post/view/116/</id><author><name>mmodell (Mukunda Modell)</name></author><published>2018-09-20T17:22:36+00:00</published><updated>2018-09-24T13:21:21+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This blog post will describe a bit about how we are utilizing the &quot;Task Types&quot; feature in Phabricator to facilitate better tracking of work and to streamline workflows with custom fields. Additionally, I will be soliciting feedback about potential use-cases which could potentially take further advantage of this feature.</p>

<h3 class="remarkup-header">Inroducing Task Types</h3>

<p><tt class="remarkup-monospaced">Task Types</tt> are a relatively new feature in Phabricator which allow tasks to be created with extra information fields that are unique to tasks of a given type. For example, <tt class="remarkup-monospaced">Release</tt> tasks have a <tt class="remarkup-monospaced">release date</tt> and <tt class="remarkup-monospaced">release version</tt> which are not relevant for other types of tasks.</p>

<p>Another task type that has been recently introduced is the <tt class="remarkup-monospaced">deadline</tt> type. Deadlines include a single extra field <tt class="remarkup-monospaced">Due Date</tt> which is displayed at the top of the task view as well as on workboard cards.</p>

<h4 class="remarkup-header">Example: Typed Tasks</h4>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Deadline</th><th>Release</th></tr>
<tr><td><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/5y2kyj5ixnenyopsdjji/PHID-FILE-66wnunlmmtwgluckqwpc/Screenshot_from_2018-09-20_06-45-44.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_811"><img src="https://phab.wmfusercontent.org/file/data/whzs7m3n2avbdjyp5yvu/PHID-FILE-crjpzgadt4gcilekliod/preview-Screenshot_from_2018-09-20_06-45-44.png" width="220" height="114.18569254186" alt="Screenshot from 2018-09-20 06-45-44.png (341×657 px, 29 KB)" /></a></div></td><td><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/ahjsynw6tcpi7xfdgxre/PHID-FILE-b3kbvgk44zwevvij6keg/Screenshot_from_2018-09-20_06-50-14.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_812"><img src="https://phab.wmfusercontent.org/file/data/cc3rk3ouqrv3mhqrxjjv/PHID-FILE-sxaropdtrtkwj42eq47r/preview-Screenshot_from_2018-09-20_06-50-14.png" width="220" height="104.39961575408" alt="Screenshot from 2018-09-20 06-50-14.png (494×1 px, 50 KB)" /></a></div></td></tr>
<tr></tr>
</table></div>



<h3 class="remarkup-header">More Uses for Task Types</h3>

<p>Task types have the potential to streamline workflows and support the use of Phabricator for collecting structured data.</p>

<h4 class="remarkup-header">Bug reports and Feature Requests</h4>

<p>One proposed use of task types is for collecting specific information in bug reports and feature requests. Bug reports, for example, might ask for OS or Browser version in separate fields to aid in sorting and searching through reports.</p>

<h4 class="remarkup-header">Security Issues</h4>

<p>Another potential use-case which is currently being developed is a <tt class="remarkup-monospaced">security issue</tt> task type. This will allow the security team to add fields relevant to security issues without cluttering the task form used by everyone for other types of tasks.</p>

<h3 class="remarkup-header">The Relationship Between Custom Forms and Custom Types</h3>

<p>Custom forms can be created which hide irrelevant fields and generally streamline the process of submitting a task for a given workflow or for a team&#039;s specific use-case. This is a great feature in Phabricator and we have made extensive use of it for various purposes. The drawback to custom forms is that they are generally only useful for submitting tasks. Once a task is created, editing takes place on the normal &quot;generic&quot; task edit form.</p>

<h4 class="remarkup-header">Enter: Typed forms</h4>

<p>It&#039;s now possible to assign a type to a form. Now it&#039;s possible to configure forms so that whenever you edit a <tt class="remarkup-monospaced">Security</tt> task you always see the <tt class="remarkup-monospaced">Edit Security Task</tt> form. Thanks to typed forms, we can now add custom fields which are always visible when editing one type of task but hidden when editing other types.</p>

<h5 class="remarkup-header">Example: Custom Forms</h5>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><th>Security Issue Form</th><th>Standard Form</th></tr>
<tr><td><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/lneolup7wpmgo5bl4xdd/PHID-FILE-afbgbu6yuh6asj4beqz5/Screenshot_from_2018-09-20_08-00-29.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_813"><img src="https://phab.wmfusercontent.org/file/data/xawejcvkhxf7mf2xd6n6/PHID-FILE-ts6gfa5ejcbnzoc23sx5/preview-Screenshot_from_2018-09-20_08-00-29.png" width="180.21276595745" height="220" alt="Screenshot from 2018-09-20 08-00-29.png (940×770 px, 75 KB)" /></a></div></td><td><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/onyx6xkzthhqdmfucgyb/PHID-FILE-dmq5tfinwjassnjfgg46/Screenshot_from_2018-09-20_08-03-43.png" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_814"><img src="https://phab.wmfusercontent.org/file/data/c4wmq3xhdexs2yrhl2im/PHID-FILE-3omaowqqnwsa4joyf6tk/preview-Screenshot_from_2018-09-20_08-03-43.png" width="220" height="220" alt="Screenshot from 2018-09-20 08-03-43.png (696×696 px, 48 KB)" /></a></div></td></tr>
<tr></tr>
</table></div>



<h3 class="remarkup-header">Soliciting Feedback</h3>

<p>Your feedback will be helpful in shaping the types of tasks and forms available in Phabricator. In order to best meet the needs of everyone who uses Phabricator, I&#039;d love to hear your input on what forms and fields would be most useful for your needs. Describe a workflow or a use-case that you think would be well served by custom fields. You can comment here or on the task: <a href="/T93499" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_815"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T93499: Add support for task types (subtypes)</span></span></a></p></div></content></entry><entry><title> mediawiki_selenium 1.8.1 Ruby Gem Released</title><link href="/phame/live/1/post/108/mediawiki_selenium_1.8.1_ruby_gem_released/" /><id>https://phabricator.wikimedia.org/phame/post/view/108/</id><author><name>zeljkofilipin (Željko Filipin)</name></author><published>2018-06-14T15:05:31+00:00</published><updated>2018-09-04T17:49:57+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>It has been a while since the last <a href="https://phabricator.wikimedia.org/diffusion/MSEL/" class="remarkup-link" rel="noreferrer">mediawiki_selenium</a> release! 💎</p>

<p>I have just released version <a href="https://rubygems.org/gems/mediawiki_selenium/versions/1.8.1" class="remarkup-link remarkup-link-ext" rel="noreferrer">1.8.1</a>. 🚀</p>

<p>Notable changes:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Required Ruby version is 2.x</li>
<li class="remarkup-list-item">Upgrade selenium-webdriver to 3.2</li>
<li class="remarkup-list-item">Integration tests use Chrome instead of PhantomJS</li>
<li class="remarkup-list-item">Added license to readme file</li>
<li class="remarkup-list-item">Documented Sauce Labs usage in readme file</li>
<li class="remarkup-list-item">Updated Special:Preferences/reset page</li>
</ul>

<p>I would like to thank several contributors that have improved the gem since the last release: <a href="https://phabricator.wikimedia.org/p/hashar/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_816"><span class="phui-tag-core phui-tag-color-person">@hashar</span></a>, <a href="https://phabricator.wikimedia.org/p/Rammanojpotla/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_817"><span class="phui-tag-core phui-tag-color-person">@Rammanojpotla</span></a>, <a href="https://phabricator.wikimedia.org/p/demon/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_818"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@demon</span></a> and <a href="https://phabricator.wikimedia.org/p/thiemowmde/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_819"><span class="phui-tag-core phui-tag-color-person">@thiemowmde</span></a>! 👏</p></div></content></entry><entry><title>Quibble in May</title><link href="/phame/live/1/post/107/quibble_in_may/" /><id>https://phabricator.wikimedia.org/phame/post/view/107/</id><author><name>hashar (Antoine Musso)</name></author><published>2018-06-01T20:36:22+00:00</published><updated>2018-06-06T10:17:44+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>[Quibble] is the new test runner for MediaWiki (see the intro <a href="/J99" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_820"><span class="phui-tag-core phui-tag-color-object">Blog Post: Introducing Quibble</span></a>).  This post is to give an update of what happened during May 2018.</p>

<h2 class="remarkup-header">Updates</h2>

<p>Željko Filipin wrote a blog post <a href="/J100" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_821"><span class="phui-tag-core phui-tag-color-object">Blog Post: Run Selenium tests using Quibble and Docker</span></a>.</p>

<p>Since the <a href="https://phabricator.wikimedia.org/J99" class="remarkup-link" rel="noreferrer">last update</a>, Quibble version went from <a href="https://phabricator.wikimedia.org/source/quibble/compare/?head=0.0.17&amp;against=0.0.11" class="remarkup-link" rel="noreferrer">0.0.11 to 0.0.17</a>:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Use Sphinx to generate documentation and publish it online <a href="https://doc.wikimedia.org/quibble/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://doc.wikimedia.org/quibble/</a> - <a href="https://phabricator.wikimedia.org/T193164" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_822"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T193164</span></span></a> [Antoine &amp; Željko]</li>
<li class="remarkup-list-item">Composer timeout bumped to 900 seconds. PHP CodeSniffer against the entirety of mediawiki/core takes a while under HHVM. [Kunal Mehta]</li>
<li class="remarkup-list-item">Process git submodules in extensions and skins - <a href="https://phabricator.wikimedia.org/T130966" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_823"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T130966</span></span></a> [Antoine]</li>
<li class="remarkup-list-item">HHVM now serves .svg files with Content-Type: image/svg+xml - <a href="https://phabricator.wikimedia.org/T195634" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_824"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T195634</span></span></a> [Antoine]</li>
<li class="remarkup-list-item">Support for posgres as a database backend. You will need postgres and pg_virtualenv installed then pass <tt class="remarkup-monospaced">--db=postgres</tt>. - <a href="https://phabricator.wikimedia.org/T39602" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_825"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T39602</span></span></a> [Kunal Mehta]</li>
<li class="remarkup-list-item">Option --skip to skip one or more test commands. [Kunal Mehta]</li>
<li class="remarkup-list-item">Properly pass environment variables to all setup and test commands. Notably <tt class="remarkup-monospaced">MW_INSTALL_PATH</tt> and <tt class="remarkup-monospaced">MW_LOG_DIR</tt> were missing which caused some extensions to fail.  The Jenkins job now properly capture all logs [Antoine]</li>
</ul>



<h2 class="remarkup-header">How you can help</h2>

<h3 class="remarkup-header">Documentation</h3>

<p>The documentation can use tutorials for various use cases. It is in integration/quibble.git in the doc/source directory.  You should be able to generate it by simply running:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">tox -e doc
&lt;your web browser&gt; doc/build/index.html</pre></div>

<p>Any support or question you might have are most welcome as a Phabricator task against <a href="/tag/quibble/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_829"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_828" aria-hidden="true"></span>Quibble</span></a>.</p>

<h3 class="remarkup-header">Migrate CI</h3>

<p>I have migrated MediaWiki and a lot of extensions to use the Quibble jobs. There are still 229 mediawiki extensions not migrated yet. A test report is build daily by Jenkins:</p>

<p><a href="https://integration.wikimedia.org/ci/job/integration-config-qa/lastCompletedBuild/testReport/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://integration.wikimedia.org/ci/job/integration-config-qa/lastCompletedBuild/testReport/</a></p>

<p>Tests &quot;test_mediawiki_repos_use_quibble&quot; represent extension not migrated yet. <a href="https://phabricator.wikimedia.org/T183512" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_826"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T183512</span></span></a> is the huge tracking task.</p>

<h3 class="remarkup-header">Postgres</h3>

<p>Make MediaWiki tests passing with Postgres!</p>

<p><a href="/T195807" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_827"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T195807: Fix failing MediaWiki core tests on Postgres database backend</span></span></a></p>

<h2 class="remarkup-header">Thank you</h2>

<p>Huge thanks to Kunal Mehta, Timo Tijhof, Adam Wight, Željko Filipin and Stephen Niedzielski.</p>

<p>That is all for May 2018.</p>

<p><em>References</em></p>

<p>[Quibble]<br />
<a href="https://lists.wikimedia.org/pipermail/wikitech-l/2018-April/089812.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://lists.wikimedia.org/pipermail/wikitech-l/2018-April/089812.html</a><br />
[Presentation]<br />
<a href="https://commons.wikimedia.org/wiki/File:20180519-QuibblePres.pdf" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://commons.wikimedia.org/wiki/File:20180519-QuibblePres.pdf</a><br />
[Last update]<br />
<a href="https://lists.wikimedia.org/pipermail/wikitech-l/2018-April/089858.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://lists.wikimedia.org/pipermail/wikitech-l/2018-April/089858.html</a></p></div></content></entry><entry><title>Technical Debt - The Contagion Effect</title><link href="/phame/live/1/post/106/technical_debt_-_the_contagion_effect/" /><id>https://phabricator.wikimedia.org/phame/post/view/106/</id><author><name>Jrbranaa (Jean-Rene Branaa)</name></author><published>2018-05-24T23:16:51+00:00</published><updated>2018-07-22T14:18:06+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>One particularly interesting topic discussed during the Hackathon Technical Debt session (<a href="https://phabricator.wikimedia.org/T194934" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_830"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T194934</span></span></a>) was that of the contagious aspect of technical debt.  Although this makes sense in hindsight, it&#039;s not something that I had really given much thought to previously.</p>

<p>The basic premise is that existing technical debt can have a contagious effect on other areas of code.  One aspect of this is developers new to the MediaWiki code base may use existing code as a pattern for new code development.  If that code has technical debt, the technical debt could get replicated in other areas of code.</p>

<p>This can be overcome with both education about desired patterns as well as sharing the technical debt state of existing code.  It&#039;s not clear how best to accomplish the later, but perhaps it&#039;s as simple as a comment in the code, once it&#039;s been identified and is being tracked in Phabricator.</p>

<p>Another aspect of the contagion effect (perhaps more of a compound effect), is the result of maintaining code with existing technical debt.  As bugs are fixed or minor features added, those changes can, in effect, result in a spreading of the technical debt.  Of course this doesn&#039;t always need to be the case, but it can be, if one is not careful.</p>

<p>I&#039;d like to get your thoughts on this topic and your past experiences working with and around technical debt.</p>

<p>Thoughts/Questions:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Are some areas of code more contagious than others?</li>
<li class="remarkup-list-item">What are some ways to mark technical debt as such?</li>
<li class="remarkup-list-item">What do you do when you need to work on code with significant technical debt?</li>
</ul></div></content></entry><entry><title>Run Selenium tests using Quibble and Docker</title><link href="/phame/live/1/post/100/run_selenium_tests_using_quibble_and_docker/" /><id>https://phabricator.wikimedia.org/phame/post/view/100/</id><author><name>zeljkofilipin (Željko Filipin)</name></author><published>2018-05-02T13:46:33+00:00</published><updated>2020-03-04T14:35:43+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Dependencies are <a href="https://git-scm.com/downloads" class="remarkup-link remarkup-link-ext" rel="noreferrer">Git</a> <a href="https://www.python.org/downloads/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Python 3</a>, and <a href="https://www.docker.com/get-docker" class="remarkup-link remarkup-link-ext" rel="noreferrer">Docker</a> Community Edition (CE).</p>

<p>First, the general setup.</p>

<div class="remarkup-code-block" data-code-lang="shell" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span>$ git clone https://gerrit.wikimedia.org/r/p/integration/quibble
...
       
$ <span class="nb">cd</span> quibble/

$ python3 -m pip install -e .
...

$ docker pull docker-registry.wikimedia.org/releng/quibble-stretch:latest
...
<span class="o">(</span>2m 26s<span class="o">)</span></pre></div>

<p>The simplest, and slowest, way to run Quibble.</p>

<div class="remarkup-code-block" data-code-lang="shell" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span>$ docker run -it --rm <span class="se">\</span>
 docker-registry.wikimedia.org/releng/quibble-stretch:latest
...
<span class="o">(</span>12m 54s<span class="o">)</span></pre></div>

<p>Speed things up by using local repositories.</p>

<div class="remarkup-code-block" data-code-lang="shell" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span>$ mkdir -p ref/mediawiki/skins

$ git clone --bare https://gerrit.wikimedia.org/r/mediawiki/core ref/mediawiki/core.git
...
<span class="o">(</span>3m 40s<span class="o">)</span>

$ git clone --bare https://gerrit.wikimedia.org/r/mediawiki/vendor ref/mediawiki/vendor.git
...

$ git clone --bare https://gerrit.wikimedia.org/r/mediawiki/skins/Vector ref/mediawiki/skins/Vector.git
...

$ mkdir cache
$ chmod <span class="m">777</span> cache

$ mkdir -p log
$ chmod <span class="m">777</span> log

$ mkdir -p src
$ chmod <span class="m">777</span> src

$ docker run -it --rm <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/cache:/cache <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/log:/workspace/log <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/ref:/srv/git:ro <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/src:/workspace/src <span class="se">\</span>
  docker-registry.wikimedia.org/releng/quibble-stretch:latest
...
<span class="o">(</span>18m 0s<span class="o">)</span></pre></div>

<p>The second run of everything, just to see if things get faster.</p>

<div class="remarkup-code-block" data-code-lang="shell" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span>$ docker run -it --rm <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/cache:/cache <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/log:/workspace/log <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/ref:/srv/git:ro <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/src:/workspace/src <span class="se">\</span>
  docker-registry.wikimedia.org/releng/quibble-stretch:latest
...
<span class="o">(</span>16m 50s<span class="o">)</span></pre></div>

<p>If you get this error message</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">A LocalSettings.php file has been detected. To upgrade this installation, please run update.php instead</pre></div>

<p>just remove the file</p>

<div class="remarkup-code-block" data-code-lang="shell" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span>$ rm src/LocalSettings.php</pre></div>

<p>Speed things up by skipping Zuul and not installing dependencies.</p>

<div class="remarkup-code-block" data-code-lang="shell" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span>$ docker run -it --rm <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/cache:/cache <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/log:/workspace/log <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/ref:/srv/git:ro <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/src:/workspace/src <span class="se">\</span>
  docker-registry.wikimedia.org/releng/quibble-stretch:latest --skip-zuul --skip-deps
...
<span class="o">(</span>6m 17s<span class="o">)</span></pre></div>

<p>Speed things up by just running Selenium tests.</p>

<div class="remarkup-code-block" data-code-lang="shell" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span>$ docker run -it --rm <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/cache:/cache <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/log:/workspace/log <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/ref:/srv/git:ro <span class="se">\</span>
  -v <span class="s2">&quot;</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/src:/workspace/src <span class="se">\</span>
  docker-registry.wikimedia.org/releng/quibble-stretch:latest --skip-zuul --skip-deps --run selenium
...
<span class="o">(</span>1m 19s<span class="o">)</span></pre></div></div></content></entry><entry><title>Introducing Quibble</title><link href="/phame/live/1/post/99/introducing_quibble/" /><id>https://phabricator.wikimedia.org/phame/post/view/99/</id><author><name>hashar (Antoine Musso)</name></author><published>2018-04-30T09:09:00+00:00</published><updated>2018-05-30T21:12:47+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Running all tests for MediaWiki and matching what CI/Jenkins is running has been a constant challenge for everyone, myself included.  Today I am introducing Quibble, a python script that clone MediaWiki, set it up and run test commands.</p>

<p>It is a follow up to the Vienna Hackathon in 2017. We had a lot of discussion to make the CI jobs reproducible on a local machine and to unify the logic at a single place.  Today, I have added a few jobs to<br />
<tt class="remarkup-monospaced">mediawiki/core</tt>.</p>

<p>An immediate advantage is that they run in Docker containers and will start running as soon as an execution slot is available. That will be faster than the old jobs (suffixed with -jessie) that had to wait for a<br />
virtual machine to be made available.</p>

<p>A second advantage, is one can exactly reproduce the build on a local computer and even hack code for a fix up.</p>

<p>The setup guide is available from the source repository (<tt class="remarkup-monospaced">integration/quibble.git</tt>):<br />
<a href="https://gerrit.wikimedia.org/g/integration/quibble/" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://gerrit.wikimedia.org/g/integration/quibble/</a></p>

<p>The minimal example would be:</p>

<div class="remarkup-code-block" data-code-lang="shell" data-sigil="remarkup-code-block"><pre class="remarkup-code"><span></span>git clone https://gerrit.wikimedia.org/r/p/integration/quibble
<span class="nb">cd</span> quibble
python3 -m pip install -e .
quibble</pre></div>

<p>A few more details are available in this post on the QA list:<br />
<a href="https://lists.wikimedia.org/pipermail/qa/2018-April/002699.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://lists.wikimedia.org/pipermail/qa/2018-April/002699.html</a></p>

<p>Please give it a try and send issues, support requests to Phabricator <a href="/tag/quibble/" class="phui-tag-view phui-tag-type-shade phui-tag-blue phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_832"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-briefcase" data-meta="0_831" aria-hidden="true"></span>Quibble</span></a> project.</p>

<p>It will eventually used for all MediaWiki extensions and skins as well.</p></div></content></entry><entry><title>Selenium tests in Node.js project retrospective</title><link href="/phame/live/1/post/88/selenium_tests_in_node.js_project_retrospective/" /><id>https://phabricator.wikimedia.org/phame/post/view/88/</id><author><name>zeljkofilipin (Željko Filipin)</name></author><published>2018-03-26T14:28:12+00:00</published><updated>2018-05-15T23:35:31+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I have been working on the project with more or less focus on it since 2015. Maybe the easiest way to follow the project is by taking a look at a few epic tasks:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="/T139740" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_833"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T139740: Port Selenium tests from Ruby to Node.js</span></span></a> (2015-2017)</li>
<li class="remarkup-list-item"><a href="/T182421" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_834"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T182421: Q3 Selenium framework improvements</span></span></a> (2018)</li>
<li class="remarkup-list-item"><a href="/T182986" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_835"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T182986: Selenium framework improvements</span></span></a> (where no one has gone before)</li>
</ul>

<p><a href="/T182421" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_836"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T182421: Q3 Selenium framework improvements</span></span></a> will come to an end in a few days, so last week a few of us had a meeting to discuss the project.</p>

<p>Conclusions:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">The new Node.js Selenium framework is simpler and easier to use than previous Ruby framework.</li>
</ul>

<p>What could have gone better:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">A lot of effort is required to port large test suites. Some teams were able to do it, some teams were not.</li>
<li class="remarkup-list-item">It was not clear that both Ruby and Node.js frameworks could coexist.</li>
<li class="remarkup-list-item">It was not clear that Mocha is recommended, but not mandatory. It is still possible to write Cucumber tests.</li>
<li class="remarkup-list-item">Some features of the Ruby framework are not available in Node.js framework, like multi-user login.</li>
<li class="remarkup-list-item">Node.js&#039;s built-in assertion library sometimes doesn&#039;t provide useful error messages. Chai is a good alternative.</li>
<li class="remarkup-list-item">It would be better if a meeting like this happened at the beginning of the project, and several times during the project.</li>
</ul>

<p>Things to do:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Meeting report (this blog post).</li>
<li class="remarkup-list-item"><a href="/T182421" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_837"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T182421: Q3 Selenium framework improvements</span></span></a> and <a href="/T182986" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_838"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T182986: Selenium framework improvements</span></span></a>.</li>
<li class="remarkup-list-item">I am always looking for people to help me review framework improvements, see <a href="/T188744" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_839"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T188744: Find a few people interested in reviewing Selenium patches</span></span></a>.</li>
<li class="remarkup-list-item">We will meet again, at least every quarter, as long as the project is ongoing.</li>
<li class="remarkup-list-item">This meeting was long (80 minutes). Time-box next meetings to 50-60, or even 20-30 minutes.</li>
<li class="remarkup-list-item">At <a href="https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2018" class="remarkup-link remarkup-link-ext" rel="noreferrer">Wikimedia Hackathon 2018</a> I will lead <a href="/T190046" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_840"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T190046: Write Selenium tests in JavaScript/Node.js workshop</span></span></a> workshop, and <a href="/T190687" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_841"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T190687: Pair on writing Selenium tests in JavaScript/Node.js</span></span></a> will be happening during the entire hackathon.</li>
</ul>

<p>Meeting notes are available at <a href="https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/20180320_Selenium_Retrospective" class="remarkup-link remarkup-link-ext" rel="noreferrer">20180320 Selenium Retrospective</a>.</p>

<hr class="remarkup-hr" />

<p>Image by Paul Friel - Meerkat II, CC BY 2.0, <a href="https://commons.wikimedia.org/w/index.php?curid=24567063" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://commons.wikimedia.org/w/index.php?curid=24567063</a></p></div></content></entry><entry><title>Phabricator Updates for February 2018</title><link href="/phame/live/1/post/85/phabricator_updates_for_february_2018/" /><id>https://phabricator.wikimedia.org/phame/post/view/85/</id><author><name>mmodell (Mukunda Modell)</name></author><published>2018-02-15T07:55:48+00:00</published><updated>2018-02-23T15:48:34+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This is a digest of the updates from several weeks of <a href="https://secure.phabricator.com/w/changelog/" class="remarkup-link remarkup-link-ext" rel="noreferrer">changelogs</a> which are published <a href="https://phacility.com/" class="remarkup-link remarkup-link-ext" rel="noreferrer">upstream</a>. This is an incomplete list as I&#039;ve cherry-picked just the changes which I think will be of significant interest to end-users of Wikimedia&#039;s phabricator. Please see the upstream changelogs for a detailed overview of everything that&#039;s changed recently.</p>

<h2 class="remarkup-header">General</h2>

<h5 class="remarkup-header">Bulk Editor</h5>

<blockquote><p><a href="https://secure.phabricator.com/T13025" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://secure.phabricator.com/T13025</a> The bulk editor (previously sometimes called the &quot;batch editor&quot;) has been rebuilt on top of modern infrastructure (<tt class="remarkup-monospaced">EditEngine</tt>) and a number of bugs have been fixed.</p>

<p>You can now modify the set of objects being edited from the editor screen, and a wider range of fields (including &quot;points&quot; and some custom fields) are supported. The bulk editor should also handle edits of workboard columns with large numbers of items more gracefully.</p>

<p>Bulk edits can now be made silently (suppressing notifications, feed stories, and email) with <tt class="remarkup-monospaced">bin/bulk make-silent</tt>. The need to run a command-line tool is a little clumsy and is likely to become easier in a future version of Phabricator, but the ability to act silently could help an attacker who compromised an account avoid discovery for an extended period of time.</p>

<p>Edits which were made silently show an icon in the timeline view to make it easier to identify them.</p></blockquote>



<h5 class="remarkup-header">Webhooks</h5>

<blockquote><p>Herald now supports formally defining webhooks. You can configure webhooks in &quot;firehose&quot; mode (so they receive all events) or use Herald rules to call them when certain conditions are met.</p></blockquote>



<h5 class="remarkup-header">Mail Stamps</h5>

<p>Several users have requested a way to differentiate notifications triggered by an <span class="phabricator-remarkup-mention-unknown">@mention</span> from the deluge of regular task subscription notification emails. This feature should provide a very good solution.  See <a href="https://phabricator.wikimedia.org/T150766" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_842"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T150766</span></span></a> for one such request.</p>

<blockquote><p>Mail now supports &quot;mail stamps&quot; to make it easier to use client rules to route or flag mail. Stamps are pieces of standardized metadata attached to mail in a machine-parseable format, like &quot;FRAGILE&quot; or &quot;RETURN TO SENDER&quot; might be stamped on a package.</p>

<p>By default, stamps are available in the <tt class="remarkup-monospaced">X-Phabricator-Stamps</tt> header. You can also enable them in the mail body by changing the <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Settings</span></span><span class="remarkup-nav-sequence-arrow"> → </span><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Email Format</span></span><span class="remarkup-nav-sequence-arrow"> → </span><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Send Stamps</span></span></span> setting. This may be useful if you use a client like Gmail which can not act on mail headers.</p>

<p>Stamps provide more comprehensive information about object and change state than was previously available, and you can now highlight important mail which has stamps like <tt class="remarkup-monospaced">mention(@alice)</tt> or <tt class="remarkup-monospaced">reviewer(@alice)</tt>.</p>

<p>See <a href="https://secure.phabricator.com/T13069" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://secure.phabricator.com/T13069</a> for additional discussion and plans for this feature.</p></blockquote>



<h5 class="remarkup-header">Mute</h5>

<blockquote><p>You can now <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Mute Notifications</span></span></span> for any object which supports subscriptions. This action is available in the right-hand column under <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Subscribe</span></span></span>. Muting notifications for an object stops you from receiving mail from that object, except for mail triggered by <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Send me an email</span></span></span> rules in Herald.</p>

<p>This feature is &quot;on probation&quot; and may be removed in the future if it proves more confusing than useful.</p>

<p>See <a href="https://secure.phabricator.com/T13068" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://secure.phabricator.com/T13068</a> for some discussion.</p></blockquote>



<h5 class="remarkup-header">Task Close Date</h5>

<blockquote><p>Maniphest now explicitly tracks a closed date (and closing actor) for tasks. This data will be built retroactively by a migration during the upgrade. This will take a little while if you have a lot of tasks (see &quot;Migrations&quot; below).</p>

<p>The Maniphest search UI can now order by close date and filter tasks closed between particular dates or closed by certain users. The <tt class="remarkup-monospaced">maniphest.search</tt> API has similar support, and returns this data in result sets. This data is also now available via <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Export Data</span></span></span>.</p>

<p>For closed tasks, the main task list view now shows a checkmark icon and the close date. For open tasks, the view retains the old behavior (no icon, modified date).</p></blockquote>



<h5 class="remarkup-header">Require secure mail</h5>

<blockquote><p>Herald rules can now <span class="remarkup-nav-sequence"><span class="phui-tag-view phui-tag-type-shade phui-tag-grey phui-tag-shade "><span class="phui-tag-core ">Require secure mail</span></span></span>. You can use this action to prevent discussion of sensitive objects (like security bugfixes) from being transmitted via email.</p>

<p>To use this feature, you&#039;ll generally write a Herald rule like this:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">Global Rule for Revisions
When: 
[ Projects ][ include ][ Security Fix ]
Take actions:
[ Require secure mail ]</pre></div>

<p>Users will still be notified that the corresponding object has been updated, but will have to follow a link in the mail to view details over HTTPS.</p>

<p>This may be useful if you use mailing lists with wide distributions or model sophisticated attackers as threats.</p>

<p>Note that this action is currently not stateful: the rule must keep matching every update to keep the object under wraps. This may change in the future. This flag may also support continuing to send mail content if GPG is configured in some future release.</p></blockquote>

<p>I expect that we will utilize this feature to improve the secrecy of critical security bugs which are kept private until a security patch has been released.</p>

<h5 class="remarkup-header">Minor</h5>

<blockquote><ul class="remarkup-list">
<li class="remarkup-list-item">Slightly reduced the level of bleeding/explosions on the Maniphest burnup chart.</li>
<li class="remarkup-list-item">Added date range filtering to activity logs, pull logs, and push logs.</li>
<li class="remarkup-list-item">Push logs are now more human readable.</li>
<li class="remarkup-list-item">&quot;Assign to&quot; should now work properly in the bulk editor.</li>
<li class="remarkup-list-item">Fixed an issue with comment actions that affect numeric fields like &quot;Points&quot; in Maniphest.</li>
<li class="remarkup-list-item">maniphest.edit should now accept null to unassign a task, as suggested by the documentation.</li>
<li class="remarkup-list-item">GitLFS over SSH no longer fatals on a bad getUser() call.</li>
<li class="remarkup-list-item">Commits and revisions may now Reverts &lt;commit|revision&gt; one another, and reverting or reverted changes are shown more clearly in the timeline.</li>
</ul></blockquote></div></content></entry><entry><title>Selenium Ruby framework deprecated</title><link href="/phame/live/1/post/79/selenium_ruby_framework_deprecated/" /><id>https://phabricator.wikimedia.org/phame/post/view/79/</id><author><name>zeljkofilipin (Željko Filipin)</name></author><published>2017-10-30T13:44:09+00:00</published><updated>2020-03-09T09:14:42+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This is your friendly but <strong>final</strong> warning that we are replacing Selenium tests written in Ruby with tests in Node.js. There will be no more reminders. Ruby stack will no longer be maintained. For more information see <a href="https://phabricator.wikimedia.org/T139740" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_844"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T139740</span></span></a> and <a href="https://phabricator.wikimedia.org/T173488" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_845"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T173488</span></span></a>.</p>

<p>Extensive <a href="https://www.mediawiki.org/wiki/Selenium/Node.js" class="remarkup-link remarkup-link-ext" rel="noreferrer">documentation</a> is available at mediawiki.org. If you need help with the migration, I am available for pairing and code review (zfilipin in <a href="https://www.mediawiki.org/wiki/Gerrit" class="remarkup-link remarkup-link-ext" rel="noreferrer">Gerrit</a>, <a href="https://www.mediawiki.org/wiki/MediaWiki_on_IRC" class="remarkup-link remarkup-link-ext" rel="noreferrer">zeljkof in #wikimedia-releng</a>).</p>

<p>To see how to write a test watch Selenium tests in Node.js tech talk (<a href="https://phabricator.wikimedia.org/J78" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_843"><span class="phui-tag-core phui-tag-color-object">J78</span></a>).</p></div></content></entry><entry><title>Tech talk: Selenium tests in Node.js</title><link href="/phame/live/1/post/78/tech_talk_selenium_tests_in_node.js/" /><id>https://phabricator.wikimedia.org/phame/post/view/78/</id><author><name>zeljkofilipin (Željko Filipin)</name></author><published>2017-10-27T12:04:18+00:00</published><updated>2017-11-07T11:32:31+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><h2 class="remarkup-header">Who 👨‍💻</h2>

<p>Željko Filipin, Engineer (Contractor) from Release Engineering team. That&#039;s me! 👋</p>

<h2 class="remarkup-header">What 📆</h2>

<p>Selenium tests in Node.js. We will write a new simple test for a MediaWiki extension. An example: <a href="https://www.mediawiki.org/wiki/Selenium/Node.js/Write" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.mediawiki.org/wiki/Selenium/Node.js/Write</a></p>

<h2 class="remarkup-header">When ⏳</h2>

<p>Tuesday, October 31, 16:00 UTC (<a href="https://phabricator.wikimedia.org/E766" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_846"><span class="phui-tag-core phui-tag-color-object">E766</span></a>).</p>

<h2 class="remarkup-header">Where 🌍</h2>

<p>The internet! The event will be streamed and recorded. Details coming soon.</p>

<h2 class="remarkup-header">Why 💻</h2>

<p>We are deprecating Ruby Selenium framework (<a href="https://phabricator.wikimedia.org/T173488" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_847"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T173488</span></span></a>).</p>

<p>See you there!</p>

<h2 class="remarkup-header">Video 🎥</h2>

<p><a href="https://www.youtube.com/watch?v=Q7TT1Joze14" class="remarkup-link remarkup-link-ext" rel="noreferrer">Youtube</a>, Commons (coming soon)</p></div></content></entry><entry><title>Selenium Ruby framework deprecation (September)</title><link href="/phame/live/1/post/75/selenium_ruby_framework_deprecation_september/" /><id>https://phabricator.wikimedia.org/phame/post/view/75/</id><author><name>zeljkofilipin (Željko Filipin)</name></author><published>2017-09-25T15:27:52+00:00</published><updated>2017-09-25T15:41:26+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p><em> Originally an email sent on September 25 2017 to <a href="https://lists.wikimedia.org/pipermail/qa/2017-September/002657.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">qa</a>, <a href="https://lists.wikimedia.org/pipermail/engineering/2017-September/000473.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">engineering</a> and <a href="https://lists.wikimedia.org/pipermail/wikitech-l/2017-September/088898.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech-l</a> mailing lists. </em></p>

<p>This is your friendly but <a href="https://en.wiktionary.org/wiki/penultimate" class="remarkup-link remarkup-link-ext" rel="noreferrer">penultimate</a> warning that we are replacing Selenium tests written in Ruby with tests in Node.js. There will be only one more reminder, in October. In the meantime, only critical problems will be resolved in the Ruby stack. After October we will no longer maintain it.</p>

<p>You can follow task <a href="https://phabricator.wikimedia.org/T139740" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_848"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T139740</span></span></a> or <a href="https://phabricator.wikimedia.org/phame/blog/view/1/" class="remarkup-link" rel="noreferrer">Release Engineering blog</a> for more information.</p>

<p>Extensive <a href="https://www.mediawiki.org/wiki/Selenium/Node.js" class="remarkup-link remarkup-link-ext" rel="noreferrer">documentation</a> is available at mediawiki.org. If you need help with the migration, I am available for pairing and code review (zfilipin in <a href="https://www.mediawiki.org/wiki/Gerrit" class="remarkup-link remarkup-link-ext" rel="noreferrer">Gerrit</a>, <a href="https://www.mediawiki.org/wiki/MediaWiki_on_IRC" class="remarkup-link remarkup-link-ext" rel="noreferrer">zeljkof in #wikimedia-releng</a>).</p></div></content></entry><entry><title>Selenium Ruby framework deprecation</title><link href="/phame/live/1/post/74/selenium_ruby_framework_deprecation/" /><id>https://phabricator.wikimedia.org/phame/post/view/74/</id><author><name>zeljkofilipin (Željko Filipin)</name></author><published>2017-09-25T15:14:04+00:00</published><updated>2017-09-25T15:14:04+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p><em> Originally an email sent on August 23 2017 to <a href="https://lists.wikimedia.org/pipermail/qa/2017-August/002646.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">qa</a>, <a href="https://lists.wikimedia.org/pipermail/engineering/2017-August/000459.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">engineering</a> and <a href="https://lists.wikimedia.org/pipermail/wikitech-l/2017-August/088653.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech-l</a> mailing lists. </em></p>

<p>As <a href="https://phabricator.wikimedia.org/phame/post/view/73/selenium_tests_in_node.js/" class="remarkup-link" rel="noreferrer">announced in April</a>, we are replacing Selenium tests written in Ruby with tests in Node.js. Now is the last responsible moment to make the move. There will be two more reminders, in September and October. In the meantime, only critical problems will be resolved in the Ruby stack. After October we will no longer maintain it. You can follow task <a href="https://phabricator.wikimedia.org/T139740" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_849"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T139740</span></span></a> for more information. Extensive <a href="https://www.mediawiki.org/wiki/Selenium/Node.js" class="remarkup-link remarkup-link-ext" rel="noreferrer">documentation</a> is available at mediawiki.org. If you need help with the migration, I am available for pairing and code review (zfilipin in <a href="https://www.mediawiki.org/wiki/Gerrit" class="remarkup-link remarkup-link-ext" rel="noreferrer">Gerrit</a>, <a href="https://www.mediawiki.org/wiki/MediaWiki_on_IRC" class="remarkup-link remarkup-link-ext" rel="noreferrer">zeljkof in #wikimedia-releng</a>).</p></div></content></entry><entry><title>Selenium tests in Node.js</title><link href="/phame/live/1/post/73/selenium_tests_in_node.js/" /><id>https://phabricator.wikimedia.org/phame/post/view/73/</id><author><name>zeljkofilipin (Željko Filipin)</name></author><published>2017-09-25T14:57:49+00:00</published><updated>2017-09-25T15:43:33+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p><em> Originally an-email sent on April 3 2017 to <a href="https://lists.wikimedia.org/pipermail/qa/2017-April/002628.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">qa</a>, <a href="https://lists.wikimedia.org/pipermail/engineering/2017-April/000409.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">engineering</a> and <a href="https://lists.wikimedia.org/pipermail/wikitech-l/2017-April/087888.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">wikitech-l</a> mailing lists. </em></p>

<h2 class="remarkup-header">TL;DR</h2>

<p>You can now write Selenium tests in Node.js! Learn more about it at <a href="https://www.mediawiki.org/wiki/Selenium/Node.js" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.mediawiki.org/wiki/Selenium/Node.js</a></p>

<h2 class="remarkup-header">Introduction</h2>

<p>Five years ago we introduced browser tests using Selenium and a Ruby based stack. It has worked great for some teams, and not so great for others. Last year we talked to people from several teams and ran a <a href="https://www.mediawiki.org/wiki/Browser_testing_user_satisfaction_survey" class="remarkup-link remarkup-link-ext" rel="noreferrer">survey</a>. The outcome is a preference toward using a language developers are familiar with: JavaScript/Node.Js.</p>

<p>After several months of research and development, we are proud to announce support for writing tests in Node.js. We have decided to use <a href="http://webdriver.io/" class="remarkup-link remarkup-link-ext" rel="noreferrer">WebdriverIO</a>. It is already available in MediaWiki core and supports running tests for extensions.</p>

<p>You can give it a try <a href="https://www.mediawiki.org/wiki/Selenium/Node.js/Inside_MediaWiki-Vagrant" class="remarkup-link remarkup-link-ext" rel="noreferrer">in MediaWiki-Vagrant</a>:</p>

<div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">vagrant up
vagrant ssh
sudo apt-get install chromedriver
export PATH=$PATH:/usr/lib/chromium
cd /vagrant/mediawiki
xvfb-run npm run selenium</pre></div>



<h2 class="remarkup-header">Documentation</h2>

<p>Extensive details are available on the landing page: <a href="https://www.mediawiki.org/wiki/Selenium/Node.js" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://www.mediawiki.org/wiki/Selenium/Node.js</a></p>

<h2 class="remarkup-header">Future</h2>

<p>We plan to replace the majority of Selenium tests written in Ruby with tests in Node.js in the next 6 months. We can not force anybody to rewrite existing tests, but we will offer documentation and pairing sessions for teams that need help. After 6 months, teams that want to continue using Ruby framework will be able to do so, but without support from Release Engineering team.</p>

<p>I have submitted a skill share session for <a href="https://phabricator.wikimedia.org/T159945" class="remarkup-link" rel="noreferrer">Wikimedia Hackathon 2017 in Vienna</a>. If you would like to pair on Selenium tests in person, that would be a great time.</p>

<p>The list of short term actions is in task <a href="https://phabricator.wikimedia.org/T139740" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_850"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T139740</span></span></a>.</p>

<h2 class="remarkup-header">Thanks</h2>

<p>I would like to thank several people for reviews, advice and code: Jean-Rene Branaa, Dan Duvall, Antoine Musso, Jon Robson, Timo Tijhof. (Names are sorted alphabetically by last name. Apologies to people I have forgot.)</p></div></content></entry><entry><title>New feature: Embed videos from Commons into Phabricator markup</title><link href="/phame/live/1/post/18/new_feature_embed_videos_from_commons_into_phabricator_markup/" /><id>https://phabricator.wikimedia.org/phame/post/view/18/</id><author><name>mmodell (Mukunda Modell)</name></author><published>2017-06-01T23:49:27+00:00</published><updated>2017-06-14T04:36:13+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I just finished deploying an update to Phabricator which includes a simple but rather useful feature:</p>

<p><a href="/T116515" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_851"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T116515: Enable embedding of media from Wikimedia Commons</span></span></a></p>

<p>You can now embed videos from Wikimedia commons into any Task, Comment or Post. Just paste the commons URL to embed the standard commons player in an iframe. For example, this url:</p>

<p><tt class="remarkup-monospaced">https://commons.wikimedia.org/wiki/File:Saving_and_sharing_search_queries_in_Phabricator.webm</tt></p>

<p>Produces this embedded video:</p>

<p><div class="embedded-commons-video"><iframe width="650" height="400" style="margin: 1em auto; border: 0px;" src="https://commons.wikimedia.org/wiki/File:Saving_and_sharing_search_queries_in_Phabricator.webm?embedplayer=yes" frameborder="0"></iframe></div></p></div></content></entry><entry><title>Sponsored Phabricator Improvements</title><link href="/phame/live/1/post/9/sponsored_phabricator_improvements/" /><id>https://phabricator.wikimedia.org/phame/post/view/9/</id><author><name>mmodell (Mukunda Modell)</name></author><published>2016-07-27T10:44:53+00:00</published><updated>2021-06-05T15:46:47+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In <a href="https://phabricator.wikimedia.org/T135327" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_853"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T135327</span></span></a>, the <a href="https://meta.wikimedia.org/wiki/Technical_Collaboration" class="remarkup-link remarkup-link-ext" rel="noreferrer">WMF Technical Collaboration team</a> collected a list of Phabricator bugs and feature requests from the Wikimedia Developer Community. After identifying the most promising requests from the community, these were presented to <a href="https://phacility.com/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Phacility</a> (the organization that builds and maintains Phabricator) for sponsored <a href="https://secure.phabricator.com/w/prioritization/" class="remarkup-link remarkup-link-ext" rel="noreferrer">prioritization</a>.</p>

<p>I am very pleased to report that we are already seeing the benefits of this initiative. Several sponsored improvements have landed on <a href="https://phabricator.wikimedia.org/" class="remarkup-link" rel="noreferrer">https://phabricator.wikimedia.org/</a> over the past few weeks. For an overview of what&#039;s landed recently, read on!</p>

<h3 class="remarkup-header">Fixed</h3>

<p>The following tasks are now resolved:</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td><span class="visual-only phui-icon-view phui-font-fa fa-check green" data-meta="0_866" aria-hidden="true"></span></td><td><a href="/T33" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_854"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T33: Phabricator should let you add dependencies both ways (depending and blocking)</span></span></a></td></tr>
<tr><td><span class="visual-only phui-icon-view phui-font-fa fa-check green" data-meta="0_867" aria-hidden="true"></span></td><td><a href="/T165" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_855"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T165: Provide a way to upload a file in Phabricator if drag&#039;n&#039;drop is not available or not wanted</span></span></a></td></tr>
<tr><td><span class="visual-only phui-icon-view phui-font-fa fa-check green" data-meta="0_868" aria-hidden="true"></span></td><td><a href="/T234" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_856"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T234: Projects dropdown should offer project descriptions</span></span></a></td></tr>
<tr><td><span class="visual-only phui-icon-view phui-font-fa fa-check green" data-meta="0_869" aria-hidden="true"></span></td><td><a href="/T634" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_857"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T634: pholio/new/ requires drag and drop</span></span></a></td></tr>
<tr><td><span class="visual-only phui-icon-view phui-font-fa fa-check green" data-meta="0_870" aria-hidden="true"></span></td><td><a href="/T75851" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_858"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T75851: Email notification for &quot;edited the task description&quot; does not contain the actual content changes (diff) nor a link</span></span></a></td></tr>
<tr><td><span class="visual-only phui-icon-view phui-font-fa fa-check green" data-meta="0_871" aria-hidden="true"></span></td><td><a href="/T78078" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_859"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T78078: Videos cannot be viewed without downloading</span></span></a></td></tr>
<tr><td><span class="visual-only phui-icon-view phui-font-fa fa-check green" data-meta="0_872" aria-hidden="true"></span></td><td><a href="/T78824" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_860"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T78824: Phabricator task description diffs inaccurate due to 80-character line wrapping</span></span></a></td></tr>
<tr></tr>
</table></div>

<p>Notice three of those have task numbers lower than 2000. Those long-standing tasks date from the first months of WMF&#039;s Phabricator evaluation and RFC period. When those tasks were originally filled, Phabricator was just a test install running in WMF Labs. For me, It&#039;s especially satisfying to close so many long-standing issues that have effected many of us for more than a year.</p>

<h3 class="remarkup-header">Work in Progress</h3>

<p>Several more issues were identified for sponsorship which are still awaiting a complete solution. Some of these are at least partially fixed and some are still pending. You can find out more details by reading the comments on each task linked below.</p>

<div class="remarkup-table-wrap"><table class="remarkup-table">
<tr><td><span class="visual-only phui-icon-view phui-font-fa fa-check yellow" data-meta="0_873" aria-hidden="true"></span></td><td>Partially fixed</td><td><a href="/T76732" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_861"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T76732: Exact matches should always win when suggesting/auto-completing</span></span></a></td></tr>
<tr><td><span class="visual-only phui-icon-view phui-font-fa fa-hourglass green" data-meta="0_874" aria-hidden="true"></span></td><td>In Progress</td><td><a href="/T136071" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_862"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T136071: Get Phabricator i18n ready for translatewiki.net</span></span></a></td></tr>
<tr><td><span class="visual-only phui-icon-view phui-font-fa fa-hourglass yellow" data-meta="0_875" aria-hidden="true"></span></td><td>Stalled</td><td><a href="/T96464" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_863"><span class="phui-tag-core phui-tag-color-object">T96464: Editing an existing task description which mentions a user should not readd the user as Subscriber</span></a></td></tr>
<tr><td>x</td><td></td><td><a href="/T1035" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_864"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T1035: Consolidate the many tech events calendars in Phabricator&#039;s calendar</span></span></a></td></tr>
<tr></tr>
</table></div>



<h3 class="remarkup-header">Other recent changes</h3>

<p>Besides the sponsored features and bug fixes, there are several other recent improvements which are worth mentioning.</p>

<h4 class="remarkup-header">Milestones now include <tt class="remarkup-monospaced">Next</tt> / <tt class="remarkup-monospaced">Previous</tt> navigation</h4>

<ul class="remarkup-list">
<li class="remarkup-list-item">This was developed by <a href="https://phabricator.wikimedia.org/p/mmodell/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_876"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@mmodell</span></a> and the code is published in <a href="/source/phab-extensions/" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_865"><span class="phui-tag-core phui-tag-color-object">rPHEX phabricator-extensions</span></a></li>
</ul>

<h4 class="remarkup-header">Recurring calendar events also gained next / previous navigation</h4>

<ul class="remarkup-list">
<li class="remarkup-list-item">This was also developed by <a href="https://phabricator.wikimedia.org/p/mmodell/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_877"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@mmodell</span></a> and submitted upstream. That patch is still pending code review at <a href="https://secure.phabricator.com/D16179" class="remarkup-link remarkup-link-ext" rel="noreferrer">https://secure.phabricator.com/D16179</a></li>
</ul>

<h4 class="remarkup-header">New feature for Maniphest tasks: dependency graph</h4>

<p>This very helpful feature displays a graphical representation of a task&#039;s Parents and Subtasks.</p>

<p><div class="phabricator-remarkup-embed-layout-left"><a href="https://phab.wmfusercontent.org/file/data/b25unvffsbd6s6o6jg4x/PHID-FILE-6at5phf2wzuiuofmmvvo/Example_screenshot_of_the_Phabricator_Task_Graph" class="phabricator-remarkup-embed-image" data-sigil="lightboxable" data-meta="0_852"><img src="https://phab.wmfusercontent.org/file/data/ro72rtnmnyclzrv3t257/PHID-FILE-x25u6wwoaoufybgtjrmx/preview-Screenshot_from_2016-07-19_16-10-44.png" width="220" height="129.726443769" alt="Example screenshot of the Phabricator Task Graph (194×329 px, 14 KB)" /></a></div></p>

<p>Initially there was an issue with this feature that made tasks with many relationships unable to load. This was exacerbated by the historical use of &quot;tracking tasks&quot; in the Wikimedia Bugzilla context. Thankfully after a quick patch from <a href="https://phabricator.wikimedia.org/p/epriestley/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_878"><span class="phui-tag-core phui-tag-color-person">@epriestley</span></a> (the primary author of Phabricator) and lots of help and testing from <a href="https://phabricator.wikimedia.org/p/Danny_B/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_879"><span class="phui-tag-core phui-tag-color-person">@Danny_B</span></a> and <a href="https://phabricator.wikimedia.org/p/Paladox/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_880"><span class="phui-tag-core phui-tag-color-person">@Paladox</span></a>, <a href="https://phabricator.wikimedia.org/p/mmodell/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_881"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@mmodell</span></a> was able to deploy a fix for the issue a little over 24 hours after it was discovered.</p>

<p>Here&#039;s to yet more fruitful collaborations with upstream Phabricator!</p></div></content></entry><entry><title>Code Review Office Hours</title><link href="/phame/live/1/post/5/code_review_office_hours/" /><id>https://phabricator.wikimedia.org/phame/post/view/5/</id><author><name>mmodell (Mukunda Modell)</name></author><published>2016-05-09T21:50:08+00:00</published><updated>2016-05-15T09:59:10+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Starting Thursday May 12th, 13:00 PDT ( 20:00 GMT ) we will be having the first weekly Code Review office hours on freenode IRC in the #wikimedia-codereview channel.</p>

<p>Event details: <a href="/E179" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_882"><span class="phui-tag-core phui-tag-color-object">E179: Code Review Office Hours</span></a><br />
Background: <a href="/T128371" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_883"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T128371: Set up Code Review office hours</span></span></a></p>

<p>Thanks to everyone who&#039;s been helping to organize this. We would welcome people to submit your patches for review as well as reviewers who can spare a few minutes to provide feedback and hopefully merge some patches!</p>

<p>If you can&#039;t make it during the scheduled time period then please feel free to suggest other times that would be better for you. I intend to set up one or two other weekly time slots, at least one of which should be at a time that&#039;s more convenient for people in Europe and Asia.</p>

<p>Looking forward to seeing you in #wikimedia-codereview</p></div></content></entry><entry><title>What&#039;s new: Lots of improvements on phabricator.wikimedia.org</title><link href="/phame/live/1/post/1/what_s_new_lots_of_improvements_on_phabricator.wikimedia.org/" /><id>https://phabricator.wikimedia.org/phame/post/view/1/</id><author><name>mmodell (Mukunda Modell)</name></author><published>2016-02-23T00:23:37+00:00</published><updated>2016-03-20T12:32:36+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Not a lot has changed for Wikimedia&#039;s instance of Phabricator over the past few months. That&#039;s because a lot has been happening behind the scenes, as well as upstream at <a href="https://phacility.com/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Phacility</a>. Members of the <a href="/tag/release-engineering-team/" class="phui-tag-view phui-tag-type-shade phui-tag-violet phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_894"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_893" aria-hidden="true"></span>Release-Engineering-Team</span></a> and <a href="/tag/team-practices/" class="phui-tag-view phui-tag-type-shade phui-tag-disabled phui-tag-shade phui-tag-icon-view " data-sigil="hovercard" data-meta="0_896"><span class="phui-tag-core "><span class="visual-only phui-icon-view phui-font-fa fa-users" data-meta="0_895" aria-hidden="true"></span>Team-Practices</span></a> group have been working since December 2015 to integrate various upstream changes, however, nothing was released to our production instance because there were so many important features that were in-progress and not yet fully usable. Additionally, we had to figure out exactly how these features would fit with the specific needs of our project and test a lot of functionality to be sure that we would not break anyone&#039;s workflows.</p>

<p>So our Phabricator instance has been relatively unchanged since November of last year. This all changed last Wednesday night <em>(Thursday February 18th, 01:00 UTC)</em> when we unleashed several months of changes into production. If you use <a href="https://phabricator.wikimedia.org" class="remarkup-link" rel="noreferrer">phabricator.wikimedia.org</a> regularly then you have probably already noticed some of the more obvious improvements.</p>

<p>A whole lot of hard work went into this release. Thankfully, everyone&#039;s hard work seems to have paid off as we only encountered a couple of relatively small issues which were fixed quickly after.</p>

<p>This post is to fill everyone in about what&#039;s changed and what you can expect from some of the exciting new functionality that has been added with this release.</p>

<h2 class="remarkup-header">Custom Forms</h2>

<ul class="remarkup-list">
<li class="remarkup-list-item">Some likely use cases include:<ul class="remarkup-list">
<li class="remarkup-list-item">Custom markup at the top of forms (<a href="https://phabricator.wikimedia.org/T115017" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_884"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T115017</span></span></a>)</li>
<li class="remarkup-list-item">Pre-filling information in fields</li>
<li class="remarkup-list-item">Hiding certain fields (<a href="https://phabricator.wikimedia.org/T120903" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_885"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T120903</span></span></a>)</li>
<li class="remarkup-list-item">Bug reporting and template tasks can be entered more easily (<a href="https://phabricator.wikimedia.org/T91538" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_886"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T91538</span></span></a>)</li>
</ul></li>
<li class="remarkup-list-item">A great deal of caution is required when using this new functionality.<ul class="remarkup-list">
<li class="remarkup-list-item">Form creation is limited to admins because it is currently too easy to accidentally override existing forms when someone creates a new form without fully understanding the subtleties of the new system</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/p/mmodell/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_897"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@mmodell</span></a> can answer questions about what is possible.</li>
<li class="remarkup-list-item">Anyone with a use-case for a custom form can request that one be set up by Phabircator admin. We have not established a formal process for this yet.</li>
</ul></li>
</ul>

<h2 class="remarkup-header">Customizable Project Pages</h2>

<p>It&#039;s now possible to customize individual project pages to meet the needs of each type of project or the needs of specific teams.</p>

<ul class="remarkup-list">
<li class="remarkup-list-item">Custom links can be added to the navigation menu. This is great for prominently linking to a project wiki page or other relevant URLs that are relevant to a project.</li>
<li class="remarkup-list-item">The default page that is shown when visiting a project can be configured. For some projects, it makes more sense to go directly to the workboard, for others, the project details page is more appropriate.</li>
<li class="remarkup-list-item">We can disable the workboard entirely for certain projects (useful for &#039;tag&#039; type projects)</li>
<li class="remarkup-list-item">There is an API for developing custom panels to be placed on project pages or as part of the navigation menus. These are new and unstable but it is seems like a promising way for us to extend Phabricator with new functionality in the future.</li>
</ul>

<h2 class="remarkup-header">Milestones &amp; Sub-Projects</h2>

<p>Projects can now be nested. There are two new types of projects in Phabricator and they could prove to be really useful for organizing all of the things. Sub-projects are just like regular projects, but nested inside of an existing project. Milestones are a special type of sub-project that can be used to represent a sprint or a software release. There are a few somewhat complex rules about how project membership, policies and tasks are affected by sub-projects. There is detailed coverage in the <a href="https://secure.phabricator.com/book/phabricator/article/projects/" class="remarkup-link remarkup-link-ext" rel="noreferrer">Phabricator Projects Documentation</a> and we have attempted to explain some of the implications here:</p>

<h3 class="remarkup-header">Comparison of Sub-projects vs. Milestones</h3>

<ul class="remarkup-list">
<li class="remarkup-list-item">Sub-projects have members, milestones do not.</li>
<li class="remarkup-list-item">Parent-projects&#039; members are the union of all sub-projects&#039; members. When adding the first sub-project to a parent, all existing members get moved to the subproject.</li>
<li class="remarkup-list-item">Tasks can only exist in a single milestone, but can exist in multiple sub-projects.</li>
<li class="remarkup-list-item">Milestones exist as columns within the parent project&#039;s workboard, sub-projects have their own workboard.</li>
</ul>

<h3 class="remarkup-header">Sub-projects in detail</h3>

<ul class="remarkup-list">
<li class="remarkup-list-item">Projects can have sub-projects.  A subproject behaves like a regular project, and moving a task between a project and sub-project is the same as moving a task between two unrelated projects, except:<ul class="remarkup-list">
<li class="remarkup-list-item">Filtering by project matches all Sub-Project tasks.</li>
<li class="remarkup-list-item">Moving a task from a project to a sub-project does auto-remove the parent project.</li>
</ul></li>
<li class="remarkup-list-item">It&#039;s very easy to navigate from viewing a sub-project to viewing a project, via the breadcrumb trail (one click, always in the same place, always present; and then a page reload).</li>
<li class="remarkup-list-item">It&#039;s possible, and maybe easier than searching, but not trivial, to navigate from projects to sub-projects.  You have to click on sub-projects in the menu, wait for page reload, see the list of projects, identify the one you want, click on it, and wait for page reload.</li>
<li class="remarkup-list-item">Sub-projects often appear in the UI as Project &gt; sub-project, but they appear in name completion as Sub-project, so if you name your sub-project &quot;bugs&quot;, it will be really confusing in completion.<ul class="remarkup-list">
<li class="remarkup-list-item">Hopefully we will get this fixed so that completion shows the parent project.</li>
</ul></li>
<li class="remarkup-list-item">A task can belong to two different sub-projects within the same project.</li>
</ul>

<h3 class="remarkup-header">Milestones are also regular projects, except:</h3>

<ul class="remarkup-list">
<li class="remarkup-list-item">They can be a child of a project or sub-project, but can&#039;t be a child of another milestone.</li>
<li class="remarkup-list-item">Milestones also appear as columns in their parent project, and so tasks in a project can be moved to milestones via drag and drop.</li>
<li class="remarkup-list-item">A task can&#039;t belong to both a project and to a milestone in that project; if it&#039;s in the milestone, adding the milestone&#039;s parent project to it removes the milestone (but, possible bug, in the UI it still appears in the Milestone&#039;s column).</li>
<li class="remarkup-list-item">Milestone names are not directly available in autocomplete.  Instead, you see the parent (sub)project, followed by the Milestone name in parenthesis.</li>
<li class="remarkup-list-item">You can&#039;t assign a new task to a project and to a milestone in that project in one action; it takes several full steps.</li>
<li class="remarkup-list-item">There&#039;s some UI for auto-numbering milestones in sequence.</li>
</ul>

<h2 class="remarkup-header">Story points is now built in to Phabricator</h2>

<p>Previously this functionality was provided by a custom field and <a href="/diffusion/PHSP/" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_890"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">rPHSP phabricator-Sprint</span></span></a></p>

<ul class="remarkup-list">
<li class="remarkup-list-item">All tasks will show a story point field by default<ul class="remarkup-list">
<li class="remarkup-list-item">A custom form could be created to restrict this per project</li>
</ul></li>
<li class="remarkup-list-item">All numeric story points have been transitioned to the new field, the old story points field is now disabled.</li>
</ul>

<h2 class="remarkup-header">Other new features and bugs fixed</h2>

<ul class="remarkup-list">
<li class="remarkup-list-item">Auto-completion of usernames and projects in all markup fields &amp; comments. (<a href="https://phabricator.wikimedia.org/T876" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_887"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T876</span></span></a>)</li>
<li class="remarkup-list-item">Non-members can watch projects(<a href="https://phabricator.wikimedia.org/T77228" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_888"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T77228</span></span></a>)</li>
<li class="remarkup-list-item">The &quot;Security&quot; field on tasks is now deprecated. Use the &quot;Report security issue&quot; form instead of submitting a regular task with &quot;security&quot; set to &quot;Software security bug.&quot;</li>
<li class="remarkup-list-item">It&#039;s now possible to make multiple changes to a task from the comment form instead of using the advanced edit form or submitting multiple times.</li>
<li class="remarkup-list-item">Marking a task as resolved no longer re-assigns it (<a href="https://phabricator.wikimedia.org/T84833" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_889"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">T84833</span></span></a>)</li>
</ul>

<h2 class="remarkup-header">Thanks to everyone who helped out testing this release</h2>

<p>This couldn&#039;t have happened without everyone&#039;s help  &lt;3</p>

<p>Specifically I&#039;d like to thank:</p>

<ul class="remarkup-list">
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/p/Luke081515/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_898"><span class="phui-tag-core phui-tag-color-person">@Luke081515</span></a> and <a href="https://phabricator.wikimedia.org/p/Paladox/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_899"><span class="phui-tag-core phui-tag-color-person">@Paladox</span></a> for testing various bugs and generally making helpful suggestions.</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/p/thcipriani/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_900"><span class="phui-tag-core phui-tag-color-person">@thcipriani</span></a> and <a href="https://phabricator.wikimedia.org/p/fgiunchedi/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_901"><span class="phui-tag-core phui-tag-color-person">@fgiunchedi</span></a> for all of the deployment-related things.</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/p/ArielGlenn/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_902"><span class="phui-tag-core phui-tag-color-person">@ArielGlenn</span></a> and <a href="https://phabricator.wikimedia.org/p/chasemp/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_903"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@chasemp</span></a> for reviewing, merging and babysitting my patches in <a href="/source/operations-puppet/" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_891"><span class="phui-tag-core phui-tag-color-object">rOPUP Wikimedia Puppet</span></a>.</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/p/csteipp/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_904"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@csteipp</span></a> for reviewing changes to <a href="/diffusion/PHES/" class="phui-tag-view phui-tag-type-object " data-sigil="hovercard" data-meta="0_892"><span class="phui-tag-core-closed"><span class="phui-tag-core phui-tag-color-object">rPHES phabricator-Security</span></span></a>.</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/p/JAufrecht/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_905"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@JAufrecht</span></a> for thoroughly testing sub-projects and milestones. Many of the points above were lifted from <a href="https://lists.wikimedia.org/pipermail/teampractices/2016-February/001010.html" class="remarkup-link remarkup-link-ext" rel="noreferrer">his email to the team practices mailing list.</a></li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/p/DStrine/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_906"><span class="phui-tag-core phui-tag-color-person"><span class="phui-tag-dot phui-tag-color-grey"></span>@DStrine</span></a> for helping with testing new features and organizing these notes</li>
<li class="remarkup-list-item"><a href="https://phabricator.wikimedia.org/p/epriestley/" class="phui-tag-view phui-tag-type-person " data-sigil="hovercard" data-meta="0_907"><span class="phui-tag-core phui-tag-color-person">@epriestley</span></a> for going out of his way on several occasions to address the bugs that we have collectively reported upstream, as well as proactively responding to bugs in our own phabricator instance. He&#039;s given us a lot of really valuable support and we would be much worse off without his help.</li>
</ul></div></content></entry></feed>