<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet href="https://syair.angkatogeljitu.workers.dev/host-https-timotijhof.net/wp-content/plugins/pretty-rss-feeds/xslt/pretty-feed.xsl" type="text/xsl" media="screen" ?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Timo Tijhof</title>
	<atom:link href="https://timotijhof.net/feed/" rel="self" type="application/rss+xml" />
	<link>https://timotijhof.net</link>
	<description></description>
	<lastBuildDate>Wed, 25 Mar 2026 00:22:01 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>John Cleese on Creativity (Transcript)</title>
		<link>https://timotijhof.net/posts/2026/john-cleese-on-creativity-transcript/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Sat, 14 Feb 2026 12:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[Linked]]></category>
		<guid isPermaLink="false">https://timotijhof.net/?p=1367</guid>

					<description><![CDATA[The below is transcribed from a 1991 talk by John Cleese titled Creativity in Management. I encourage you to watch the 30-minute recording on YouTube. The delivery is hilarious with great comedic timing that my transcript can&#8217;t begin to do justice. I edited the transcript for brevity, and added headings and links. This speech was given by…]]></description>
										<content:encoded><![CDATA[
<p>The below is transcribed from a 1991 talk by John Cleese titled <strong>Creativity in Management</strong>. I encourage you to watch the <a href="https://www.youtube.com/watch?v=toWQ_BQF8Aw">30-minute recording on YouTube</a>. The delivery is hilarious with great comedic timing that my transcript can&#8217;t begin to do justice. I edited the transcript for brevity, and added headings and links.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>This speech was given by John Cleese to an international audience linked by satellite at the Grosvenor House Hotel London, 23rd January 1991.</p>
</blockquote>



<h2 class="wp-block-heading">What creativity isn&#8217;t</h2>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>A couple of years ago I got very excited because a friend of mine, who runs the psychology department at Sussex University, <a href="https://en.wikipedia.org/wiki/Brian_Bates_(psychologist)">Brian Bates</a>, showed me some research on creativity done at Berkeley in the 70s by a brilliant psychologist called <a href="https://en.wikipedia.org/wiki/Donald_W._MacKinnon">Donald MacKinnon</a>, which seemed to confirm in the most impressively scientific way: all the vague observations and intuitions that I&#8217;d have over the years. […]</p>



<p>The reason why it is futile for me to talk about creativity, is that it simply cannot be explained. It&#8217;s like Mozart&#8217;s music, or Van Gogh&#8217;s painting. It is literally inexplicable.</p>



<p>Freud, who analysed practically everything else, repeatedly denied that psychoanalysis could shed any light whatsoever on the mysteries of creativity. Brian Bates wrote to me recently: &#8220;Most of the best research on creativity was done in the 60s and 70s with a quite dramatic drop-off in quantity after then&#8221;, largely, I suspect, because researchers began to feel that they had reached the limits of what science could discover about it.</p>



<p>The only thing from the research that I could tell you about how to be creative, is the sort of childhood that you should have had, which is of limited help to you at this point of your lives.</p>



<p>However there is one <em>negative</em> thing that I can say, and it&#8217;s negative because it&#8217;s easier to say what creativity isn&#8217;t. A bit like the sculptor who, when asked how he had sculpted a very fine elephant, explained that he&#8217;d taken a big block of marble and then knocked away all the bits that&nbsp;<em>didn&#8217;t</em>&nbsp;look like an elephant. Now here&#8217;s the negative thing:</p>



<p><strong>Creativity is not a talent.</strong> It is <em>not</em> a talent. <strong>It is a way of operating.</strong> […]</p>



<p>When I say &#8220;a way of operating&#8221;, what I mean is this: Creativity is not an ability that you either have or do not have. It is […] absolutely unrelated to IQ (provided you&#8217;re intelligent above certain minimal level that is).</p>



<p>MacKinnon showed in investigating scientists, architects, engineers, and writers, that those regarded by their peers as &#8220;most creative&#8221; were in no way whatsoever different in IQ from their less creative colleagues.</p>



<p>So in what way were they different?</p>
</blockquote>



<h2 class="wp-block-heading">Open and closed mode</h2>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>MacKinnon showed that the most creative had simply acquired a facility for getting themselves into a particular mood, a way of operating, which allowed their natural creativity to function. MacKinnon described this particular facility as an ability to play. He described the most creative, when in this mood, as being childlike. They were able to play with ideas to explore them, not for any immediate practical purpose, but&nbsp;<em>just</em>&nbsp;for enjoyment. Play for its own sake.</p>



<p>I&#8217;m working at the moment with <a href="https://en.wikipedia.org/wiki/Robin_Skynner">Dr. Robin Skynner</a> on a successor to our psychiatry book <em><a href="https://en.wikipedia.org/wiki/Families_and_How_to_Survive_Them">Families and How to Survive Them</a></em>. We&#8217;re comparing the ways in which psychologically healthy families function, and the ways in which the most successful corporations and organisations function. We became fascinated by the fact that we can usefully describe the way in which people function at work in terms of two modes: open and close. <strong>Creativity is not possible in the closed mode</strong>. […]</p>



<p>By the&nbsp;<em>closed mode</em>&nbsp;I mean the mode that we are in most of the time when we&#8217;re at work. We have inside us a feeling that there&#8217;s lots to be done, and we have to get on with it if we&#8217;re gonna get through it all. It&#8217;s an active, probably slightly anxious, mode. Although the anxiety can be exciting and pleasurable. It&#8217;s a mode in which we&#8217;re probably a little impatient, if only with ourselves. It has a little tension in it, not much humour, it&#8217;s a mode in which we&#8217;re very purposeful, and it&#8217;s a mode in which we&nbsp;<em>can</em>&nbsp;get very stressed and even a bit manic, but&nbsp;<em>not</em>&nbsp;creative.</p>



<p>By contrast the&nbsp;<em>open&nbsp;mode</em> is a relaxed, expansive, less purposeful, mode in which we&#8217;re probably more contemplative, more inclined to humour (which always accompanies a wider perspective), and consequently more playful. It&#8217;s a mood in which curiosity for its own sake can operate, because we&#8217;re not under pressure to get a specific thing done quickly. We can play. And that is what allows natural creativity to surface. Let me give you an example of what I mean.</p>
</blockquote>



<h3 class="wp-block-heading">Discovery of penicillin</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>When <a href="https://en.wikipedia.org/wiki/Alexander_Fleming">Alexander Fleming</a> had the thought that led to the discovery of penicillin, he must have been in the open mode. The previous day, he&#8217;d arranged a number of dishes so that culture would grow upon them. On the day in question, he glanced at the dishes, and he discovered that on one of them, no culture had appeared. If he&#8217;d been in the closed mode, he would have been so focused upon his need for dishes with cultures grown upon them, that when he saw that one dish was of no use to him for that purpose, he would quite simply have thrown it away.</p>



<p>Thank goodness, he was in the open mode, so he became curious about&nbsp;<em>why</em>&nbsp;the culture had not grown on this particular dish. That curiosity, as the world knows, led him […] to penicillin.</p>



<p>In the closed mode, an uncultured dish is an irrelevance. In the open mode, it&#8217;s a clue. One more example:</p>
</blockquote>



<h3 class="wp-block-heading">Hitchcock</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>One of Alfred Hitchcock&#8217;s regular co-writers has described working with him on screenplays. He says:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><em>When we came up against a block, and our discussions became very heated and intense, Hitchcock would suddenly stop and tell a story that had nothing to do with the work at hand. At first, I was almost outraged.</em></p>



<p><em>I discovered that he did this intentionally. He&nbsp;<strong>mistrusted</strong>&nbsp;working under pressure. He would say &#8220;We&#8217;re pressing, we&#8217;re pressing, we&#8217;re working too hard. Relax, it will come.&#8221; And, of course it finally always did.</em></p>
</blockquote>
</blockquote>



<h3 class="wp-block-heading">Implement in the closed mode</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Let me make one thing quite clear. We need to be in the open mode when we&#8217;re pondering a problem. But, once we come up with a solution, we must then switch to the closed mode to implement it. Once we&#8217;ve made a decision we are efficient only if we go through with it decisively, undistracted by doubts about its correctness. For example, if you decide to leap a ravine, the moment just before takeoff is a bad time to start reviewing alternative strategies!</p>
</blockquote>



<h3 class="wp-block-heading">Review in the open mode</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>We should once again switch back to the&nbsp;<em>open</em>&nbsp;mode to review the feedback arising from our action, in order to decide whether the course that we have taken is successful […], or whether we should create an alternative plan to correct any error we&#8217;ve perceived, and then back into the closed mode again to implement that next stage. And so on.</p>



<p>To be at our most efficient, we need to be able to switch backwards and forwards between the two roads.</p>



<p>But here&#8217;s the problem: We too often get stuck in the closed mode. Under the pressures which are all too familiar to us. We tend to maintain tunnel vision at times, when we really need to step back and contemplate the wider view.</p>



<p>This is particularly true of politicians. The main complaint about them, from their non-political colleagues, is that they become so addicted to the adrenaline that they get from reacting to events on an hour-by-hour basis, that they almost completely lose the desire or the ability to ponder problems in the open mode.</p>



<p>So, as I say:&nbsp;Creativity is not possible in the closed mode. […]</p>
</blockquote>



<h3 class="wp-block-heading">Conditions for the open mode</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>There are certain conditions which make it more likely that you&#8217;ll get into the open mode, and that something creative will occur. More likely. You can&#8217;t guarantee anything will occur. You might sit around for hours, as I did last Tuesday, and nothing, zilch, bupkis, not a sausage.</p>



<p>I can at least tell you how to get yourselves into the open mode. You need five things:</p>



<ol class="wp-block-list">
<li>Space.</li>



<li>Time.</li>



<li>Time.</li>



<li>Confidence.</li>



<li>Humor.</li>
</ol>



<p>[…]</p>
</blockquote>



<h2 class="wp-block-heading">Factor 1: Space</h2>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>You can&#8217;t become playful, and therefore creative, if you&#8217;re under your usual pressures. To cope with them, you&#8217;ve got to be in the closed mode, right? You have to create some space for yourself away from those demands, and that means sealing yourself off.</p>



<p>You must make a quiet space for yourself, where you will be undisturbed.</p>



<p>Next: Time.</p>
</blockquote>



<h2 class="wp-block-heading">Factor 2: Time</h2>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>It&#8217;s not enough to create space. You have to create your space for a&nbsp;<em>specific</em>&nbsp;period of time.</p>



<p>You have to know that your space will last until, exactly, say, 3:30, and that at that moment your normal life will start again.</p>



<p>It&#8217;s only by having a&nbsp;<em>specific</em>&nbsp;moment when your space starts, and an equally specific moment when your space stops, that you can seal yourself off from the everyday closed mode in which we all habitually operate.</p>
</blockquote>



<h3 class="wp-block-heading">Johan Huizinga</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>I&#8217;d never realised how vital this was, until I read a historical study of play, by a Dutch historian called <a href="https://en.wikipedia.org/wiki/Johan_Huizinga">Johan Huizinga</a>. In it, he says:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Play is distinct from ordinary life. Both as to locality, and duration. This is its main characteristic. It&#8217;s secludedness. It&#8217;s limitedness.</p>



<p>Play begins and then, at a certain moment, it is over. Otherwise, it&#8217;s not play.</p>
</blockquote>
</blockquote>



<h3 class="wp-block-heading">Oasis of Quiet — Not so fast</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Combining the first two factors, we create an Oasis of Quiet, for ourselves, by setting boundaries of space, and of time. Now, creativity can happen, because play is possible, when we are separate from everyday life.</p>



<p>So, you&#8217;ve arranged to take no calls, you&#8217;ve closed your door, you sat down somewhere comfortable. We take a couple of deep breaths and, if you&#8217;re anything like me, after you&#8217;ve pondered some problem that you want to turn into an opportunity for about 90 seconds, you find yourself thinking: Oh I forgot I&#8217;ve got to call Jim! I must tell Tina that I need the report on Wednesday and not Thursday, which means I must move my lunch with Joe, and […] I must pop out this afternoon to get Will&#8217;s birthday present, and those plants need watering, and none of my pencils are sharpened and&#8230; Right, I&#8217;ve got too much to do, so I&#8217;m going to start by sorting out my paper clips, then I shall make 27 phone calls, and I&#8217;ll do some thinking tomorrow, when I&#8217;ve got everything out of the way.</p>



<p>Because, it&#8217;s easier to do trivial things that are urgent, than it is to do important things that are not urgent, like thinking.</p>



<p>It&#8217;s also easier to do little things we&nbsp;<em>know</em>&nbsp;we can do, than to start on big things that we&#8217;re not so sure about.</p>



<p>So, when I say create an Oasis of Quiet, know that when you have your mind will pretty soon start racing again, but you&#8217;re not going to take that very seriously. You just sit there, for a bit, tolerating the racing and the slight anxiety that comes with that, and after a time your mind will quieten down again.</p>



<p>Because it takes some time for your mind to quieten down, it&#8217;s absolutely no use arranging a space-time oasis lasting 30 minutes. Just as you&#8217;re getting quieter, and getting into the open mode, you&#8217;ll have to stop, and that is&nbsp;<em>very</em>&nbsp;deeply frustrating. You must allow yourself a good chunk of time. I&#8217;d suggest about an hour and a half. Then, after you&#8217;ve gotten to the open mode, you&#8217;ll have about an hour left for something to happen (if you&#8217;re lucky).</p>



<p>But, don&#8217;t put a whole morning aside. My experience is, after about an hour and a half, you need a break. So it&#8217;s far better to do an hour and a half now, and then an hour and a half next Thursday, and maybe an hour and a half a week after that; then to fix one four-hour session &#8220;now&#8221;.</p>



<p>There&#8217;s another reason, and that&#8217;s factor number three: Time.</p>



<p>Yes, I know we&#8217;ve just done&nbsp;<em>Time</em>, but that was half of creating our Oasis. Now, I&#8217;m going to tell you about how to&nbsp;<em>use</em>&nbsp;the Oasis you&#8217;ve created. Why do you still need time?</p>
</blockquote>



<h2 class="wp-block-heading">Factor 3: Time (really)</h2>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Let me tell you a story. I was always intrigued, that one of my Monty Python colleagues, who seemed to be to me more talented than I was, did never produce scripts as original as mine. And I watched for some time, and then I began to see why.</p>



<p>If he was faced with a problem, and fairly soon saw a solution, he was inclined to take it. Even though, I think he knew, the solution was not very original. Whereas if I was in the same situation, although I was sorely tempted to take the easy way out and finish by five o&#8217;clock, I just couldn&#8217;t. I&#8217;d sit there, with the problem, for another hour and a quarter, and by sticking to it, would in the end almost always come up with something more original. It was that simple. My work was more creative than his, simply because I was prepared to stick with the problem longer.</p>



<p>So imagine my excitement when I found that this was&nbsp;<em>exactly</em>&nbsp;what MacKinnon found in his research! He discovered that the &#8220;most creative&#8221; professionals always played with the problem for much longer, before they tried to resolve it because: they were prepared to tolerate that slight discomfort and anxiety, that we all experience when we&nbsp;<em>haven&#8217;t</em>&nbsp;solved a problem. You know what I mean?</p>



<p>If we have a problem and we we need to solve it, until we do, we feel it inside us: a kind of internal agitation or tension or uncertainty that makes us just plain uncomfortable. And we want to get rid of that discomfort. So, in order to do so, we take a decision; not because we&#8217;re sure it&#8217;s the best decision, but because taking it will make us feel better.</p>



<p>Well, the most creative people have learned to tolerate that discomfort for much longer. So, just because they put in more pondering time, their solutions are more creative.</p>



<p>The people I find it hardest to be creative with, are people who need (all the time) to project an image of themselves as decisive, and, who feel that to create this image, they need to decide everything very quickly, and with a great show of confidence. This behaviour, I suggest sincerely, is the most effective way of strangling creativity at birth.</p>



<p>Please note, I&#8217;m not arguing against real decisiveness. I&#8217;m 100% in favour of taking a decision when it has to be taken, and then sticking to it while it&#8217;s being implemented. What I&#8217;m suggesting to you, is that before you take a decision, you should always ask yourself the question: When does this decision have to be taken? And having answered that, you defer the decision until then, in order to give yourself maximum pondering time, which will lead you to the most creative solution.</p>



<p>And if, while you&#8217;re pondering, somebody accuses you of indecision say: Look babycakes, I don&#8217;t have to decide till Tuesday and I&#8217;m not chickening out of my creative discomfort by taking a snap decision before then; that&#8217;s too easy!</p>



<p>To summarise, the third factor that facilitates creativity is Time: giving your mind as long as possible to come up with something original.</p>
</blockquote>



<h2 class="wp-block-heading">Factor 4: Confidence</h2>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>The next factor, number four, is Confidence.</p>



<p>When you&#8217;re in your space-time Oasis (getting into the open mode) nothing will stop you being creative so effectively as the fear of making a mistake. If you think about play, you&#8217;ll see why.</p>



<p>To play, is to experiment &#8220;what happens if I do this&#8221;, &#8220;what would happen if we do that&#8221;. What is the very essence of playfulness, is an openness to&nbsp;<em>anything</em>&nbsp;that may happen; a feeling that whatever happens, it&#8217;s okay!</p>



<p>You cannot be playful if you&#8217;re frightened that moving in some direction will be &#8220;wrong&#8221;, something you &#8220;shouldn&#8217;t have done&#8221;. You&#8217;re either free to play, or you&#8217;re not.</p>



<p>As <a href="https://en.wikipedia.org/wiki/Alan_Watts">Alan Watts</a> puts it: &#8220;You can&#8217;t be spontaneous within reason.&#8221;</p>



<p>You&#8217;ve got to risk saying things that are silly, and illogical, and wrong. The best way to get the confidence to do that, is to know that, while you&#8217;re being creative,&nbsp;<em>nothing</em>&nbsp;is wrong; there&#8217;s no such thing as a mistake, and&nbsp;<em>any</em>&nbsp;drivel may lead to the breakthrough.</p>
</blockquote>



<h2 class="wp-block-heading">Factor 5: Humour</h2>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Now the last factor, the fifth, Humour.</p>



<p>I happen to think the main evolutionary significance of humour, is that it gets us from the closed mode to the open mode quicker than anything else.</p>



<p>I think we all know that laughter brings relaxation, and that humour makes us playful. Yet, how many times have important discussions been held, where really original and creative ideas were desperately needed to solve important problems, but where humour was taboo, because the subject being discussed was &#8220;so serious&#8221;? This attitude seems to me to stem from a very basic misunderstanding of the difference between&nbsp;<em>serious</em>&nbsp;and&nbsp;<em>solemn</em>.</p>
</blockquote>



<h3 class="wp-block-heading">Serious does not mean solemn</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>A group of us could be sitting around after dinner, discussing matters that were extremely serious (like the education of our children, our marriages, the meaning of life, &#8230; not talking about <a href="https://en.wikipedia.org/wiki/Monty_Python%27s_The_Meaning_of_Life">the film</a>) and we could be laughing, and that would not make what we were discussing one bit less serious.</p>



<p>Solemnity, on the other hand, I don&#8217;t know what it&#8217;s for. What is the point of it?</p>



<p>The two most beautiful memorial services that I&#8217;ve ever attended, both had a lot of humour. It freed us all, and made the services inspiring and cathartic. But solemnity? It serves pomposity. The self-important [people] always know, at some level of their consciousness, that their egotism is going to be punctured by humour.&nbsp;<em>That&#8217;s</em>&nbsp;why they see it as a&nbsp;<em>threat</em>, and so dishonestly pretend that their deficiency makes their views more substantial, when it only makes&nbsp;<em>them</em>&nbsp;feel bigger.</p>



<p>Humour is an essential part of spontaneity; an essential part of playfulness; an essential part of the creativity that we need to solve problems, no matter how serious they may be.</p>



<p>When you set up a space-time Oasis, giggle all you want!</p>



<p>And there, are the five factors which you can arrange to make your lives more creative:</p>



<ul class="wp-block-list">
<li>Space,</li>



<li>Time,</li>



<li>Time.</li>



<li>Confidence,</li>



<li>and <a href="https://en.wikipedia.org/wiki/Jeffrey_Archer">Lord Jeffrey Archer</a>.</li>
</ul>
</blockquote>



<h2 class="wp-block-heading">Practicing the open mode</h2>



<h3 class="wp-block-heading">Pondering</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Now you know how to get into the open mode, the only other requirement is that you keep your mind gently round the subject you ponder. You&#8217;ll daydream, of course, but you just keep bringing your mind back, like with meditation.</p>



<p>The extraordinary thing about creativity is: if you just keep your mind resting against the subject in a friendly but persistent way, sooner or later you will get a reward from your <strong>unconscious</strong>. Probably in the shower later, or at breakfast the next morning, but suddenly you are rewarded, out of the blue a new thought mysteriously appears. If you&#8217;ve put in the pondering time first.</p>
</blockquote>



<h3 class="wp-block-heading">Play requires trust</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>I think it&#8217;s easy to be creative, if you&#8217;ve got other people to play with. I always find that if two or more of us throw ideas backwards and forwards, I get to more interesting and original places than I could ever have got to on my own.</p>



<p>But, there is a danger, a&nbsp;<em>real</em>&nbsp;danger: If there&#8217;s&nbsp;<em>one</em>&nbsp;person around you who makes you feel defensive, you lose the confidence to play, and it&#8217;s goodbye creativity. Always make sure your play-friends are people that you like and trust. Never say anything to squash&nbsp;<em>them</em>, either. Never say &#8220;No&#8221;, or &#8220;Wrong&#8221;, or &#8220;I don&#8217;t like that&#8221;. Always be positive, and build on what&#8217;s been said: &#8220;Would it be&nbsp;<em>even</em>&nbsp;better if &#8230;&#8221;, &#8220;I don&#8217;t quite understand that can you just explain it again?&#8221;, &#8220;Go on!&#8221;, &#8220;What if &#8230;.?&#8221; Let&#8217;s pretend.</p>



<p>Try to establish as free an atmosphere as possible.</p>
</blockquote>



<h3 class="wp-block-heading">Japanese meetings</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Sometimes I wonder, if the success of the Japanese isn&#8217;t partly due to their instinctive understanding of how to use groups creatively. You know, Westerners are often amazed at the unstructured nature of Japanese meetings.</p>



<p>But maybe it&#8217;s that very lack of structure, that absence of time pressure, that frees them to solve problems so creatively. And how clever of the Japanese, sometimes to plan that unstructuredness by, for example, insisting that the first people to give their views are the most junior. So that they can speak freely, without the possibility of contradicting what&#8217;s already been said by somebody more important.</p>
</blockquote>



<h3 class="wp-block-heading">Connect two ideas in a new way</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>The very last thing that I can say about creativity is this: It&#8217;s like human. In a joke, the laugh comes at a moment when you connect two different frameworks of reference in a new way.</p>



<p>For example there&#8217;s the old story about a woman, doing a survey into sexual attitudes, who stops an airline pilot and asks him when he last had sexual intercourse. He replies &#8220;1958&#8221;. Now, knowing airline pilots, the researcher is surprised and queries this. &#8220;Well&#8221;, says the pilot, &#8220;it&#8217;s only 21:10 now&#8221;.</p>



<p>We laugh at the moment of contact between two frameworks of reference: the way we express what year it is, and the 24-hour clock.</p>



<p>Having an idea, a new idea, is exactly the same thing. It&#8217;s connecting two separate ideas in a way that generates new meaning. Now, connecting different ideas isn&#8217;t difficult; you can connect cheese with motorcycles, or moral courage with light green, or bananas with international cooperation. You can get any computer to make a billion random connections for you, but&nbsp;<em>these</em>&nbsp;new connections or juxtapositions are significant&nbsp;<em>only</em>&nbsp;if they generate new meaning.</p>



<p>As you play, you can deliberately try inventing these random juxtapositions, and use your intuition to tell you whether any of them seem to have significance for you.&nbsp;<em>That&#8217;s</em>&nbsp;the bit the computer can&#8217;t do. It can produce millions of new connections, but it can&#8217;t tell which one of them&nbsp;<em>smells</em>&nbsp;interesting. Of course, you&#8217;ll produce some juxtapositions which are absolutely ridiculous. Absurd. Good for you!</p>



<p><a href="https://en.wikipedia.org/wiki/Edward_de_Bono">Edward de Bono</a>, who invented the notion of <em>lateral thinking</em>, specifically suggests in his book <em>Po: Beyond Yes and No</em>, that you can try loosening up your assumptions by playing with deliberately crazy connections. He calls such absurd ideas &#8220;intermediate impossibles&#8221;. He points out that the use of an intermediate impossible, is completely contrary to ordinary logical thinking, in which you have to be right at each stage. It doesn&#8217;t matter if the intermediate impossible is right or absurd, it can nevertheless be used as a stepping stone to another idea that <em>is</em> right. Another example of how when you&#8217;re playing, nothing is wrong.</p>



<p>If you really don&#8217;t know how to start, or if you&#8217;ve got stuck, start generating random connections and allow your intuition to tell you if one might lead somewhere interesting.</p>



<p>That really is all I can tell you, that won&#8217;t help you, to be creative. Everything.</p>
</blockquote>



<h2 class="wp-block-heading">How to kill creativity</h2>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>[…] The important part. And that is: How to stop your subordinates becoming creative — which is the real threat.</p>



<p>Believe me no one appreciates better than I do what trouble creative people are, and how they stop decisive hard-nosed bastards like us from running businesses efficiently. We encourage someone to be creative, the next thing is they&#8217;re rocking the boat, coming up with ideas, and asking&nbsp;<em>us</em>&nbsp;questions.</p>



<p>If we don&#8217;t nip this kind of thing in the bud, we&#8217;ll have to start justifying our decisions by reasoned argument. And sharing information, the concealment of which gives us considerable advantages in our power struggle.</p>



<p>So, here&#8217;s how to stamp out creativity in the rest of the organisation, and get a bit of respect going.</p>
</blockquote>



<h3 class="wp-block-heading">Allow no humour</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>One: Allow subordinates no humour.</p>



<p>It threatens your self-importance, especially your omniscience. Treat all humour as frivolous or subversive. Because subversive is, of course, what humour will be in your setup, as it&#8217;s the only way that people can express their opposition, since if they express it openly you&#8217;re down on them like a ton of bricks.</p>



<p>So, let&#8217;s get this clear: Blame humour for the resistance that your way of working creates. Then, you don&#8217;t have to blame your way of working. This is important, and I mean that solemnly: your dignity is no laughing matter.</p>
</blockquote>



<h3 class="wp-block-heading">Undermine confidence</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Second: Keeping ourselves feeling irreplaceable, involves cutting everybody else <a href="https://en.wiktionary.org/wiki/cut_down_to_size">down to size</a>.</p>



<p>Don&#8217;t miss an opportunity to undermine your employees confidence. A perfect opportunity comes when you&#8217;re reviewing work that they&#8217;ve done: Use your authority to zero in&nbsp;<em>immediately</em>&nbsp;on all the things you can find wrong.</p>



<p>Never, never, balance the negatives with positives. Only criticise, just as your school teachers did.</p>



<p>Always remember: Praise makes people uppity!</p>
</blockquote>



<h3 class="wp-block-heading">Demand urgency</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Third: Demand that people should always be actively&nbsp;<em>doing</em>&nbsp;things.</p>



<p>If you catch anybody pondering, accuse them of laziness and/or indecision. This is to starve employees of thinking time, because that leads to creativity, and insurrection.</p>



<p>Demand urgency at all time. Use lots of fighting talk and war analogies. Establish a permanent atmosphere of stress, of breathless anxiety, and crisis.</p>



<p>In a phrase: Keep that mode closed!</p>



<p>Now, in this way, we no-nonsense types can be sure that the tiny, tiny, microscopic, quantity of creativity in our organisation will all be ours!</p>



<p>But, let your vigilance slip for one moment, and you could find yourself surrounded by&nbsp;happy,&nbsp;enthusiastic, and&nbsp;creative&nbsp;people whom you might never be able completely to control, ever again.</p>



<p>So be careful! Thank you, and good night.</p>
</blockquote>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2026/john-cleese-on-creativity-transcript/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20John%20Cleese%20on%20Creativity%20%28Transcript%29&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2026%2Fjohn-cleese-on-creativity-transcript%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>📎 Unifying Wikipedia mobile and desktop domains</title>
		<link>https://timotijhof.net/posts/2025/unifying-mobile-and-desktop-domains/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Mon, 24 Nov 2025 14:30:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[Linked]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/?p=1362</guid>

					<description><![CDATA[Until now, when you visited a wiki (like en.wikipedia.org), the server responded in one of two ways: a desktop page, or a redirect to the equivalent mobile URL (like en.m.wikipedia.org). This mobile URL in turn served the mobile version of the page. All wikis now serve mobile page views on the canonical domain, instead of…]]></description>
										<content:encoded><![CDATA[
<p>Until now, when you visited a wiki (like <code>en.wikipedia.org</code>), the server responded in one of two ways: a desktop page, or a redirect to the equivalent mobile URL (like <code>en.m.wikipedia.org</code>). This mobile URL in turn served the mobile version of the page.<br><br>All wikis now serve mobile page views on the canonical domain, instead of via a redirect.<br><br>The changed improved mobile response time by 20% worldwide, un-broke Commons SEO, and fixed a long-standing UX issue with opening shared links on desktop. Read more about this on the Wikimedia Blog:</p>



<p><a href="https://techblog.wikimedia.org/2025/11/21/unifying-mobile-and-desktop-domains/">→ techblog.wikimedia.org</a></p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2025/unifying-mobile-and-desktop-domains/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20%F0%9F%93%8E%20Unifying%20Wikipedia%20mobile%20and%20desktop%20domains&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2025%2Funifying-mobile-and-desktop-domains%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>YouTube in a feed reader is&#8230; better?</title>
		<link>https://timotijhof.net/posts/2025/youtube-in-a-feed-reader-is-better/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Sat, 17 May 2025 13:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<guid isPermaLink="false">https://timotijhof.net/?p=900</guid>

					<description><![CDATA[Two months ago, I deleted my YouTube subscriptions. I now follow YouTube channels in my feed reader instead (I use the NetNewsWire app). How does that work? Is it better? How to follow a channel On desktop, or on the mobile site, copy from the addres bar when on any channel page, or from the…]]></description>
										<content:encoded><![CDATA[
<p>Two months ago, I deleted my YouTube subscriptions. I now follow YouTube channels in my feed reader instead (<a href="https://timotijhof.net/posts/2020/uses-this/">I use</a> the <a href="https://netnewswire.com">NetNewsWire app</a>). How does that work? Is it better?</p>



<span id="more-900"></span>



<h2 class="wp-block-heading">How to follow a channel</h2>



<ol class="wp-block-list">
<li>Copy link to the YouTube channel.</li>



<li>Paste into your feed reader.</li>



<li>That&#8217;s it!</li>
</ol>



<p>On desktop, or on the mobile site, copy from the addres bar when on any channel page, or from the share sheet, or copy a link to any channel in the search&nbsp;results (via right-click or long-press).</p>



<p>In the YouTube mobile app you can get the link via the &#8220;Copy link&#8221; button. Today, that sits in the unlabeled three-dotted &#8220;Share&#8221; menu. </p>



<div class="wp-block-group is-content-justification-center is-layout-flex wp-container-core-group-is-layout-64b26803 wp-block-group-is-layout-flex">
<figure class="wp-block-image size-full is-resized"><img fetchpriority="high" decoding="async" width="713" height="781" src="https://timotijhof.net/wp-content/uploads/2024_youtube_link_app.png" alt="" class="wp-image-902" style="width:300px"/><figcaption class="wp-element-caption">Share from YouTube app.</figcaption></figure>



<figure class="wp-block-image size-full is-resized"><img decoding="async" width="960" height="512" src="https://timotijhof.net/wp-content/uploads/2024_youtube_link_desktop.png" alt="" class="wp-image-903" style="width:400px"/><figcaption class="wp-element-caption">Copy the address from a browser tab.</figcaption></figure>



<figure class="wp-block-image size-full is-resized"><img decoding="async" width="680" height="313" src="https://timotijhof.net/wp-content/uploads/2024_youtube_link_search.png" alt="" class="wp-image-946" style="width:350px"/><figcaption class="wp-element-caption">YouTube search result.</figcaption></figure>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="1337" height="683" src="https://timotijhof.net/wp-content/uploads/2024_youtube_netnewswire_add.png" alt="" class="wp-image-919" style="width:400px;height:auto"/></figure>
</div>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading">Reader experience</h2>



<p>How does the fead reader experience compare to YouTube&#8217;s own &#8220;Subscriptions&#8221; page?</p>



<div class="wp-block-group is-content-justification-center is-layout-flex wp-container-core-group-is-layout-64b26803 wp-block-group-is-layout-flex">
<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="2536" height="1667" src="https://timotijhof.net/wp-content/uploads/2024_youtube_netnewswire_read.png" alt="" class="wp-image-920" style="width:700px"/></figure>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="750" height="1288" src="https://timotijhof.net/wp-content/uploads/2024_youtube_netnewswire_read_ios_light.png" alt="" class="wp-image-1221" style="width:auto;height:430px"/></figure>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="750" height="1288" src="https://timotijhof.net/wp-content/uploads/2024_youtube_netnewswire_read_ios_dark.png" alt="" class="wp-image-1222" style="width:auto;height:430px"/></figure>
</div>



<p>I used to triage the YouTube Subscriptions page by either clicking &#8220;Hide&#8221; on videos I&#8217;m not interested in, via the three-dot menu on YouTube, or by adding videos to a &#8220;Watch Later&#8221; playlist. This regularly breaks and causes discomfort in a number of ways. By using a feed reader, we get:</p>



<p><strong>Fast and efficient triage.</strong> I now spend less time &#8220;managing&#8221; my YouTube stuff. I quickly swipe past videos I&#8217;m not interested in, each automatically marked as read. Videos to watch later, I star. Or, I watch &#8217;em right there with fullscreen and picture-in-picture support! (Works even without the YouTube app!) If I want to do something with the video on YouTube, it&#8217;s one tap on the post title (or the big &#8220;Watch on YouTube&#8221; button), and e.g. add to any playlist, or stream to a Chromecast or Smart&nbsp;TV.</p>



<p><strong>No sense of urgency</strong>. I am happy to no longer feel compelled to regularly open the YouTube app &#8220;just in case&#8221;, and am no longer urged to triage new videos from the YouTube Subscriptions page before they disappear. (YouTube deletes stuff there after a few weeks.) I can now trust that new uploads are reliably delivered, and never lost.</p>



<p><strong>Reclaimed sense of agency</strong>. Native apps tend to make it hard to let you finish a thought when you open them, by presenting you with options or otherwise distracting you. Now, I only end up in the app via a specific video link from the feed reader. This means <em>I</em> have decided what to do, <em>and</em> the technology knows my intent, so the app opens and goes straight to that one video. There is no &#8220;Home&#8221; feed or &#8220;Shorts&#8221; page to navigate past. (In case you&#8217;re interested, I describe further down how to disable &#8220;Home&#8221; and &#8220;Shorts&#8221; in the YouTube mobile app more generally.)</p>



<h3 class="wp-block-heading">Behind the scenes</h3>



<p>This is all possible because YouTube implements two open standards: it provides feeds in the <a href="https://en.wikipedia.org/wiki/RSS">RSS format</a>, and a <a href="https://blog.whatwg.org/feed-autodiscovery">discovery link</a> that lets you follow the channel from its web page (without needing to know about or find the URL to an RSS file).</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading">Pet peeves of the app</h2>



<p>Until recently, the main way I used YouTube (both via its website on desktop, and through its mobile app) was through the &#8220;Subscriptions&#8221; page.</p>



<h3 class="wp-block-heading">&#8220;Home&#8221; page</h3>


<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="750" height="1288" src="https://timotijhof.net/wp-content/uploads/2024_youtube_clean_home.png" alt="YouTube mobile app, showing an empty &quot;Home&quot; page with just a big search field and nothing else." class="wp-image-1228" style="width:auto;height:322px"/><figcaption class="wp-element-caption"><strong>What a delight! </strong></figcaption></figure>
</div>


<p>I&#8217;ve always disabled watch history on my YouTube account. <a href="https://www.theverge.com/2023/8/8/23824672/youtube-blank-homepage-watch-history" data-type="link" data-id="https://www.theverge.com/2023/8/8/23824672/youtube-blank-homepage-watch-history">As of 2023</a>, YouTube no longer offers non-personalised recommendations to logged-in users through the Home page. That means my YouTube &#8220;Home&#8221; page is now a clean landing page with nothing but a welcoming search bar.</p>



<p>It took YouTube ten years to decide this. I wonder if they thought the semi-personal recommendations were not useful (they seemed fine to me?), or whether YouTube is simply becoming more honest and bold in pushing their preferred economic transaction (use the platform in exchange for your consent to store and analyze your watch history, even if paying for Premium. If you disable watch history, they intentionally try to make it worse?). I don&#8217;t miss it, but I didn&#8217;t mind it either.</p>



<h3 class="wp-block-heading">How to disable YouTube Shorts, for real!</h3>



<p>Around the same time in 2023, YouTube decided to no longer let logged-in users access the endless Shorts feature via the YouTube app, unless you enable watch history. That&#8217;s been an absolute blessing. I miss <em>nothing</em> there.</p>



<p>Except perhaps the transparency. I would sometimes study what it serves to other people. Note that the endless Shorts feed is still available via the website when logged-out, so the generic version of this feature remains available there for anthropological research.</p>



<h3 class="wp-block-heading">Perennial breaking of &#8220;Hide&#8221;</h3>



<p>The &#8220;Hide&#8221; option on the subs page lets you maintain a list of videos from channels you follow. This UI feature on YouTube is buggy. It breaks all the time, and Google takes months to prioritize fixing it. I remember when YouTube Shorts was introduced and force-fed throughout the platform, the &#8220;Hide&#8221; button for Shorts on the subs page did nothing. Google probably didn&#8217;t intentionally launch Shorts with a broken &#8220;Hide&#8221; button. But, the lack of test coverage and lack of bug priority are a direct consequence of internal success metrics at YouTube — directing engineering teams toward what is valued and rewarded by management, and away from what is not.</p>



<h3 class="wp-block-heading">Unreliable delivery</h3>



<p>YouTube&#8217;s subscription system is famously unreliable. It is a decade-old meme at this point. Their system might report some &#8220;9s&#8221; after 99.9% internally, but it is <a href="https://timotijhof.net/posts/2018/measuring-wikipedia-page-load-times/#how-to-measure-percentiles">expected that on a service used by millions, this bug affects everyone</a>. People I talk to are affected multiple times a year. And, it doesn&#8217;t self-correct! Compare this to texting or emailing: When have you not received a message addressed to you? I don&#8217;t mean arrive late or miss the notification, but never arrive to your inbox? I suspect YouTube implements their subscribtion system such that new videos are individually added to a separate queue for each subscriber. And, if the stars don&#8217;t align for all milions of those one-shot attempts, there is no retry, and no on-demand detection or reconstruction. This is good enough for an algorithmic feed, but not for a personal subscription system.</p>



<p>YouTube is not in the business of delivering what you expect or ask for (unlike Netflix, Apple TV, or linear television). It is in the business of eyeball retention, by serving up whatever is &#8220;good enough&#8221; to keep users in the app. Step one: Minimize your use of the app.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p id="up1"><strong>Update (18 June 2025)</strong>: YouTube&#8217;s original RSS feeds contain only a linked title. Many feed reader services (like Feedbin, NewsBlur, FreshRSS, Inoreader, Tiny Tiny RSS, and Inoreader) detect YouTube feeds and <a href="https://github.com/Ranchero-Software/NetNewsWire/issues/3683#issuecomment-2990216544">enhance them by appending a video iframe </a>and description text. I use <strong>NetNewsWire with Feedbin</strong> as sync service, which yields the screenshots above. While some feed reader app do the same locally, NetNewsWire doesn&#8217;t yet (<a href="https://github.com/Ranchero-Software/NetNewsWire/issues/3683">Feature request</a>). It works for me because I combine it with Feedbin. Without that, the feed entries only have a title linking to YouTube.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2025/youtube-in-a-feed-reader-is-better/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20YouTube%20in%20a%20feed%20reader%20is%26%238230%3B%20better%3F&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2025%2Fyoutube-in-a-feed-reader-is-better%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Lockfiles for apps, not packages (still)</title>
		<link>https://timotijhof.net/posts/2024/lockfiles-for-apps-not-packages-still/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Thu, 12 Sep 2024 23:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[JavaScript]]></category>
		<guid isPermaLink="false">https://timotijhof.net/?p=961</guid>

					<description><![CDATA[TL;DR: My updated take is Lockfiles for Node.js apps, not for other projects. When you run npm install, after you add or change a dependency in package.json, npm finds and selects the latest compatible version, downloads it, and replaces your package-lock.json file to describe what it found. The npm install command does not consider lockfiles…]]></description>
										<content:encoded><![CDATA[
<p>TL;DR: My updated take is <strong>Lockfiles for Node.js apps, not for other projects.</strong></p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>When you run <code>npm install</code>, after you add or change a dependency in <code>package.json</code>, npm finds and selects the latest compatible version, downloads it, and replaces your <code>package-lock.json</code> file to describe what it found.</p>



<p>The <code>npm install</code> command does not consider lockfiles from upstream packages you depend on. This is not a bug. It&#8217;s by design. The <code>npm publish</code> command explicitly omits lockfiles from any package.<sup id="fnr1" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2024/lockfiles-for-apps-not-packages-still/#fn1" title="Jump to footnote 1">[1]</a></sup></p>



<p>This and other factors led Sindre Sorhus (@sindresorhus), author of some of the most well-known and popular packages on npm, to <a href="https://github.com/sindresorhus/ama/issues/479#issuecomment-310661514">adopt this policy in 2017</a>:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="has-large-font-size">Lockfiles for apps, but not for packages.</p>
</blockquote>



<p>This was in response to <a href="https://blog.npmjs.org/post/161081169345/v500">npm enabling package-lock.json</a> in the npm 5.0 release.</p>



<h2 class="wp-block-heading">Lockfiles are useful</h2>



<p>Over the past decade, I found lockfiles to really shine and be &#8220;worth it&#8221; when:</p>



<ul class="wp-block-list">
<li>You maintain a Node.js-based application that you deploy as a finished product. Or;</li>



<li>You maintain a command-line application that developers should install globally on their workstation, via <code>npm install -g</code>. </li>
</ul>



<p>When developing a Node.js-based service, you can commit a <code>package-lock.json</code> file alongside it. Combine this with a production deployment that runs <code>npm ci</code> (instead of <code>npm install</code>), and you can safely deploy changes (especially rollbacks, time-sensitive reverts after a faulty deployment) to your service <em>without</em> untimely updates to dependencies piggybacking as part of your deployment. There are other and better ways to accomplish this, but lockfiles are a decent start. In this case, I&#8217;d also run <code>npm shrinkwrap</code>, which renames the lockfile to <code>npm-shrinkwrap.json</code>. That clearly communicates that this lockfile is tied to your application&#8217;s deployment. But, any lockfile will do for this use case.</p>



<p>When installing a package globally, e.g. <code>npm install -g fresnel</code>, npm can consider an upstream lockfile. Such upstream package <em>must</em> supply a shrinkwrap for npm to consider it. And, npm can only utlize it when installing the package standalone, i.e. globally. When developing an end-user application that you expect developers to install via <code>npm&nbsp;install&nbsp;‑g</code>, by all means use a lockfile. Any lockfile that isn&#8217;t &#8220;shrinkwrapped&#8221;, won&#8217;t be published by npm as part of your package, and thus cannot benefit installations.</p>



<h2 class="wp-block-heading">Global dependencies</h2>



<p>Back in the early 2010s, it was common to find projects that couldn&#8217;t locally pass linters and tests, because it assumed a different version of JSHint or ESLint than I installed, for another project I contribute to. These kinds of problems tormented many frontend developers, when they first dabbled in CLI and server-side scripting. They would have their project rely on globally installed tools and, invariably, on a specific (yet undocumented) version.</p>



<p>Over the past decade, the Node.js ecosystem has slowly learned its lesson. Packages now generally take care of their own dev tooling. In <code>package.json</code>, each package declares the relevant dev dependencies. We use <code>"scripts"</code> entries to execute commands like <code>eslint</code>, <code>qunit</code>, or <code>grunt</code>. This is especially convenient given that the commands of any dependency can be used directly in <code>"scripts"</code>. You need not specify the path to <code>node_modules</code> or call <code>npx</code> here.<sup id="fnr2" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2024/lockfiles-for-apps-not-packages-still/#fn2" title="Jump to footnote 2">[2]</a></sup></p>



<h2 class="wp-block-heading">Benefits and costs</h2>



<p>Most repositories containing a package.json file are either:</p>



<ol class="wp-block-list">
<li>packages published to npm, for use as dependency in another project, or</li>



<li>projects that use Node.js tooling during development only — such as PHP, Ruby, Python, or C++ projects that may use tools like ESLint and QUnit for frontend testing. This includes Composer packages, WordPress plugins, and MediaWiki extensions.</li>
</ol>



<p>Note that neither of these fall under the categories outlined earlier (Node.js services, and Node.js global tools), and thus have no use for a lockfile. However, as maintainer, it costs you in busywork, support tickets, and sunkcosts you further into justifying other equally-fruitless busywork.</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="1024" height="683" src="https://timotijhof.net/wp-content/uploads/2024_lockfiles_faucet.jpg" alt="Open faucet splashing water from a fountain." class="wp-image-1154"/><figcaption class="wp-element-caption">In Dutch we have the idiom &#8220;<span lang="nl" translate="no">dweilen met de kraan open</span>&#8220;, to mop while the tap is running. This perfectly captures the idea of a <a href="https://en.wikipedia.org/wiki/Boondoggle">boondoggle</a> and <a href="https://en.wikipedia.org/wiki/Busy_work">busy work</a> more generally. (Image via <a href="https://commons.wikimedia.org/wiki/File:Water_Fountain_Frisch.jpg">Wikimedia Commons</a>)</figcaption></figure>
</div>


<h3 class="wp-block-heading">Security updates</h3>



<p>Okay, so you&#8217;re working on a project or package where you ostensibly don&#8217;t need a package-lock.json file. Can this impact security?</p>



<p>For packages, we&#8217;ve already established that the lockfile can&#8217;t benefit your users. Hence, it does not delay or provide any protection from problematic updates. When they install your package, npm selects the latest compatible version of your dependencies. To pin a dependency, you have to pin it in <code>package.json</code>. This is best paired with a general reduction in risk by <a href="https://timotijhof.net/posts/2019/protect-yourself-from-npm/#fewer-dependencies">reducing your dependencies</a>. Either way, a lockfile cannot help you.</p>



<p>Okay, what about you? Does it help you as maintainer?</p>



<p>For maintainers and contributors to your project, the first install downloads dependencies over the network, either way. Subsequent installs resolve versions against the online registry, then utilize the local npm cache, either way. Lockfiles accomplish nothing but a constant stream of patches (and conflicts) to said lockfile, to keep it identical to&#8230; how <code>npm install</code> leaves it. Also, notice what just happened. Yes, when you have a lockfile and run <code>npm install</code>, it changes. That&#8217;s because npm isn&#8217;t required to follow it. You could locally run <code>npm ci</code>, which does. However, assuming you semi-automatically update the lockfile regularly, what&#8217;s the difference? Have you ever <em>not</em> merged a patch that updates a lockfile to match <code>npm install</code>? Any issue captured by that would still be experienced by people using <code>npm install</code>, which is most people.</p>



<h3 class="wp-block-heading">Pinning dependencies</h3>



<p>Perhaps you have scars from a badly behaved dependency that broke compatibility in a semver-minor release. I know I do. It&#8217;s rare, but it happens. Lockfiles are an ineffective approach to pinning dependencies, though, as they aren&#8217;t applied in most cases, and get overwritten the very next time anyone runs <code>npm install</code>.</p>



<p>A more effective solution, even if you do utilize a lockfile, is to pin dependencies in package.json first.</p>



<p>I like to use <a href="https://docs.npmjs.com/cli/v10/configuring-npm/package-json#overrides">the &#8220;overrides&#8221; key</a>, to further separate these from my own dependencies.</p>



<h3 class="wp-block-heading">npm audit</h3>



<p><code>npm audit</code> is great, <a href="https://overreacted.io/npm-audit-broken-by-design/">mostly</a>, and works regardless of whether you commit a lockfile.</p>



<h3 class="wp-block-heading">Dependency update notifications</h3>



<p>Perhaps you use GitHub Dependabot, or <a href="https://www.mediawiki.org/wiki/LibUp">Wikimedia LibUp</a>. Whether for security, or for other reasons, it&#8217;s useful to learn about available software updates, right? Yes! And, the great thing is, these work <em>even better</em> on <code>package.json</code> — without lockfile.</p>



<p>GitHub scans for CVEs in indirect dependencies. It scans <code>package.json</code> too, and knows about affected packages and their downstream dependants. By not checking in your lockfile, it will inform you if, and only if, a change to <code>package.json</code> is needed. In most cases, a CVE or other bug is fixed in a patch release, and your package.json (or the one of the intermediary package) has a caret or tilde version that expands automatically to the newer version. By definition, if the Dependabot only changed package-lock.json, then it didn&#8217;t need to be done<sup id="fnr3" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2024/lockfiles-for-apps-not-packages-still/#fn3" title="Jump to footnote 3">[3]</a></sup>. Whether you change the lockfile or not, anyone installing your project was already getting the update. The lockfile is ignored by npm-install, and isn&#8217;t part of your package. The lockfile merely describes what <code>npm install</code> last did.</p>



<p>Suppose your project uses <code>eslint</code> and <code>@typescript-eslint/parser</code>, which has an indirect dependency on <a href="https://www.npmjs.com/package/micromatch">micromatch</a>. Then, a <a href="https://github.com/micromatch/micromatch/blob/4.0.8/CHANGELOG.md">CVE emerges</a>. The intermediary package uses a tilde or caret version, and the patch release is compatible and in-range. With a lockfile, you&#8217;d get notified and &#8220;have to&#8221; merge a patch to update your lockfile. Without a lockfile, this is a non-event as npm install was already installing said update. Okay, that one was easy.</p>



<p>Suppose the intermediary dependency pinned micromatch to an exact version (or maybe the fix was outside its semver range). To get this update, you&#8217;ll need to upgrade <code>@typescript-eslint/parser</code>. And you can, because GitHub Dependabot scans your <code>package.json</code>, notifying you of package versions you rely on that have insecure dependencies. By removing the lockfile, it now only notifies you when your own dependencies are affected and/or when you have to use a newer version of your dependencies to obtain the update.</p>



<p>Adding a lockfile in this scenario only serves to invite noise and churn over already-solved issues. In the event of malicious activity and compromised packages, the company behind npmjs.com (Microsoft/GitHub) deletes those releases from the registry. This isn&#8217;t what npm audit or lockfiles are for.</p>



<p>We all care about security. <a href="https://timotijhof.net/posts/2023/wikimedia-balances-security-and-openness/">I care about security</a>. But, be wary of performative security, which can cost you valuable code review time, CI resources, and support tickets (from users who mistakenly think you must update your lockfile to help them, when actually they need to update their own).</p>



<p>Except when deploying Node.js apps, a lockfile brings you nothing but lost oppertunities.</p>



<h2 class="wp-block-heading">Further reading</h2>



<ul class="wp-block-list">
<li><a href="https://timotijhof.net/posts/2023/wikimedia-balances-security-and-openness/"><em>Practicing Security At Wikimedia</em></a>, Timo Tijhof, 2023.</li>



<li><a href="https://overreacted.io/npm-audit-broken-by-design/"><em>npm audit: Broken by Design</em></a>, Dan Abramov, 2021.</li>



<li><a href="https://www.economist.com/business/2022/01/07/the-rise-of-performative-work"><em>Office theatrics: The rise of performative work</em></a> (<a href="https://archive.is/0eG7w">archive</a>), The Economist, 2022.</li>
</ul>



<p></p>



<p></p>



<p></p>

<hr><div class="footnotes" role="doc-endnotes">Footnotes:<ol><li id="fn1" role="doc-endnote">npm publish uses <a href="https://github.com/npm/npm-packlist/blob/cb4a823cd42d50475a8e1e7582b95b15766f5ca2/lib/index.js#L290">@npm/npm-packlist</a> which specifically excludes any package-lock.json file when creating the package tarball, before uploading it to the npm Registry. <a href="#fnr1" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn2" role="doc-endnote">In other words, when run you execute commands from <code>package.json</code> via <code>npm test</code> or <code>npm run</code>, the <code>node_modules/.bin/</code> directory inside your current working directory is automatically included in the shell <code>PATH</code>. <a href="#fnr2" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn3" role="doc-endnote">If you do maintain a lockfile, please, do not release updates to your package that only bump indirect dependencies in the lockfile. It is literally a no-op given that lockfiles are explicilty excluded from the package. <a href="#fnr3" role="doc-backlink" title="Jump back">↩︎</a></li></ol></div><hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2024/lockfiles-for-apps-not-packages-still/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20Lockfiles%20for%20apps%2C%20not%20packages%20%28still%29&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2024%2Flockfiles-for-apps-not-packages-still%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How we balance security and openness at Wikimedia</title>
		<link>https://timotijhof.net/posts/2023/wikimedia-balances-security-and-openness/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Thu, 19 Oct 2023 20:00:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/?p=830</guid>

					<description><![CDATA[How does an open philosophy jive with best practices in performance and security? In short, we&#8217;re selective in our dependencies and audit our own upstream sources. Progressive enhancement not only makes for a fast and accessible site, I argue it&#8217;s also the cheaper choice in the long run! Background The Wikimedia Foundation is the non-profit…]]></description>
										<content:encoded><![CDATA[
<p>How does an open philosophy jive with best practices in performance and security? In short, we&#8217;re selective in our dependencies and audit our own upstream sources. Progressive enhancement not only makes for a fast and accessible site, I argue it&#8217;s also the cheaper choice in the long run!</p>



<span id="more-830"></span>



<h2 class="wp-block-heading">Background</h2>



<p>The Wikimedia Foundation is the non-profit that hosts Wikipedia and other free knowledge and open data projects. These projects are made possible by a global community who, together with the Foundation, comprise the &#8220;Wikimedia movement&#8221;. The Wikimedia movement is united by a vision: to bring about a world in which every single human being can freely share in the sum of all knowledge.</p>



<p>I&#8217;ve worked at the Wikimedia Foundation for over 10 years, first starting as a front-end developer and eventually as a part of the Performance Team.</p>



<p>The Wikimedia movement is rooted in the culture of <a href="https://en.wikipedia.org/wiki/Free_and_open-source_software">freely licensed software</a>. The MediaWiki application that Wikipedia runs on, and all other software developed at the Foundation, is open source. That includes the <a href="https://diff.wikimedia.org/2011/09/19/ever-wondered-how-the-wikimedia-servers-are-configured/">configuration and datacenter automation</a> of our web servers, databases, and CDN service. The Wikimedia community and any other individual or organization may inspect, contribute to, reuse for themselves, or fork any aspect of the platform at any time. This philosophy is also the basis of long-standing security practices which support visibility and openness.</p>



<h2 class="wp-block-heading">Security through visibility and trust</h2>



<p>We live in an incredible world. Today, most online devices are powered by open source. Whether the data centers of video streaming giants and social media sites, or your smartphone, they likely run an open source operating system like Linux or a BSD derivative<sup id="fnr1" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2023/wikimedia-balances-security-and-openness/#fn1" title="Jump to footnote 1">[1]</a></sup>. The vast majority of websites are also built with open source tools, or run on open source platforms. When you build on existing software that is developed by another organization or community, this is called an &#8220;upstream&#8221;.</p>



<p>The Wikimedia Foundation relies heavily on upstream technology to power its platforms. This allows the organization to focus on its core mission of providing free knowledge to the world, rather than on developing and maintaining technology from scratch. Additionally, by collaborating with other open source projects, the Foundation is able to give back to the broader free software ecosystem and help advance the state of technology for everyone.</p>



<p>We&#8217;re notable for operating exclusively with upstreams that are also open source. This ensures <a href="https://foundation.wikimedia.org/wiki/Resolution:Wikimedia_Foundation_Guiding_Principles">our freedom principles</a> (to freely inspect, modify, reuse, and fork) are not hindered by proprietary components.</p>



<p>New Wikimedia production software components or dependencies must pass certain&nbsp;<a href="https://www.mediawiki.org/wiki/Wikimedia_Security_Team/Third_Party_Code_Review_Checklist">fitness checks</a>&nbsp;and a chain of trust for the software’s security and integrity. When the Wikimedia community creates software that is peer-reviewed during development, this trust follows implicitly from its public policies and standards. When adding a new third-party package or dependency (&#8220;upstream&#8221;), this chain needs to be established by other means.</p>



<p>The Wikimedia Foundation extends its chain to several credible upstream vendors and communities. For example, Debian, known for its Linux distribution, is host to the highly trusted and curated Debian package repository. When a package is present in the Debian repository, this signals trust, stability, and confidence to the industry. While we usually don’t audit source code of Debian packages, installing a Debian package may still require a concept review to validate and verify that the package actually intends to meet our scale, threat model, and performance requirements.</p>



<p>When <a href="https://www.mediawiki.org/wiki/Manual:External_libraries">considering PHP or JavaScript libraries</a> from an anonymous and open registry like npm or Packagist, the Wikimedia Foundation audits the code as if it were its own. We keep on-going costs to a minimum by only adopting upstream packages in areas that solve non-trivial problems, have stable external requirements, and sit behind a module boundary. Dependencies should reduce cost, not increase it. In practice, we only consider packages with few or no transitive dependencies, written for a <a href="https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2018/Participants/Tim_Starling">stable runtime</a>.</p>



<p>As an added precaution, the Wikimedia Foundation prohibits networking to third-party services in its production realm. When deploying or installing the MediaWiki application, it does not download JavaScript or PHP packages from npm or Composer. Instead, upstream packages are&nbsp;<a href="https://github.com/wikimedia/mediawiki/tree/1.40.0/resources/lib">downloaded as a file</a>&nbsp;with an integrity hash, and are <a href="https://gerrit.wikimedia.org/r/q/project:mediawiki%252Fvendor+branch:master+is:merged">checked into Git</a>. This approach implements the organization’s security requirements, allowing for transparent auditing, patch-ability, and independent offline deployment. It also helps with faster onboarding, consistent and reproducible development, and creates a natural place for auditing upstream changes during code review.</p>



<h2 class="wp-block-heading">The most localized software</h2>



<p>With over&nbsp;<a href="https://en.wikipedia.org/wiki/Wikipedia#Language_editions">300 language editions</a>, Wikipedia might be among the&nbsp;<a href="https://en.wikipedia.org/wiki/List_of_literary_works_by_number_of_translations">most-translated</a>&nbsp;literature in the world. Wikipedia editors usually write or translate articles manually, and in recent years, the ContentTranslation tool has helped editors do this more efficiently, producing over&nbsp;<a href="https://diff.wikimedia.org/2021/11/16/content-translation-tool-helps-create-one-million-wikipedia-articles/">1 million articles</a>&nbsp;through this new tool alone.&nbsp;</p>



<p>The MediaWiki platform underneath it all recognizes and localizes its user interface in <a href="https://en.wikipedia.org/wiki/MediaWiki#Internationalization_and_localisation">over 400 languages</a>, including gender, pluralization rules (&#8220;10 new messages&#8221;), and sort order <a href="https://unicode-org.github.io/icu/userguide/collation/">ICU collations</a>. We <a href="https://translatewiki.net/wiki/CLDR">contribute</a> to the Unicode CLDR standard on behalf of Wikipedia’s language communities. These contributions flow downstream to other Unicode customers such as Linux, Apple, and Microsoft.</p>



<p>Languages like Arabic and Hebrew are written from right to left. <a href="https://www.mediawiki.org/wiki/CSSJanus">CSSJanus</a> takes stylesheets designed and developed for left-to-right languages like English, and automatically converts them into right-to-left layouts. We deploy the MediaWiki platform on a weekly basis. Each change to functionality is deployed to all supported languages from day 1, every time. CSSJanus is part of what makes this feasible and with little to no developer training.</p>



<p>Not all issues are that easy! During <a href="https://www.mediawiki.org/wiki/VisualEditor">VisualEditor</a> development, <a href="https://phabricator.wikimedia.org/T53314">extensive effort</a> went into localizing the bold and italic toolbar buttons. The familiar &#8220;B&#8221; and &#8220;I&#8221; buttons usually make place for an equivalent abbreviation, such as F (Fett) and K (Kursiv) in German, or a stylized &#8220;A&#8221; for language communities that have <a href="https://usability.wikimedia.org/wiki/Opinion_Icons">no accepted standard</a>. But, early adoption of English-centric software led to &#8220;B&#8221; and &#8220;I&#8221; becoming the established and culturally familiar design pattern in some other languages as well. In Hebrew, Czech, and Malayalam &#8220;correcting&#8221; these with a translation actually created confusion.</p>



<h2 class="wp-block-heading">No profit motive</h2>



<p>Large corporations, driven by profit motives, regularly drop support for older devices and browsers. The Wikimedia Foundation, however, has an imperative to make information more accessible, not less.</p>



<p>How does the organization pull that off without the resources of a large corporation? Through equal parts being aggressively lean and aggressively uncompromising.</p>



<p>The organization saves development and testing costs by writing and deploying native JavaScript that targets only modern browsers. Through an approach inspired by BBC News’&nbsp;<a href="https://responsivenews.tumblr.com/post/18948466399/cutting-the-mustard">cutting the mustard</a>, the Foundation enables millions of people (1% of its 2 billion monthly readers) to access Wikipedia on older devices through a JavaScript-free experience. This is the same experience that all page views start at prior to the (optional) arrival of JavaScript code.</p>



<p>The Wikimedia Foundation’s&nbsp;<a href="https://wikitech.wikimedia.org/wiki/MediaWiki_Engineering/Guides/Frontend_performance_practices#Principles">development principles</a>&nbsp;and browser&nbsp;<a href="https://www.mediawiki.org/wiki/Compatibility#Browsers">support policy</a>&nbsp;reflects this by emphasizing the importance of&nbsp;<a href="https://en.wikipedia.org/wiki/Progressive_enhancement">progressive enhancement</a>.</p>



<p>Viewing Wikipedia through a web browser is the most common access method, but Wikipedia’s knowledge is consumed far beyond the canonical experience at Wikipedia.org. Wikipedia content goes everywhere. It’s distributed offline through&nbsp;<a href="https://kiwix.org/">Kiwix</a>&nbsp;and IPFS, rendered in native apps like Apple Dictionary, and even shared peer-to-peer through USB sticks. What these environments have in common is that they may not involve JavaScript as they require high security and high privacy. This is made possible at no extra cost due to APIs offering complete content HTML-first, with CSS and embedded media based on ubiquitous and open formats only.</p>



<h2 class="wp-block-heading">Summary</h2>



<p>The Wikimedia Foundation prioritizes both security and openness. To achieve this balance, it implements a number of practices and policies that ensure that it protects both the freedoms and the privacy of its audience, all while sharing information transparently.</p>



<p>For example, the Foundation publishes an&nbsp;<a href="https://wikimediafoundation.org/about/transparency/">annual transparency report</a>&nbsp;detailing its response to information and takedown requests twice per year. The Wikimedia Foundation’s&nbsp;<a href="https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Board_of_Trustees">Board positions</a>&nbsp;are largely held by community members, and appointed by public election through anonymous and cryptographically-verifiable votes from any eligible Wikipedia account. Its&nbsp;<a href="https://foundation.wikimedia.org/wiki/Resolutions">Governance Wiki</a>&nbsp;publishes the Foundation’s bylaws, board decisions, and meetings.</p>



<p>The Foundation participates in an ecosystem of organizations that collaborate on freely-licensed information and open-source software. Overall, the organization balances exceptional security and openness by implementing strong security practices, and providing transparency about their policies and procedures.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="has-small-font-size">Originally published on <a href="https://openjsf.org/blog/2023/10/05/wikimedia-case-study">OpenJS Foundation Blog</a>.</p>

<hr><div class="footnotes" role="doc-endnotes">Footnotes:<ol><li id="fn1" role="doc-endnote">Note that Apple&#8217;s macOS and iOS are also Unix-like BSD derivatives, through their inheritence from the NeXTSTEP operating system, which continues to this day via the <a href="https://en.wikipedia.org/wiki/Darwin_(operating_system)">Darwin kernel</a>. <a href="#fnr1" role="doc-backlink" title="Jump back">↩︎</a></li></ol></div><hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2023/wikimedia-balances-security-and-openness/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20How%20we%20balance%20security%20and%20openness%20at%20Wikimedia&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2023%2Fwikimedia-balances-security-and-openness%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>An Internet of PHP</title>
		<link>https://timotijhof.net/posts/2023/an-internet-of-php/</link>
					<comments>https://timotijhof.net/posts/2023/an-internet-of-php/#comments</comments>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Mon, 04 Sep 2023 23:00:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[PHP]]></category>
		<guid isPermaLink="false">https://timotijhof.net/?p=478</guid>

					<description><![CDATA[Statistics and anecdotes about PHP at scale.]]></description>
										<content:encoded><![CDATA[
<p>PHP is <strong>big</strong>. The trolls can proclaim its all-but-certain &#8220;death&#8221; until the cows come home, but no amount of heckling changes that the Internet runs on PHP. The evidence is overwhelming. What follows is a loosely organised collection of precisely that evidence.</p>



<ol class="wp-block-list">
<li><a href="#statistics">Statistics</a></li>



<li><a href="#anecdotes">Anecdotes</a></li>



<li><a href="#php-at-scale">At scale</a></li>



<li><a href="#what-about-my-bubble">What about my bubble?</a></li>



<li><a href="#conclusion">Conclusion</a></li>
</ol>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading">Statistics</h2>


<div class="wp-block-image">
<figure class="alignright size-full is-resized"><img loading="lazy" decoding="async" width="735" height="767" src="https://timotijhof.net/wp-content/uploads/2023_php_langs.png" alt="" class="wp-image-495" style="width:270px;height:undefinedpx"/></figure>
</div>


<h3 class="wp-block-heading">PHP as programming language of choice</h3>



<p>From <a href="https://w3techs.com/technologies/overview/programming_language">Language analysis by W3 Techs</a> on the top 10 million websites worldwide:</p>



<ol class="wp-block-list">
<li>PHP at 77.2%.</li>



<li>ASP at 6.9%.</li>



<li>Ruby at 5.4%.</li>
</ol>



<h3 class="wp-block-heading">Content management on PHP</h3>



<p>The bulk of public sites build on PHP via a CMS. By market share, <strong>8 of the 12 largest CMS softwares are written in PHP</strong>. The below is from <a href="https://w3techs.com/technologies/overview/content_management">CMS usage by W3 Techs</a>, where each percent represents 100,000 of the top 10 million sites. There&#8217;s a similar <a href="https://trends.builtwith.com/cms/traffic/Entire-Internet">CMS report by BuiltWith</a> that analyses a larger set of 78 million websites.</p>


<div class="wp-block-image">
<figure class="alignright size-full is-resized"><img loading="lazy" decoding="async" width="512" height="512" src="https://timotijhof.net/wp-content/uploads/2023_php_wordpress.png" alt="WordPress logo" class="wp-image-502" style="width:256px;height:256px"/><figcaption class="wp-element-caption">© WordPress.org</figcaption></figure>
</div>


<ol class="wp-block-list">
<li>[<strong>PHP</strong>] WordPress ecosystem (63%&nbsp;of&nbsp;CMS-based sites, 43% of all sites)</li>



<li>[Ruby] Shopify</li>



<li>Wix</li>



<li>Squarespace</li>



<li>[<strong>PHP</strong>] Joomla ecosystem (3%)</li>



<li>[<strong>PHP</strong>] Drupal ecosystem (2%)</li>



<li>[<strong>PHP</strong>] Adobe Magento (2%)</li>



<li>[<strong>PHP</strong>] PrestaShop (1%)</li>



<li>[Python] Google Blogger</li>



<li>[<strong>PHP</strong>] Bitrix (1%)</li>



<li>[<strong>PHP</strong>] OpenCart (1%)</li>



<li>[<strong>PHP</strong>] TYPO3 (1%)</li>
</ol>



<h3 class="wp-block-heading">E-commerce on PHP</h3>



<p>From <a href="https://trends.builtwith.com/shop">BuiltWith&#8217;s report on online stores</a>, as of Aug 2023:</p>


<div class="wp-block-image">
<figure class="alignright size-full is-resized"><img loading="lazy" decoding="async" width="512" height="149" src="https://timotijhof.net/wp-content/uploads/2023_php_shopware.png" alt="" class="wp-image-552" style="width:256px;height:undefinedpx"/></figure>
</div>


<ul class="wp-block-list">
<li><a href="https://en.wikipedia.org/wiki/WooCommerce">WooCommerce for WordPress</a> (24% of global&nbsp;market&nbsp;share)</li>



<li><a href="https://en.wikipedia.org/wiki/Magento">Adobe Magento</a> (7% of global market share) </li>



<li>OpenCart (2% global market share, 24% <a href="https://trends.builtwith.com/shop/country/Russia">market share in Russia</a>)</li>



<li>PrestaShop (2% global market share, 14% <a href="https://trends.builtwith.com/shop/country/France">market share in France</a>)</li>



<li><a href="https://en.wikipedia.org/wiki/Shopware">Shopware</a> (1% global market share, 12% <a href="https://www.ehi.org/presse/e-commerce-2021-zeit-des-wachstums/">market share in Germany</a>)</li>
</ul>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading">Anecdotes</h2>



<p><a href="https://kinsta.com/blog/is-php-dead/">Kinsta published a retort</a> demonstrating that PHP is fast, lively, and popular:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Well, first off, it’s important to point out that there’s a big difference between &#8220;wanting&#8221; and &#8220;being&#8221;. People have been calling for the death of PHP […] as far back as 2011.</p>



<p>PHP 7.3 was pushing 2-3x the number of requests per second as PHP 5.6. And PHP 8.1 is even faster.</p>



<p>[…] Because of PHP’s popularity, it’s <strong>easy to find PHP developers</strong>. And not just PHP developers – but PHP developers with experience.</p>
</blockquote>



<p>Matt Brown from Vimeo Engineering in <a href="https://medium.com/vimeo-engineering-blog/its-not-legacy-code-it-s-php-1f0ee0462580">It’s not legacy code — it’s PHP</a>:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>PHP hasn’t stopped innovating</strong> […]. A new wave of backend engineers planned how we might carve up 500,000 lines of PHP into a bunch of [services]. […] Ultimately none of the proposals took hold.</p>



<p>Vimeo had grown many times over in the ten years since 2004, and our PHP codebase along with it […]</p>
</blockquote>



<p>Ars Technica tells us: <a href="https://arstechnica.com/gadgets/2021/09/php-maintains-an-enormous-lead-in-server-side-programming-languages/">PHP maintains an enormous lead</a>. Ars published a version of the W3 Techs report that includes historical data.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Despite many infamous quirks, the server-side language seems here to stay. […]<br>Within that dataset, the story told is clear. […] PHP held a 72.5 percent share in 2010 and holds a 78.9 percent share as of today. […] There doesn&#8217;t appear to be any clear contender for PHP to worry about.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="900" height="500" src="https://timotijhof.net/wp-content/uploads/2023_php_arstechnica_w3techs.png" alt="" class="wp-image-506" title="Usage of server-side programming languages for websites, September 2021, W3Techs.com."/></figure>
</blockquote>



<p>Lex Fridman put it as follows in an interview with Python-creator Guido van Rossum on his podcast (<a href="https://lexfridman.com/guido-van-rossum-2">episode</a>, <a href="https://www.youtube.com/watch?v=-DVyjdw4t9I&amp;t=25m50s">timestamp</a>):</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>Lex</strong>: &#8220;PHP probably still runs most of the back-end of the Internet.&#8221;<br><strong>Guido</strong>: &#8220;Oh yeah, yeah. […]&#8221;</p>
</blockquote>



<p>Daniel Stenberg&#8217;s annual <a href="https://daniel.haxx.se/blog/2023/06/17/curl-user-survey-2023-analysis/">Curl user survey</a> (page 18) asks where people use curl. After curl&#8217;s own interface (78.4%), the most familiar curl binding is PHP. It has been, since the survey&#8217;s beginning in 2015. In 2023, 19.6% of curl survey respondents reported they use curl via PHP.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>curl (CLI) 78.4%, php-curl 19.6%, pycurl 13%, […], node-libcurl 4.1%.</p>
</blockquote>



<p>Ember.js famously originated from the Ruby community. But, as a frontend framework Ember can pair with any backend. The <a href="https://emberjs.com/survey/2022/">Ember Community Survey</a> reports PHP as the third-most favoured among survey participants, after Ruby and Java.</p>


<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="1090" height="532" src="https://timotijhof.net/wp-content/uploads/2023_php_ember_survey.png" alt="Ember Survey 2022 results:
First 29.9% Rails (Ruby).
Second 14.3% Spring (Java).
Third 7.6% PHP.
Fourth 6.5% Express (Node.js)." class="wp-image-537" style="width:undefinedpx;height:266px"/></figure>
</div>


<p>The Ember survey also asked general industry questions. For example, <strong>24% described their employer&#8217;s infrastructure as &#8220;self-hosted&#8221;</strong>, and not at a major cloud provider. This isn&#8217;t a representative survey per-se, but may still be a surprise. Especially for folks who rely on social media and conference talks for their sense of what businesses do in the real world. It is more important than ever for companies to have a <a href="https://www.infoworld.com/article/3211374/public-cloud-consolidation-requires-an-exit-plan-even-from-the-big-guys.html">cloud exit strategy</a> ready (<a href="https://digital.nhs.uk/services/cloud-centre-of-excellence/strategy/nhs-cloud-exit-strategy">NHS example</a>). You can read how <a href="https://world.hey.com/dhh/we-have-left-the-cloud-251760fb">Basecamp&#8217;s cloud exit</a> saves them millions of dollars a year.</p>



<h2 class="wp-block-heading">PHP at scale</h2>



<p>The stats cited above measure the number of distinct sites and companies. The vast majority of those build on PHP. But, all that says about their scale is that they&#8217;re somewhere in the top 10 million. Does that worry you? What&#8217;s in the top 500?</p>


<div class="wp-block-image">
<figure class="alignright size-full is-resized"><img loading="lazy" decoding="async" width="50" height="52" src="https://timotijhof.net/wp-content/uploads/2023_php_laravel.svg" alt="Laravel logo" class="wp-image-565" style="object-fit:contain;width:200px;height:208px"/><figcaption class="wp-element-caption">Laravel</figcaption></figure>
</div>


<p>Jack Ellis from Fanthom Analytics in <a href="https://usefathom.com/blog/does-laravel-scale">Does Laravel Scale?</a> makes the case that you shouldn&#8217;t make choices based on handling millions of requests per second. You&#8217;re not likely to reach that, and will face many other bottlenecks. But, it turns out, PHP is one of the languages that does scale to that level.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>When we started seeing incredible growth in our software, Fathom Analytics (which is built on Laravel), […] never had moments of &#8220;does the framework do enough requests per second?&#8221;.  […]</p>



<p>I&#8217;ve worked with enterprise companies using Laravel to power their entire business, and companies such as Twitch, Disney, New York Times, WWE and Warner Bros are using Laravel for various projects they run. <strong>Laravel can handle your application at scale.</strong></p>
</blockquote>



<p>Matt Brown again, from Vimeo Engineering in <a href="https://medium.com/vimeo-engineering-blog/its-not-legacy-code-it-s-php-1f0ee0462580">It’s not legacy code</a>:</p>


<div class="wp-block-image">
<figure class="alignright size-full is-resized"><img loading="lazy" decoding="async" width="70" height="20" src="https://timotijhof.net/wp-content/uploads/2023_php_vimeo.svg" alt="" class="wp-image-618" style="object-fit:contain;width:210px;height:60px"/></figure>
</div>


<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>I’m here to tell you that it can, and Vimeo’s continued success with PHP is proof that it’s a great tool for <strong>fast-moving companies in 2020</strong>.</p>
</blockquote>



<p>Vimeo is also known as the developer of <a href="https://psalm.dev/">Psalm</a>, a popular open-source static analysis tool for PHP.</p>



<p>From Keith Adams, Chief Architect at Slack Engineering in <a href="https://slack.engineering/taking-php-seriously/">Taking PHP Seriously</a>:</p>


<div class="wp-block-image is-style-default">
<figure class="alignright size-full is-resized"><img loading="lazy" decoding="async" width="127" height="127" src="https://timotijhof.net/wp-content/uploads/2023_php_slack.svg" alt="" class="wp-image-676" style="object-fit:contain;width:110px;height:110px"/></figure>
</div>


<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Slack uses PHP for most of its server-side application logic […].</p>



<p>the advantages of the PHP environment (reduced cost of bugs through <strong>fault isolation</strong>; <strong>safe&nbsp;concurrency</strong>; and high <strong>developer throughput</strong>) are more valuable than the problems […]</p>
</blockquote>



<p>Let&#8217;s take another look at the <a href="https://w3techs.com/technologies/overview/content_management">W3 Techs report</a>, and this time focus on the size of some single businesses. At the top, we have WordPress which of course powers Automattic&#8217;s WordPress.com. That&#8217;s 20 billion page views <a href="https://wordpress.com/activity/">each month</a> (Alexa rank 55 worldwide).</p>



<p>If we move further down the report, to entries with 0.1% market share, we find PHP systems that power massive websites. Yet, these are also the platform of choice for over 100,000 smaller websites.</p>



<ul class="wp-block-list">
<li>#23 CMS: <a href="https://en.wikipedia.org/wiki/Moodle">Moodle</a></li>



<li>#25 CMS: phpBB, e.g. Google&#8217;s <a href="https://www.waze.com/forum/">Waze Community</a>, ApacheFriends Forum, VideoLAN&nbsp;Forums.</li>



<li>#31 CMS: XenForo forums, e.g. <a href="https://arstechnica.com/civis/">ArsTechnica.com</a>, <a href="https://forums.macrumors.com/">MacRumors.com</a>.</li>



<li>#33 CMS: Roundcube</li>



<li>#45 CMS: MediaWiki</li>



<li>#49 CMS: vBulletin forums</li>



<li>#53 CMS: IPS Community, e.g. <a href="https://forums.malwarebytes.com">MalwareBytes.com</a>, <a href="https://en.wikipedia.org/wiki/Bleeping_Computer">BleepingComputer</a>, and Squarespace.com Forums.</li>
</ul>


<div class="wp-block-image">
<figure class="alignright size-large is-resized"><img loading="lazy" decoding="async" width="270" height="300" src="https://timotijhof.net/wp-content/uploads/2023_php_mediawiki_white.svg" alt="" class="wp-image-626" style="object-fit:cover;width:200px;height:222px"/></figure>
</div>


<p><a href="https://en.wikipedia.org/wiki/MediaWiki">MediaWiki</a> is the <a href="https://wikitech.wikimedia.org/wiki/MediaWiki_at_WMF">platform behind Wikipedia.org</a> with <a href="https://stats.wikimedia.org/">25 billion page views</a> a month (Alexa #12). MediaWiki also powers <a href="https://en.wikipedia.org/wiki/Fandom_(website)">Fandom</a> with <a href="https://about.fandom.com/news/fandoms-2021-state-of-fandom-study-identifies-pandemic-era-consumer-behavior-trends-in-entertainment-gaming">2 billion page views</a> a month (Similarweb #44), and <a href="https://en.wikipedia.org/wiki/WikiHow">WikiHow</a> with 100 million monthly visitors (Alexa #215).</p>



<p>Other major Internet properties powered by PHP include Facebook (Alexa #7), Etsy (Alexa #66), Vimeo (Alexa #165), and Slack (Similarweb #362).</p>



<p>Etsy is interesting due to its high proportion of active sessions and dynamic content. This unlike Wikipedia or WordPress, which can serve most page views from a static cache. This means despite a similar scale, Etsy&#8217;s PHP application is a lot more exposed to <a href="https://www.etsy.com/codeascraft/how-etsy-prepared-for-historic-volumes-of-holiday-traffic-in-2020/">their high traffic</a>.</p>


<div class="wp-block-image">
<figure class="alignright size-full is-resized"><img decoding="async" src="https://timotijhof.net/wp-content/uploads/2023_php_etsy.svg" alt="" class="wp-image-697" style="object-fit:contain;width:200px;height:100px"/></figure>
</div>


<p>Etsy is also where PHP-creator <a href="https://en.wikipedia.org/wiki/Rasmus_Lerdorf">Rasmus Lerdorf</a> is employed. He sometimes features snippets from Etsy&#8217;s codebase in his tech talks. (Geek side&nbsp;note: His <a href="https://www.youtube.com/watch?v=Hc4S74LCXHo&amp;t=1620s">2021 Modern PHP talk</a> explains how Etsy deploys with <code>rsync</code>, exactly like Wikipedia did for the past decade with <a href="https://wikitech.wikimedia.org/w/index.php?title=Scap&amp;oldid=2007017">Scap</a>). Etsy&#8217;s engineering blog occasionally covers work on their modular PHP monolith, e.g. <a href="https://www.etsy.com/uk/codeascraft/plurals-at-etsy">Plural localisation</a>, or their detailed <a href="https://www.etsy.com/uk/codeascraft/q1-2016-site-performance-report">Etsy&nbsp;Site&nbsp;Performance</a> reports:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Happily, this quarter we saw site-wide performance improvements, due to our upgrade to PHP7.</p>



<p>[…] we saw significant performance gains on all our pages.</p>
</blockquote>



<h2 class="wp-block-heading">What about my bubble?</h2>



<p>One could critique the PHP community for not occupying much space in public discourse. Whether PHP core developers, or authors of PHP packages (like Laravel, Symfony, WordPress, Composer, and PHPUnit), or the average engineer using it in their day job&#8230; we&#8217;re not seen much in arguments on social media.</p>



<p>You also don&#8217;t see us give many conference talks prescribing formulas for a stack that will &#8220;definitely be better&#8221; for your company. If talks by fans of certain JavaScript frameworks are anything to go by, we should believe that most companies use their stack today, and that you should feel sorry if you still don&#8217;t. I don&#8217;t say that to judge JavaScript. What bothers me is prescriptive messaging without considering technical or business needs, without assessing what &#8220;better&#8221; means — better compared to what? It&#8217;s hard to compare the one thing you know.</p>



<p>The above isn&#8217;t to say JavaScript doesn&#8217;t have its place. Share your experience! Share your results (and the benchmarks behind them), what worked, what didn&#8217;t. Keep searching, keep innovating, keep sharing, and above all: keep pushing the human race forward. That&#8217;s <a href="https://en.wikipedia.org/wiki/Free_software_movement">free software</a>!</p>



<p>One could question merits through the <a href="https://infrequently.org/2023/02/the-market-for-lemons/">lost decade</a> and <a href="https://www.zachleat.com/web/react-criticism/">critique on React</a>, but&#8230; React holds a <a href="https://w3techs.com/technologies/overview/javascript_library">3% market share</a>. Add the smaller frameworks (Vue, Angular, Svelte) and we reach a sum of 5%. Similarly, Node.js as web server holds <a href="https://w3techs.com/technologies/overview/web_server">3% market share</a>. Does that mean over 90% missed out on This One Trick That Will Boost Your Business?</p>



<p>Lest we forget, this 5% represents 500,000 major websites. That&#8217;s huge. Node.js has its place and its strengths (real-time message streams). But, Node.js also has its weaknesses (<a href="https://www.langton.cloud/misconception-on-cpu-node-js-vs-php-blocking-web-requests/">blocking the main thread</a>). And remember, market share doesn&#8217;t say much about scale. It could be powering several organisations in the top 1% (like MediaWiki), or the bottom 1%. Or, be WordPress and power both the top 1% <em>and</em> over 40 <em>million</em> other sites.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Companies young and old, small and big, <strong>might not be utilising</strong> the software stacks we hear talked about most in public spaces. This is especially true outside the bubble of personal projects and cash-burning startups.</p>


<div class="wp-block-image">
<figure class="alignright size-full is-resized"><a href="https://www.php.net/releases/8.1/en.php"><img loading="lazy" decoding="async" width="930" height="390" src="https://timotijhof.net/wp-content/uploads/2023_php_php8.png" alt="" class="wp-image-772" style="object-fit:contain;width:310px;height:130px"/></a></figure>
</div>


<p>Is PHP <em>the</em> most economic choice for growing and sustainable businesses today? Is it in the top three? Does language runtime matter at all when scaling up a business and team of people around it? We don&#8217;t know. </p>



<p>What we do know is that a great many businesses today build on PHP, and PHP has proven to be a sustainable option. It stands the test of time. That includes new companies like Fathom that turned <a href="https://usefathom.com/blog/spending-money">profitable</a> in just three years. Like the Fathom article said, most of us will never reach that scale. But, <strong>it&#8217;s comforting to know that PHP is a sustainable and economical option</strong> even at scale. Is it the only option? No, certainly not.</p>



<p>There are languages that are even faster (Rust), have an even larger community (Node.js), or have more mature compilers (Java); but that tends to trade other values.</p>



<p>PHP hits a certain Goldilocks sweetspot. It is pretty fast, has a <a href="https://packagist.org/statistics">large community</a> for <a href="https://www.youtube.com/watch?v=x7OsH3bH6DA">productivity</a>, features <a href="https://stitcher.io/blog/evolution-of-a-php-object">modern syntax</a>, is actively <a href="https://wiki.php.net/RFC#implemented">developed</a>, easy to learn, easy to scale, and has a capable standard library. It offers high and safe concurrency at scale, yet without async complexity or blocking a main thread. It also tends to carry low maintenance cost due to a stable platform, and through a community that values compatibility and <a href="https://blog.jim-nielsen.com/2023/software-crisis-dependencies/">low dependency count</a>. You will have different needs at times, of course, but for this particular sweetspot, PHP stands among very few others. Which others? You tell me!</p>



<h2 class="wp-block-heading">Further reading</h2>



<ul class="wp-block-list">
<li><a href="https://mcfunley.com/choose-boring-technology"><em>Choose Boring Technology</em></a>, Dan McKinley, 2015.</li>



<li><a href="https://motherduck.com/blog/the-simple-joys-of-scaling-up/"><em>The Simple Joys of Scaling Up</em></a>, Jordan Tigani, 2023.</li>



<li><a href="https://timotijhof.net/posts/2019/protect-yourself-from-npm/"><em>How to protect yourself from npm</em></a>, Timo Tijhof, 2019.</li>



<li><em><a href="https://snarfed.org/2022-03-10_were-drowning-software-dependencies">We’re drowning in software dependencies</a></em>, Ryan Barrett, 2022.</li>



<li><a href="https://blog.jim-nielsen.com/2023/software-crisis-dependencies/"><em>“Out of the Software Crisis”: Dependencies</em></a>, Baldur Bjarnason.</li>



<li><a href="https://blog.danslimmon.com/2023/08/11/squeeze-the-hell-out-of-the-system-you-have/"><em>Squeeze the hell out of the system you have</em></a>, Dan Slimmon, 2023.</li>



<li><a href="https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2018/Participants/Tim_Starling"><em>On language choice and maintenance burden at Wikimedia</em></a>, Tim Starling, 2018.</li>
</ul>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p><strong>Update (6 Sep 2023)</strong>: About HHVM, Wikipedia and Etsy indeed both tried it as PHP5-compatible alternative runtime (no Hacklang). After <a href="https://kinsta.com/blog/hhvm-wordpress/">performance improvements</a> in PHP 7, Wikipedia reverted its <a href="https://techblog.wikimedia.org/2014/12/29/how-we-made-editing-wikipedia-twice-as-fast/">roll out</a> and <a href="https://phabricator.wikimedia.org/T176370">upgraded to PHP 7.2</a>. Etsy also abandoned the <a href="https://www.etsy.com/uk/codeascraft/experimenting-with-hhvm-at-etsy?ref=codeascraft">experiment</a> and <a href="https://www.etsy.com/uk/codeascraft/q1-2015-site-performance-report?ref=codeascraft">partial</a> use and similarly <a href="https://www.etsy.com/uk/codeascraft/q1-2016-site-performance-report?ref=codeascraft">moved to PHP 7</a>, stating <a href="https://www.etsy.com/uk/codeascraft/api-first-transformation-at-etsy-operations?ref=codeascraft">later</a>: &#8220;<em>hhvm was a catalyst for performance improvements that made it into PHP7. We are now completely switched over to PHP7 everywhere</em>&#8220;.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2023/an-internet-of-php/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20An%20Internet%20of%20PHP&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2023%2Fan-internet-of-php%2F">Reply via email</a></p>]]></content:encoded>
					
					<wfw:commentRss>https://timotijhof.net/posts/2023/an-internet-of-php/feed/</wfw:commentRss>
			<slash:comments>5</slash:comments>
		
		
			</item>
		<item>
		<title>Browser adoption rates</title>
		<link>https://timotijhof.net/posts/2023/browser-adoption/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Thu, 16 Feb 2023 20:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Testing]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/?p=418</guid>

					<description><![CDATA[For two years in 2020 and 2021, I shared Wikipedia&#8217;s worldwide browser statistics on Mastodon under #browserstats. They looked a little something like this: As the data includes the browser&#8217;s major version, I wondered whether I could use this to follow the adoption rate through each browser&#8217;s release cycle. The short answer is&#8230; Yes! Here…]]></description>
										<content:encoded><![CDATA[
<p>For two years in 2020 and 2021, I shared Wikipedia&#8217;s worldwide browser statistics on Mastodon under <a href="https://fosstodon.org/tags/browserstats">#browserstats</a>. They looked a little something like this: </p>


<div class="wp-block-image wp-duotone-unset-1">
<figure class="alignright size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/browsers_2021_05.png" alt="Browser data from 3 May 2021 to 29 May 2021." class="wp-image-425" style="width:215px;height:250px" width="215" height="250"/></figure>
</div>


<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Wikipedia.org and sister projects, browserstats for May 2021:</p>



<ul class="wp-block-list">
<li>49%: Chrome + Chrome Mobile</li>



<li>24.7%: Safari + Mobile Safari</li>



<li>5.2%: Firefox + Firefox Mobile</li>



<li>2.8%: Edge</li>



<li>2.5%: Samsung Internet</li>



<li>[…]</li>
</ul>



<p>100% = 16.4 billion page views (not including bots)</p>
</blockquote>



<p>As the data includes the browser&#8217;s major version, I wondered whether I could use this to follow the adoption rate through each browser&#8217;s release cycle. The short answer is&#8230; Yes! Here is what I found as of May 2021:</p>



<ul class="wp-block-list">
<li>Firefox: 1 week (peaks ~87% every 4 weeks).</li>



<li>Edge: 1 week (peaks ~97%, every 6 weeks).</li>



<li>Chrome: 2 weeks (peaks ~91%, every 6 weeks).</li>



<li>Safari: 1-2 months (peaks ~86%, yearly).</li>



<li>Chrome Mobile: 2 weeks (peaks ~80%, every 6 weeks).</li>



<li>Mobile Safari: 4 months (peaks ~92%, yearly).</li>
</ul>



<p>For each browser family I identified the typical adoption &#8220;peak&#8221;, which is the highest percentage of clients having the same major version of that browser during the last six months. I then measured the time it takes for a given version to reach that peak. To discount noise (such as from early betas and fake user agents) I count from 2% to 90% relative to the browser&#8217;s own adoption peak.</p>



<h2 class="wp-block-heading">Firefox (desktop)</h2>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2d8e38b2a794d371.png" alt="" class="wp-image-447" style="width:930px;height:365px" width="930" height="365"/></figure>



<p>Release cadence: every 4 weeks.<br>Adoption peak: ~ 87%.<br>Adoption time: ~ 1 week.</p>



<p>from 1.7% to 78% (2-90% of peak):</p>



<ul class="wp-block-list">
<li>v85: 26 Jan &#8211; 3 Feb.</li>



<li>v86: 23 Feb &#8211; 2 Mar.</li>



<li>v87: 23 Mar &#8211; 31 Mar.</li>
</ul>



<h2 class="wp-block-heading">Microsoft Edge</h2>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/64b8c499ffbd5032.png" alt="" class="wp-image-449" style="width:925px;height:365px" width="925" height="365"/></figure>



<p>Release cadence: every 6 weeks.<br>Adoption peak: ~ 97%.<br>Adoption time: ~ 1 week.</p>



<p>from 1.9% to 87% (2-90% of peak):</p>



<ul class="wp-block-list">
<li>v87: 19 Nov &#8211; 29 Nov.</li>



<li>v88: 21 Jan &#8211; 30 Jan.</li>



<li>v89: 4 Mar &#8211; 12 Mar.</li>
</ul>



<p>As of August 2020, Edge aligns its schedule to Chromium releases.</p>



<h2 class="wp-block-heading">Chrome (desktop)</h2>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/26512abd4c74f83a.png" alt="" class="wp-image-450" style="width:925px;height:364px" width="925" height="364"/></figure>



<p>Release cadence: every 6 weeks.<br>Adoption peak: ~ 91%.<br>Adoption time: ~ 2 weeks.</p>



<p>from 1.8% to 82% (2-90% of peak):</p>



<ul class="wp-block-list">
<li>v86: 7 Oct &#8211; 18 Oct.</li>



<li>v87: (had a bumpy ride).</li>



<li>v88: 20 Jan &#8211; Feb 6.</li>



<li>v89: 3 Mar &#8211; 19 Mar.</li>
</ul>



<h2 class="wp-block-heading">Safari (desktop)</h2>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/d3543a049910138a.png" alt="" class="wp-image-451" style="width:925px;height:368px" width="925" height="368"/></figure>



<p>Release cadence: every 12 months.<br>Adoption peak: ~ 86%.<br>Adoption time: 1-2 months.</p>



<p>from 1.7% to 77% (2-90% of peak):</p>



<ul class="wp-block-list">
<li>v13: 14 Sep 2019 &#8211; 17 Nov 2019.</li>



<li>v14: 16 Sep 2020 &#8211; 25 Dec 2020.</li>
</ul>



<h2 class="wp-block-heading">Chrome Mobile</h2>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/7ace35d7ad4fb08f.png" alt="" class="wp-image-454" style="width:948px;height:371px" width="948" height="371"/></figure>



<p>Release cadence: every 6 weeks.<br>Adoption peak: ~ 80%.<br>Adoption time: ~ 2 weeks.</p>



<p>from 1.6% to 72% (2-90% of peak):</p>



<ul class="wp-block-list">
<li>v86: 7 Oct &#8211; 24 Oct.</li>



<li>v88: 20 Jan &#8211; Feb 3.</li>



<li>v89: 3 Mar &#8211; 19 Mar.</li>
</ul>



<h2 class="wp-block-heading">Mobile Safari (iOS)</h2>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="1950" height="775" src="https://timotijhof.net/wp-content/uploads/3456b6f518ef5d7b.png" alt="" class="wp-image-456"/></figure>



<p>Release cadence: every 12 months.<br>Adoption peak: ~ 92%.<br>Adoption time: ~ 4 months.</p>



<p>from 1.8% to 82% (2-90% of peak):</p>



<ul class="wp-block-list">
<li>iOS 13: 9 Sep 2019 &#8211; 12 Feb 2020.</li>



<li>iOS 14: 16 Sep 2020 &#8211; 31 Dec 2020.</li>
</ul>



<h2 class="wp-block-heading">See also</h2>



<p>You can interact with the adoption graphs on the <a href="https://grafana.wikimedia.org/d/000000218/navigation-timing-by-browser?viewPanel=17&amp;orgId=1&amp;from=now-6M&amp;to=now-1d&amp;var-source=navtiming2&amp;var-metric=loadEventEnd&amp;var-browserFamily=Firefox&amp;var-browserVersion=all&amp;var-m=p75">Navigation Timing by browser</a> dashboard in our public Grafana instance.<sup id="fnr1" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2023/browser-adoption/#fn1" title="Jump to footnote 1">[1]</a></sup></p>



<p>Explore the general browser usage and pageview data for yourself, visually:</p>



<ul class="wp-block-list">
<li><a href="https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-browser/browser-family-and-major-hierarchical-view">by browser family</a>,</li>



<li><a href="https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os">by operating system</a>, and</li>



<li><a href="https://stats.wikimedia.org/#/all-projects/reading/total-page-views/normal|bar|2-year|(access)~desktop*mobile-web+agent~user|monthly">overall Wikipedia pageview counts</a>.</li>
</ul>



<p>Or, access the open data in its pure form:</p>



<ul class="wp-block-list">
<li><a href="https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/browser/">OS/browser dataset</a>,</li>



<li><a href="https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews">Pageview API</a>.</li>
</ul>

<hr><div class="footnotes" role="doc-endnotes">Footnotes:<ol><li id="fn1" role="doc-endnote">I used Navigation Timing instead of the dedicated browser usage data, because the browser usage visualisations focussed only on browser family over time, or total usage of a particular browser version during over a date range. While the data is there, there isn&#8217;t yet a plot for all of one browser&#8217;s major versions over time. In our Grafana dashboard for Navigation Timing data we did have this. The pageview/browser dataset is unsampled, based on aggregate server logs, filtered to only pageviews and non-bots (thus excluding visits to URLs that are not considered pageviews, e.g. when editing articles, or using the account login form, etc.). The Navigation Timing data is randomly 1:1000 sampled, based on any URL where the JS sucessfully loads. <a href="#fnr1" role="doc-backlink" title="Jump back">↩︎</a></li></ol></div><hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2023/browser-adoption/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20Browser%20adoption%20rates&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2023%2Fbrowser-adoption%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>HTTP/2 performance revisited</title>
		<link>https://timotijhof.net/posts/2022/http-2-performance-revisited/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Sun, 20 Nov 2022 06:00:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/?p=395</guid>

					<description><![CDATA[Deploying HTTP/2 support to the Wikimedia CDN significantly changed how browsers negotiate and transfer data during the page load process. We found regressions in performance during the transition and are sharing the lessons we learned. Hello, HTTP/2! In 2016, the Wikimedia Foundation deployed HTTP/2 (or “H2”) support to our CDN. At the time, we used…]]></description>
										<content:encoded><![CDATA[
<p>Deploying HTTP/2 support to the Wikimedia CDN significantly changed how browsers negotiate and transfer data during the page load process. We found regressions in performance during the transition and are sharing the lessons we learned.</p>



<span id="more-395"></span>



<h2 class="wp-block-heading">Hello, HTTP/2!</h2>



<p>In 2016, the Wikimedia Foundation deployed HTTP/2 (or “H2”) support to our CDN. At the time, we used Nginx- for TLS termination and <a href="https://wikitech.wikimedia.org/wiki/Caching_overview">two layers</a> of Varnish for caching. We anticipated a possible speed-up as part of the transition, and also identified opportunities to leverage H2 in our architecture.</p>



<p>The <a href="https://en.wikipedia.org/wiki/HTTP/2">HTTP/2 protocol</a> was standardized through the IETF, with Google Chrome shipping support for the experimental SPDY protocol ahead of the standard. Brandon Black (SRE Traffic) led the deployment and <a href="https://phabricator.wikimedia.org/T96848#1856035">had to make a choice</a> between SPDY and H2. We launched with SPDY in 2015, as H2 support was still lacking in many browsers, and Nginx did not support having both. By May 2016, browser support had picked up and we switched to H2.</p>



<h2 class="wp-block-heading">Goodbye domain sharding?</h2>



<p>You can benefit more from HTTP/2 through domain consolidation. The following improvements were achieved by effectively undoing <a href="https://hpbn.co/http1x/#domain-sharding">domain sharding</a>:</p>



<ul class="wp-block-list">
<li>Faster delivery of static CSS/JS assets. We changed <a href="https://www.mediawiki.org/wiki/ResourceLoader/Architecture">ResourceLoader</a> to no longer use a dedicated cookieless domain (“bits.wikimedia.org”), and folded our asset entrypoint back into the MediaWiki platform for faster requests local to a given wiki domain name (<a href="https://phabricator.wikimedia.org/T107430">T107430</a>).</li>



<li>Speed up mobile page loads, specifically mobile-device “m-dot” redirects. We consolidated the canonical and mobile domains behind the scenes, through DNS. This allows the browser to reuse and carry the same HTTP/2 connection over a <em>cross-domain</em> redirect (<a href="https://phabricator.wikimedia.org/T124482">T124482</a>).</li>



<li>Faster Geo service and faster localized fundraising banner rendering. The Geo service was moved from geiplookup.wikimedia.org to /geoiplookup on each wiki. The service was later removed entirely, in favor of an even faster zero-roundtrip solution (0-RTT): An edge-injected cookie within the Wikimedia CDN (<a href="https://phabricator.wikimedia.org/T100902">T100902</a>, <a href="https://gerrit.wikimedia.org/r/q/owner:ori.livneh%2540gmail.com+message:geo">patch</a>). This transfers the information directly alongside the pageview without the delay of a JavaScript payload requesting it after the fact.</li>
</ul>



<h2 class="wp-block-heading">Could HTTP/2 be slower than HTTP/1?</h2>



<p>During the SPDY experiment, Peter Hedenskog noticed early on that SPDY and HTTP/2 have a very real risk of being slower than HTTP/1. We <a href="https://phabricator.wikimedia.org/T125208">observed</a> this through our <a href="https://wikitech.wikimedia.org/wiki/Performance/Synthetic_testing">synthetic testing</a> infrastructure.</p>



<p>In HTTP/1, all resources are considered equal. When your browser navigates to an article, it creates a dedicated connection and starts downloading HTML from the server. The browser streams, parses, and renders in real-time as each chunk arrives. The browser creates additional connections to fetch stylesheets and images when it encounters references to them. For a typical article, MediaWiki’s stylesheets are notably smaller than the body content. This means, despite naturally being discovered from within (and thus after the start of) the HTML download, the CSS download generally finishes first, while chunks from the HTML continue to trickle in. This is good, because it means we can achieve the First Paint and Visually Complete milestones (above-the-fold) on page views <em>before</em> the HTML has fully downloaded in the background.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/CSS-HTTP1-marked.png" alt="" class="wp-image-402" width="672" height="213"/><figcaption class="wp-element-caption">Page load over HTTP/1.</figcaption></figure>



<p>In HTTP/2, the browser assigns a bandwidth priority to each resource, and resources share a single connection. This is different from HTTP/1, where each resource has its own connection, with lower-level networks and routers dividing their bandwidth equally as two seemingly unrelated connections. During the time where HTML and CSS downloads overlap, HTTP/1 connections each enjoyed about half the available bandwidth. This was enough for the CSS to slip through without any apparent delay. With HTTP/2, we observed that Chrome was not getting any CSS response until after the HTML was mostly done.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/SPDY-CSS.png" alt="" class="wp-image-404" width="951" height="379"/><figcaption class="wp-element-caption">Page load over SPDY.</figcaption></figure>



<p>This HTTP/2 feature can solve a similar issue in reverse. If a webpage suffers from large amounts of JavaScript code and below-the-fold images being downloaded during the page load, under HTTP1 those low-priority resources would compete for bandwidth and starve the critical HTML and CSS downloads. The HTTP/2 priority system allows the browser and server to agree, and give more bandwidth to the important resources first. A bug in Chrome caused CSS to effectively have a lower priority relative to HTML (<a href="https://bugs.chromium.org/p/chromium/issues/detail?id=586938">chromium&nbsp;#586938</a>).</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/first-paint-vs-spdy.png" alt="" class="wp-image-403" width="785" height="513"/><figcaption class="wp-element-caption">First paint regression correlated with SPDY rollout. (Ori Livneh, <a href="https://phabricator.wikimedia.org/T96848#2199791">T96848</a>)</figcaption></figure>



<p>We confirmed the hypothesis by disabling SPDY support on the Wikimedia CDN for a week (<a href="https://phabricator.wikimedia.org/T125979">T125979</a>). After Chrome resolved the bug, we transitioned from SPDY to HTTP/2 (<a href="https://phabricator.wikimedia.org/T166129#3294333">T166129</a>, <a href="https://phabricator.wikimedia.org/T193221">T193221</a>). This transition saw improvements both to how web browsers give signals to the server, and the way Nginx handled those signals.</p>



<p>As it stands today, page load time is overall faster on HTTP/2, and the CSS once again often finishes before the HTML. Thus, we achieve the same great early First Paint and Visually Complete milestones that we were used to from HTTP/1. But, we do still see edge cases where HTTP/2 is sometimes not able to re-negotiate priorities quick enough, causing CSS to needlessly be held back by HTML chunks that have already filled up the network pipes for that connection (<a href="https://bugs.chromium.org/p/chromium/issues/detail?id=849106">chromium #849106</a>, still unresolved as of this writing).</p>



<h2 class="wp-block-heading">Lessons learned</h2>



<p>These difficulties in controlling bandwidth prioritization taught us that domain consolidation isn’t a cure-all. We decided to keep operating our thumbnail service at upload.wikimedia.org through a dedicated IP and thus a dedicated connection, for now (<a href="https://phabricator.wikimedia.org/T116132">T116132</a>).</p>



<p>Browsers may reuse connections for multiple domains if an existing HTTPS connection carries a TLS certificate that includes the other domain in its SNI information, <em>even</em> when this connection is for a domain that corresponds to a different IP address in DNS. Under certain conditions, this can lead to a surprising HTTP 404 error (<a href="https://phabricator.wikimedia.org/T207340">T207340</a>, <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1363451">mozilla&nbsp;#1363451</a>, <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1222136">mozilla&nbsp;#1222136</a>). Emanuele Rocca from SRE Traffic Team mitigated this by implementing HTTP 421 response codes in compliance with the spec. This way, visitors affected by non-compliant browsers and middleware will automatically recover and reconnect accordingly.</p>



<h2 class="wp-block-heading">Further reading</h2>



<ul class="wp-block-list">
<li><em><a href="https://simonhearne.com/2020/network-faster-than-cache/">When Network is Faster than Cache</a></em>, Simon Hearne, 2020.</li>
</ul>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="has-small-font-size">Originally published on <a href="https://techblog.wikimedia.org/2022/11/04/http-2-performance-revisited/">techblog.wikimedia.org</a>.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2022/http-2-performance-revisited/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20HTTP%2F2%20performance%20revisited&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2022%2Fhttp-2-performance-revisited%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How does Internet Archive know?</title>
		<link>https://timotijhof.net/posts/2022/internet-archive-crawling/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Mon, 20 Jun 2022 19:30:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2022/internet-archive-crawling</guid>

					<description><![CDATA[The Internet Archive discovers in real-time when WordPress blogs publish a new post, and when Wikipedia articles reference new sources. How does that work? Wikipedia Wikipedia, and its sister projects such as Wiktionary and Wikidata, run on the MediaWiki open-source software. One of its core features is “Recent changes”. This enables the Wikipedia community to…]]></description>
										<content:encoded><![CDATA[
<p>The Internet Archive discovers in real-time when WordPress blogs publish a new post, and when Wikipedia articles reference new sources. How does that work?</p>



<span id="more-343"></span>



<h2 class="wp-block-heading" id="wikipedia">Wikipedia</h2>



<p>Wikipedia, and its <a href="https://www.wikimedia.org/">sister projects</a> such as Wiktionary and Wikidata, run on the <a href="https://en.wikipedia.org/wiki/MediaWiki">MediaWiki</a> open-source software. One of its core features is “<a href="https://www.mediawiki.org/wiki/Help:Recent_changes">Recent changes</a>”. This enables the Wikipedia community to monitor site activity in real-time. We use it to facilitate anti-spam, counter-vandalism, machine learning, and many more quality and research efforts.</p>



<p>MediaWiki’s built-in REST API exposes this data in machine-readable form to query (or poll). For wikipedia.org, we have an additional <a href="https://github.com/wikimedia/mediawiki/blob/2da0f819371123048cfbd38ce1e1c4831a373a62/includes/DefaultSettings.php#L7989-L8031">RCFeed</a> plugin that broadcasts events to the <code>stream.wikimedia.org</code> service (<a href="https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams">docs</a>).</p>



<p>The service implements the HTTP Server-Sent Events protocol (<a href="https://en.wikipedia.org/wiki/Server-sent_events">SSE</a>). Most programming languages have an SSE client via a popular package. Most exciting to me, though, is the original SSE client: the <code>EventSource</code> API — built straight into the browser.<sup id="fnr1" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2022/internet-archive-crawling/#fn1" title="Jump to footnote 1">[1]</a></sup> This makes <a href="https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Powered_By">cool demos</a> possible, getting started with only the following JavaScript:</p>


<pre class="wp-block-code"><span><code class="hljs language-javascript"><span class="hljs-keyword">new</span> EventSource(<span class="hljs-string">'https://stream.wikimedia.org/…'</span>);</code></span></pre>


<p>And from the command-line, with cURL:</p>


<pre class="wp-block-code"><span><code class="hljs language-bash">$ curl <span class="hljs-string">'https://stream.wikimedia.org/v2/stream/recentchange'</span>

event: message
id: …
data: {<span class="hljs-string">"<span class="hljs-variable">$schema</span>"</span>:…,<span class="hljs-string">"meta"</span>:…,<span class="hljs-string">"type"</span>:<span class="hljs-string">"edit"</span>,<span class="hljs-string">"title"</span>:…}

…</code></span></pre>


<h2 class="wp-block-heading" id="wordpress">WordPress</h2>



<p>WordPress played a major role in the rise of the <a href="https://en.wikipedia.org/wiki/Blog">blogosphere</a>. In particular, ping servers (and pingbacks<sup id="fnr2" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2022/internet-archive-crawling/#fn2" title="Jump to footnote 2">[2]</a></sup>), helped the early blogging community with discovery. The idea: your website notifies a ping server over a standardized protocol. The ping server in turn notifies feed reader services (Feedbin, Feedly), aggregators (FeedBurner), podcast directories, search engines, and more.<sup id="fnr3" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2022/internet-archive-crawling/#fn3" title="Jump to footnote 3">[3]</a></sup></p>



<p>Ping servers today implement the <code>weblogsCom</code> interface (<a href="https://web.archive.org/web/20041019083107/http://www.xmlrpc.com/weblogsCom">specification</a>), introduced in 2001 and based on the XML-RPC protocol.<sup id="fnr4" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2022/internet-archive-crawling/#fn4" title="Jump to footnote 4">[4]</a></sup> The default ping server in WordPress is Automattic’s <a href="https://pingomatic.com/">Ping-O-Matic</a>, which in turn powers the <a href="https://developer.wordpress.com/docs/firehose/">WordPress.com&nbsp;Firehose</a>.</p>



<p>This firehose is a Jabber/XMPP server at <code>xmpp.wordpress.com:8008</code>. It provides events about blog posts published in real-time, from any WordPress site. Both WordPress.com and self-hosted ones.<sup id="fnr5" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2022/internet-archive-crawling/#fn5" title="Jump to footnote 5">[5]</a></sup> The firehose is also available in as HTTP stream.</p>


<pre class="wp-block-code"><span><code class="hljs language-bash">$ curl -vi xmpp.wordpress.com:8008/posts.org.json <span class="hljs-comment"># self-hosted</span>
{ <span class="hljs-string">"published"</span>:<span class="hljs-string">"2022-06-05T21:26:09Z"</span>,
  <span class="hljs-string">"verb"</span>:<span class="hljs-string">"post"</span>,
  <span class="hljs-string">"generator"</span>:{…},
  <span class="hljs-string">"actor"</span>:{…},
  <span class="hljs-string">"target"</span>:{<span class="hljs-string">"objectType"</span>:<span class="hljs-string">"blog"</span>,…,},
  <span class="hljs-string">"object"</span>:{<span class="hljs-string">"objectType"</span>:<span class="hljs-string">"article"</span>,…}
}
{ … }

$ curl -vi xmpp.wordpress.com:8008/posts.json <span class="hljs-comment"># WordPress.com</span>
{ … }
</code></span></pre>


<h2 class="wp-block-heading" id="internet-archive">Internet Archive</h2>



<p>It might be surprising, but the Internet Archive does <em>not</em> try to index the entire Internet. This in contrast to commercial search engines.</p>



<p>The Internet Archive consists of bulk datasets from curated sources (“collections”). Collections are often donated by other organizations, and go beyond capturing web pages. They can also include books, music,<sup id="fnr6" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2022/internet-archive-crawling/#fn6" title="Jump to footnote 6">[6]</a></sup> and software.<sup id="fnr7" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2022/internet-archive-crawling/#fn7" title="Jump to footnote 7">[7]</a></sup> Any captured web pages are additionally surfaced via the Wayback Machine interface.</p>



<p>Perhaps you’ve used the “<a href="https://archive.org/web/">Save&nbsp;Page Now</a>” feature, where you can manually submit URLs to capture. While also represented by <a href="https://archive.org/details/save-page-now">a collection</a>, these actually go to the Wayback Machine first, and appear in bulk as part of the collection later.</p>



<p>The <a href="https://en.wikipedia.org/wiki/Common_Crawl">Common Crawl</a> and <a href="https://archive.org/details/widecrawl&amp;tab=about">Wide Crawl</a> collections represent traditional crawlers. These starts with a seed list, and go breadth-first to every site it finds (within a certain global and per-site depth limit). Such crawl can take months to complete, and captures a portion of the web from a particular period in time — regardless of whether a page was indexed before. Other collection are more narrow in focus, e.g. regularly crawl a news site and capture any articles not previously indexed.</p>



<h3 class="wp-block-heading" id="wikipedia-collection">Wikipedia collection</h3>



<p>One such collection is <a href="https://archive.org/details/wikipediaoutlinks?sort=-publicdate">Wikipedia Outlinks</a>.<sup id="fnr8" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2022/internet-archive-crawling/#fn8" title="Jump to footnote 8">[8]</a></sup> This collection is fed several times a day with bulk crawls of new URLs. The URLs are extracted from recently edited or created Wikipedia articles, as discovered via the events from <code>stream.wikimedia.org</code> (<a href="https://github.com/internetarchive/crawling-for-nomore404/">Source&nbsp;code: crawling-for-nomore404</a>).</p>


<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2022_ia_wikidiff.png" alt="en.wikipedia.org, revision by Krinkle, on 30 May 2022 at 21:03:30." class="wp-image-355" width="480" height="265"/></figure>
</div>


<p>Last month, I <a href="https://en.wikipedia.org/w/index.php?title=VodafoneZiggo&amp;diff=1090690932&amp;oldid=1085345575&amp;diffonly=1&amp;diffmode=visual">edited</a> the VodafoneZiggo article on Wikipedia. My edit added several new citations. The articles I cited were from several years ago, and most already made their way into the Wayback Machine by other means. Among my citations was a 2010 article from an Irish news site (<code>rtl.ie</code>). I searched for it on archive.org and no snapshots existed of that URL.</p>



<p>A day later I searched again, and <a href="https://web.archive.org/web/*/https://www.rte.ie/news/business/2010/0506/130676-vodafone/">there it was</a>!</p>



<div class="wp-block-group is-content-justification-center is-layout-flex wp-container-core-group-is-layout-64b26803 wp-block-group-is-layout-flex">
<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2022_ia_results_calendar.png" alt="web.archive.org found 1 result, captured at 30 May 2022 21:03:55." class="wp-image-358" width="369" height="180"/></figure>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2022_ia_results_collections.png" alt="This capture was collected by: Wikipedia Eventstream." class="wp-image-359" width="360" height="180"/></figure>
</div>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2022_ia_capture_about.png" alt="" class="wp-image-360" width="720" height="122"/></figure>



<p>I should note that, while the snapshot was uploaded a day later, the crawling occurred in real-time. I published my edit to Wikipedia on May 30th, at 21:03:30 UTC. The snapshot of the referenced source article, was captured at 21:03:55 UTC. A mere 25 seconds later!</p>



<p>In addition to archiving citations for future use, Wikipedia also integrates with the Internet Archive in the present. The so-called <a href="https://en.wikipedia.org/wiki/User:InternetArchiveBot">InternetArchiveBot</a> (<a href="https://github.com/internetarchive/internetarchivebot">source code</a>) continously crawls Wikipedia, looking for “dead” links. When it finds one, it searches the Wayback Machine for a matching snapshot, preferring one taken on or near the date that the citation was originally added to Wikipedia. This is important for online citations, as web pages may change over time.</p>



<p>The bot then edits Wikipedia (<a href="https://en.wikipedia.org/w/index.php?title=Air_Force_Cross_(United_States)&amp;diff=prev&amp;oldid=1091533127&amp;diffmode=source">example</a>) to rescue the citation by filling in the archive link.</p>



<div class="wp-block-group is-content-justification-center is-layout-flex wp-container-core-group-is-layout-64b26803 wp-block-group-is-layout-flex">
<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2022_ia_rescue_diff.png" alt="Wikipedia.org, revision by InternetArchiveBot, on 4 June 2022. Rescuing 1 source. The source was originally cited on 29 September 2018. The added archive URL is also from 29 September 2018." class="wp-image-361" width="315" height="185"/></figure>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2022_ia_rescue_about.png" alt="web.archive.org, found 1 result, captured 29 September 2018. This capture was collected by: Wikipedia Eventstream." class="wp-image-362" width="347" height="180"/></figure>
</div>



<h3 class="wp-block-heading" id="wordpress-collection">WordPress collection</h3>



<p>The <a href="https://archive.org/details/NO404-WP?sort=-publicdate">NO404-WP collection</a> on archive.org works in a similar fashion. It is fed by a crawler that uses the WordPress Firehose (<a href="https://github.com/internetarchive/crawling-for-nomore404/">source code</a>). The firehose, as described above, is pinged by individual WordPress sites after publishing a new post.</p>



<p>For example, <a href="https://chriswiegman.com/2022/05/why-hello-apple-im-glad-youre-back-on-your-feet/">this blog post by Chris</a>. According to the post metadata, it was published at 12:00:42 UTC. And by 12:01:55, one minute later, it <a href="https://web.archive.org/web/20220506120155/https://chriswiegman.com/2022/05/why-hello-apple-im-glad-youre-back-on-your-feet/">was captured</a>.<sup id="fnr9" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2022/internet-archive-crawling/#fn9" title="Jump to footnote 9">[9]</a></sup></p>



<p>In addition to preserving blog posts, the NO404-WP collection goes a step further and also captures any new material your post links to. (Akin to Wikipedia citations!) For example, <a href="https://css-tricks.com/how-to-create-block-theme-patterns-in-wordpress-6-0/#aa-use-case-example-1-twenty-twenty-one">this css-tricks.com post</a> links to file on GitHub inside the TT1 Blocks project. This deep link was not captured before and is unlikely to be picked up by regular crawling due to depth limits. It got <a href="https://web.archive.org/web/collections/2022*/https://github.com/WordPress/theme-experiments/blob/master/tt1-blocks/functions.php">captured</a> and uploaded to the NO404-WP collection a few days later.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2022_ia_results_collections2.png" alt="" class="wp-image-363" width="555" height="255"/></figure>



<h2 class="wp-block-heading" id="further-reading">Further reading</h2>



<ul class="wp-block-list">
<li><em><a href="https://www.forbes.com/sites/kalevleetaru/2016/01/18/the-internet-archive-turns-20-a-behind-the-scenes-look-at-archiving-the-web/">The Internet Archive Turns 20: Behind The Scenes</a></em>, Forbes (Kalev Leetaru), 2016.</li>



<li><em><a href="https://blog.archive.org/2013/10/25/fixing-broken-links/">Fixing Broken Links on the Internet</a></em>, blog.archive.org (Alexis Rossi), 2013.</li>



<li><em><a href="https://blog.archive.org/2019/10/23/the-wayback-machines-save-page-now-is-new-and-improved/">‘Save Page Now’ is New and Improved</a></em>, blog.archive.org (Mark Graham), 2019.</li>



<li><em><a href="https://blog.archive.org/2017/04/16/early-macintosh-emulation-comes-to-the-archive/">Early Macintosh Emulation Comes to the Archive</a></em>, blog.archive.org (Jason Scott), 2017.</li>



<li><em><a href="https://blog.archive.org/2019/10/13/2500-more-ms-dos-games-playable-at-the-archive/">2,500 More MS-DOS Games at the Archive</a></em>, blog.archive.org, 2019.</li>
</ul>

<hr><div class="footnotes" role="doc-endnotes">Footnotes:<ol><li id="fn1" role="doc-endnote">The “Server-sent events” technology was around as early as 2006, originating at Opera (<a href="https://dev.opera.com/blog/event-streaming-to-web-browsers/">announcement</a>, <a href="https://en.wikipedia.org/wiki/Server-sent_events#History">history</a>). It was among the first specifications to be drafted through WHATWG, which formed in 2004 after <a href="https://en.wikipedia.org/wiki/WHATWG#History">the W3C XHTML debacle</a>. <a href="#fnr1" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn2" role="doc-endnote">Pingback (<a href="https://wordpress.org/support/article/introduction-to-blogging/#pingbacks">Pingbacks explained</a>, <a href="https://en.wikipedia.org/wiki/Pingback">history</a>) provides direct peer-to-peer discovery between blogs when one post mentions or links to another post. By the way, the Pingback and Server-Sent Events specifications were both written by Ian Hickson. <a href="#fnr2" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn3" role="doc-endnote">Feedbin supports push notifications. While these could come from from its periodic RSS crawling, it tries to deliver these in real-time where possible. It this does by <a href="https://blog.superfeedr.com/ping/api/infrastructure/realtime/ping-me-i-m-famous/">mapping pings from blogs</a> that notify Ping-O-Matic, to feed subscriptions. <a href="#fnr3" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn4" role="doc-endnote">The <code>weblogUpdates</code> spec for <a href="https://en.wikipedia.org/wiki/Ping_(blogging)">Ping servers</a> was writen by Dave Winer in 2001, who took over Weblogs.com around that time (<a href="https://en.wikipedia.org/wiki/Weblogs.com">history</a>) and needed something <a href="http://oldweblogscomblog.scripting.com/2001/10/21">more scalable</a>. This, by the way, is the same <a href="https://en.wikipedia.org/wiki/Dave_Winer">Dave Winer</a> who developed the underlying XML-RPC protocol, the OPML format, and worked on RSS 2.0. <a href="#fnr4" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn5" role="doc-endnote">That is, unless the blog owner opts-out by disabling the “search engine” and “ping” settings in WordPress Admin. <a href="#fnr5" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn6" role="doc-endnote">The <a href="https://archive.org/details/muziekweb">Muziekweb</a> collection is one that stores music rather than web pages. Muziekweb is a library in the Netherlands that lends physical CDs, via local libraries, to patrons. They also digitize their collection for long-term preservation. One cool application of this, is that you can stream any album in full from a library computer. And… they mirror to the Internet Archive! You can <a href="https://archive.org/search.php?query=creator%3A%22Deep+Purple%22&amp;&amp;and&#91;&#93;=collection%3A%22muziekweb%22">search</a> for an artist, and <a href="https://archive.org/details/lp_the-best-of-edith-piaf_edith-piaf_0/disc1/02.06.+Non%2C+Je+Ne+Regrette+Rien+(No+Regrets).mp3">listen</a> online. For copyright reasons, most music is publicly limited to 30s samples. Through <a href="https://en.wikipedia.org/wiki/Controlled_digital_lending">Controlled digital lending</a>, however, you can access many more albums in full. Plus you can publicly stream any music in the public domain, under a free license, or pre-1972 <a href="https://archive.org/details/unlockedrecordings?tab=about">no longer commercially available</a>. <a href="#fnr6" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn7" role="doc-endnote">I find particularly impressive that Internet Archive also host platform emulators for the software it preserves, and that these platforms not only include game consoles but also Macintosh and MS-DOS, and that these emulators are then compiled via Emscripten to JavaScript and integrated right on the archive.org entry! For example, you can play the <a href="https://archive.org/details/PrinceOfPersiaMacintosh">original Prince of Persia</a> for Mac (via <code>pce-macplus.js</code>), the later <a href="https://archive.org/details/msdos_Prince_of_Persia_1990">color edition</a>, or <a href="https://archive.org/details/msdos_Wolfenstein_3D_1992">Wolfenstein 3D</a> for MS-DOS (via <code>js-dos</code> or <code>em-dosbox</code>), or check out Bill Atkinson’s <a href="https://archive.org/details/mac_Paint_2">1985 MacPaint</a>. <a href="#fnr7" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn8" role="doc-endnote">The “Wikipedia Outlinks” collection was originally populated via the <a href="https://archive.org/details/NO404-WKP?sort=-publicdate">NO404-WKP subcollection</a>, which used the <a href="https://wikitech.wikimedia.org/wiki/Irc.wikimedia.org">irc.wikimedia.org service</a> from 2013 to 2019. It was phased out in favour of the <a href="https://archive.org/details/wikipedia-eventstream?sort=-publicdate">wikipedia-eventstream subcollection</a>. <a href="#fnr8" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn9" role="doc-endnote">In practice, the <a href="https://archive.org/details/archiveteam_urls">ArchiveTeam URLs collection</a> tends to beat the NO404-WP collection and thus the latter doesn’t crawl it again. Perhaps the ArchiveTeam scripts also consume the WordPress Firehose? For many WordPress posts I checked, the URL is only indexed once, which is from “ArchiveTeam URLs” doing so within seconds of original publication. <a href="#fnr9" role="doc-backlink" title="Jump back">↩︎</a></li></ol></div><hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2022/internet-archive-crawling/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20How%20does%20Internet%20Archive%20know%3F&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2022%2Finternet-archive-crawling%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>📎 Krinkle Treasure Hunt</title>
		<link>https://timotijhof.net/posts/2021/treasure-hunt/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Fri, 04 Jun 2021 12:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[Linked]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2021/treasure-hunt</guid>

					<description><![CDATA[I miss the era of very Internet-y things, geocities-style scavenger hunts, with easter eggs and all. So, I devised a treasure hunt of my own! → Enter here]]></description>
										<content:encoded><![CDATA[
<p>I miss the era of very Internet-y things, geocities-style scavenger hunts, with easter eggs and all. So, I devised a treasure hunt of my own!</p>



<span id="more-329"></span>



<p><a href="https://treasure21.timotijhof.net/">→ Enter here</a></p>



<figure class="wp-block-image size-full no-filter"><a href="https://treasure21.timotijhof.net/"><img loading="lazy" decoding="async" width="960" height="540" src="https://timotijhof.net/wp-content/uploads/2021_treasure21_poster.png" alt="Find your way through the forrest. Happy hunting!" class="wp-image-336"/></a></figure>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2021/treasure-hunt/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20%F0%9F%93%8E%20Krinkle%20Treasure%20Hunt&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2021%2Ftreasure-hunt%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Profiling PHP in production at scale</title>
		<link>https://timotijhof.net/posts/2020/profiling-php-at-scale/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Fri, 11 Dec 2020 12:00:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2020/profiling-php-at-scale</guid>

					<description><![CDATA[At Wikipedia, we built an efficient sampling profiler for PHP, and use it to instrument live requests. The trace logs and flame graphs are powered by a simple setup that involves only free open-source software, and runs at low infrastructure cost. I’d like to demonstrate that profiling doesn’t have to be expensive, and can even…]]></description>
										<content:encoded><![CDATA[
<p>At Wikipedia, we built an efficient sampling profiler for PHP, and use it to instrument live requests. The trace logs and flame graphs are powered by a simple setup that involves only free open-source software, and runs at low infrastructure cost.</p>



<span id="more-313"></span>



<p>I’d like to demonstrate that profiling doesn’t have to be expensive, and can even be performant enough to run continually in production! The principles in this article should apply to most modern programming languages. We developed <a href="https://github.com/wikimedia/php-excimer/">Excimer</a>, a sampling profiler for PHP; and <a href="https://github.com/wikimedia/arc-lamp/">Arc Lamp</a> for processing stack traces and generating flame graphs.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="2404" height="1001" src="https://timotijhof.net/wp-content/uploads/2020_profiling_figure1_flamegraph_intro.png" alt="Flame graph." class="wp-image-314"/><figcaption class="wp-element-caption">Figure 1: A daily flame graph, from <a href="https://performance.wikimedia.org/php-profiling/">performance.wikimedia.org</a>.</figcaption></figure>



<h2 class="wp-block-heading" id="exhibit-a-the-flame-graph">Exhibit A: The Flame Graph</h2>



<p>Our goal is to help developers understand the performance characteristics of their application through flame graphs. Flame graphs visually describe how and where an application spends its time. You may have seen them while using the browser’s developer tools, or after running an application via a special tool from the command-line.</p>



<p>Profilers often come with a cost – code may run much more slowly when a profiler is active. This cost is fine when investigating something locally or ad-hoc, but it’s not something we always want to apply to live requests.</p>



<p>To generate flame graphs, we sample stack traces from web servers that are serving live traffic. This is achieved through a sampling profiler. We then send the stack traces to a stream, which is then turned into a flame graph.</p>



<p>Our target was to add less than <strong>1 millisecond</strong> to user-facing web requests that complete within 50ms or 200ms, and add under 1% to long-running processes that run for several minutes. And so our journey begins, with the quest for an efficient sampling profiler.</p>



<h2 class="wp-block-heading" id="how-profiling-can-be-expensive">How profiling can be expensive</h2>



<h3 class="wp-block-heading" id="internal-entry-and-exit-hooks">Internal entry and exit hooks</h3>



<p>XHProf is a native extension for PHP. It intercepts the start and end of every function call, and may record function hierarchy, call count, memory usage, etc. When used as a debugger to trace an entire request, it can slow down your application by 3X (+200%).<sup id="fnr1" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2020/profiling-php-at-scale/#fn1" title="Jump to footnote 1">[1]</a></sup></p>



<p>It has a sampled mode in which its entry-exit hooks are reduced to no-ops most of the time, and otherwise records only a stack trace. But this could still run code <a href="https://phabricator.wikimedia.org/T176916#4293822">10-30% slower</a>. The time spent within these hooks for “no-op” cases was fairly small. But, the act of switching to and from such a hook has a cost as well. And, when we intercept every single function in an application, those costs quickly add up.</p>



<p>We also found that the mere presence of these entry-exit hooks prevented the PHP engine from using certain <a href="https://www.mediawiki.org/wiki/Excimer#Background">optimisations</a>. When evaluating performance, compare not only a plugin being used vs not, but also compare to a system with the plugin being entirely uninstalled!</p>



<p>We also looked at external ways to capture stack trace samples, using GDB, or <code>perf_events</code>.</p>



<h3 class="wp-block-heading" id="external-interrupts">External interrupts</h3>



<p>GDB unlocks the full power of the Linux kernel to halt a process in mid-air, break into it, run your code in its local state, and then gets out to let the process resume – all without the process’ awareness.<sup id="fnr2" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2020/profiling-php-at-scale/#fn2" title="Jump to footnote 2">[2]</a></sup></p>



<p><a href="https://en.wikipedia.org/wiki/GNU_Debugger">GDB</a> does this through <code>ptrace</code>, which comes with a relatively high interrupt cost. But, the advantage of this approach is that there is no overhead when the profiling is inactive. Initial exploration showed that taking a single sample could delay the process by <a href="https://phabricator.wikimedia.org/T176916#4301180">a whole second</a> while GDB attached and detached itself. There was some room for improvement here (such as GDB preloading), but it seemed inevitable that the cost would be magnitudes too high.</p>



<h3 class="wp-block-heading" id="perf_events">perf_events</h3>



<p><code>perf_events</code> is a Linux tool that can inspect a process and read its current stack trace. As with GDB, when we’re not looking, the process runs as normal. <code>perf_events</code> takes samples relatively quickly, has growing ecosystem support, and its cost can be <a href="http://www.brendangregg.com/perf.html">greatly minimised</a>.</p>



<p>If your application runs as its own compiled program, such as when using C or Rust, then this solution might be ideal. But, runtimes that use a <a href="https://en.wikipedia.org/wiki/Application_virtual_machine">virtual machine</a> (like PHP, Node.js, or Java), act as an intermediary process with their own way of managing an application’s call stack. All that <code>perf_events</code> would see is the time spent inside the runtime engine itself. This might tell you how internal operations like “assign_variable” work, but is not what we are after.<sup id="fnr3" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2020/profiling-php-at-scale/#fn3" title="Jump to footnote 3">[3]</a></sup><sup id="fnr4" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2020/profiling-php-at-scale/#fn4" title="Jump to footnote 4">[4]</a></sup></p>



<h2 class="wp-block-heading" id="introducing-excimer">Introducing: Excimer</h2>



<p>Excimer is a small C program, with a binding for PHP 7. Its binding can be used to collect sampled stack traces. It leverages two low-level concepts that I’ll briefly describe on their own: POSIX timers, and graceful interrupts.</p>



<h3 class="wp-block-heading" id="posix-timers">POSIX timers</h3>



<p>With a POSIX timer, we directly ask the operating system to notify us after a given amount of time has elapsed. It can notify us in one of several ways. The timer can deliver signal events to a particular process or thread (which we could poll for). Or, the timer can respond by spawning a new concurrent thread in the process, and run a callback there. This last option is known as <code>SIGEV_THREAD</code>.</p>



<h3 class="wp-block-heading" id="graceful-interrupts">Graceful interrupts</h3>



<p>There is a <code>vm_interrupt</code> global flag in the PHP engine that the virtual machine checks during code execution. It’s not a very precise feature, but it is checked at least once before the end of any userland function, which is enough for our purpose.</p>



<p>If during such a check the engine finds that the flag is raised (set to <code>1</code> instead of <code>0</code>), it resets the flag and runs any registered callbacks. The engine uses the same feature for enforcing request timeouts, and thus no overhead is added by using it to facilitate our sampling.</p>



<h2 class="wp-block-heading" id="at-last-we-can-start-sampling">At last, we can start sampling!</h2>



<p>When the Excimer profiler starts, it starts a little POSIX timer, with <code>SIGEV_THREAD</code> as the notification type. To give all code an equal chance of being sampled, the first interval is staggered by a random fraction of the sampling interval.</p>



<p>We’ll also give the timer the raw memory address where the <code>vm_interrupt</code> flag is located (you’ll understand why in a moment). The code to set up this timer is negligible and happens only once for a given web request. After that, the process is left to run as normal.</p>



<p>When the sampling interval comes around, the operating system spawns a new thread and runs Excimer’s timer handler. There isn’t a whole lot we can do from here since we’re in a thread alongside the PHP engine which is still running. We don’t know what the engine is up to. For example, we can’t safely and non-blockingly read the stack trace from here. Its memory may mutate at any time. What we do have is the raw address to the <code>vm_interrupt</code> flag, and we can boldly write a <code>1</code> there! No matter where the engine is at, that much is safe to do.</p>



<p>Not long after, PHP will reach one of its checkpoints and find the flag is raised. It resets the flag and makes a direct inline call to Excimer’s profiling code. Excimer simply reads out a copy of the stack trace, optionally flushing or sending it out, and then PHP resumes as normal.</p>



<p>If the process runs long enough to cover more than one sampling interval, the timer will notify us once more and the above cycle repeats.</p>



<h2 class="wp-block-heading" id="putting-it-all-together">Putting it all together</h2>



<p>It’s time to put our sampling profiler to use!</p>



<ul class="wp-block-list">
<li>Collect – start the profiler and set a flush destination.</li>



<li>Flush – send the traces someplace nice.</li>



<li>Flame graphs – combine the traces and generate flame graphs.</li>
</ul>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="1730" height="586" src="https://timotijhof.net/wp-content/uploads/2020_profiling_figure2_arclamp.png" alt="" class="wp-image-320"/><figcaption class="wp-element-caption">Figure 2: Web servers send stack traces to a Redis stream. This is independently read into a rotated log file and periodically converted to a flame graph.</figcaption></figure>



<h3 class="wp-block-heading" id="collect">Collect</h3>



<p>The application can start the Excimer profiler with a sampling interval and flush callback.</p>


<pre class="wp-block-code"><span><code class="hljs language-php"><span class="hljs-keyword">static</span> $prof = <span class="hljs-keyword">new</span> ExcimerProfiler();
$prof-&gt;setPeriod(<span class="hljs-number">60</span>); <span class="hljs-comment">// seconds</span>
$prof-&gt;setFlushCallback(<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-params">($log)</span> </span>{ ArcLamp::flush($log); });
$prof-&gt;start();</code></span></pre>


<p>The above snippet is from Arc Lamp, as used on Wikipedia. This code would be placed in the early setup phase of your application. In PHP, this could also be placed in an <a href="https://www.php.net/manual/en/ini.core.php#ini.auto-prepend-file"><code>auto_prepend_file</code></a> that automatically applies to your web entry points, without needing any code or configuration inside the application.</p>



<h3 class="wp-block-heading" id="flush">Flush</h3>



<p>Next we need to flush these traces to a place where we can find them later. This place needs to be reachable from all web servers, accept concurrent input at low latencies, and have a fast failure mode. I subscribe to the <a href="https://mcfunley.com/choose-boring-technology">“boring technology”</a> ethos, and so if you have existing infrastructure in use for something like this, I’d start with that. (e.g. ZeroMQ, or rsyslog/Kafka.)</p>



<p>At Wikimedia Foundation, we choose Redis for this. We ingest about 3 million samples daily from a cluster of 150 Apache servers in any given data centre, using a 60s sample interval. These are all received by a single Redis instance.</p>



<h3 class="wp-block-heading" id="flame-graphs">Flame Graphs</h3>



<p>Arc Lamp consumes the Redis stream and writes the trace logs in batches to locally rotated files. You can configure how to split and join these. For example, we split incoming samples by “web”, “api”, or “job queue” entry point; and join by the hour, and by full day.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="2378" height="681" src="https://timotijhof.net/wp-content/uploads/2020_profiling_figure3_flamegraph_summary.png" alt="" class="wp-image-322"/></figure>



<p>You can browse our daily flame graphs on <a href="https://performance.wikimedia.org/php-profiling/">performance.wikimedia.org</a>, or check out the <a href="https://github.com/wikimedia/arc-lamp/">Arc Lamp</a> and <a href="https://github.com/wikimedia/php-excimer/">Excimer</a> projects.</p>



<p><em>Thanks to: Tim Starling who single-handedly developed Excimer, Stas Malyshev for his insights on PHP internals, Kunal Mehta as Debian developer and fellow Wikimedian who packaged Excimer, and Ori Livneh who originally created Arc Lamp and got me into all this.</em></p>



<h2 class="wp-block-heading" id="further-reading">Further reading</h2>



<ul class="wp-block-list">
<li><em><a href="https://diff.wikimedia.org/2014/12/29/how-we-made-editing-wikipedia-twice-as-fast/">How editing Wikipedia became twice as fast on HHVM</a></em>, Ori Livneh, 2015.</li>



<li><em><a href="https://launchdarkly.com/blog/how-the-wikimedia-foundation-successfully-migrated-to-php7/">How Wikimedia Foundation successfully migrated to PHP 7</a></em>, Effie Mouzeli, 2019.</li>



<li><em><a href="https://techblog.wikimedia.org/2019/12/16/wikimediadebug-v2-is-here/#how-does-it-all-work">WikimediaDebug for PHP is here</a></em>, Timo Tijhof, 2019.</li>



<li><em><a href="https://nikic.github.io/2017/04/14/PHP-7-Virtual-machine.html">PHP Virtual Machine: vm_interrupt</a></em>, Nikita Popov, 2017.</li>



<li><a href="http://www.brendangregg.com/flamegraphs.html">Flame Graphs</a> by Brendan Gregg.</li>



<li><a href="https://en.wikipedia.org/wiki/Perf_%28Linux%29">perf_events (Linux)</a> on Wikipedia.</li>



<li><a href="http://www.phpinternalsbook.com/php7/extensions_design/hooks.html">PHP Internals Handbook: All about hooks</a>.</li>



<li><a href="https://man7.org/linux/man-pages/man2/timer_create.2.html">POSIX timer: notify options</a>, Linux manual pages.</li>



<li><a href="https://wikitech.wikimedia.org/wiki/Wikimedia_infrastructure">Wikimedia Foundation: Site infrastructure</a>.</li>
</ul>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="has-small-font-size">Originally published in <a href="https://calendar.perfplanet.com/2020/profiling-php-in-production-at-scale/">Performance Calendar 2020</a>. Also published on <a href="https://techblog.wikimedia.org/2021/03/03/profiling-php-in-production-at-scale/">techblog.wikimedia.org</a>.</p>



<p></p>

<hr><div class="footnotes" role="doc-endnotes">Footnotes:<ol><li id="fn1" role="doc-endnote">We already used XHProf as a debugger for capturing complete and unsampled profiles over a single web request. The original <a href="https://github.com/phacility/xhprof">php-xhprof</a> targeted PHP 5. When we migrated to HHVM, we continued using its built-in port of XHProf. We since migrated to PHP 7 and use <a href="https://github.com/tideways/php-xhprof-extension">php-tideways</a>, which is a maintained alternative with PHP 7 support. The original xhprof has since published an <a href="https://github.com/phacility/xhprof/tree/dab44f76da5c8a0d4f1339f7d2ea2bc42408e8e9">experimental</a> branch with tentative PHP 7 support. <a href="#fnr1" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn2" role="doc-endnote">See also <em><a href="https://dom.as/2009/02/15/poor-mans-contention-profiling/">Poor man’s contention profiling</a></em> (Domas Mituzas, 2009), in which GDB is used to profile a MySQL server. <a href="#fnr2" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn3" role="doc-endnote">If the VM runtime includes a JIT compiler, then perf_events could be used still. With a <a href="https://en.wikipedia.org/wiki/Just-in-time_compilation">JIT compiler</a>, the runtime compiles your source code into machine code, which then becomes a native part of the VM’s process. The VM would call these unnamed chunks of machine code directly by their memory address. This is a bit like how “eval” can create functions in a scripting language. You then need a <code>perf.map</code> file so that <code>perf_events</code> can turn these unnamed addresses back into the names of classes and methods from which a chunk of code originated. This is known as symbol translation. There <a href="http://www.brendangregg.com/perf.html#JIT_Symbols">is support</a> for perf map files in Node.js and Java. <a href="#fnr3" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn4" role="doc-endnote">PHP 8.0 was <a href="https://www.php.net/releases/8.0/en.php">announced</a> last week, and includes <a href="https://wiki.php.net/rfc/jit">a new JIT</a> with perf.map support. I look forward to exploring this over the coming year! <a href="#fnr4" role="doc-backlink" title="Jump back">↩︎</a></li></ol></div><hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2020/profiling-php-at-scale/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20Profiling%20PHP%20in%20production%20at%20scale&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2020%2Fprofiling-php-at-scale%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>📎 Interview on Uses This</title>
		<link>https://timotijhof.net/posts/2020/uses-this/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Wed, 07 Oct 2020 12:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[Linked]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2020/uses-this</guid>

					<description><![CDATA[Today, yours truly got to add his bit.]]></description>
										<content:encoded><![CDATA[
<p>Daniel’s <em>Uses This</em> interview series has been a long-time resident in my feed reader. The over <a rel="noreferrer noopener" href="https://usesthis.com/categories/" target="_blank">1,000 interviews</a> feature everyone from the people behind <a rel="noreferrer noopener" href="https://usesthis.com/interviews/graham.linehan/" target="_blank">The IT Crowd</a>, <a rel="noreferrer noopener" href="https://usesthis.com/interviews/justin.frankel/" target="_blank">Winamp</a>, <a rel="noreferrer noopener" href="https://usesthis.com/interviews/joe.armstrong/" target="_blank">Erlang</a>, and <a rel="noreferrer noopener" href="https://en.wikipedia.org/wiki/Brian_Kernighan" target="_blank">Unix</a>; to some of my personal heroes such as <a rel="noreferrer noopener" href="https://usesthis.com/interviews/vi.hart/" target="_blank">Vi Hart</a>, <a rel="noreferrer noopener" href="https://usesthis.com/interviews/chris.coyier/" target="_blank">Chris Coyier</a>, <a rel="noreferrer noopener" href="https://usesthis.com/interviews/cassidy.williams/" target="_blank">Cassidy Williams</a>, <a rel="noreferrer noopener" href="https://usesthis.com/interviews/john.gruber/" target="_blank">John Gruber</a>, and <a rel="noreferrer noopener" href="https://usesthis.com/interviews/brendan.gregg/" target="_blank">Brendan Gregg</a>.</p>



<p>Today, yours truly got to add his bit.</p>



<p><a target="_blank" rel="noreferrer noopener" href="https://usesthis.com/interviews/timo.tijhof/">→ usesthis.com</a></p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2020/uses-this/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20%F0%9F%93%8E%20Interview%20on%20Uses%20This&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2020%2Fuses-this%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Should I substr(), substring(), or slice()?</title>
		<link>https://timotijhof.net/posts/2020/substr-substring-slice/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Sat, 26 Sep 2020 12:00:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[JavaScript]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2020/substr-substring-slice</guid>

					<description><![CDATA[What’s the deal with these string methods, and how are they different? String substr() str.substr(start[, length]) This method takes a start index, and optionally a number of characters to read from that start index with the default being to read until the end of the string. The start parameter may be a negative number, for…]]></description>
										<content:encoded><![CDATA[
<p>What’s the deal with these string methods, and how are they different?</p>



<span id="more-305"></span>



<h2 class="wp-block-heading" id="string-substr">String substr()</h2>



<pre class="wp-block-preformatted">str.substr(start[, length])</pre>



<p>This method takes a start index, and optionally a number of characters to read <em>from</em> that start index with the default being to read until the end of the string.</p>


<pre class="wp-block-code"><span><code class="hljs language-javascript"><span class="hljs-string">'foobar'</span>.substr(<span class="hljs-number">2</span>, <span class="hljs-number">3</span>); <span class="hljs-comment">// "oba"</span></code></span></pre>


<p>The <code>start</code> parameter may be a negative number, for starting relative from the end.</p>



<p>Note that only the first parameter of <code>substr()</code> supports negative numbers. This in contrast to most methods you may be familiar with that support negative offsets, such as <code>String#slice()</code> or <code>Array#slice()</code>. The second parameter may not be negative. In fact, it isn’t an end index at all. Instead, it is the (maximum) number of characters to return.</p>



<p>But, in Internet Explorer 8 (and earlier IE versions), the <code>substr()</code> method deviates from the ECMAScript spec. Its <code>start</code> parameter doesn’t support negative numbers. Instead, these are <strong>silently ignored</strong> and treated as zero. (I noticed this in 2014, shortly before we gracefully disabled JavaScript for IE&nbsp;8 on Wikipedia.)</p>



<p>IE&nbsp;8:</p>


<pre class="wp-block-code"><span><code class="hljs language-javascript"><span class="hljs-string">'faux'</span>.substr( <span class="hljs-number">-1</span> ); <span class="hljs-comment">// "faux"</span></code></span></pre>


<p>Standard behaviour:</p>


<pre class="wp-block-code"><span><code class="hljs language-javascript"><span class="hljs-string">'faux'</span>.substr( <span class="hljs-number">-1</span> ); <span class="hljs-comment">// "x"</span></code></span></pre>


<p>And, the name and signature of <code>substr()</code> are deceptively similar to those of the <code>substring()</code> method.</p>



<h2 class="wp-block-heading" id="string-substring">String substring()</h2>



<pre class="wp-block-preformatted">str.substring(start[, end])</pre>



<p>This method takes a start index, and optionally an end index. At glance, a very simple and low-level method. No relative lengths, negative offsets, or any other trickery. Right?</p>



<p>Behold! The two parameters <strong>automatically swap</strong> if <code>start</code> is larger than <code>end</code>.<sup id="fnr1" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2020/substr-substring-slice/#fn1" title="Jump to footnote 1">[1]</a></sup></p>


<pre class="wp-block-code"><span><code class="hljs language-javascript"><span class="hljs-string">'foobar'</span>.substring(<span class="hljs-number">1</span>, <span class="hljs-number">4</span>); <span class="hljs-comment">// "oob"</span>
<span class="hljs-string">'foobar'</span>.substring(<span class="hljs-number">4</span>, <span class="hljs-number">1</span>); <span class="hljs-comment">// "oob", also!</span></code></span></pre>


<p>Unexpected values such as <code>null</code>, <code>undefined</code>, or <code>NaN</code> are silently treated as zero. For <code>substring()</code> this also applies to negative numbers.</p>



<p>And, of course, the name and signature of <code>substring()</code> are deceptively similar to <code>substr()</code>.</p>



<h2 class="wp-block-heading" id="string-slice">String slice()</h2>



<pre class="wp-block-preformatted">str.slice(start[, end])</pre>



<p>This method takes a start index, and optionally an end index that defaults to the end of the string. Either parameter may be a negative number, which is interpreted as a relative offset from the end of the string.</p>



<p>I found no defects in browsers or JavaScript engines implementing this method. And it has been around since the <a target="_blank" rel="noreferrer noopener" href="https://developer.mozilla.org/en-US/docs/Archive/Web/JavaScript/New_in_JavaScript/1.2">beginning of time</a>.</p>



<p>Its only weakness is also its greatest strength — full support for negative numbers.</p>



<p>One might think this can be ignored for cases where you only intend to work with positive numbers. You’d be right, until you write code like the following:</p>


<pre class="wp-block-code"><span><code class="hljs language-javascript">start = something.indexOf(needle); <span class="hljs-comment">// returns -1 if needle not found.</span>
remainder = str.slice(start); <span class="hljs-comment">// oops, -1 means something else here!</span>
</code></span></pre>


<p>The notion of negative offsets was confusing to me when I first learned it. But, over the years, I’ve come to appreciate it and it actually became second nature to think about offsets in this way. If you’re unfamiliar, see the examples below.</p>



<h2 class="wp-block-heading" id="conclusion">Conclusion</h2>



<p>Let’s compare these methods once more:</p>


<pre class="wp-block-code"><span><code class="hljs language-javascript">str = <span class="hljs-string">'foobarb…z'</span>;

<span class="hljs-comment">// Strip start "foo" &gt; "barb…z"</span>
str.slice(<span class="hljs-number">3</span>);
str.substring(<span class="hljs-number">3</span>);
str.substr(<span class="hljs-number">3</span>);

<span class="hljs-comment">// Strip end "z" &gt; "foobarb…"</span>
str.slice(<span class="hljs-number">0</span>, <span class="hljs-number">-1</span>);
str.substring(<span class="hljs-number">0</span>, str.length - <span class="hljs-number">1</span>);
str.substr(<span class="hljs-number">0</span>, str.length - <span class="hljs-number">1</span>);

<span class="hljs-comment">// Strip "foo" and "z" &gt; "barb…"</span>
str.slice(<span class="hljs-number">3</span>, <span class="hljs-number">-1</span>);
str.substring(<span class="hljs-number">3</span>, str.length - <span class="hljs-number">1</span>);
str.substr(<span class="hljs-number">3</span>, str.length - <span class="hljs-number">3</span> - <span class="hljs-number">1</span>); <span class="hljs-comment">// 👀</span>

<span class="hljs-comment">// Extract start &gt; "foo"</span>
str.slice(<span class="hljs-number">0</span>, <span class="hljs-number">3</span>);
str.substring(<span class="hljs-number">0</span>, <span class="hljs-number">3</span>);
str.substr(<span class="hljs-number">0</span>, <span class="hljs-number">3</span>);

<span class="hljs-comment">// Extract end &gt; "z"</span>
str.slice(<span class="hljs-number">-1</span>);
str.substring(str.length - <span class="hljs-number">1</span>);
str.substr(str.length - <span class="hljs-number">1</span>); <span class="hljs-comment">// Compat</span>
str.substr(<span class="hljs-number">-1</span>); <span class="hljs-comment">// Modern</span>

<span class="hljs-comment">// Extract 4 chars at &#091;3] &gt; "barb"</span>
str.slice(<span class="hljs-number">3</span>, <span class="hljs-number">3</span> + <span class="hljs-number">4</span>);
str.substring(<span class="hljs-number">3</span>, <span class="hljs-number">3</span> + <span class="hljs-number">4</span>);
str.substr(<span class="hljs-number">3</span>, <span class="hljs-number">4</span>); <span class="hljs-comment">// 👀</span>
</code></span></pre>


<p>None of these seem unreasonable, in isolation. It’s nice that <code>slice()</code> allows negative offsets. It’s nice that <code>substring()</code> may limit the damage of accidentally negative offsets. It’s nice that <code>substr()</code> allows extracting a specific number of characters without needing to add to the start index.</p>



<p>But having all three? That can incur a very real cost on development in the form of doubt, confusion, and — inevitably — mistakes. I don’t think any of these is worth that cost over some minute localised benefit.</p>



<p>I find <code>substr()</code> or <code>substring()</code> cast doubt on surrounding code. I need to second-guess the author’s intentions when reviewing or debugging such code. Which is wasteful even, or especially, when they (or I) use them correctly.</p>



<p><em>But what about unit tests?</em> Well, there’s sufficient overlap between the three that a couple of good tests may very well pass. It’s easy to forget exercising every possible value for a parameter, especially one that is passed through to a built-in. We usually don’t question whether the built-in method works. The question is – did we use the right method?</p>



<p>This ubiquitous signature of <code>slice()</code> is well-understood. It is a de facto standard in technology, seen in virtually all programming languages. It is applied to strings, arrays, and sequences of all sorts. As such, that’s the one I tend to prefer.</p>



<p>But more important than which one you choose, I think, is the act of choosing itself. Eliminating the other methods from your work environment reduces cognitive overhead in development, with one less worry whilst reading code, and one less decision when writing it.<sup id="fnr2" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2020/substr-substring-slice/#fn2" title="Jump to footnote 2">[2]</a></sup></p>

<hr><div class="footnotes" role="doc-endnotes">Footnotes:<ol><li id="fn1" role="doc-endnote">This “argument swapping” behaviour in <code>substring()</code> has existed since the original JavaScript 1.0 as implemented in Netscape 2 (1996), and reverse-engineered by Microsoft in IE 3. The behaviour was <a href="https://web.archive.org/web/19971015223714/http://developer.netscape.com/library/documentation/communicator/jsguide/js1_2.htm" target="_blank" rel="noreferrer noopener">briefly removed</a> by Netscape 4 with <a href="https://developer.mozilla.org/en-US/docs/Archive/Web/JavaScript/New_in_JavaScript/1.2#Changed_functionality_in_JavaScript_1.2" target="_blank" rel="noreferrer noopener">JavaScript 1.2</a> in June 1997, but that same month the misfeature finished its <a href="https://www.ecma-international.org/archive/ecmascript/1996/index.html" target="_blank" rel="noreferrer noopener">fast-tracked standardisation</a> as part of <a href="https://ecma-international.org/publications-and-standards/standards/ecma-262/" target="_blank" rel="noreferrer noopener">ECMAScript 1</a>. Thus, the misfeature returned in 1998 with the release of Netscape 4.5 and JavaScript 1.3, which aligned itself with the new specification. <a href="#fnr1" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn2" role="doc-endnote">In 2014, I wrote <a rel="noreferrer noopener" href="https://gerrit.wikimedia.org/r/c/mediawiki/core/+/158108" target="_blank">a lengthy code review</a> about the string methods which, after much delay, I used as the basis for this article. <a href="#fnr2" role="doc-backlink" title="Jump back">↩︎</a></li></ol></div><hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2020/substr-substring-slice/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20Should%20I%20substr%28%29%2C%20substring%28%29%2C%20or%20slice%28%29%3F&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2020%2Fsubstr-substring-slice%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Many dots, do not a query make</title>
		<link>https://timotijhof.net/posts/2020/many-dots-do-not/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Thu, 09 Jan 2020 12:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[Engineering stories]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2020/many-dots-do-not</guid>

					<description><![CDATA[How a long sequence of dots allowed a regex to reach its internal stack limit. Premise Wikipedia’s production error logs were reporting an increase in app crashes from the search results page. The internal Logstash error report looked as follows: [RuntimeException] Cannot consume query at offset 0 (need to go to 7296) at mediawiki/…/CirrusSearch: QueryStringRegexParser->nextToken…]]></description>
										<content:encoded><![CDATA[
<p>How a long sequence of dots allowed a regex to reach its internal stack limit.</p>



<span id="more-224"></span>



<h2 class="wp-block-heading" id="premise">Premise</h2>



<p>Wikipedia’s production error logs were reporting an increase in app crashes from the search results page. The internal Logstash error report looked as follows:</p>



<pre class="wp-block-preformatted">[RuntimeException]
Cannot consume query at offset 0 (need to go to 7296)

at mediawiki/…/CirrusSearch: QueryStringRegexParser->nextToken
at mediawiki/…/CirrusSearch: QueryStringRegexParser->parse
at mediawiki/…/CirrusSearch: SearchQueryBuilder::newFTSearchQueryBuilder</pre>



<p>What caused this?</p>



<h2 class="wp-block-heading" id="background">Background</h2>



<p>Wikipedia’s search experience is provided by the <a rel="noreferrer noopener" href="https://www.mediawiki.org/wiki/Extension:CirrusSearch" target="_blank">CirrusSearch</a> extension for MediaWiki. It is internally backed by an Elasticsearch cluster.</p>



<p>There are a number of custom operators supported in the search field, such as wildcards, excluded words, and things like <code>incategory:</code> and <code>intitle:</code>. These are parsed by the plugin’s middleware and turned into a structured query sent to the Elastic API.</p>



<p>While each error report had a different URL and search query, I noticed most of them had something in common: the search query consisted mostly of dots. For example:</p>



<pre class="wp-block-preformatted">https://de.wikipedia.org/w/?search=.................. (3000 dots)</pre>



<p>Such an odd query might not need to yield a useful response, but it is important that it not crash the application. Doing so leaves the user stranded with an unhelpful “Internal server error” page. It can also interfere with on-going deployments as raised error levels usually indicate that a recent software update caused a problem.</p>



<h2 class="wp-block-heading" id="investigation">Investigation</h2>



<p><a rel="noreferrer noopener" href="https://phabricator.wikimedia.org/p/dcausse/" target="_blank">David Causse</a> (Search Platform team) led the investigation (<a rel="noreferrer noopener" href="https://phabricator.wikimedia.org/T236419" target="_blank">task T236419</a>).</p>



<p>The RuntimeException comes from a safeguard, in the parser for incoming search queries. The guard exists toward the end of the parsing code, and should never be reached. It is an indication that a problem appeared previously. The problem was narrowed down to a failure executing the following regex:</p>


<pre class="wp-block-code"><span><code class="hljs language-css">/\<span class="hljs-selector-tag">G</span>(?&lt;<span class="hljs-selector-tag">negated</span>&gt;<span class="hljs-selector-attr">&#91;-!]</span>(?=<span class="hljs-selector-attr">&#91;\w]</span>))?(?&lt;<span class="hljs-selector-tag">word</span>&gt;(?:\\\\.|<span class="hljs-selector-attr">&#91;!-]</span>(?!")|<span class="hljs-selector-attr">&#91;^<span class="hljs-string">"!\pZ\pC-])+)/</span></span></code></span></pre>


<p>This regex looks complex, but it can actually be simplified to:</p>



<pre class="wp-block-preformatted">/(?:ab|c)+/</pre>



<p>This regex still triggers the problematic behavior in PHP. It fails with a <code>PREG_JIT_STACKLIMIT_ERROR</code>, when given a long string. Below is a reduced test case:</p>


<pre class="wp-block-code"><span><code class="hljs language-php">$ret = preg_match(<span class="hljs-string">'/(?:ab|c)+/'</span>, str_repeat(<span class="hljs-string">'c'</span>, <span class="hljs-number">8192</span>));
<span class="hljs-keyword">if</span> ($ret === <span class="hljs-keyword">false</span>) {
    <span class="hljs-keyword">print</span>(<span class="hljs-string">"failed with: "</span> . preg_last_error());
}</code></span></pre>


<ul class="wp-block-list">
<li>Fails when given 1365 contiguous <code>c</code> on PHP 7.0.</li>



<li>Fails with 2731 characters on PHP 7.2, PHP 7.1, and PHP 7.0.13.</li>



<li>Fails with 8192 characters on PHP 7.3. (Might be due to <a target="_blank" rel="noreferrer noopener" href="https://github.com/php/php-src/commit/bb2f1a683003559ada1c70166557bd7ac2845a11">php-src@bb2f1a6</a>).</li>
</ul>



<p>In the end, the fix we applied was to split the regex into two separate ones, and remove the non-capturing group with a quantifier, and loop through at the PHP level (<a rel="noreferrer noopener" href="https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/546209" target="_blank">change  546209</a>).</p>



<p>The lesson learned here is that the code did not properly check the return value of <code>preg_match</code>, this is even more important as the size allowed for the JIT stack changes between PHP versions.</p>



<p>For future reference, David concluded: The regex could be optimized to support more chars (~3 times more) by using atomic groups, like so <code>/(?&gt;ab|c)+/</code>.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2020/many-dots-do-not/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20Many%20dots%2C%20do%20not%20a%20query%20make&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2020%2Fmany-dots-do-not%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>To throw or not to throw, that is the question</title>
		<link>https://timotijhof.net/posts/2019/to-throw-or-not/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Sun, 08 Dec 2019 12:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[Engineering stories]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2019/to-throw-or-not</guid>

					<description><![CDATA[Why does software accept invalid data? And, at what software layer should we reject it? Also, what are “namespaces” and “special pages” on Wikipedia? Premise One day, our server monitoring was reporting a high frequency of fatal errors from web servers. Over 10,000 an hour. The majority shared a single root cause – The program…]]></description>
										<content:encoded><![CDATA[
<p>Why does software accept invalid data? And, at what software layer should we reject it? Also, what are “namespaces” and “special pages” on Wikipedia?</p>



<span id="more-211"></span>



<h2 class="wp-block-heading" id="premise">Premise</h2>



<p>One day, our server monitoring was reporting a high frequency of fatal errors from web servers. Over 10,000 an hour. The majority shared a single root cause – The program attempted to find the discussion space for a page that didn’t support discussions.</p>



<p>Why was the program trying to do this? And how should the software behave when asked to do something it cannot?</p>



<h2 class="wp-block-heading" id="background">Background</h2>



<h3 class="wp-block-heading" id="namespaces-and-special-pages">Namespaces and Special pages</h3>



<p>The MediaWiki software that powers Wikipedia has a concept of titles and namespaces. Each article (or “wiki page”) has a title. And each title can belong to one of several namespaces.</p>



<p>The pages that contain the encyclopaedic content you’re familiar with, exist under the Article namespace. These are accessed via URLs such as <code>/wiki/Some_subject</code>.</p>



<p>Each Article also has an associated wiki page under the so-called “Talk” namespace. For example, <code>Talk:Some_subject</code>. This is a place where conversations about the article take place. (Questions, concerns, suggestions, and other discussion threads.)</p>



<p>Beyond this, there are many more namespaces. “File” pages represent an uploaded multimedia file, “User” pages represent individual contributors and their profile pages, and so on. Each of these namespaces has an associated talk namespace as well (“File talk”, “User talk”, etc.).</p>



<p>Lastly, there is the “Special” namespace of pages. These do not represent things that can be created or edited by contributors. Instead, this space is reserved for software features. For example, the account sign up page is a “special” page (at <code>Special:Create_account</code>). These do not have a discussion space. That is, there is no “Special talk” namespace.</p>



<h3 class="wp-block-heading" id="specialcontributions">Special:Contributions</h3>



<p>The special page we’ll take a closer look at today is “User contributions” (at <code>Special:Contributions</code>). This is where you can see the contribution history of a specific editor. Besides the mandatory username field, there are date filters, and namespace filters. The namespace filter also allows one to search through any associated namespaces.</p>



<p>Because the “Special” namespace does not contain wiki pages, and thus no contributions, it is not listed in this dropdown menu.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="1504" height="764" src="https://timotijhof.net/wp-content/uploads/2019_stories3_form.png" alt="The Special:Contributions form contains a &quot;Namespace&quot; dropdown menu with options such as &quot;Article&quot;, &quot;Talk&quot;, &quot;User&quot;, and &quot;File&quot;. It also has a checkbox for &quot;Include associated namespace&quot;." class="wp-image-212"/></figure>



<h2 class="wp-block-heading" id="the-problem">The Problem</h2>



<p>Some users browsed URLs to Special:Contributions with the namespace ID of “Special” selected. While this wasn’t an option in the user interface, the request handler did not reject it. After all, it <em>is</em> a valid namespace. Just one that contains no user contributions.</p>



<p>By itself, such query would actually succeed. In so far, that it simply yields no results. It works as well as could be expected.</p>



<p>Where it went wrong is if one would also tick the “Include associated namespace” checkbox.</p>



<p>This forced the software to filter the query to one of two possible namespace IDs. The ID of the “Special” namespace, and the ID of its associated namespace. Except, there is no associated namespace for Special! The code in charge of associating namespaces had no choice but to abort. The question it was asked demanded a specific answer, but it could not give any.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2019_stories3_error.png" alt="Users were shown an &quot;Internal error&quot; page, stating a fatal exception had ocurred, with an Error Code next to it." class="wp-image-214" width="635" height="230"/></figure>



<p>The error report reads as follows (<a rel="noreferrer noopener" href="https://phabricator.wikimedia.org/T150324" target="_blank">task&nbsp;T150324</a>):</p>



<pre class="wp-block-preformatted">Exception: getAssociated is not valid for the Special namespace.

at Namespace.php: Namespace::isMethodValidFor()
at pagers/ContribsPager.php: Namespace::getAssociated()
at pagers/ContribsPager.php: ContribsPager-&gt;getNamespaceCond()
…
at MediaWiki.php: SpecialContributions-&gt;execute()
at index.php: MediaWiki-&gt;run()
</pre>



<h2 class="wp-block-heading" id="the-investigation">The Investigation</h2>



<h3 class="wp-block-heading" id="accepting-invalid-data">Accepting invalid data</h3>



<p>Do we need to change anything, or is the program already good enough?</p>



<p>There are no contributions under the Special namespace. And, there is also no talk space for discussions about these non-existent contributions. The desired outcome isn’t for there to be results, as there can’t be any.</p>



<p>But, we also can’t prevent our editors (or their apps) from asking for results. Perhaps an older app version did list “Special” as option, or another system mistakenly opens the form the wrong way. Or, someone may be intentionally manipulating the system via its URL. It can happen. And when it does, the server has to respond in some way.</p>



<p>So far, the server was responding by crashing… If that happens a lot, alarm bells will ring about a potential outage being underway. When we crash without explanation, end-users (or developers working on an app) can’t tell what’s wrong. Were our servers malfunctioning? Or did the user do something wrong?</p>



<h3 class="wp-block-heading" id="rejecting-invalid-data">Rejecting invalid data</h3>



<p>I sometimes think about software as an onion. At its outer layer, anything can happen. We don’t control what end-users and external systems try to do. If we encounter invalid input, we generally prefer to respond clearly. For example, by explaining the nature of the problem so that users may correct it, and carry on.</p>



<p>At this outer layer, bad input is not unexpected and should not cause our software to crash. And, to avoid false alarms in the backend, we need to distinguish end-user mistakes from real bugs in our code. Ideally crashes only happen if there is a bug in the program. It may be worth measuring in the backend when an end-user mistake happens. (For example, it might help you understand that the user-interface is confusing to users.) But, such instrumentation should stand separate from the technical question of whether the system is in full working order.</p>



<h3 class="wp-block-heading" id="who-is-in-charge-and-who-is-responsible">Who is in charge, and who is responsible?</h3>



<p>Once past the outer layer, there are many more layers to our “onion”. Each layer gets closer to core business logic.</p>



<p>A question like “What are recent contributions by user X?” is subdivided into many small instructions and questions (or “functions”). One such function will answer to “<em>What is the talk namespace for a given title?</em>”. This would answer “Talk” for “Article”, and “File_talk” for “File”.</p>



<p>The “Associated namespaces” option on Special:Contributions, uses that function.</p>



<p>If one of the contributions is for a page that has no discussion namespace, what should we do? Show no results at all? Skip that one edit and tell the user “1 edit was hidden”? Or show it anyway, but without the “talk” portion? This is a decision the inner layer cannot make. It only knows the small question being asked. It should not be aware of what the outer layer wants to do (sometimes known as “global state”). The outer layer has to decide how to handle this problem. If the outer layer believes this kind of edit should never show up under normal conditions, then it could show an error message. Something like “<em>Error: Unsupported namespace selection.</em>”</p>



<p>Alternatively, the canundrum can be avoided by structuring the program differently. The outer layer could ask a different question instead. A question that cannot fail. A question that leaves room for unexpected outcomes. Such as “<em>Does namespace X have a talk space?</em>”, instead of “<em>I need the talk space of X, what is it?</em>”. The outer layer then recognises that the question can be answered with “No”, and could then have logic for displaying those contributions in a different way.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2019/to-throw-or-not/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20To%20throw%20or%20not%20to%20throw%2C%20that%20is%20the%20question&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2019%2Fto-throw-or-not%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Tomorrow, may be sooner than you think</title>
		<link>https://timotijhof.net/posts/2019/tomorrow-may-be-sooner/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Sat, 07 Dec 2019 12:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[Engineering stories]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2019/tomorrow-may-be-sooner</guid>

					<description><![CDATA[These are short stories from bug hunts and incident investigations at Wikipedia. Impact After developers submit code to Gerrit, they eagerly await the result from Jenkins, an automated test runner. Every day during the 15 minute window before 5 PM in San Francisco, code changes submitted for code review would have mysteriously failing tests. Jenkins…]]></description>
										<content:encoded><![CDATA[
<p>These are short stories from bug hunts and incident investigations at Wikipedia.</p>



<span id="more-179"></span>



<h2 class="wp-block-heading" id="impact">Impact</h2>



<p>After developers submit code to Gerrit, they eagerly await the result from Jenkins, an automated test runner.</p>



<p>Every day during the 15 minute window before 5 PM in San Francisco, code changes submitted for code review would have mysteriously failing tests. Jenkins would wrongly inform developers that their proposed changes cause a problem with the MergeHistory feature of MediaWiki.</p>



<h2 class="wp-block-heading" id="background">Background</h2>



<p>The test in question assumed that it would finish by “<em>tomorrow</em>”. At first glance, it seems fair to assume that by tomorrow, a given test will have finished. We know our test suite generally only take a few minutes to run (with a time limit of 30 minutes, to ensure tests report back even if they are stuck).</p>



<h2 class="wp-block-heading" id="investigation">Investigation</h2>



<p>Unfortunately…, the <code>strtotime</code> utility function in PHP, does not interpret “tomorrow” as “this time tomorrow”.</p>



<p>Rather, it takes it to mean “the start of tomorrow”. In other words, the next strike of midnight!</p>



<p>For example, on 14 August 23:59:59, <code>strtotime("tomorrow")</code> would evaluate to a timestamp merely one second into the future — 15 August 00:00:00.</p>



<p>This meant that whenever a test started running shortly before midnight, it would fail. The test server uses UTC as its timezone. As such, a test suite that started less than 15 minutes before 5 PM in San Francisco (which is midnight in UTC), it would mysteriously fail!</p>



<p>– <a rel="noopener" href="https://phabricator.wikimedia.org/T201976" target="_blank">Task T201976</a></p>



<p>– <a rel="noopener" href="https://gerrit.wikimedia.org/r/452873" target="_blank">Change 452873</a></p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="has-small-font-size">Originally published in the <a href="https://phabricator.wikimedia.org/phame/post/view/119/production_excellence_3_september_2018/">September 2018 edition</a> of the <a href="https://phabricator.wikimedia.org/phame/blog/view/1/">Production Excellence</a> newsletter at Wikimedia. This article is an expanded version of that.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2019/tomorrow-may-be-sooner/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20Tomorrow%2C%20may%20be%20sooner%20than%20you%20think&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2019%2Ftomorrow-may-be-sooner%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Missing partitions, disappearing audio players, and extreme packet loss</title>
		<link>https://timotijhof.net/posts/2019/wikipedia-stories-1/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Fri, 06 Dec 2019 12:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[Engineering stories]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2019/wikipedia-stories-1</guid>

					<description><![CDATA[These are short stories from bug hunts and incident investigations at Wikipedia. New database partition A user reported a timeout error for certain queries from the Public log viewer on commons.wikimedia.org. Database administrator Manuel Aróstegui investigated the underlying query and found that it was slow (and timing out) due to one of the database replicas…]]></description>
										<content:encoded><![CDATA[
<p>These are short stories from bug hunts and incident investigations at Wikipedia.</p>



<span id="more-161"></span>



<ul class="wp-block-list">
<li><a href="#new-database-partition">New database partition</a></li>



<li><a href="#mystery-of-disappearing-audio-players">Mystery of Disappearing Audio Players</a></li>



<li><a href="#losing-packets-on-the-way-to-logstash">Losing packets on the way to Logstash</a></li>
</ul>



<h2 class="wp-block-heading" id="new-database-partition">New database partition</h2>



<p>A user reported a timeout error for certain queries from the <a target="_blank" href="https://en.wikipedia.org/wiki/Special:Log" rel="noopener">Public log viewer</a> on commons.wikimedia.org.</p>



<p>Database administrator <a target="_blank" href="https://phabricator.wikimedia.org/p/Marostegui/" rel="noopener">Manuel Aróstegui</a> investigated the underlying query and found that it was slow (and timing out) due to one of the database replicas having an unpartitioned <code>logging</code> table.</p>



<h3 class="wp-block-heading" id="background">Background</h3>



<p>Our database servers carry labels that the MediaWiki application can ask for, alongside a SQL query. This allows replicas to be finely tuned to specific workloads. In particular, when two optimisations strategies are mutually exclusive. The labelling system allows both strategies to be applied, on different database servers. MediaWiki then decides which one is most important for that query.</p>



<p>Partioning the <a target="_blank" href="https://www.mediawiki.org/wiki/Manual:Logging_table" rel="noopener">MediaWiki <code>logging</code> table</a> is one such optimisation strategy. For queries in the Public logs that focus on actions by a specific user, we route the query to replicas where the <code>logging</code> table is partioned by user ID. This is in addition to a regular index on the user ID column for that table, which we have on all replicas.</p>



<h3 class="wp-block-heading" id="action">Action</h3>



<p>As first response, the faulty server was taken out of rotation. Re-partitioning was completed later that day.</p>



<p>– <a rel="noopener" href="https://phabricator.wikimedia.org/T199790" target="_blank">Task T199790</a></p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading" id="mystery-of-disappearing-audio-players">Mystery of Disappearing Audio Players</h2>



<p>Routine triaging of PHP errors led to discovery of the following:</p>


<pre class="wp-block-code"><span><code class="hljs language-plaintext">&#91;PHP Notice] Undefined index: 'c9ndx98du2.ogg'
at mediawiki/extensions/Score/includes/Score.php:L507</code></span></pre>


<h3 class="wp-block-heading" id="background-1">Background</h3>



<p>The <a target="_blank" href="https://www.mediawiki.org/wiki/Extension:Score" rel="noopener">Score extension</a> for MediaWiki provides a way to produce image and audio files from music notation (backed by LilyPond). The extension registers a wikitext tag that allows editors to create and embed music on Wikipedia pages.</p>



<p>The “Undefined index” warning from PHP happens when code tries to access a non-existent key from an associative array. For example: <code>$x = array( 'foo' => 1 ); return $x['bar'];</code>. When this happens, the PHP engine implicitly returns the <code>null</code> value. PHP also emits a notice to the error log channel. We feed that into Logstash and Kibana.</p>



<p>“PHP Notice” errors are not uncommon and can sometimes even cause (by accident) the correct behaviour. For example, if the code involves a condition like <code>if ($x['bar']) { … } else { … }</code>. Our error will produce the <code>null</code> value, which casts to <code>false</code>, and we proceed to the <code>else</code> branch. If the <code>bar</code> key is meant to be optional here, and if the <code>else</code> branch correctly handles the scenario for when it is not set, then this code might already behave correctly. A simple fix would then be to expand the condition to first assert that the key exists. Thus preventing the warning message, but otherwise behaving the same.</p>



<h3 class="wp-block-heading" id="action-1">Action</h3>



<p>Back to our investigation; The response was led by volunteer <a target="_blank" href="https://phabricator.wikimedia.org/p/Ebe123/" rel="noopener">@Ebe123</a> who is also the lead maintainer of the Score extension.</p>



<p>First, we did some exploratory testing to see if there were any defects we could find with the feature. On the various Wikipedia articles we tested it on, the audio player seemed to work fine.</p>



<p>Back to the error we found on the backend, we traced it to the code responsible for adding the “duration” metadata (used by the audio player). The code for computing this duration stores it in an array, and later reads it. However, these two functions were not using the same logic to create their array key. As such, it was unable to find the duration, and did not add it to the audio player. While this is bad, it appeared to not affect the audio player. It worked and even displayed the correct duration in the interface!</p>



<p>Ebe123 wrote a patch that corrects the key string logic. The duration value is then correctly found in the array and passed on in the way the code originally intended.</p>



<p>During code review, we also looked at why this code existed in the first place (because the player appeared to work fine without it). The code was introduced several years ago in an attempt to fix a bug where the player loaded very slowly for some users. The story is that our multimedia framework needs the duration information before it can start playing back audio. And, for most file types, the framework is able to compute this on its own in the backend and hand it to the audio player ahead of time. However, handler did not support computing durations for files with the <code>audio/ogg</code> MIME-type (which the Score extension uses).</p>



<p>When no duration is given ahead of time, web browsers have a fallback strategy. They attempt to download the track regardless, wait for it to fully arrive, then look at how many seconds it contains audio for, and use that as the duration value. This means the audio would not start playing until <em>after</em> it was fully downloaded. No streaming!</p>



<p>In our isolated testing we were playing relatively short audio clips using a high-bandwidth connection. Thus, the issue was not obvious to us.</p>



<p>We also found a separate bug report from a few months earlier where several users reported that when pressing “Play” the player would dissappear for 5-20 seconds before audio starts playing.</p>



<p>It all makes sense now.</p>



<p>– <a rel="noopener" href="https://phabricator.wikimedia.org/T200835" target="_blank">Task T200835</a>, <a rel="noopener" href="https://phabricator.wikimedia.org/T192550" target="_blank">Task T192550</a></p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading" id="losing-packets-on-the-way-to-logstash">Losing packets on the way to Logstash</h2>



<p>I noticed that for recent bug reports with Error IDs, I was unable to find the associated error report in Logstash. I could also reproduce this for bugs I had reported myself.</p>



<h3 class="wp-block-heading" id="background-2">Background</h3>



<p>In the event of an internal server error, MediaWiki sends a detailed error report to Logstash. MediaWiki then displays an error page to the user, where it mentions the “Error ID”.</p>



<h3 class="wp-block-heading" id="action-2">Action</h3>



<p><a rel="noopener" href="https://tstarling.com/blog/" target="_blank">Tim&nbsp;Starling</a> (Platform Architect at Wikimedia) started investigating. He created a new Grafana dashboard and the culprit was quickly identified. Over 3000 UDP packets were being dropped at the Logstash servers, every second. That’s over 90% of its total packets – lost!</p>



<p>As first mitigation, he rebooted the server, quadrupled the default receive buffer size (<code>net.core.rmem_default</code> in the Linux kernel) to 4MB, and rebooted it again.</p>



<div class="wp-block-group is-content-justification-space-between is-layout-flex wp-container-core-group-is-layout-e3bc7287 wp-block-group-is-layout-flex">
<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="939" height="493" src="https://timotijhof.net/wp-content/uploads/2018_augstories_1a_logstash_recv.png" alt="Rate of succesfull Logstash packet reception increased from 50 pps to 300 pps." class="wp-image-169" style="width:380px;height:200px"/></figure>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="931" height="490" src="https://timotijhof.net/wp-content/uploads/2018_augstories_1b_logstash_loss.png" alt="Rate of Logstash packet loss decreased from 1200 pps to 950 pps." class="wp-image-170" style="width:380px;height:200px"/></figure>
</div>



<p>The first reboot significantly improved throughput (from 10% success, to 25% success), but the receive buffer change didn’t have any positive effect and we were still dropping the remaining 75% of packets.</p>



<p>To recap, the buffer was now large enough to accomodate 3 seconds worth of messages which should be enough margin for Logstash to process it. Short spikes aside, it’s unlikely that allowing more stalling would help, because new packets are constantly added to the buffer as well.</p>



<p><a href="https://phabricator.wikimedia.org/p/fgiunchedi/" target="_blank" rel="noopener">Filippo Giunchedi</a> (Site Reliability Engineering team) jumped in and noticed that the <code>workers.pipeline</code> setting was explicitly set to <code>1</code>, thus allowing Logstash to only use a single thread to process all the messages. This was configured several years earlier (<a href="https://github.com/wikimedia/puppet/commit/011aa76f0af62c3d5160c9f5e821108323cc3f16" target="_blank" rel="noopener">commit</a>) to workaround a problem with the Logstash Multiline plugin; This plugin wasn’t thread-safe and would corrupt logs if activated across multiple threads.</p>



<p>Filippo determined we no longer needed this plugin, disabled it, and allowed the default <code>workers.pipeline</code> setting to take effect &#8211; which is to use the number of available CPU cores as the number of threads.</p>



<p>This, together with the 4MB receive buffer Kernel setting, dropped the packet loss rate back to zero.</p>



<p>– <a rel="noopener" href="https://phabricator.wikimedia.org/T200960" target="_blank">Task T200960</a>, <a rel="noopener" href="https://grafana.wikimedia.org/d/000000561/logstash" target="_blank">Grafana&nbsp;dashboard: Logstash</a></p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="has-small-font-size">Originally published in the <a href="https://lists.wikimedia.org/pipermail/wikitech-l/2018-August/090594.html">August 2018 edition</a> of the <a href="https://phabricator.wikimedia.org/phame/blog/view/1/">Production Excellence</a> newsletter at Wikimedia. This&nbsp;article is an expanded version of that.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2019/wikipedia-stories-1/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20Missing%20partitions%2C%20disappearing%20audio%20players%2C%20and%20extreme%20packet%20loss&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2019%2Fwikipedia-stories-1%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Wikipedia&#8217;s JavaScript initialisation on a budget</title>
		<link>https://timotijhof.net/posts/2019/wikipedia-javascript-on-budget/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Wed, 18 Sep 2019 12:00:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2019/wikipedia-javascript-on-budget</guid>

					<description><![CDATA[This week saw the conclusion of a project that I’ve been shepherding on and off since September of last year. The goal was for the initialisation of our asynchronous JavaScript pipeline (at the time, 36 kilobytes in size) to fit within a budget of 28 KB. The above graph shows the transfer size over time.…]]></description>
										<content:encoded><![CDATA[
<p>This week saw the conclusion of a project that I’ve been shepherding on and off since September of last year. The goal was for the initialisation of our asynchronous JavaScript pipeline (at the time, 36 kilobytes in size) to fit within a budget of 28 KB.</p>



<span id="more-149"></span>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="1190" height="740" src="https://timotijhof.net/wp-content/uploads/2019_wikipedia_figure1_chart.png" alt="Chart showing a decline in Startup manifest size from 36.2 kilobytes in 2018 to just under 28 KB in September 2019" class="wp-image-150" style="width:595px;height:370px" title="From 36.2 KB to 27.2 KB"/></figure>



<p>The above graph shows the transfer size over time. Sizes are after compression (i.e. the net bandwidth cost as perceived from a&nbsp;browser).</p>



<p>In total, the year-long effort is saving 4.3 terabytes a day of data bandwidth for our users’ page views.</p>



<h2 class="wp-block-heading" id="how-we-did-it">How we did it</h2>



<p>The startup manifest is a difficult payload to optimise. The vast majority of its code isn’t functional logic that can be optimised by traditional means. Rather, it is almost entirely made of pure data. The data is auto-generated by ResourceLoader and represents the registry of module bundles. (<a href="https://www.mediawiki.org/wiki/ResourceLoader/Architecture">ResourceLoader</a> is the delivery system Wikipedia uses for its JavaScript, CSS, interface text.)</p>



<p>This registry contains the metadata for all front-end features deployed on Wikipedia. It enumerates their name, currently deployed version, and their dependency relationships to other such bundles of loadable code.</p>



<p>I started by identifying code that was never used in practice (<a href="https://phabricator.wikimedia.org/T202154">task #202154</a>). This included picking up unfinished or forgotten software deprecations, and removing unused compatibility code for browsers that no longer passed our <a href="https://www.mediawiki.org/wiki/Compatibility#Browsers">Grade A</a> feature-test. I also wrote a <a href="https://wikitech.wikimedia.org/wiki/Performance/Guides/Frontend_performance_practices">document about Page load performance</a>. This document serves as reference material, enabling developers to understand the impact of various types of changes on one or more stages of the page load process.</p>



<h2 class="wp-block-heading" id="fewer-modules">Fewer modules</h2>



<p>Next was collaborating with the engineering teams here at Wikimedia Foundation and at Wikimedia Deutschland, to identify features that were using more modules than is necessary. For example, by bundling together parts of the same feature that are generally always downloaded together. Thus leading to fewer entry points to have metadata for in the ResourceLoader registry.</p>



<p>Some highlights:</p>



<ul class="wp-block-list">
<li>Editing product team (WMF):<br>The WikiEditor extension has 11 fewer modules now. Another 31 modules were removed in UploadWizard.</li>



<li>Language product team (WMF):<br>Combined 24 modules of the ContentTranslation software.</li>



<li>Reading product team (WMF):<br>Combined 25 modules in MobileFrontend.</li>



<li>Community Wishlist team (WMDE):<br>Removed 20 modules from the RevisionSlider and TwoColConflict features.</li>
</ul>



<p>Last but not least, there was the Wikidata client for Wikipedia. This was an epic journey of its own (<a href="https://phabricator.wikimedia.org/T203696">task&nbsp;#203696</a>). This feature originally had a whopping 248 distinct modules registered on Wikipedia page views. The magnificent efforts of Amir&nbsp;Sarabadani <strong>removed over 200 modules</strong>, bringing it down to 42 today.</p>



<p>The bar chart above shows small improvements throughout the year, all moving us closer to the goal. Two major drops stand out in particular. One is around two-thirds of the way, in the first week of August. This is when the aforementioned Wikidata improvement was deployed. The second drop is toward the end of the chart and happened this week – more about that below.</p>



<h2 class="wp-block-heading" id="less-metadata">Less metadata</h2>



<p>This week’s improvement was achieved by two holistic changes that organised the data in a smarter way overall.</p>



<p>First –&nbsp;The <a href="https://www.mediawiki.org/wiki/Extension:EventLogging">EventLogging</a> extension previously shipped its schema metadata as part the startup manifest. Roan Kattouw (<a href="https://phabricator.wikimedia.org/p/Catrope/">@Catrope</a>) refactored this mechanism to instead bundle the schema metadata together with the JavaScript code of the EventLogging client. This means the startup footprint of EventLogging was reduced by over 90%. That’s 2KB less metadata in the critical path! It also means that going forward, the startup cost for EventLogging no longer grows with each new event instrumentation. This clever bundling is powered by ResourceLoader’s new <a href="https://www.mediawiki.org/wiki/ResourceLoader/Package_modules">Package files</a> feature. This feature was expedited in February 2019 in part because of its potential to reduce the number of modules in our registry. Package Files make it super easy to combine generated data with JavaScript code in a single module bundle.</p>



<p>Second – We shrunk the average size for each entry in the registry overall (<a href="https://phabricator.wikimedia.org/T229245">task #229245</a>). The startup manifest contains two pieces of data for each module: Its name, and its version ID. This version ID previously required 7 bytes of data. After thinking through the mathemetical <a href="https://en.wikipedia.org/wiki/Birthday_problem">Birthday problem</a> in context of ResourceLoader, we decided that the probability spectrum for our version IDs can be safely reduced from 78 billion down to “only” 60 million. For more details see <a href="https://github.com/wikimedia/mediawiki/commit/9f516f1d3b6ab6a4f1bb7e385c93e4d9bccb46d7#diff-57e85f8b8063990fa5b0e2d2f0d25f8e">the code comments</a>, but in summary it means we’re saving 2 bytes for each of the 1100 modules still in the registry. Thus reducing the payload by another 2-3 KB.</p>



<p>Below is a close-up for the last few days (this is from synthetic monitoring, plotting the decompressed size):</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="2224" height="938" src="https://timotijhof.net/wp-content/uploads/2019_wikipedia_figure2_synth.png" alt="Line graph showing a sudden drop in Startup JS size from 55.6KB to 52.8KB" class="wp-image-152" title="From 55.6KB to 52.8KB (decompressed)"/></figure>



<p>The change was detected in ResourceLoader’s synthetic monitoring. The above is captured from the <a href="https://grafana.wikimedia.org/d/BvWJlaDWk/startup-manifest-size?orgId=1&amp;from=1568439360000&amp;to=1568680200000">Startup manifest size&nbsp;dashboard</a> on our public Grafana instance, showing a <strong>2.8KB</strong> decrease in the uncompressed data stream.</p>



<p>With this week’s deployment, we’ve completed the goal of shrinking the startup manifest to under 28 KB. This cross-departmental and cross-organisational project reduced the startup manifest by <strong>9 KB</strong> overall (net bandwidth, after compression); From 36.2 kilobytes one year ago, down to 27.2 KB today.</p>



<p>We have around 363,000 page views a minute in total on Wikipedia and sister projects. That’s 21.8M an hour, or 523 million every&nbsp;day (<a href="https://stats.wikimedia.org/v2/#/all-projects/reading/total-page-views/normal%7Cbar%7C2-year%7Cagent~user%7Cmonthly">User pageview&nbsp;stats</a>). This week’s deployment saves around 1.4 terabytes a day. In total, the year-long effort is saving 4.3 terabytes a day of bandwidth on our users’ page views.</p>



<h2 class="wp-block-heading" id="whats-next">What’s next</h2>


<div class="wp-block-image">
<figure class="alignright size-full is-resized"><img loading="lazy" decoding="async" width="980" height="918" src="https://timotijhof.net/wp-content/uploads/2019_wikipedia_figure3_pie.png" alt="Percentage of bundle metadata size, by component. 26% is for MediaWiki core's bundles, 12% for ContentTranslation bundles, 7% for VisualEditor, 5% for Wikidata." class="wp-image-154" style="width:245px;height:230px" title="Percentage of bundle metadata size, by component"/></figure>
</div>


<p>It’s great to celebrate that Wikipedia’s startup payload now neatly fits into the target budget of 28 KB – chosen as the lowest multiple of 14KB we can fit within subsequent <a href="https://tylercipriani.com/blog/2016/09/25/the-14kb-in-the-tcp-initial-window/">bursts of Internet packets</a> to a web browser.</p>



<p>The challenge going forward will be to keep us there. Over the past year I’ve kept a very close eye (<a href="https://docs.google.com/document/d/1SESOADAH9phJTeLo4lqipAjYUMaLpGsQTAUqdgyZb4U/edit">spreadsheet</a>) on the startup manifest — to verify our progress, and to identify potential regressions. I’ve since automated this laborious process through a public <a href="https://grafana.wikimedia.org/d/BvWJlaDWk/startup-manifest-size">Grafana dashboard</a>.</p>



<p>We still have many more opportunities on that dashboard to improve bundling of our features, and (for Wikimedia’s Performance Team) to make it even easier to implement such bundling.</p>



<h2 class="wp-block-heading">Further reading</h2>



<ul class="wp-block-list">
<li><a href="https://performance.wikimedia.org/">Metrics &amp; Perf reports</a>, on performance.wikimedia.org</li>



<li><a href="https://www.mediawiki.org/wiki/ResourceLoader/Architecture">ResourceLoader Architecture</a>, on mediawiki.org.</li>



<li><a href="https://tylercipriani.com/blog/2016/09/25/the-14kb-in-the-tcp-initial-window/">The 14KB Initial Window</a>, by Tyler Cipriani.</li>
</ul>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="has-small-font-size">Originally published on <a href="https://techblog.wikimedia.org/2019/09/19/wikipedias-javascript-initialisation-on-a-budget/">techblog.wikimedia.org</a>.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2019/wikipedia-javascript-on-budget/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20Wikipedia%26%238217%3Bs%20JavaScript%20initialisation%20on%20a%20budget&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2019%2Fwikipedia-javascript-on-budget%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to protect yourself from npm</title>
		<link>https://timotijhof.net/posts/2019/protect-yourself-from-npm/</link>
					<comments>https://timotijhof.net/posts/2019/protect-yourself-from-npm/#comments</comments>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Thu, 12 Sep 2019 12:00:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Testing]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2019/protect-yourself-from-npm</guid>

					<description><![CDATA[What’s the worst that could happen after npm&#160;install? When you open an app or execute a program from the terminal, that program can do anything that you can do. In a nutshell: Imagine if your computer were to disappear in front of your eyes and re-appear in front of mine. Still open. Still unlocked. What…]]></description>
										<content:encoded><![CDATA[
<p>What’s the worst that could happen after <code>npm&nbsp;install</code>?</p>



<span id="more-117"></span>



<p>When you open an app or execute a program from the terminal, that program can do anything that you can do.</p>



<p><strong>In a nutshell</strong>: Imagine if your computer were to disappear in front of your eyes and re-appear in front of mine. Still open. Still unlocked. What could I do from this moment on? <em>That</em> is what an unknown program could do.</p>



<ol class="wp-block-list">
<li><a href="#what-is-atstake">What is at&nbsp;stake?</a></li>



<li><a href="#is-this-an-npmproblem">How does it compare to other package managers?</a></li>



<li><a href="#i-get-it-now-what-can-we-do-aboutit">What can you do about&nbsp;it?</a></li>
</ol>



<hr class="wp-block-separator has-alpha-channel-opacity"/>


<div class="wp-block-image">
<figure class="alignleft size-full is-resized"><img loading="lazy" decoding="async" width="1024" height="678" src="https://timotijhof.net/wp-content/uploads/2019_npm_BlueSkyCCTV13.jpg" alt="Surveillance cameras on a lamppost with a clear blue sky behind it." class="wp-image-121" style="width:400px;height:265px"/><figcaption class="wp-element-caption">Photo by&nbsp;<a href="https://commons.wikimedia.org/wiki/File:BlueSkyCCTV13.jpg#firstHeading" target="_blank" rel="noopener">Raysonho</a></figcaption></figure>
</div>


<p>Upon running <code>npm install</code>, you may be downloading and executing hundreds of programs.</p>



<p>Programs from nice people sometimes ask for your permission. This is because a developer choose to do so.</p>



<p>There may also be laws that could punish them if they get caught not doing so.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>What about programs of which the authors choose differently? Well, such program could do quite a bit.</p>



<ul class="wp-block-list">
<li>It could access any of your files, modify them, delete them, or upload them. This also applies to the internal files used by other applications.</li>



<li>It could install other programs in the background.</li>



<li>It could talk to other devices linked to your home network.</li>
</ul>



<h2 class="wp-block-heading" id="what-is-atstake">What is at&nbsp;stake</h2>



<p>Files you might not be thinking about:</p>



<ul class="wp-block-list">
<li>The cookies in your web browser.</li>



<li>Desktop applications. Chat history, password managers, todo lists, etc. They all use files to store the text and media you send or receive.</li>



<li>Digital media. Your photo albums, home videos, and voice memos.</li>



<li>SSH private keys, GPG key rings, and other access keys and encryption keys used by developers.</li>
</ul>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="1024" height="630" src="https://timotijhof.net/wp-content/uploads/2019_npm_FamilyComputer.jpg" alt="A red face in a white rectangle made of nanoblocks, resting on a silver Apple keyboard." class="wp-image-129"/><figcaption class="wp-element-caption">Photo by&nbsp;<a href="https://commons.wikimedia.org/wiki/File:Family_Computer_(6914313766).jpg#firstHeading" target="_blank" rel="noopener">DaraKero_F</a> / CC BY&nbsp;2.0</figcaption></figure>



<h2 class="wp-block-heading" id="browser-cookies">Browser cookies</h2>



<p>Browsers cookies make it so you’re immediately logged-in when you open a new tab for Gmail, or Twitter. An evil program can copy the browser’s cookies file and share it with the attacker.</p>



<p>They could then read any e-mail you’ve ever received or sent stored there. It could also delete any. (Got a backup?) They can naturally access future e-mails as well. Like the ones you get from “Forgot password” buttons. They could also hide any trace of these (e.g. filter rules).</p>



<p>This affects any website you use. Social network? Access to any post or DM — regardless of privacy setting. Company e-mail, Google Drive? That too.</p>



<h2 class="wp-block-heading" id="sleeper-programs">Sleeper programs</h2>



<p>The evil program may configure itself to always start in the background when you open your laptop. A new friend for life!</p>



<p>It could also add local command-line programs that wrap the popular <code>sudo</code> and <code>ssh</code> commands, to make them do a little extra behind the scenes. Next time you run <code>sudo &lt;something&gt;</code> to perform an administrator action and enter your password—you may have given away full system access. Deploying some code? Running <code>ssh cloud.someplace.special</code> might let the attacker tailgate along with you, opening one shell for itself and another for you.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="2048" height="1250" src="https://timotijhof.net/wp-content/uploads/2019_npm_LouisXIV.jpg" alt="Statue of King Louis XIV on a horse with a red blindfold over his eyes. Taken in Paris, France." class="wp-image-131"/><figcaption class="wp-element-caption">Photo by&nbsp;<a href="https://commons.wikimedia.org/wiki/File:Louis_XIV_with_a_red_mask,_Paris_20_August_2015.jpg#firstHeading" target="_blank" rel="noopener">BikerNormand</a> / CC BY-SA&nbsp;2.0</figcaption></figure>



<h2 class="wp-block-heading" id="local-webserver">Local web&nbsp;server</h2>



<p>These background programs could also affect you in a myriad of other ways. I won’t detail those today, except to mention they can keep a local web server running. Spotify and Zoom have been seen in the news doing <a href="https://medium.com/bugbountywriteup/zoom-zero-day-4-million-webcams-maybe-an-rce-just-get-them-to-visit-your-website-ac75c83f4ef5" target="_blank" rel="noopener">questionable things</a> with their local web servers.</p>



<h2 class="wp-block-heading" id="is-this-an-npmproblem">Is this an npm&nbsp;problem?</h2>



<p>Maybe. Technically these concerns apply to any method of executing unknown code. Running <code>npm install</code> isn’t very different from pasting a command like <code>curl url… | bash</code>. They both execute a downloaded program from your terminal. The difference is in user expectation.</p>



<p>Upon seeing the url and the <code>bash</code> invocation, you have a choice: Trust the publisher (the url), or trust the script (download, review, then decide whether to run). The result is generally predictable and without hidden dependencies.</p>



<h3 class="wp-block-heading" id="other-packagemanagers">Other package&nbsp;managers</h3>



<p>What about Debian (apt-get) or Homebrew? Like npm, code published there is unknown to most of us and hard to review. But, there is an important difference: Peer-review. These traditional repositories are curated by a central authority. You don’t have to trust the script or original authors of each package, so long as you trust the publishers and their curation process.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="1024" height="737" src="https://timotijhof.net/wp-content/uploads/2019_npm_Jupiter.jpg" alt="Earth is small compared to Jupiter. Jupiter is roughly 11 times larger." class="wp-image-133"/><figcaption class="wp-element-caption">Image by NASA / Public&nbsp;domain</figcaption></figure>



<h3 class="wp-block-heading" id="the-scale-has-changed-thegame">The scale has changed the&nbsp;game</h3>



<p>What about PyPI or Packagist (Composer)? These are like npm. Anyone can publish anything. There is however a difference in scale. PyPI has 194K projects. Packagist is host to 237K packages with 0.5 billion downloads a month. npm has over 1.3 million packages and 30 <em>billion</em> downloads a month. This makes it a much more popular target. <a href="https://pypi.org">[1]</a> <a href="https://packagist.org/statistics">[2]</a> <a href="https://blog.npmjs.org/post/180868064080/this-year-in-javascript-2018-in-review-and-npms">[3]</a></p>



<h3 class="wp-block-heading" id="dependency-graphs">Dependency graphs</h3>



<p>There is also a difference in habit: PyPI packages have 7 dependencies on average, with typically 1 indirect dependency. And, I would expect most dependencies there to be from authors the user has trusted before. <a href="https://snyk.io/blog/how-much-do-we-really-know-about-how-packages-behave-on-the-npm-registry/">[4]</a> Snyk.io published in April that the average npm package has a whopping 86 dependencies, with 4+ levels of indirect dependencies. <a href="https://snyk.io/blog/how-much-do-we-really-know-about-how-packages-behave-on-the-npm-registry/">[4]</a></p>



<p>The ESLint package has 118 npm dependencies <a href="http://npm.broofa.com/?q=eslint@6.3.0">[5]</a>. Eleventy, a popular static site generator, requires 555 dependencies (<a href="http://npm.broofa.com/?q=@11ty/eleventy@0.9.0">Explore dependency graph</a>). Each one of these may run <a href="https://blog.alexwendland.com/2018-11-20-npm-install-scripts-intro/">arbitrary shell commands</a> from the terminal both during the installation process, after later when using the tool.</p>



<h2 class="wp-block-heading" id="i-get-it-now-what-can-we-do-aboutit">I get it. Now, what can we do about&nbsp;it?</h2>



<p>There isn’t a magic bullet to make everything perfectly safe. But, there are a number of things you can do to reduce risk.</p>



<h3 class="wp-block-heading" id="isolation">Isolation</h3>



<p>For the past year, I’ve been using disposable Docker containers as a way to reduce the risk of compromise. It has controls for network access, and for which directories can be exposed. Docker isn’t a perfect safety net by any means, but it’s a step in the right direction.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="577" src="https://timotijhof.net/wp-content/uploads/2019_npm_Servers-1024x577.jpg" alt="Close-up view of network cables and LED lights on servers inside a Wikimedia Foundation data center." class="wp-image-135"/><figcaption class="wp-element-caption">Image by <a href="https://commons.wikimedia.org/wiki/Category:Wikimedia_servers_in_Carrollton#/media/File:Wikimedia_Foundation_Servers_2015-88.jpg" target="_blank" rel="noopener">Victor Grigas</a> / CC BY-SA&nbsp;3.0</figcaption></figure>



<p>My base image uses Debian and comes with Node.js, npm, and a few other utilities (such as headless browsers, for automated tests). I use a bash script to launch a temporary container, based on that image. It runs as the unprivileged <code>nobody</code> user, and mounts only the current working directory.</p>



<p>From there, I would run <code>npm install</code> and such. The only thing it interacts with is the source code and local <code>node_modules</code> directory for that specific project. It isn’t given access to any other Git repos, desktop apps, browser cookies, or private documents. And, once that terminal tab is closed, the container is destroyed.</p>



<p>I’ve published the script I use at <a href="https://github.com/wikimedia/fresh#start-of-content">github.com/wikimedia/fresh</a>. I don’t recommend using it outside Wikimedia, however. Create your own instead. The repository explains <a href="https://github.com/wikimedia/fresh/blob/19.10.1/Tutorial.md">how it works</a>.</p>



<p>Other options for isolating your environment:</p>



<ul class="wp-block-list">
<li>Speed and flexibility: Use <code>systemd-nspawn</code> or <code>chroot</code>. This takes more work to setup, but provides a faster environment than Docker. In terms of security it is comparable to Docker. Read more about systemd-nspan on <a href="https://wiki.archlinux.org/index.php/Systemd-nspawn">ArchWiki</a>.</li>



<li>Security and ease of use: Use a virtual machine (e.g. VirtualBox/Vagrant). This is more secure by default and offers a GUI for controlling what to expose. The downside is that VMs are significantly slower.</li>
</ul>



<h3 class="wp-block-heading" id="fewer-dependencies">Fewer dependencies</h3>



<p>Finally, you can reduce risk by reducing the number of packages you depend on in your projects (and then shrink-wrap them). Especially development dependencies, as these tend to be explicitly aimed at executing from the CLI.</p>



<p id="see-also">Question yourself and question others before introducing new dependencies. Perhaps even encourage maintainers of your favourite packages to <a href="https://github.com/qunitjs/qunit/issues/1342" target="_blank" rel="noreferrer noopener">Reduce the size of their dependency graph</a>!</p>



<h2 class="wp-block-heading">Further reading</h2>



<ul class="wp-block-list">
<li><a href="https://cgbystrom.com/posts/deconstructing-spotifys-builtin-http-server/" target="_blank" rel="noreferrer noopener">Deconstructing Spotify’s local server</a> by Carl Byström (2013).</li>



<li><a href="https://www.idontplaydarts.com/2016/04/detecting-curl-pipe-bash-server-side/">Detect piped curl on the server-side</a>, Phil at idontplaydarts.com (2016).</li>



<li><a rel="noopener" href="https://snyk.io/blog/malicious-code-found-in-npm-package-event-stream/" target="_blank">Malicious code on npm</a>, Danny Grander, Snyk.io (2018).</li>



<li><a rel="noopener" href="https://blog.alexwendland.com/2018-11-20-npm-install-scripts-intro/" target="_blank">npm Install Hook Scripts</a>, Alex Wendland (2018).</li>



<li><a rel="noopener" href="https://jakearchibald.com/2018/when-packages-go-bad/" target="_blank">When packages go bad</a>, Jake Archibald (2018).</li>



<li><a href="https://scribe.rip/im-harvesting-credit-card-numbers-and-passwords-from-your-site-here-s-how-9a8cb347c5b5">I’m harvesting passwords from your site</a>, David Gilbertson (2018).</li>



<li><a href="https://scribe.rip/npm-package-permissions-an-idea-441a02902d9b">An idea for improving npm package permissions</a>, David Gilbertson (2018).</li>



<li><a rel="noopener" href="https://blog.mozilla.org/security/2019/10/09/iterm2-critical-issue-moss-audit/" target="_blank">Shell vulnerability in iTerm 2</a>, Tom Ritter, Mozilla Security (2019).</li>



<li><a href="https://scribe.rip/12-strange-things-that-can-happen-after-installing-an-npm-package-45de7fbf39f0">Strange things after installing an NPM package</a>, Vladimir Tikhonov (2019).</li>



<li><a href="https://news.sophos.com/en-us/2019/07/15/apple-quietly-removes-zooms-hidden-web-server-from-macs/" target="_blank" rel="noreferrer noopener">Apple removes Zoom’s server</a>, John E Dunn, Naked Security (2019).</li>



<li><a href="https://scribe.rip/zoom-zero-day-4-million-webcams-maybe-an-rce-just-get-them-to-visit-your-website-ac75c83f4ef5" target="_blank" rel="noreferrer noopener">Zoom 0-day: 4+ Million Webcams</a>, Jonathan Leitschuh (2019).</li>
</ul>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p><strong>Update (6 Aug 2020)</strong>: Check out <a href="https://sambleckley.com/writing/npm.html"><em>Worrying about the NPM ecosystem</em></a>, which takes a more scientific look at the problem.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p><strong>Update (2 September 2023)</strong>: Further reading I&#8217;ve accumulated on this topic:</p>



<ul class="wp-block-list">
<li><a href="https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2018/Participants/Tim_Starling">On cost of platform dependencies and services</a>, Tim Starling (2018).</li>



<li><a href="https://sambleckley.com/writing/npm.html">Worrying about the NPM ecosystem</a>, Sam&nbsp;Bleckley (2020).</li>



<li><a href="https://overreacted.io/npm-audit-broken-by-design/">npm audit: Broken by design</a>, Dan Abramov (2021).</li>



<li><a href="https://www.jackfranklin.co.uk/blog/check-in-your-node-dependencies/">Check-in your node dependencies</a>, Jack Franklin (2021).</li>



<li><a href="https://sizeof.cat/post/coa-npm-with-malicious-code/">Coa npm package got malicious code</a>, sizeof.cat (2021).</li>



<li><a href="https://socket.dev/blog/inside-node-modules">Inside Your node_modules Folder</a>, Feross Aboukhadijeh (2022).</li>



<li><a href="https://old.reddit.com/r/Python/comments/uwhzkj/i_think_the_ctx_package_on_pypi_has_been_hacked/">CTX package on PyPI has been hacked</a>, Reddit (2022).</li>



<li><a href="https://snarfed.org/2022-03-10_were-drowning-software-dependencies">We’re drowning in software dependencies</a>, Ryan Barrett (2022).</li>



<li><a href="https://blog.jim-nielsen.com/2023/software-crisis-dependencies/">“Out of the Software Crisis”: Dependencies</a>, Jim Nielsen (2023).</li>
</ul>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2019/protect-yourself-from-npm/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20How%20to%20protect%20yourself%20from%20npm&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2019%2Fprotect-yourself-from-npm%2F">Reply via email</a></p>]]></content:encoded>
					
					<wfw:commentRss>https://timotijhof.net/posts/2019/protect-yourself-from-npm/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Six years of BrowseHappy</title>
		<link>https://timotijhof.net/posts/2018/twitter-browsehappy/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Wed, 16 May 2018 12:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[Testing]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2018/twitter-browsehappy</guid>

					<description><![CDATA[Six years ago (in 2012), I was looking for a newsletter about browser releases. At the time, my motivation was to remember to regularly check and update the jQuery TestSwarm framework as needed for each new browser release. I found a simple overview at browsehappy.com, run by WordPress. Lacking RSS, I decided to simply check…]]></description>
										<content:encoded><![CDATA[
<p>Six years ago (in 2012), I was looking for a newsletter about browser releases. At the time, my motivation was to remember to regularly check and update the jQuery TestSwarm framework as needed for each new browser release. I found a simple overview at <a href="https://browsehappy.com/">browsehappy.com</a>, run by WordPress.</p>


<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="2405" height="1002" src="https://timotijhof.net/wp-content/uploads/2018_browsehappy_swarm.png" alt="Screenshot of swarm.jquery.org" class="wp-image-97" style="width:480px;height:200px"/></figure>
</div>


<p>Lacking RSS, I decided to simply check it on a regular basis, and created <a href="https://twitter.com/browsehappy" target="_blank" rel="noreferrer noopener">@browsehappy</a> on Twitter for others also looking to follow browser releases. I decided to pair it with links to relevant blog posts and documentation. </p>


<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="1862" height="1163" src="https://timotijhof.net/wp-content/uploads/2018_browsehappy_feed.png" alt="" class="wp-image-106" style="width:285px;height:178px"/></figure>
</div>


<p>Then, one day, Chrome’s version number was missing on Browse Happy’s homepage. Browse Happy is open-sourced at <a href="https://github.com/WordPress/browsehappy" target="_blank" rel="noreferrer noopener">https://github.com/WordPress/browsehappy</a>, which helped me find that its data actually comes from Wikipedia! Specifically, it scraped markup from article infoboxes, and extracted the version with some string operations.</p>



<p>Those string operations made assumptions about the wiki’s internal templates, which no longer held up after some edits to the Google Chrome article on Wikipedia. Thes data issues repeated itself a number of times…</p>


<div class="wp-block-image">
<figure class="alignright size-full is-resized"><img loading="lazy" decoding="async" width="311" height="220" src="https://timotijhof.net/wp-content/uploads/2018_browsehappy_wikidata.png" alt="" class="wp-image-111" style="width:156px;height:110px"/></figure>
</div>


<p>I <a href="https://github.com/WordPress/browsehappy/issues/37">helped them</a> to use <a href="https://wikidata.org" target="_blank" rel="noreferrer noopener">Wikidata.org</a> as the source for version numbers instead.</p>



<p>Many Wikipedia statements are now maintained on Wikidata, which are then queried and embedded directly in articles on Wikipedia.</p>



<p>Also… browser vendors have boosted their comm efforts a lot since 2012!</p>



<p>Opera started at <a href="https://blogs.opera.com/desktop/" target="_blank" rel="noreferrer noopener">blogs.opera.com/desktop</a><br>Edge started at <a href="https://blogs.windows.com/msedgedev/" target="_blank" rel="noreferrer noopener">blogs.windows.com/msedgedev</a><br>WebKit renewed their blog at <a href="https://webkit.org/blog/" target="_blank" rel="noreferrer noopener">webkit.org/blog</a><br>Mozilla and Chromium continue as always at <a href="https://hacks.mozilla.org" target="_blank" rel="noreferrer noopener">hacks.mozilla.org</a> and <a href="https://blog.chromium.org" target="_blank" rel="noreferrer noopener">blog.chromium.org</a>.</p>


<div class="wp-block-image">
<figure class="alignright size-full is-resized"><img loading="lazy" decoding="async" width="192" height="192" src="https://timotijhof.net/wp-content/uploads/2018_browsehappy_wpn.png" alt="" class="wp-image-113" style="width:96px;height:96px"/></figure>
</div>


<p>After three years of moderating the feed I took a break, and&#8230; never got back. TestSwarm no longer has its own user-agent parser, and for web-dev interests, much better newsletters sprung into existence. The main one for me is <a href="https://webplatform.news" target="_blank" rel="noreferrer noopener">webplatform.news</a>, by <a href="https://mastodon.social/@simevidas" target="_blank" rel="noreferrer noopener">@simevidas</a>.</p>



<p>Back to Browse Happy.… as part of digital spring cleaning, I decided I shouldn’t be owner @browsehappy on Twitter, especially given it’s now dormant. I’ve reached out to Automattic and transferred ownership.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Good bye <a href="https://twitter.com/browsehappy" target="_blank" rel="noreferrer noopener">@browsehappy</a>, and welcome <a href="https://twitter.com/Automattic" target="_blank" rel="noreferrer noopener">@Automattic</a>!</p>
</blockquote>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="has-small-font-size">Originally published on <a rel="noreferrer noopener" href="https://twitter.com/TimoTijhof/status/996770174140321792" target="_blank">twitter.com</a>.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2018/twitter-browsehappy/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20Six%20years%20of%20BrowseHappy&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2018%2Ftwitter-browsehappy%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Measuring Wikipedia page load times</title>
		<link>https://timotijhof.net/posts/2018/measuring-wikipedia-page-load-times/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Tue, 09 Jan 2018 12:00:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2018/measuring-wikipedia-page-load-times</guid>

					<description><![CDATA[This post shows how we measure and interpret load times on Wikipedia. It also explains what real-user metrics are, and how percentiles work. Navigation Timing When a browser loads a page, the page can include program code (JavaScript). This program will run inside the browser, alongside the page. This makes it possible for a page…]]></description>
										<content:encoded><![CDATA[
<p>This post shows how we measure and interpret load times on Wikipedia. It also explains what real-user metrics are, and how percentiles work.</p>



<span id="more-50"></span>



<h2 class="wp-block-heading" id="navigation-timing">Navigation Timing</h2>



<p>When a browser loads a page, the page can include program code (JavaScript). This program will run inside the browser, alongside the page. This makes it possible for a page to become dynamic (more than static text and images). When you search on Wikipedia.org, the suggestions that appear are made with JavaScript.</p>



<p>Browsers allow JavaScript to access some internal systems. One such system is Navigation Timing, which tracks how long each step takes. For example:</p>



<ul class="wp-block-list">
<li>How long to establish a connection to the server?</li>



<li>When did the response from the server start arriving?</li>



<li>When did the browser finish loading the page?</li>
</ul>



<h2 class="wp-block-heading" id="where-to-measure-real-user-and-synthetic">Where to measure: Real-user and synthetic</h2>



<p>There are two ways to measure performance: Real user monitoring, and synthetic testing. Both play an important role in understanding performance, and in detecting changes.</p>



<p>Synthetic testing can give high confidence in change detection. To detect changes, we use an automated mechanism to continually load a page and extract a result (eg. load time). When there is a difference between results, it likely means that our website changed. This assumes other factors remained constant in the test environment. Factors such as network latency, operating system, browser version, and so on.</p>



<p>This is good for understanding relative change. But synthetic testing does not measure the performance as perceived by users. For that, we need to collect measurements from the user’s browser.</p>



<p>Our JavaScript code reads the measurements from Navigation Timing, and sends them back to Wikipedia.org. This is real-user monitoring.</p>



<h2 class="wp-block-heading" id="how-to-measure-percentiles">How to measure: Percentiles</h2>



<p>Imagine 9 users each send a request: 5 users get a result in 5ms, 3 users get a result in 70ms, and for one user the result took 560ms. The average is 88ms. But, the average does not match anyone’s real experience. Let’s explore percentiles!</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="1910" height="334" src="https://timotijhof.net/wp-content/uploads/2018_measuring_figure_1_percentiles_intro.png" alt="Diagram showing 9 labels: 5ms, 5ms, 5ms, 5ms, 5ms, 70ms, 70ms, 70ms, and 560ms." class="wp-image-56"/></figure>



<p>The first number after the lower half (or middle) is the median (or <em>50th percentile</em>). Here, the median is 5ms. The first number after the lower 75% is 70ms (<em>75th percentile</em>). We can say that “for 75% of users, the service responded within 70ms”. That’s more useful.</p>



<p>When working on a service used by millions, we focus on the 99th percentile and the highest value (100th percentile). Using medians, or percentiles lower than 99%, would exclude many users. A problem with 1% of requests is a serious problem. To understand why, it is important to understand that, 1% of requests does not mean 1% of page views, or even 1% of users.</p>



<p>A typical Wikipedia pageview makes 20 requests to the server (1&nbsp;document, 3&nbsp;stylesheets, 4&nbsp;scripts, 12&nbsp;images). A typical user views 3&nbsp;pages during their session (on average).</p>



<p>This means our problem with 1% of requests, could affect 20% of pageviews (<code>20 requests x 1% = 20% = ⅕</code>). And 60% of users (<code>3 pages x 20 objects x 1% = 60% ≈ ⅔</code>). Even worse, over a long period of time, it is most likely that every user will experience the problem at least once. This is like rolling dice in a game. With a 16% (⅙) chance of rolling a six, if everyone keeps rolling, everyone should get a six eventually.</p>



<h2 class="wp-block-heading" id="real-user-variables">Real-user variables</h2>



<p>The previous section focussed on performance as measured inside our servers. These measurements start when our servers receive a request, and end once we have sent a response. This is <em>back-end</em> performance. In this context, our servers are the <em>back-end</em>, and the user’s device is the <em>front-end</em>.</p>



<p>It takes time for the request to travel from the user’s device to our systems (through cellular or WiFi radio waves, and through wires.) It also takes time for our response to travel back over similar networks to the user’s device. Once there, it takes even more time for the device’s operating system and browser to process and display the information. Measuring this is part of front-end performance.</p>



<p>Differences in back-end performance may affect all users. But, differences in front-end performance are influenced by factors we don’t control. Such as network quality, device hardware capability, browser, browser version, and more.</p>



<p>Even when we make no changes, the front-end measurements do change. Possible causes:</p>



<ul class="wp-block-list">
<li><strong>Network</strong>. ISPs and mobile network carriers can make changes that affect network performance. Existing users may switch carriers. New users come online with a different choice distribution of carrier than current users.</li>



<li><strong>Device</strong>. Operating system and browser vendors release upgrades that may affect page load performance. Existing users may switch browsers. New users may choose browsers or devices differently than current users.</li>



<li><strong>Content change</strong>. Especially for Wikipedia, the composition of an article may change at any moment.</li>



<li><strong>Content choice</strong>. Trends in news or social media may cause a shift towards different (kinds of) pages.</li>



<li><strong>Device choice</strong>. Users that own multiple devices may choose a different device to view the (same) content.</li>
</ul>



<p>The most likely cause for a sudden change in metrics is ourselves. Given our scale, the above factors usually change only for a small number of users at once. Or the change might happen slowly.</p>



<p>Yet, sometimes these external factors do cause a sudden change in metrics.</p>



<h2 class="wp-block-heading" id="case-in-point-mobile-safari-9">Case in point: Mobile Safari 9</h2>



<p>Shortly after Apple released iOS&nbsp;9 (in 2015), our global measurements were higher than before. We found this was due to Mobile Safari&nbsp;9 introducing support for Navigation Timing.</p>



<p>Before this event, our metrics only represented mobile users on Android. With iOS&nbsp;9, our data increased its scope to include Mobile&nbsp;Safari.</p>



<p>iOS&nbsp;9, or the networks of iOS&nbsp;9 users, were not significantly faster or slower than Android’s. The iOS upgrade affected our metrics because we now include an extra 15% of users – those on Mobile&nbsp;Safari.</p>



<p>Where desktop latency is around 330ms; mobile latency is around 520ms. Having more metrics from mobile, skewed the global metrics toward that category.</p>



<div class="wp-block-group is-content-justification-space-between is-layout-flex wp-container-core-group-is-layout-e3bc7287 wp-block-group-is-layout-flex"><div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2018_measuring_figure_2a_desktop.png" alt="Line graph for responseStart metric from desktop pageviews. Values range from 250ms to 450ms. Averaging around 330ms." class="wp-image-62" width="380" height="195"/></figure>
</div>


<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2018_measuring_figure_2b_mobile.png" alt="responseStart metric from mobile pageviews. Values range 350ms to 700ms. Averaging around 520ms." class="wp-image-63" width="380" height="193"/></figure>
</div>



<p>The above graphs plot the “75th percentile” of responseStart for desktop and mobile (from November 2015). We combine these metrics into one data point for each minute. The above graphs show data for one month. There is only enough space on the screen to have each point represent 3 hours. This works by taking the mean average of the per-minute values within each 3 hour block. While this provides a rough impression, this graph does not show the 75th percentile for November 2015. The next section explains why.</p>



<h2 class="wp-block-heading" id="average-of-percentiles">Average of percentiles</h2>



<p>Opinions vary on how bad it is to take the average of percentiles over time. But one thing is clear: The average of many 1-minute percentiles is not the percentile for those minutes. Every minute is different, and the number of values also varies each minute. To get the percentile for one hour, we need all values from that hour, not the percentile summary from each minute.</p>



<p>Below is an example with values from three minutes of time. Each value is the response time for one request. Within each minute, the values sort from low to high.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="2050" height="1869" src="https://timotijhof.net/wp-content/uploads/2018_measuring_figure_3_percentiles.png" alt="Diagram with four sections. Section One is for the minute 08:00 to 08:01, it has nine values with the middle value of 5ms marked as the median. Section Two is for 08:01 to 08:02 and contains five values, the median is 560ms. Section Three is 08:02 to 08:03, contains five values, the median of Section Three is 70ms. The last section, Section Four, is the combined diagram from 08:00 to 08:03 showing all nineteen values. The median is 70ms." class="wp-image-74"/></figure>



<p>The average of the three separate medians is 211ms. This is the result of <code>(5 + 560 + 70) / 3</code>. The&nbsp;actual median of these values combined, is 70ms.</p>



<h2 class="wp-block-heading" id="buckets">Buckets</h2>



<p>To compute the percentile over a large period, we must have all original values. But, it’s not efficient to store data about every visit to Wikipedia for a long time. We could not quickly compute percentiles either.</p>



<p>A different way of summarising data is by using buckets. We can create one bucket for each range of values. Then, when we process a time value, we only increment the counter for that bucket. When using a bucket in this way, it is also called a <em>histogram bin</em>.</p>



<p>Let’s process the same example values as before, but this time using buckets.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="2008" height="1852" src="https://timotijhof.net/wp-content/uploads/2018_measuring_figure_4a_buckets.png" alt="There are four buckets. Bucket A is for values below 11ms. Bucket B is for 11ms to 100ms. Bucket C is for 101ms to 1000ms. And Bucket D is for values above 1000ms. For each of the 19 values, we find the associated bucket and increase its counter." class="wp-image-77"/></figure>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://timotijhof.net/wp-content/uploads/2018_measuring_figure_4b_buckets_summary.png" alt="After processing all values, the counters are as follows. Bucket A holds 9, Bucket B holds 4, Bucket C holds 6, and Bucket D holds 0." class="wp-image-78" width="520" height="180"/></figure>



<p>Based on the total count (19) we know that the median (10th value) must be in bucket B, because bucket B contains values 10 to 13. And that the 75th percentile (15th value) must be in bucket C because it contains values 14 to 19.</p>



<p>We cannot know the exact millisecond value of the median, but we know the median must be between 11ms and 100ms. (This matches our previous calculation, which produced 70ms.)</p>



<p>When we use exact percentiles, our goal was for that percentile to be a certain number. For example, if our 75th percentile today is 560ms, this means for 75% of users a response takes 560ms or less. Our goal could be to reduce the 75th percentile to below 500ms.</p>



<p>When using buckets, goals are defined differently. In our example, 6 out of 19 responses (32%) are above 100ms (bucket C and D), and 13 of 19 (68%) are below 100ms (bucket A and B). Our goal could be to reduce the percentage of responses above 100ms. Or the opposite, to increase the percentage of responses within 100ms.</p>



<h2 class="wp-block-heading" id="rise-of-mobile">Rise of mobile</h2>



<p>Traffic trends are generally moving towards mobile. In fact, April 2017 was the first month where Wikimedia mobile pageviews reached 50% of all Wikimedia pageviews. And after June 2017, mobile traffic has stayed above 50%.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="2254" height="636" src="https://timotijhof.net/wp-content/uploads/2018_measuring_figure_5_platforms.png" alt="Bar chart showing percentages of mobile and desktop pageviews for each month in 2017. They mostly swing equal at around 50%. Looking closely, we see mobile first reaches 51% in April. In May it was below 50% again. But for June and every month since then mobile has remained above 50%. The peak was in October 2017, where mobile accounted for 59% of pageviews. The last month in the graph, November 2017 shows 53% of mobile pageviews." class="wp-image-81"/></figure>



<p>Global changes like this have a big impact on our measurements. This is the kind of change that drives us to rethink how we measure performance, and (more importantly) what we monitor.</p>



<h2 class="wp-block-heading" id="further-reading">Further reading</h2>



<ul class="wp-block-list">
<li><a href="https://www.mediawiki.org/wiki/Wikimedia_Performance_Team">Wikimedia Performance Team</a> – overview of our projects, tools, and data.</li>



<li><a href="https://www.w3.org/TR/navigation-timing-2/">Navigation Timing Level 2</a>, the W3C specification.</li>



<li><a href="https://www.infoq.com/presentations/latency-response-time">How Not To Measure Latency</a>, a tech talk by Gil Tene.</li>



<li><a href="https://howdns.works/">How DNS Works</a>, a comic explaining how computers use domain names.</li>



<li><a href="https://hpbn.co/">High Performance Browser Networking</a>, by Ilya Grigorik.</li>



<li><a href="https://en.wikipedia.org/wiki/Domain_Name_System">“Domain Name System (DNS)”</a>, at Wikipedia.</li>



<li><a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">“Transmission Control Protocol (TCP)”</a>, at Wikipedia.</li>



<li><a href="https://en.wikipedia.org/wiki/HTTPS">“HTTPS”</a>, at Wikipedia.</li>
</ul>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="has-small-font-size">Originally published on <a href="https://techblog.wikimedia.org/2018/01/09/measuring-wikipedia-page-load-times/" target="_blank" rel="noopener">techblog.wikimedia.org</a>. Also published in <a href="https://calendar.perfplanet.com/2018/measuring-wikipedia-page-load-times/" target="_blank" rel="noopener">Performance Calendar 2018</a>.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2018/measuring-wikipedia-page-load-times/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20Measuring%20Wikipedia%20page%20load%20times&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2018%2Fmeasuring-wikipedia-page-load-times%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>QUnit anti-patterns</title>
		<link>https://timotijhof.net/posts/2015/qunit-anti-patterns/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Fri, 13 Feb 2015 12:00:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Testing]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2015/qunit-anti-patterns</guid>

					<description><![CDATA[Today, I’d like to challenge the assert.ok and assert.not* methods. I believe they may’ve become an anti-pattern. assert.ok Using assert.ok() indicates one of two problems: The former necessitates improvement to the code being tested. The latter comes with two additional caveats: Common examples: assert.not Using assert.not*() indicates one of three problems: Common example: I’ve yet…]]></description>
										<content:encoded><![CDATA[
<p>Today, I’d like to challenge the <code>assert.ok</code> and <code>assert.not*</code> methods. I believe they may’ve become an anti-pattern.</p>



<span id="more-22"></span>



<h2 class="wp-block-heading" id="assertok">assert.ok</h2>



<p>Using <code>assert.ok()</code> indicates one of two problems:</p>



<ul class="wp-block-list">
<li>The software, or testing strategy, is unreliable. (Unsure what value to expect.)</li>



<li>The author is using it as shortcut for a proper comparison.</li>
</ul>



<p>The former necessitates improvement to the code being tested. The latter comes with two additional caveats:</p>



<ol class="wp-block-list">
<li>Less debug information. (Inaccurate actual/expected diff). Without an expected value provided, one can’t determine what’s wrong with the value.</li>



<li>Masking regressions. Even if the API being tested returns a proper boolean and <code>ok</code> is just a shortcut, the day the API breaks (e.g. returns a number, Promise, or other object) the test will not be able to catch this regression.</li>
</ol>



<p>Common examples:</p>


<pre class="wp-block-code"><span><code class="hljs language-javascript"><span class="hljs-comment">// Meh...</span>
assert.ok( result );
assert.ok( obj.fn );

<span class="hljs-comment">// Better.</span>
assert.equal( <span class="hljs-keyword">typeof</span> obj.fn, <span class="hljs-string">'function'</span> );
assert.strictEqual( result, <span class="hljs-literal">true</span> );</code></span></pre>


<h2 class="wp-block-heading" id="assertnot">assert.not</h2>



<p>Using <code>assert.not*()</code> indicates one of three problems:</p>



<ul class="wp-block-list">
<li>The software is unreliable. (Value is indeterministic.)</li>



<li>The test uses an unreliable environment. (E.g. the input data is dynamic or variable, insufficient isolation or mocking.)</li>



<li>The author is using it as shortcut for a proper comparison.</li>
</ul>



<p>Common example:</p>


<pre class="wp-block-code"><span><code class="hljs language-javascript"><span class="hljs-keyword">var</span> index = list.indexOf( item );

<span class="hljs-comment">// Meh...</span>
assert.notEqual( index, <span class="hljs-number">-1</span> );

<span class="hljs-comment">// Better.</span>
assert.equal( index, <span class="hljs-number">2</span> );

<span class="hljs-comment">// Even better?</span>
assert.propEqual( list, &#91;
  <span class="hljs-string">'foo'</span>,
  <span class="hljs-string">'bar'</span>,
] );</code></span></pre>


<p>I’ve yet to see the first use of these assert methods that wouldn’t be improved by writing it a different way. I admit there are limited scenarios where <a target="_blank" href="https://api.qunitjs.com/assert/notEqual" rel="noopener"><code>assert.notEqual</code></a> can’t be avoided in the short-term, for example when the intent is to detect a difference between two unpredictable return values.</p>



<p>When calling a method such as <code>Math.random()</code> twice, one could use <code>notEqual</code> to assert the two return values differ. I still have my doubts about the value of such test, though. It’ll certainly be annoying when it randomly does produce the same value twice and cause a test failure. In the mission of test coverage, my recommendation would be to instead assert that calling the method did not throw an exception, and perhaps assert the type and length of the return value, without comparing the string content.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="has-small-font-size">Originally published on <a href="https://codepen.io/Krinkle/post/qunit-anti-patterns" target="_blank" rel="noopener">codepen.io</a>.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2015/qunit-anti-patterns/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20QUnit%20anti-patterns&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2015%2Fqunit-anti-patterns%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>PhantomJS for CI (anno 2014)</title>
		<link>https://timotijhof.net/posts/2014/phantomjs-for-ci/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Fri, 03 Oct 2014 12:00:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Testing]]></category>
		<guid isPermaLink="false">https://timotijhof.net/posts/2014/phantomjs-for-ci</guid>

					<description><![CDATA[How did Apple create Safari, and what is PhantomJS? Safari In January 2003 Apple announced Safari, their new web browser for Mac. The Safari team had just spent 2002 building Safari atop KHTML and KJS, the KDE layout and javascript engines developed for Konqueror. The Safari team kept the codebase somewhat modular. This allowed Apple-branding…]]></description>
										<content:encoded><![CDATA[
<p>How did Apple create Safari, and what is PhantomJS?</p>



<span id="more-1"></span>



<h2 class="wp-block-heading">Safari</h2>



<p>In January 2003 Apple announced Safari, their new web browser for Mac.<sup id="fnr1" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2014/phantomjs-for-ci/#fn1" title="Jump to footnote 1">[1]</a></sup> The Safari team had just spent 2002 building Safari atop KHTML and KJS,<sup id="fnr2" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2014/phantomjs-for-ci/#fn2" title="Jump to footnote 2">[2]</a></sup> the KDE layout and javascript engines developed for Konqueror. The Safari team kept the codebase somewhat modular. This allowed Apple-branding and other proprietary features to stay separate whilst also having a sustainable open-source project (WebKit) that is standalone and compilable into a fully functional GUI application. The Mac OS version of WebKit is composed of WebCore and JavaScriptCore – the frameworks that encapsulate the OSX ports of KHTML and KJS respectively. Apple developed the JavaScriptCore library previously for use in Sherlock.<sup id="fnr3" class="footnote"><a rel="footnote" role="doc-noteref" href="https://timotijhof.net/posts/2014/phantomjs-for-ci/#fn3" title="Jump to footnote 3">[3]</a></sup></p>



<h2 class="wp-block-heading">Chromium</h2>



<p>In 2008, Google introduced Chrome and started the open-source project Chromium. Chromium was composed of WebKit’s <strong>WebCore</strong> and the <strong>V8</strong> javascript engine (instead of JavaScriptCore). Google later forked WebCore into <strong>Blink</strong> in 2013, thus abandoning any upstream connection with WebKit.</p>



<p>While Chromium is a single code-base with bindings for multiple platforms, WebKit is not. Instead, WebKit is based around the concept of ports.</p>



<p>These ports are manually kept in sync. Some maintained by third parties (e.g. not by webkit.org or Apple). Some ports are better than others. “WebKit”, as such, has also become an abstract API, rather than just a framework.</p>



<h2 class="wp-block-heading">WebKit</h2>



<p>A few popular ports:</p>



<ul class="wp-block-list"><li>Safari for Mac.</li><li>Mobile Safari for iOS.</li><li>Safari for Windows (abandoned).</li><li>QtWebKit (by Nokia; due to it being implemented atop Qt, it works on Mac/Linux/Windows).</li><li>Android browser (abandoned, uses Chromium now).</li><li>Chromium (abandoned, uses Blink now).</li><li>WebKitGTK+.</li></ul>



<p>WebKit itself doesn’t do much when it comes to network, GPU, javascript, or text rendering. Those are not “WebKit”. Each port binds those to something present in the OS &#8211; or another application layer. E.g. QtWebKit defers to Qt, which in turn binds to the platform.</p>



<h2 class="wp-block-heading">PhantomJS</h2>



<p>PhantomJS is a headless browser using the <strong>QtWebKit</strong> engine at its core.</p>



<p>The current release cycle of PhantomJS (1.9.x) is based on Qt 4.8.5, which bundles QtWebKit 2.2.4, which was branched off of upstream WebKit in May 2011. Due to the many layers in between, it will take a long time for PhantomJS to get anywhere near the feature-set of current Safari 8. PhantomJS by design is nothing like Safari but, if anything, it is probably like an alpha version (branched from SVN trunk) of Safari 4. Which is why, contrary to Safari 5.0, PhantomJS has only partial support for ES5.</p>



<p>Chromium has its abstraction layer at a higher level (platform independent). When run headless, it is exactly like an actual instance of Chrome on the same platform. When used in a virtual machine on a remote server, one doesn’t even need to be “headless”. We can use regular Chromium (under Xvfb). In theory the visual rendering through Xvfb and VM hypervisor could be different, however.</p>



<h2 class="wp-block-heading">Further reading</h2>



<ul class="wp-block-list"><li><a href="https://en.wikipedia.org/wiki/Konqueror">Konqueror</a> on Wikipedia</li><li><a href="https://en.wikipedia.org/wiki/Safari_(web_browser)">Safari</a> on Wikipedia</li><li><a href="https://en.wikipedia.org/wiki/WebKit">WebKit</a> on Wikipedia</li><li><a href="https://en.wikipedia.org/wiki/Sherlock_(software)">Sherlock</a> on Wikipedia</li><li><a href="https://en.wikipedia.org/wiki/V8_(JavaScript_engine)">V8</a> on Wikipedia</li><li><a href="https://en.wikipedia.org/wiki/Blink_(layout_engine)">Blink</a> on Wikipedia</li><li><a href="http://phantomjs.org/" target="_blank" rel="noopener">phantomjs.org</a></li><li><a href="https://www.paulirish.com/2013/webkit-for-developers/" target="_blank" rel="noopener">WebKit for Developers</a> by Paul Irish</li></ul>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p><strong>Update (September 2018)</strong>: I recently read <em><a rel="noopener" href="http://creativeselection.io/" target="_blank">Creative Selection</a></em>, which talks about the engineering choices behind Safari and iPhone, WebKit&#8217;s roots in KDE, how certain iPhone features came to be, and the role of Steve Jobs day-to-day. It is written by Ken Kocienda, an engineer who worked on both projects. The book was a quick read.</p>



<p>I previously read <em>Steve Jobs</em> by Walter Isaacson. The biography was interesting, but it didn’t cover much of Apple’s internal practices. <em>Creative Selection</em> covers this gap.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="has-small-font-size">Originally published on <a href="https://codepen.io/Krinkle/post/phantomjs-anno-2014" target="_blank" rel="noopener">codepen.io</a>.</p>

<hr><div class="footnotes" role="doc-endnotes">Footnotes:<ol><li id="fn1" role="doc-endnote"><a href="https://dot.kde.org/2003/01/08/apple-announces-new-safari-browser" target="_blank" rel="noopener">Apple Announces New Safari Browser</a> (2003), kde.org. <a href="#fnr1" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn2" role="doc-endnote"><a href="http://lists.kde.org/?l=kfm-devel&amp;m=104197092318639&amp;w=2" target="_blank" rel="noopener">Greetings from the Safari team</a> (2003), Don Melton (Apple), lists.kde.org. <a href="#fnr2" role="doc-backlink" title="Jump back">↩︎</a></li><li id="fn3" role="doc-endnote"><a href="https://web.archive.org/web/20070310215550/http://www.opendarwin.org/pipermail/kde-darwin/2002-June/000034.html" target="_blank" rel="noopener">JavaScriptCore: Apple&#8217;s JavaScript framework based on KJS</a> (2003), Maciej Stachowiak (Apple), opendarwin.org list. <a href="#fnr3" role="doc-backlink" title="Jump back">↩︎</a></li></ol></div><hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2014/phantomjs-for-ci/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20PhantomJS%20for%20CI%20%28anno%202014%29&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2014%2Fphantomjs-for-ci%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>The word “rebuke”</title>
		<link>https://timotijhof.net/posts/2013/the-word-rebuke/</link>
		
		<dc:creator><![CDATA[Timo Tijhof]]></dc:creator>
		<pubDate>Wed, 18 Dec 2013 12:00:00 +0000</pubDate>
				<category><![CDATA[Minor]]></category>
		<category><![CDATA[English language]]></category>
		<guid isPermaLink="false">https://timotijhof.net/?p=27</guid>

					<description><![CDATA[re·buke verb express sharp disapproval or criticism of (someone) because of their behavior or actions “she had rebuked him for drinking too much“ “the judge publicly rebuked the jury“ noun an expression of sharp disapproval or criticism “he hadn’t meant it as a rebuke, but Neil flinched“ (from the&#160;Oxford English Dictionary) I ran into the…]]></description>
										<content:encoded><![CDATA[<blockquote class="wp-block-quote quote-snippet">
<p><strong>re·buke</strong></p>
<p><em>verb</em></p>
<ol>
<li>
<p>express sharp disapproval or criticism of (someone) because of their behavior or actions</p>
<p><em>“she had <strong>rebuked</strong> him for drinking too much“</em></p>
<p><em>“the judge publicly <strong>rebuked</strong> the jury“</em></p>
</li>
</ol>
<p><em>noun</em></p>
<ol>
<li>
<p>an expression of sharp disapproval or criticism</p>
<p><em>“he hadn’t meant it as a <strong>rebuke</strong>, but Neil flinched“</em></p>
</li>
</ol>
</blockquote>


<p>(from the&nbsp;<a target="_blank" href="http://www.oxforddictionaries.com/definition/english/rebuke" rel="noopener">Oxford English Dictionary</a>)</p>



<p>I ran into the word whilst watching an episode of&nbsp;<a target="_blank" href="https://en.wikipedia.org/wiki/Elementary_(TV_series)" rel="noopener">Elementary</a>.</p>



<p>The scene continued to feature more rich language.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><em>Holmes</em>: I’ve given further consideration to your rebuke regarding my capacity for niceness.</p>



<p><em>Watson</em>: I didn’t mean it as a rebuke. I was trying to have a conversation.</p>



<p><em>Holmes</em>: Either way, you have a point… There is unquestionably a certain social utility to being polite. To maintaining an awareness of other people’s sensitivities. To exhibiting all the traits that might commonly be grouped under the heading “nice”.</p>



<p><em>Watson</em>: I think you’ll be surprised how easy it is to earn that designation.</p>



<p><em>Holmes</em>: No. I am not a nice man. It’s important that you understand that.</p>



<p><em>[..]</em></p>



<p><em>Holmes</em>: There is not a warmer, kinder me waiting to be coaxed out into the light. I am&nbsp;acerbic. I can be cruel. It’s who I am; right to the bottom. I’m neither proud of this, nor ashamed of it. It simply is.</p>
</blockquote>



<p>Having lines like these is actually not uncommon for the Holmes character and is one of the reasons I enjoy the show so much. Short musings and rants containing rich language happen at regular intervals throughout the series’ episodes.</p>



<p>My compliments to the writers of the show for producing a showpiece for the English language. It is a pleasure to be reminded of these words and even more so to learn about new ones.</p>
<hr/><p>This post appeared on <a href="https://timotijhof.net/posts/2013/the-word-rebuke/">timotijhof.net</a>. <a target="_blank" href="mailto:hello@timotijhof.net?subject=RE%3A%20The%20word%20%E2%80%9Crebuke%E2%80%9D&body=%0A%0A%0APermalink%3A%20https%3A%2F%2Ftimotijhof.net%2Fposts%2F2013%2Fthe-word-rebuke%2F">Reply via email</a></p>]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
