Timo Tijhof

John Cleese on Creativity (Transcript)

Timo Tijhof — Sat, 14 Feb 2026 12:00:00 +0000

The below is transcribed from a 1991 talk by John Cleese titled Creativity in Management. I encourage you to watch the 30-minute recording on YouTube. The delivery is hilarious with great comedic timing that my transcript can’t begin to do justice. I edited the transcript for brevity, and added headings and links.

This speech was given by John Cleese to an international audience linked by satellite at the Grosvenor House Hotel London, 23rd January 1991.

What creativity isn’t

A couple of years ago I got very excited because a friend of mine, who runs the psychology department at Sussex University, Brian Bates, showed me some research on creativity done at Berkeley in the 70s by a brilliant psychologist called Donald MacKinnon, which seemed to confirm in the most impressively scientific way: all the vague observations and intuitions that I’d have over the years. […]

The reason why it is futile for me to talk about creativity, is that it simply cannot be explained. It’s like Mozart’s music, or Van Gogh’s painting. It is literally inexplicable.

Freud, who analysed practically everything else, repeatedly denied that psychoanalysis could shed any light whatsoever on the mysteries of creativity. Brian Bates wrote to me recently: “Most of the best research on creativity was done in the 60s and 70s with a quite dramatic drop-off in quantity after then”, largely, I suspect, because researchers began to feel that they had reached the limits of what science could discover about it.

The only thing from the research that I could tell you about how to be creative, is the sort of childhood that you should have had, which is of limited help to you at this point of your lives.

However there is one negative thing that I can say, and it’s negative because it’s easier to say what creativity isn’t. A bit like the sculptor who, when asked how he had sculpted a very fine elephant, explained that he’d taken a big block of marble and then knocked away all the bits that didn’t look like an elephant. Now here’s the negative thing:

Creativity is not a talent. It is not a talent. It is a way of operating. […]

When I say “a way of operating”, what I mean is this: Creativity is not an ability that you either have or do not have. It is […] absolutely unrelated to IQ (provided you’re intelligent above certain minimal level that is).

MacKinnon showed in investigating scientists, architects, engineers, and writers, that those regarded by their peers as “most creative” were in no way whatsoever different in IQ from their less creative colleagues.

So in what way were they different?

Open and closed mode

MacKinnon showed that the most creative had simply acquired a facility for getting themselves into a particular mood, a way of operating, which allowed their natural creativity to function. MacKinnon described this particular facility as an ability to play. He described the most creative, when in this mood, as being childlike. They were able to play with ideas to explore them, not for any immediate practical purpose, but just for enjoyment. Play for its own sake.

I’m working at the moment with Dr. Robin Skynner on a successor to our psychiatry book Families and How to Survive Them. We’re comparing the ways in which psychologically healthy families function, and the ways in which the most successful corporations and organisations function. We became fascinated by the fact that we can usefully describe the way in which people function at work in terms of two modes: open and close. Creativity is not possible in the closed mode. […]

By the closed mode I mean the mode that we are in most of the time when we’re at work. We have inside us a feeling that there’s lots to be done, and we have to get on with it if we’re gonna get through it all. It’s an active, probably slightly anxious, mode. Although the anxiety can be exciting and pleasurable. It’s a mode in which we’re probably a little impatient, if only with ourselves. It has a little tension in it, not much humour, it’s a mode in which we’re very purposeful, and it’s a mode in which we can get very stressed and even a bit manic, but not creative.

By contrast the open mode is a relaxed, expansive, less purposeful, mode in which we’re probably more contemplative, more inclined to humour (which always accompanies a wider perspective), and consequently more playful. It’s a mood in which curiosity for its own sake can operate, because we’re not under pressure to get a specific thing done quickly. We can play. And that is what allows natural creativity to surface. Let me give you an example of what I mean.

Discovery of penicillin

When Alexander Fleming had the thought that led to the discovery of penicillin, he must have been in the open mode. The previous day, he’d arranged a number of dishes so that culture would grow upon them. On the day in question, he glanced at the dishes, and he discovered that on one of them, no culture had appeared. If he’d been in the closed mode, he would have been so focused upon his need for dishes with cultures grown upon them, that when he saw that one dish was of no use to him for that purpose, he would quite simply have thrown it away.

Thank goodness, he was in the open mode, so he became curious about why the culture had not grown on this particular dish. That curiosity, as the world knows, led him […] to penicillin.

In the closed mode, an uncultured dish is an irrelevance. In the open mode, it’s a clue. One more example:

Hitchcock

One of Alfred Hitchcock’s regular co-writers has described working with him on screenplays. He says:

When we came up against a block, and our discussions became very heated and intense, Hitchcock would suddenly stop and tell a story that had nothing to do with the work at hand. At first, I was almost outraged.

I discovered that he did this intentionally. He mistrusted working under pressure. He would say “We’re pressing, we’re pressing, we’re working too hard. Relax, it will come.” And, of course it finally always did.

Implement in the closed mode

Let me make one thing quite clear. We need to be in the open mode when we’re pondering a problem. But, once we come up with a solution, we must then switch to the closed mode to implement it. Once we’ve made a decision we are efficient only if we go through with it decisively, undistracted by doubts about its correctness. For example, if you decide to leap a ravine, the moment just before takeoff is a bad time to start reviewing alternative strategies!

Review in the open mode

We should once again switch back to the open mode to review the feedback arising from our action, in order to decide whether the course that we have taken is successful […], or whether we should create an alternative plan to correct any error we’ve perceived, and then back into the closed mode again to implement that next stage. And so on.

To be at our most efficient, we need to be able to switch backwards and forwards between the two roads.

But here’s the problem: We too often get stuck in the closed mode. Under the pressures which are all too familiar to us. We tend to maintain tunnel vision at times, when we really need to step back and contemplate the wider view.

This is particularly true of politicians. The main complaint about them, from their non-political colleagues, is that they become so addicted to the adrenaline that they get from reacting to events on an hour-by-hour basis, that they almost completely lose the desire or the ability to ponder problems in the open mode.

So, as I say: Creativity is not possible in the closed mode. […]

Conditions for the open mode

There are certain conditions which make it more likely that you’ll get into the open mode, and that something creative will occur. More likely. You can’t guarantee anything will occur. You might sit around for hours, as I did last Tuesday, and nothing, zilch, bupkis, not a sausage.

I can at least tell you how to get yourselves into the open mode. You need five things:

Space.

Time.

Time.

Confidence.

Humor.

[…]

Factor 1: Space

You can’t become playful, and therefore creative, if you’re under your usual pressures. To cope with them, you’ve got to be in the closed mode, right? You have to create some space for yourself away from those demands, and that means sealing yourself off.

You must make a quiet space for yourself, where you will be undisturbed.

Next: Time.

Factor 2: Time

It’s not enough to create space. You have to create your space for a specific period of time.

You have to know that your space will last until, exactly, say, 3:30, and that at that moment your normal life will start again.

It’s only by having a specific moment when your space starts, and an equally specific moment when your space stops, that you can seal yourself off from the everyday closed mode in which we all habitually operate.

Johan Huizinga

I’d never realised how vital this was, until I read a historical study of play, by a Dutch historian called Johan Huizinga. In it, he says:

Play is distinct from ordinary life. Both as to locality, and duration. This is its main characteristic. It’s secludedness. It’s limitedness.

Play begins and then, at a certain moment, it is over. Otherwise, it’s not play.

Oasis of Quiet — Not so fast

Combining the first two factors, we create an Oasis of Quiet, for ourselves, by setting boundaries of space, and of time. Now, creativity can happen, because play is possible, when we are separate from everyday life.

So, you’ve arranged to take no calls, you’ve closed your door, you sat down somewhere comfortable. We take a couple of deep breaths and, if you’re anything like me, after you’ve pondered some problem that you want to turn into an opportunity for about 90 seconds, you find yourself thinking: Oh I forgot I’ve got to call Jim! I must tell Tina that I need the report on Wednesday and not Thursday, which means I must move my lunch with Joe, and […] I must pop out this afternoon to get Will’s birthday present, and those plants need watering, and none of my pencils are sharpened and… Right, I’ve got too much to do, so I’m going to start by sorting out my paper clips, then I shall make 27 phone calls, and I’ll do some thinking tomorrow, when I’ve got everything out of the way.

Because, it’s easier to do trivial things that are urgent, than it is to do important things that are not urgent, like thinking.

It’s also easier to do little things we know we can do, than to start on big things that we’re not so sure about.

So, when I say create an Oasis of Quiet, know that when you have your mind will pretty soon start racing again, but you’re not going to take that very seriously. You just sit there, for a bit, tolerating the racing and the slight anxiety that comes with that, and after a time your mind will quieten down again.

Because it takes some time for your mind to quieten down, it’s absolutely no use arranging a space-time oasis lasting 30 minutes. Just as you’re getting quieter, and getting into the open mode, you’ll have to stop, and that is very deeply frustrating. You must allow yourself a good chunk of time. I’d suggest about an hour and a half. Then, after you’ve gotten to the open mode, you’ll have about an hour left for something to happen (if you’re lucky).

But, don’t put a whole morning aside. My experience is, after about an hour and a half, you need a break. So it’s far better to do an hour and a half now, and then an hour and a half next Thursday, and maybe an hour and a half a week after that; then to fix one four-hour session “now”.

There’s another reason, and that’s factor number three: Time.

Yes, I know we’ve just done Time, but that was half of creating our Oasis. Now, I’m going to tell you about how to use the Oasis you’ve created. Why do you still need time?

Factor 3: Time (really)

Let me tell you a story. I was always intrigued, that one of my Monty Python colleagues, who seemed to be to me more talented than I was, did never produce scripts as original as mine. And I watched for some time, and then I began to see why.

If he was faced with a problem, and fairly soon saw a solution, he was inclined to take it. Even though, I think he knew, the solution was not very original. Whereas if I was in the same situation, although I was sorely tempted to take the easy way out and finish by five o’clock, I just couldn’t. I’d sit there, with the problem, for another hour and a quarter, and by sticking to it, would in the end almost always come up with something more original. It was that simple. My work was more creative than his, simply because I was prepared to stick with the problem longer.

So imagine my excitement when I found that this was exactly what MacKinnon found in his research! He discovered that the “most creative” professionals always played with the problem for much longer, before they tried to resolve it because: they were prepared to tolerate that slight discomfort and anxiety, that we all experience when we haven’t solved a problem. You know what I mean?

If we have a problem and we we need to solve it, until we do, we feel it inside us: a kind of internal agitation or tension or uncertainty that makes us just plain uncomfortable. And we want to get rid of that discomfort. So, in order to do so, we take a decision; not because we’re sure it’s the best decision, but because taking it will make us feel better.

Well, the most creative people have learned to tolerate that discomfort for much longer. So, just because they put in more pondering time, their solutions are more creative.

The people I find it hardest to be creative with, are people who need (all the time) to project an image of themselves as decisive, and, who feel that to create this image, they need to decide everything very quickly, and with a great show of confidence. This behaviour, I suggest sincerely, is the most effective way of strangling creativity at birth.

Please note, I’m not arguing against real decisiveness. I’m 100% in favour of taking a decision when it has to be taken, and then sticking to it while it’s being implemented. What I’m suggesting to you, is that before you take a decision, you should always ask yourself the question: When does this decision have to be taken? And having answered that, you defer the decision until then, in order to give yourself maximum pondering time, which will lead you to the most creative solution.

And if, while you’re pondering, somebody accuses you of indecision say: Look babycakes, I don’t have to decide till Tuesday and I’m not chickening out of my creative discomfort by taking a snap decision before then; that’s too easy!

To summarise, the third factor that facilitates creativity is Time: giving your mind as long as possible to come up with something original.

Factor 4: Confidence

The next factor, number four, is Confidence.

When you’re in your space-time Oasis (getting into the open mode) nothing will stop you being creative so effectively as the fear of making a mistake. If you think about play, you’ll see why.

To play, is to experiment “what happens if I do this”, “what would happen if we do that”. What is the very essence of playfulness, is an openness to anything that may happen; a feeling that whatever happens, it’s okay!

You cannot be playful if you’re frightened that moving in some direction will be “wrong”, something you “shouldn’t have done”. You’re either free to play, or you’re not.

As Alan Watts puts it: “You can’t be spontaneous within reason.”

You’ve got to risk saying things that are silly, and illogical, and wrong. The best way to get the confidence to do that, is to know that, while you’re being creative, nothing is wrong; there’s no such thing as a mistake, and any drivel may lead to the breakthrough.

Factor 5: Humour

Now the last factor, the fifth, Humour.

I happen to think the main evolutionary significance of humour, is that it gets us from the closed mode to the open mode quicker than anything else.

I think we all know that laughter brings relaxation, and that humour makes us playful. Yet, how many times have important discussions been held, where really original and creative ideas were desperately needed to solve important problems, but where humour was taboo, because the subject being discussed was “so serious”? This attitude seems to me to stem from a very basic misunderstanding of the difference between serious and solemn.

Serious does not mean solemn

A group of us could be sitting around after dinner, discussing matters that were extremely serious (like the education of our children, our marriages, the meaning of life, … not talking about the film) and we could be laughing, and that would not make what we were discussing one bit less serious.

Solemnity, on the other hand, I don’t know what it’s for. What is the point of it?

The two most beautiful memorial services that I’ve ever attended, both had a lot of humour. It freed us all, and made the services inspiring and cathartic. But solemnity? It serves pomposity. The self-important [people] always know, at some level of their consciousness, that their egotism is going to be punctured by humour. That’s why they see it as a threat, and so dishonestly pretend that their deficiency makes their views more substantial, when it only makes them feel bigger.

Humour is an essential part of spontaneity; an essential part of playfulness; an essential part of the creativity that we need to solve problems, no matter how serious they may be.

When you set up a space-time Oasis, giggle all you want!

And there, are the five factors which you can arrange to make your lives more creative:

Space,

Time,

Time.

Confidence,

and Lord Jeffrey Archer.

Practicing the open mode

Pondering

Now you know how to get into the open mode, the only other requirement is that you keep your mind gently round the subject you ponder. You’ll daydream, of course, but you just keep bringing your mind back, like with meditation.

The extraordinary thing about creativity is: if you just keep your mind resting against the subject in a friendly but persistent way, sooner or later you will get a reward from your unconscious. Probably in the shower later, or at breakfast the next morning, but suddenly you are rewarded, out of the blue a new thought mysteriously appears. If you’ve put in the pondering time first.

Play requires trust

I think it’s easy to be creative, if you’ve got other people to play with. I always find that if two or more of us throw ideas backwards and forwards, I get to more interesting and original places than I could ever have got to on my own.

But, there is a danger, a real danger: If there’s one person around you who makes you feel defensive, you lose the confidence to play, and it’s goodbye creativity. Always make sure your play-friends are people that you like and trust. Never say anything to squash them, either. Never say “No”, or “Wrong”, or “I don’t like that”. Always be positive, and build on what’s been said: “Would it be even better if …”, “I don’t quite understand that can you just explain it again?”, “Go on!”, “What if ….?” Let’s pretend.

Try to establish as free an atmosphere as possible.

Japanese meetings

Sometimes I wonder, if the success of the Japanese isn’t partly due to their instinctive understanding of how to use groups creatively. You know, Westerners are often amazed at the unstructured nature of Japanese meetings.

But maybe it’s that very lack of structure, that absence of time pressure, that frees them to solve problems so creatively. And how clever of the Japanese, sometimes to plan that unstructuredness by, for example, insisting that the first people to give their views are the most junior. So that they can speak freely, without the possibility of contradicting what’s already been said by somebody more important.

Connect two ideas in a new way

The very last thing that I can say about creativity is this: It’s like human. In a joke, the laugh comes at a moment when you connect two different frameworks of reference in a new way.

For example there’s the old story about a woman, doing a survey into sexual attitudes, who stops an airline pilot and asks him when he last had sexual intercourse. He replies “1958”. Now, knowing airline pilots, the researcher is surprised and queries this. “Well”, says the pilot, “it’s only 21:10 now”.

We laugh at the moment of contact between two frameworks of reference: the way we express what year it is, and the 24-hour clock.

Having an idea, a new idea, is exactly the same thing. It’s connecting two separate ideas in a way that generates new meaning. Now, connecting different ideas isn’t difficult; you can connect cheese with motorcycles, or moral courage with light green, or bananas with international cooperation. You can get any computer to make a billion random connections for you, but these new connections or juxtapositions are significant only if they generate new meaning.

As you play, you can deliberately try inventing these random juxtapositions, and use your intuition to tell you whether any of them seem to have significance for you. That’s the bit the computer can’t do. It can produce millions of new connections, but it can’t tell which one of them smells interesting. Of course, you’ll produce some juxtapositions which are absolutely ridiculous. Absurd. Good for you!

Edward de Bono, who invented the notion of lateral thinking, specifically suggests in his book Po: Beyond Yes and No, that you can try loosening up your assumptions by playing with deliberately crazy connections. He calls such absurd ideas “intermediate impossibles”. He points out that the use of an intermediate impossible, is completely contrary to ordinary logical thinking, in which you have to be right at each stage. It doesn’t matter if the intermediate impossible is right or absurd, it can nevertheless be used as a stepping stone to another idea that is right. Another example of how when you’re playing, nothing is wrong.

If you really don’t know how to start, or if you’ve got stuck, start generating random connections and allow your intuition to tell you if one might lead somewhere interesting.

That really is all I can tell you, that won’t help you, to be creative. Everything.

How to kill creativity

[…] The important part. And that is: How to stop your subordinates becoming creative — which is the real threat.

Believe me no one appreciates better than I do what trouble creative people are, and how they stop decisive hard-nosed bastards like us from running businesses efficiently. We encourage someone to be creative, the next thing is they’re rocking the boat, coming up with ideas, and asking us questions.

If we don’t nip this kind of thing in the bud, we’ll have to start justifying our decisions by reasoned argument. And sharing information, the concealment of which gives us considerable advantages in our power struggle.

So, here’s how to stamp out creativity in the rest of the organisation, and get a bit of respect going.

Allow no humour

One: Allow subordinates no humour.

It threatens your self-importance, especially your omniscience. Treat all humour as frivolous or subversive. Because subversive is, of course, what humour will be in your setup, as it’s the only way that people can express their opposition, since if they express it openly you’re down on them like a ton of bricks.

So, let’s get this clear: Blame humour for the resistance that your way of working creates. Then, you don’t have to blame your way of working. This is important, and I mean that solemnly: your dignity is no laughing matter.

Undermine confidence

Second: Keeping ourselves feeling irreplaceable, involves cutting everybody else down to size.

Don’t miss an opportunity to undermine your employees confidence. A perfect opportunity comes when you’re reviewing work that they’ve done: Use your authority to zero in immediately on all the things you can find wrong.

Never, never, balance the negatives with positives. Only criticise, just as your school teachers did.

Always remember: Praise makes people uppity!

Demand urgency

Third: Demand that people should always be actively doing things.

If you catch anybody pondering, accuse them of laziness and/or indecision. This is to starve employees of thinking time, because that leads to creativity, and insurrection.

Demand urgency at all time. Use lots of fighting talk and war analogies. Establish a permanent atmosphere of stress, of breathless anxiety, and crisis.

In a phrase: Keep that mode closed!

Now, in this way, we no-nonsense types can be sure that the tiny, tiny, microscopic, quantity of creativity in our organisation will all be ours!

But, let your vigilance slip for one moment, and you could find yourself surrounded by happy, enthusiastic, and creative people whom you might never be able completely to control, ever again.

So be careful! Thank you, and good night.

This post appeared on timotijhof.net. Reply via email

📎 Unifying Wikipedia mobile and desktop domains

Timo Tijhof — Mon, 24 Nov 2025 14:30:00 +0000

Until now, when you visited a wiki (like en.wikipedia.org), the server responded in one of two ways: a desktop page, or a redirect to the equivalent mobile URL (like en.m.wikipedia.org). This mobile URL in turn served the mobile version of the page.

All wikis now serve mobile page views on the canonical domain, instead of via a redirect.

The changed improved mobile response time by 20% worldwide, un-broke Commons SEO, and fixed a long-standing UX issue with opening shared links on desktop. Read more about this on the Wikimedia Blog:

→ techblog.wikimedia.org

This post appeared on timotijhof.net. Reply via email

YouTube in a feed reader is… better?

Timo Tijhof — Sat, 17 May 2025 13:00:00 +0000

Two months ago, I deleted my YouTube subscriptions. I now follow YouTube channels in my feed reader instead (I use the NetNewsWire app). How does that work? Is it better?

How to follow a channel

Copy link to the YouTube channel.
Paste into your feed reader.
That’s it!

On desktop, or on the mobile site, copy from the addres bar when on any channel page, or from the share sheet, or copy a link to any channel in the search results (via right-click or long-press).

In the YouTube mobile app you can get the link via the “Copy link” button. Today, that sits in the unlabeled three-dotted “Share” menu.

Share from YouTube app.

Copy the address from a browser tab.

YouTube search result.

Reader experience

How does the fead reader experience compare to YouTube’s own “Subscriptions” page?

I used to triage the YouTube Subscriptions page by either clicking “Hide” on videos I’m not interested in, via the three-dot menu on YouTube, or by adding videos to a “Watch Later” playlist. This regularly breaks and causes discomfort in a number of ways. By using a feed reader, we get:

Fast and efficient triage. I now spend less time “managing” my YouTube stuff. I quickly swipe past videos I’m not interested in, each automatically marked as read. Videos to watch later, I star. Or, I watch ’em right there with fullscreen and picture-in-picture support! (Works even without the YouTube app!) If I want to do something with the video on YouTube, it’s one tap on the post title (or the big “Watch on YouTube” button), and e.g. add to any playlist, or stream to a Chromecast or Smart TV.

No sense of urgency. I am happy to no longer feel compelled to regularly open the YouTube app “just in case”, and am no longer urged to triage new videos from the YouTube Subscriptions page before they disappear. (YouTube deletes stuff there after a few weeks.) I can now trust that new uploads are reliably delivered, and never lost.

Reclaimed sense of agency. Native apps tend to make it hard to let you finish a thought when you open them, by presenting you with options or otherwise distracting you. Now, I only end up in the app via a specific video link from the feed reader. This means I have decided what to do, and the technology knows my intent, so the app opens and goes straight to that one video. There is no “Home” feed or “Shorts” page to navigate past. (In case you’re interested, I describe further down how to disable “Home” and “Shorts” in the YouTube mobile app more generally.)

Behind the scenes

This is all possible because YouTube implements two open standards: it provides feeds in the RSS format, and a discovery link that lets you follow the channel from its web page (without needing to know about or find the URL to an RSS file).

Pet peeves of the app

Until recently, the main way I used YouTube (both via its website on desktop, and through its mobile app) was through the “Subscriptions” page.

“Home” page

What a delight!

I’ve always disabled watch history on my YouTube account. As of 2023, YouTube no longer offers non-personalised recommendations to logged-in users through the Home page. That means my YouTube “Home” page is now a clean landing page with nothing but a welcoming search bar.

It took YouTube ten years to decide this. I wonder if they thought the semi-personal recommendations were not useful (they seemed fine to me?), or whether YouTube is simply becoming more honest and bold in pushing their preferred economic transaction (use the platform in exchange for your consent to store and analyze your watch history, even if paying for Premium. If you disable watch history, they intentionally try to make it worse?). I don’t miss it, but I didn’t mind it either.

How to disable YouTube Shorts, for real!

Around the same time in 2023, YouTube decided to no longer let logged-in users access the endless Shorts feature via the YouTube app, unless you enable watch history. That’s been an absolute blessing. I miss nothing there.

Except perhaps the transparency. I would sometimes study what it serves to other people. Note that the endless Shorts feed is still available via the website when logged-out, so the generic version of this feature remains available there for anthropological research.

Perennial breaking of “Hide”

The “Hide” option on the subs page lets you maintain a list of videos from channels you follow. This UI feature on YouTube is buggy. It breaks all the time, and Google takes months to prioritize fixing it. I remember when YouTube Shorts was introduced and force-fed throughout the platform, the “Hide” button for Shorts on the subs page did nothing. Google probably didn’t intentionally launch Shorts with a broken “Hide” button. But, the lack of test coverage and lack of bug priority are a direct consequence of internal success metrics at YouTube — directing engineering teams toward what is valued and rewarded by management, and away from what is not.

Unreliable delivery

YouTube’s subscription system is famously unreliable. It is a decade-old meme at this point. Their system might report some “9s” after 99.9% internally, but it is expected that on a service used by millions, this bug affects everyone. People I talk to are affected multiple times a year. And, it doesn’t self-correct! Compare this to texting or emailing: When have you not received a message addressed to you? I don’t mean arrive late or miss the notification, but never arrive to your inbox? I suspect YouTube implements their subscribtion system such that new videos are individually added to a separate queue for each subscriber. And, if the stars don’t align for all milions of those one-shot attempts, there is no retry, and no on-demand detection or reconstruction. This is good enough for an algorithmic feed, but not for a personal subscription system.

YouTube is not in the business of delivering what you expect or ask for (unlike Netflix, Apple TV, or linear television). It is in the business of eyeball retention, by serving up whatever is “good enough” to keep users in the app. Step one: Minimize your use of the app.

Update (18 June 2025): YouTube’s original RSS feeds contain only a linked title. Many feed reader services (like Feedbin, NewsBlur, FreshRSS, Inoreader, Tiny Tiny RSS, and Inoreader) detect YouTube feeds and enhance them by appending a video iframe and description text. I use NetNewsWire with Feedbin as sync service, which yields the screenshots above. While some feed reader app do the same locally, NetNewsWire doesn’t yet (Feature request). It works for me because I combine it with Feedbin. Without that, the feed entries only have a title linking to YouTube.

This post appeared on timotijhof.net. Reply via email

Lockfiles for apps, not packages (still)

Timo Tijhof — Thu, 12 Sep 2024 23:00:00 +0000

TL;DR: My updated take is Lockfiles for Node.js apps, not for other projects.

When you run npm install, after you add or change a dependency in package.json, npm finds and selects the latest compatible version, downloads it, and replaces your package-lock.json file to describe what it found.

The npm install command does not consider lockfiles from upstream packages you depend on. This is not a bug. It’s by design. The npm publish command explicitly omits lockfiles from any package.^[1]

This and other factors led Sindre Sorhus (@sindresorhus), author of some of the most well-known and popular packages on npm, to adopt this policy in 2017:

Lockfiles for apps, but not for packages.

This was in response to npm enabling package-lock.json in the npm 5.0 release.

Lockfiles are useful

Over the past decade, I found lockfiles to really shine and be “worth it” when:

You maintain a Node.js-based application that you deploy as a finished product. Or;
You maintain a command-line application that developers should install globally on their workstation, via npm install -g.

When developing a Node.js-based service, you can commit a package-lock.json file alongside it. Combine this with a production deployment that runs npm ci (instead of npm install), and you can safely deploy changes (especially rollbacks, time-sensitive reverts after a faulty deployment) to your service without untimely updates to dependencies piggybacking as part of your deployment. There are other and better ways to accomplish this, but lockfiles are a decent start. In this case, I’d also run npm shrinkwrap, which renames the lockfile to npm-shrinkwrap.json. That clearly communicates that this lockfile is tied to your application’s deployment. But, any lockfile will do for this use case.

When installing a package globally, e.g. npm install -g fresnel, npm can consider an upstream lockfile. Such upstream package must supply a shrinkwrap for npm to consider it. And, npm can only utlize it when installing the package standalone, i.e. globally. When developing an end-user application that you expect developers to install via npm install ‑g, by all means use a lockfile. Any lockfile that isn’t “shrinkwrapped”, won’t be published by npm as part of your package, and thus cannot benefit installations.

Global dependencies

Back in the early 2010s, it was common to find projects that couldn’t locally pass linters and tests, because it assumed a different version of JSHint or ESLint than I installed, for another project I contribute to. These kinds of problems tormented many frontend developers, when they first dabbled in CLI and server-side scripting. They would have their project rely on globally installed tools and, invariably, on a specific (yet undocumented) version.

Over the past decade, the Node.js ecosystem has slowly learned its lesson. Packages now generally take care of their own dev tooling. In package.json, each package declares the relevant dev dependencies. We use "scripts" entries to execute commands like eslint, qunit, or grunt. This is especially convenient given that the commands of any dependency can be used directly in "scripts". You need not specify the path to node_modules or call npx here.^[2]

Benefits and costs

Most repositories containing a package.json file are either:

packages published to npm, for use as dependency in another project, or
projects that use Node.js tooling during development only — such as PHP, Ruby, Python, or C++ projects that may use tools like ESLint and QUnit for frontend testing. This includes Composer packages, WordPress plugins, and MediaWiki extensions.

Note that neither of these fall under the categories outlined earlier (Node.js services, and Node.js global tools), and thus have no use for a lockfile. However, as maintainer, it costs you in busywork, support tickets, and sunkcosts you further into justifying other equally-fruitless busywork.

In Dutch we have the idiom “dweilen met de kraan open“, to mop while the tap is running. This perfectly captures the idea of a boondoggle and busy work more generally. (Image via Wikimedia Commons)

Security updates

Okay, so you’re working on a project or package where you ostensibly don’t need a package-lock.json file. Can this impact security?

For packages, we’ve already established that the lockfile can’t benefit your users. Hence, it does not delay or provide any protection from problematic updates. When they install your package, npm selects the latest compatible version of your dependencies. To pin a dependency, you have to pin it in package.json. This is best paired with a general reduction in risk by reducing your dependencies. Either way, a lockfile cannot help you.

Okay, what about you? Does it help you as maintainer?

For maintainers and contributors to your project, the first install downloads dependencies over the network, either way. Subsequent installs resolve versions against the online registry, then utilize the local npm cache, either way. Lockfiles accomplish nothing but a constant stream of patches (and conflicts) to said lockfile, to keep it identical to… how npm install leaves it. Also, notice what just happened. Yes, when you have a lockfile and run npm install, it changes. That’s because npm isn’t required to follow it. You could locally run npm ci, which does. However, assuming you semi-automatically update the lockfile regularly, what’s the difference? Have you ever not merged a patch that updates a lockfile to match npm install? Any issue captured by that would still be experienced by people using npm install, which is most people.

Pinning dependencies

Perhaps you have scars from a badly behaved dependency that broke compatibility in a semver-minor release. I know I do. It’s rare, but it happens. Lockfiles are an ineffective approach to pinning dependencies, though, as they aren’t applied in most cases, and get overwritten the very next time anyone runs npm install.

A more effective solution, even if you do utilize a lockfile, is to pin dependencies in package.json first.

I like to use the “overrides” key, to further separate these from my own dependencies.

npm audit

npm audit is great, mostly, and works regardless of whether you commit a lockfile.

Dependency update notifications

Perhaps you use GitHub Dependabot, or Wikimedia LibUp. Whether for security, or for other reasons, it’s useful to learn about available software updates, right? Yes! And, the great thing is, these work even better on package.json — without lockfile.

GitHub scans for CVEs in indirect dependencies. It scans package.json too, and knows about affected packages and their downstream dependants. By not checking in your lockfile, it will inform you if, and only if, a change to package.json is needed. In most cases, a CVE or other bug is fixed in a patch release, and your package.json (or the one of the intermediary package) has a caret or tilde version that expands automatically to the newer version. By definition, if the Dependabot only changed package-lock.json, then it didn’t need to be done^[3]. Whether you change the lockfile or not, anyone installing your project was already getting the update. The lockfile is ignored by npm-install, and isn’t part of your package. The lockfile merely describes what npm install last did.

Suppose your project uses eslint and @typescript-eslint/parser, which has an indirect dependency on micromatch. Then, a CVE emerges. The intermediary package uses a tilde or caret version, and the patch release is compatible and in-range. With a lockfile, you’d get notified and “have to” merge a patch to update your lockfile. Without a lockfile, this is a non-event as npm install was already installing said update. Okay, that one was easy.

Suppose the intermediary dependency pinned micromatch to an exact version (or maybe the fix was outside its semver range). To get this update, you’ll need to upgrade @typescript-eslint/parser. And you can, because GitHub Dependabot scans your package.json, notifying you of package versions you rely on that have insecure dependencies. By removing the lockfile, it now only notifies you when your own dependencies are affected and/or when you have to use a newer version of your dependencies to obtain the update.

Adding a lockfile in this scenario only serves to invite noise and churn over already-solved issues. In the event of malicious activity and compromised packages, the company behind npmjs.com (Microsoft/GitHub) deletes those releases from the registry. This isn’t what npm audit or lockfiles are for.

We all care about security. I care about security. But, be wary of performative security, which can cost you valuable code review time, CI resources, and support tickets (from users who mistakenly think you must update your lockfile to help them, when actually they need to update their own).

Except when deploying Node.js apps, a lockfile brings you nothing but lost oppertunities.

How we balance security and openness at Wikimedia

Timo Tijhof — Thu, 19 Oct 2023 20:00:00 +0000

How does an open philosophy jive with best practices in performance and security? In short, we’re selective in our dependencies and audit our own upstream sources. Progressive enhancement not only makes for a fast and accessible site, I argue it’s also the cheaper choice in the long run!

Background

The Wikimedia Foundation is the non-profit that hosts Wikipedia and other free knowledge and open data projects. These projects are made possible by a global community who, together with the Foundation, comprise the “Wikimedia movement”. The Wikimedia movement is united by a vision: to bring about a world in which every single human being can freely share in the sum of all knowledge.

I’ve worked at the Wikimedia Foundation for over 10 years, first starting as a front-end developer and eventually as a part of the Performance Team.

The Wikimedia movement is rooted in the culture of freely licensed software. The MediaWiki application that Wikipedia runs on, and all other software developed at the Foundation, is open source. That includes the configuration and datacenter automation of our web servers, databases, and CDN service. The Wikimedia community and any other individual or organization may inspect, contribute to, reuse for themselves, or fork any aspect of the platform at any time. This philosophy is also the basis of long-standing security practices which support visibility and openness.

Security through visibility and trust

We live in an incredible world. Today, most online devices are powered by open source. Whether the data centers of video streaming giants and social media sites, or your smartphone, they likely run an open source operating system like Linux or a BSD derivative^[1]. The vast majority of websites are also built with open source tools, or run on open source platforms. When you build on existing software that is developed by another organization or community, this is called an “upstream”.

The Wikimedia Foundation relies heavily on upstream technology to power its platforms. This allows the organization to focus on its core mission of providing free knowledge to the world, rather than on developing and maintaining technology from scratch. Additionally, by collaborating with other open source projects, the Foundation is able to give back to the broader free software ecosystem and help advance the state of technology for everyone.

We’re notable for operating exclusively with upstreams that are also open source. This ensures our freedom principles (to freely inspect, modify, reuse, and fork) are not hindered by proprietary components.

New Wikimedia production software components or dependencies must pass certain fitness checks and a chain of trust for the software’s security and integrity. When the Wikimedia community creates software that is peer-reviewed during development, this trust follows implicitly from its public policies and standards. When adding a new third-party package or dependency (“upstream”), this chain needs to be established by other means.

The Wikimedia Foundation extends its chain to several credible upstream vendors and communities. For example, Debian, known for its Linux distribution, is host to the highly trusted and curated Debian package repository. When a package is present in the Debian repository, this signals trust, stability, and confidence to the industry. While we usually don’t audit source code of Debian packages, installing a Debian package may still require a concept review to validate and verify that the package actually intends to meet our scale, threat model, and performance requirements.

When considering PHP or JavaScript libraries from an anonymous and open registry like npm or Packagist, the Wikimedia Foundation audits the code as if it were its own. We keep on-going costs to a minimum by only adopting upstream packages in areas that solve non-trivial problems, have stable external requirements, and sit behind a module boundary. Dependencies should reduce cost, not increase it. In practice, we only consider packages with few or no transitive dependencies, written for a stable runtime.

As an added precaution, the Wikimedia Foundation prohibits networking to third-party services in its production realm. When deploying or installing the MediaWiki application, it does not download JavaScript or PHP packages from npm or Composer. Instead, upstream packages are downloaded as a file with an integrity hash, and are checked into Git. This approach implements the organization’s security requirements, allowing for transparent auditing, patch-ability, and independent offline deployment. It also helps with faster onboarding, consistent and reproducible development, and creates a natural place for auditing upstream changes during code review.

The most localized software

With over 300 language editions, Wikipedia might be among the most-translated literature in the world. Wikipedia editors usually write or translate articles manually, and in recent years, the ContentTranslation tool has helped editors do this more efficiently, producing over 1 million articles through this new tool alone.

The MediaWiki platform underneath it all recognizes and localizes its user interface in over 400 languages, including gender, pluralization rules (“10 new messages”), and sort order ICU collations. We contribute to the Unicode CLDR standard on behalf of Wikipedia’s language communities. These contributions flow downstream to other Unicode customers such as Linux, Apple, and Microsoft.

Languages like Arabic and Hebrew are written from right to left. CSSJanus takes stylesheets designed and developed for left-to-right languages like English, and automatically converts them into right-to-left layouts. We deploy the MediaWiki platform on a weekly basis. Each change to functionality is deployed to all supported languages from day 1, every time. CSSJanus is part of what makes this feasible and with little to no developer training.

Not all issues are that easy! During VisualEditor development, extensive effort went into localizing the bold and italic toolbar buttons. The familiar “B” and “I” buttons usually make place for an equivalent abbreviation, such as F (Fett) and K (Kursiv) in German, or a stylized “A” for language communities that have no accepted standard. But, early adoption of English-centric software led to “B” and “I” becoming the established and culturally familiar design pattern in some other languages as well. In Hebrew, Czech, and Malayalam “correcting” these with a translation actually created confusion.

No profit motive

Large corporations, driven by profit motives, regularly drop support for older devices and browsers. The Wikimedia Foundation, however, has an imperative to make information more accessible, not less.

How does the organization pull that off without the resources of a large corporation? Through equal parts being aggressively lean and aggressively uncompromising.

The organization saves development and testing costs by writing and deploying native JavaScript that targets only modern browsers. Through an approach inspired by BBC News’ cutting the mustard, the Foundation enables millions of people (1% of its 2 billion monthly readers) to access Wikipedia on older devices through a JavaScript-free experience. This is the same experience that all page views start at prior to the (optional) arrival of JavaScript code.

The Wikimedia Foundation’s development principles and browser support policy reflects this by emphasizing the importance of progressive enhancement.

Viewing Wikipedia through a web browser is the most common access method, but Wikipedia’s knowledge is consumed far beyond the canonical experience at Wikipedia.org. Wikipedia content goes everywhere. It’s distributed offline through Kiwix and IPFS, rendered in native apps like Apple Dictionary, and even shared peer-to-peer through USB sticks. What these environments have in common is that they may not involve JavaScript as they require high security and high privacy. This is made possible at no extra cost due to APIs offering complete content HTML-first, with CSS and embedded media based on ubiquitous and open formats only.

Summary

The Wikimedia Foundation prioritizes both security and openness. To achieve this balance, it implements a number of practices and policies that ensure that it protects both the freedoms and the privacy of its audience, all while sharing information transparently.

For example, the Foundation publishes an annual transparency report detailing its response to information and takedown requests twice per year. The Wikimedia Foundation’s Board positions are largely held by community members, and appointed by public election through anonymous and cryptographically-verifiable votes from any eligible Wikipedia account. Its Governance Wiki publishes the Foundation’s bylaws, board decisions, and meetings.

The Foundation participates in an ecosystem of organizations that collaborate on freely-licensed information and open-source software. Overall, the organization balances exceptional security and openness by implementing strong security practices, and providing transparency about their policies and procedures.

Originally published on OpenJS Foundation Blog.

Footnotes:

Note that Apple’s macOS and iOS are also Unix-like BSD derivatives, through their inheritence from the NeXTSTEP operating system, which continues to this day via the Darwin kernel. ↩︎

This post appeared on timotijhof.net. Reply via email

An Internet of PHP

Timo Tijhof — Mon, 04 Sep 2023 23:00:00 +0000

PHP is big. The trolls can proclaim its all-but-certain “death” until the cows come home, but no amount of heckling changes that the Internet runs on PHP. The evidence is overwhelming. What follows is a loosely organised collection of precisely that evidence.

Statistics
Anecdotes
At scale
What about my bubble?
Conclusion

Statistics

PHP as programming language of choice

From Language analysis by W3 Techs on the top 10 million websites worldwide:

PHP at 77.2%.
ASP at 6.9%.
Ruby at 5.4%.

Content management on PHP

The bulk of public sites build on PHP via a CMS. By market share, 8 of the 12 largest CMS softwares are written in PHP. The below is from CMS usage by W3 Techs, where each percent represents 100,000 of the top 10 million sites. There’s a similar CMS report by BuiltWith that analyses a larger set of 78 million websites.

[PHP] WordPress ecosystem (63% of CMS-based sites, 43% of all sites)
[Ruby] Shopify
Wix
Squarespace
[PHP] Joomla ecosystem (3%)
[PHP] Drupal ecosystem (2%)
[PHP] Adobe Magento (2%)
[PHP] PrestaShop (1%)
[Python] Google Blogger
[PHP] Bitrix (1%)
[PHP] OpenCart (1%)
[PHP] TYPO3 (1%)

E-commerce on PHP

From BuiltWith’s report on online stores, as of Aug 2023:

WooCommerce for WordPress (24% of global market share)
Adobe Magento (7% of global market share)
OpenCart (2% global market share, 24% market share in Russia)
PrestaShop (2% global market share, 14% market share in France)
Shopware (1% global market share, 12% market share in Germany)

Anecdotes

Kinsta published a retort demonstrating that PHP is fast, lively, and popular:

Well, first off, it’s important to point out that there’s a big difference between “wanting” and “being”. People have been calling for the death of PHP […] as far back as 2011.

PHP 7.3 was pushing 2-3x the number of requests per second as PHP 5.6. And PHP 8.1 is even faster.

[…] Because of PHP’s popularity, it’s easy to find PHP developers. And not just PHP developers – but PHP developers with experience.

Matt Brown from Vimeo Engineering in It’s not legacy code — it’s PHP:

PHP hasn’t stopped innovating […]. A new wave of backend engineers planned how we might carve up 500,000 lines of PHP into a bunch of [services]. […] Ultimately none of the proposals took hold.

Vimeo had grown many times over in the ten years since 2004, and our PHP codebase along with it […]

Ars Technica tells us: PHP maintains an enormous lead. Ars published a version of the W3 Techs report that includes historical data.

Despite many infamous quirks, the server-side language seems here to stay. […]
Within that dataset, the story told is clear. […] PHP held a 72.5 percent share in 2010 and holds a 78.9 percent share as of today. […] There doesn’t appear to be any clear contender for PHP to worry about.

Lex Fridman put it as follows in an interview with Python-creator Guido van Rossum on his podcast (episode, timestamp):

Lex: “PHP probably still runs most of the back-end of the Internet.”
Guido: “Oh yeah, yeah. […]”

Daniel Stenberg’s annual Curl user survey (page 18) asks where people use curl. After curl’s own interface (78.4%), the most familiar curl binding is PHP. It has been, since the survey’s beginning in 2015. In 2023, 19.6% of curl survey respondents reported they use curl via PHP.

curl (CLI) 78.4%, php-curl 19.6%, pycurl 13%, […], node-libcurl 4.1%.

Ember.js famously originated from the Ruby community. But, as a frontend framework Ember can pair with any backend. The Ember Community Survey reports PHP as the third-most favoured among survey participants, after Ruby and Java.

The Ember survey also asked general industry questions. For example, 24% described their employer’s infrastructure as “self-hosted”, and not at a major cloud provider. This isn’t a representative survey per-se, but may still be a surprise. Especially for folks who rely on social media and conference talks for their sense of what businesses do in the real world. It is more important than ever for companies to have a cloud exit strategy ready (NHS example). You can read how Basecamp’s cloud exit saves them millions of dollars a year.

PHP at scale

The stats cited above measure the number of distinct sites and companies. The vast majority of those build on PHP. But, all that says about their scale is that they’re somewhere in the top 10 million. Does that worry you? What’s in the top 500?

Laravel

Jack Ellis from Fanthom Analytics in Does Laravel Scale? makes the case that you shouldn’t make choices based on handling millions of requests per second. You’re not likely to reach that, and will face many other bottlenecks. But, it turns out, PHP is one of the languages that does scale to that level.

When we started seeing incredible growth in our software, Fathom Analytics (which is built on Laravel), […] never had moments of “does the framework do enough requests per second?”. […]

I’ve worked with enterprise companies using Laravel to power their entire business, and companies such as Twitch, Disney, New York Times, WWE and Warner Bros are using Laravel for various projects they run. Laravel can handle your application at scale.

Matt Brown again, from Vimeo Engineering in It’s not legacy code:

I’m here to tell you that it can, and Vimeo’s continued success with PHP is proof that it’s a great tool for fast-moving companies in 2020.

Vimeo is also known as the developer of Psalm, a popular open-source static analysis tool for PHP.

From Keith Adams, Chief Architect at Slack Engineering in Taking PHP Seriously:

Slack uses PHP for most of its server-side application logic […].

the advantages of the PHP environment (reduced cost of bugs through fault isolation; safe concurrency; and high developer throughput) are more valuable than the problems […]

Let’s take another look at the W3 Techs report, and this time focus on the size of some single businesses. At the top, we have WordPress which of course powers Automattic’s WordPress.com. That’s 20 billion page views each month (Alexa rank 55 worldwide).

If we move further down the report, to entries with 0.1% market share, we find PHP systems that power massive websites. Yet, these are also the platform of choice for over 100,000 smaller websites.

#23 CMS: Moodle
#25 CMS: phpBB, e.g. Google’s Waze Community, ApacheFriends Forum, VideoLAN Forums.
#31 CMS: XenForo forums, e.g. ArsTechnica.com, MacRumors.com.
#33 CMS: Roundcube
#45 CMS: MediaWiki
#49 CMS: vBulletin forums
#53 CMS: IPS Community, e.g. MalwareBytes.com, BleepingComputer, and Squarespace.com Forums.

MediaWiki is the platform behind Wikipedia.org with 25 billion page views a month (Alexa #12). MediaWiki also powers Fandom with 2 billion page views a month (Similarweb #44), and WikiHow with 100 million monthly visitors (Alexa #215).

Other major Internet properties powered by PHP include Facebook (Alexa #7), Etsy (Alexa #66), Vimeo (Alexa #165), and Slack (Similarweb #362).

Etsy is interesting due to its high proportion of active sessions and dynamic content. This unlike Wikipedia or WordPress, which can serve most page views from a static cache. This means despite a similar scale, Etsy’s PHP application is a lot more exposed to their high traffic.

Etsy is also where PHP-creator Rasmus Lerdorf is employed. He sometimes features snippets from Etsy’s codebase in his tech talks. (Geek side note: His 2021 Modern PHP talk explains how Etsy deploys with rsync, exactly like Wikipedia did for the past decade with Scap). Etsy’s engineering blog occasionally covers work on their modular PHP monolith, e.g. Plural localisation, or their detailed Etsy Site Performance reports:

Happily, this quarter we saw site-wide performance improvements, due to our upgrade to PHP7.

[…] we saw significant performance gains on all our pages.

What about my bubble?

One could critique the PHP community for not occupying much space in public discourse. Whether PHP core developers, or authors of PHP packages (like Laravel, Symfony, WordPress, Composer, and PHPUnit), or the average engineer using it in their day job… we’re not seen much in arguments on social media.

You also don’t see us give many conference talks prescribing formulas for a stack that will “definitely be better” for your company. If talks by fans of certain JavaScript frameworks are anything to go by, we should believe that most companies use their stack today, and that you should feel sorry if you still don’t. I don’t say that to judge JavaScript. What bothers me is prescriptive messaging without considering technical or business needs, without assessing what “better” means — better compared to what? It’s hard to compare the one thing you know.

The above isn’t to say JavaScript doesn’t have its place. Share your experience! Share your results (and the benchmarks behind them), what worked, what didn’t. Keep searching, keep innovating, keep sharing, and above all: keep pushing the human race forward. That’s free software!

One could question merits through the lost decade and critique on React, but… React holds a 3% market share. Add the smaller frameworks (Vue, Angular, Svelte) and we reach a sum of 5%. Similarly, Node.js as web server holds 3% market share. Does that mean over 90% missed out on This One Trick That Will Boost Your Business?

Lest we forget, this 5% represents 500,000 major websites. That’s huge. Node.js has its place and its strengths (real-time message streams). But, Node.js also has its weaknesses (blocking the main thread). And remember, market share doesn’t say much about scale. It could be powering several organisations in the top 1% (like MediaWiki), or the bottom 1%. Or, be WordPress and power both the top 1% and over 40 million other sites.

Conclusion

Companies young and old, small and big, might not be utilising the software stacks we hear talked about most in public spaces. This is especially true outside the bubble of personal projects and cash-burning startups.

Is PHP the most economic choice for growing and sustainable businesses today? Is it in the top three? Does language runtime matter at all when scaling up a business and team of people around it? We don’t know.

What we do know is that a great many businesses today build on PHP, and PHP has proven to be a sustainable option. It stands the test of time. That includes new companies like Fathom that turned profitable in just three years. Like the Fathom article said, most of us will never reach that scale. But, it’s comforting to know that PHP is a sustainable and economical option even at scale. Is it the only option? No, certainly not.

There are languages that are even faster (Rust), have an even larger community (Node.js), or have more mature compilers (Java); but that tends to trade other values.

PHP hits a certain Goldilocks sweetspot. It is pretty fast, has a large community for productivity, features modern syntax, is actively developed, easy to learn, easy to scale, and has a capable standard library. It offers high and safe concurrency at scale, yet without async complexity or blocking a main thread. It also tends to carry low maintenance cost due to a stable platform, and through a community that values compatibility and low dependency count. You will have different needs at times, of course, but for this particular sweetspot, PHP stands among very few others. Which others? You tell me!

Browser adoption rates

Timo Tijhof — Thu, 16 Feb 2023 20:00:00 +0000

For two years in 2020 and 2021, I shared Wikipedia’s worldwide browser statistics on Mastodon under #browserstats. They looked a little something like this:

Wikipedia.org and sister projects, browserstats for May 2021:

49%: Chrome + Chrome Mobile

24.7%: Safari + Mobile Safari

5.2%: Firefox + Firefox Mobile

2.8%: Edge

2.5%: Samsung Internet

[…]

100% = 16.4 billion page views (not including bots)

As the data includes the browser’s major version, I wondered whether I could use this to follow the adoption rate through each browser’s release cycle. The short answer is… Yes! Here is what I found as of May 2021:

Firefox: 1 week (peaks ~87% every 4 weeks).
Edge: 1 week (peaks ~97%, every 6 weeks).
Chrome: 2 weeks (peaks ~91%, every 6 weeks).
Safari: 1-2 months (peaks ~86%, yearly).
Chrome Mobile: 2 weeks (peaks ~80%, every 6 weeks).
Mobile Safari: 4 months (peaks ~92%, yearly).

For each browser family I identified the typical adoption “peak”, which is the highest percentage of clients having the same major version of that browser during the last six months. I then measured the time it takes for a given version to reach that peak. To discount noise (such as from early betas and fake user agents) I count from 2% to 90% relative to the browser’s own adoption peak.

Firefox (desktop)

Release cadence: every 4 weeks.
Adoption peak: ~ 87%.
Adoption time: ~ 1 week.

from 1.7% to 78% (2-90% of peak):

v85: 26 Jan – 3 Feb.
v86: 23 Feb – 2 Mar.
v87: 23 Mar – 31 Mar.

Microsoft Edge

Release cadence: every 6 weeks.
Adoption peak: ~ 97%.
Adoption time: ~ 1 week.

from 1.9% to 87% (2-90% of peak):

v87: 19 Nov – 29 Nov.
v88: 21 Jan – 30 Jan.
v89: 4 Mar – 12 Mar.

As of August 2020, Edge aligns its schedule to Chromium releases.

Chrome (desktop)

Release cadence: every 6 weeks.
Adoption peak: ~ 91%.
Adoption time: ~ 2 weeks.

from 1.8% to 82% (2-90% of peak):

v86: 7 Oct – 18 Oct.
v87: (had a bumpy ride).
v88: 20 Jan – Feb 6.
v89: 3 Mar – 19 Mar.

Safari (desktop)

Release cadence: every 12 months.
Adoption peak: ~ 86%.
Adoption time: 1-2 months.

from 1.7% to 77% (2-90% of peak):

v13: 14 Sep 2019 – 17 Nov 2019.
v14: 16 Sep 2020 – 25 Dec 2020.

Chrome Mobile

Release cadence: every 6 weeks.
Adoption peak: ~ 80%.
Adoption time: ~ 2 weeks.

from 1.6% to 72% (2-90% of peak):

v86: 7 Oct – 24 Oct.
v88: 20 Jan – Feb 3.
v89: 3 Mar – 19 Mar.

Mobile Safari (iOS)

Release cadence: every 12 months.
Adoption peak: ~ 92%.
Adoption time: ~ 4 months.

from 1.8% to 82% (2-90% of peak):

iOS 13: 9 Sep 2019 – 12 Feb 2020.
iOS 14: 16 Sep 2020 – 31 Dec 2020.

HTTP/2 performance revisited

Timo Tijhof — Sun, 20 Nov 2022 06:00:00 +0000

Deploying HTTP/2 support to the Wikimedia CDN significantly changed how browsers negotiate and transfer data during the page load process. We found regressions in performance during the transition and are sharing the lessons we learned.

Hello, HTTP/2!

In 2016, the Wikimedia Foundation deployed HTTP/2 (or “H2”) support to our CDN. At the time, we used Nginx- for TLS termination and two layers of Varnish for caching. We anticipated a possible speed-up as part of the transition, and also identified opportunities to leverage H2 in our architecture.

The HTTP/2 protocol was standardized through the IETF, with Google Chrome shipping support for the experimental SPDY protocol ahead of the standard. Brandon Black (SRE Traffic) led the deployment and had to make a choice between SPDY and H2. We launched with SPDY in 2015, as H2 support was still lacking in many browsers, and Nginx did not support having both. By May 2016, browser support had picked up and we switched to H2.

Goodbye domain sharding?

You can benefit more from HTTP/2 through domain consolidation. The following improvements were achieved by effectively undoing domain sharding:

Faster delivery of static CSS/JS assets. We changed ResourceLoader to no longer use a dedicated cookieless domain (“bits.wikimedia.org”), and folded our asset entrypoint back into the MediaWiki platform for faster requests local to a given wiki domain name (T107430).
Speed up mobile page loads, specifically mobile-device “m-dot” redirects. We consolidated the canonical and mobile domains behind the scenes, through DNS. This allows the browser to reuse and carry the same HTTP/2 connection over a cross-domain redirect (T124482).
Faster Geo service and faster localized fundraising banner rendering. The Geo service was moved from geiplookup.wikimedia.org to /geoiplookup on each wiki. The service was later removed entirely, in favor of an even faster zero-roundtrip solution (0-RTT): An edge-injected cookie within the Wikimedia CDN (T100902, patch). This transfers the information directly alongside the pageview without the delay of a JavaScript payload requesting it after the fact.

Could HTTP/2 be slower than HTTP/1?

During the SPDY experiment, Peter Hedenskog noticed early on that SPDY and HTTP/2 have a very real risk of being slower than HTTP/1. We observed this through our synthetic testing infrastructure.

In HTTP/1, all resources are considered equal. When your browser navigates to an article, it creates a dedicated connection and starts downloading HTML from the server. The browser streams, parses, and renders in real-time as each chunk arrives. The browser creates additional connections to fetch stylesheets and images when it encounters references to them. For a typical article, MediaWiki’s stylesheets are notably smaller than the body content. This means, despite naturally being discovered from within (and thus after the start of) the HTML download, the CSS download generally finishes first, while chunks from the HTML continue to trickle in. This is good, because it means we can achieve the First Paint and Visually Complete milestones (above-the-fold) on page views before the HTML has fully downloaded in the background.

Page load over HTTP/1.

In HTTP/2, the browser assigns a bandwidth priority to each resource, and resources share a single connection. This is different from HTTP/1, where each resource has its own connection, with lower-level networks and routers dividing their bandwidth equally as two seemingly unrelated connections. During the time where HTML and CSS downloads overlap, HTTP/1 connections each enjoyed about half the available bandwidth. This was enough for the CSS to slip through without any apparent delay. With HTTP/2, we observed that Chrome was not getting any CSS response until after the HTML was mostly done.

Page load over SPDY.

This HTTP/2 feature can solve a similar issue in reverse. If a webpage suffers from large amounts of JavaScript code and below-the-fold images being downloaded during the page load, under HTTP1 those low-priority resources would compete for bandwidth and starve the critical HTML and CSS downloads. The HTTP/2 priority system allows the browser and server to agree, and give more bandwidth to the important resources first. A bug in Chrome caused CSS to effectively have a lower priority relative to HTML (chromium #586938).

First paint regression correlated with SPDY rollout. (Ori Livneh, T96848)

We confirmed the hypothesis by disabling SPDY support on the Wikimedia CDN for a week (T125979). After Chrome resolved the bug, we transitioned from SPDY to HTTP/2 (T166129, T193221). This transition saw improvements both to how web browsers give signals to the server, and the way Nginx handled those signals.

As it stands today, page load time is overall faster on HTTP/2, and the CSS once again often finishes before the HTML. Thus, we achieve the same great early First Paint and Visually Complete milestones that we were used to from HTTP/1. But, we do still see edge cases where HTTP/2 is sometimes not able to re-negotiate priorities quick enough, causing CSS to needlessly be held back by HTML chunks that have already filled up the network pipes for that connection (chromium #849106, still unresolved as of this writing).

Lessons learned

These difficulties in controlling bandwidth prioritization taught us that domain consolidation isn’t a cure-all. We decided to keep operating our thumbnail service at upload.wikimedia.org through a dedicated IP and thus a dedicated connection, for now (T116132).

Browsers may reuse connections for multiple domains if an existing HTTPS connection carries a TLS certificate that includes the other domain in its SNI information, even when this connection is for a domain that corresponds to a different IP address in DNS. Under certain conditions, this can lead to a surprising HTTP 404 error (T207340, mozilla #1363451, mozilla #1222136). Emanuele Rocca from SRE Traffic Team mitigated this by implementing HTTP 421 response codes in compliance with the spec. This way, visitors affected by non-compliant browsers and middleware will automatically recover and reconnect accordingly.

How does Internet Archive know?

Timo Tijhof — Mon, 20 Jun 2022 19:30:00 +0000

The Internet Archive discovers in real-time when WordPress blogs publish a new post, and when Wikipedia articles reference new sources. How does that work?

Wikipedia

Wikipedia, and its sister projects such as Wiktionary and Wikidata, run on the MediaWiki open-source software. One of its core features is “Recent changes”. This enables the Wikipedia community to monitor site activity in real-time. We use it to facilitate anti-spam, counter-vandalism, machine learning, and many more quality and research efforts.

MediaWiki’s built-in REST API exposes this data in machine-readable form to query (or poll). For wikipedia.org, we have an additional RCFeed plugin that broadcasts events to the stream.wikimedia.org service (docs).

The service implements the HTTP Server-Sent Events protocol (SSE). Most programming languages have an SSE client via a popular package. Most exciting to me, though, is the original SSE client: the EventSource API — built straight into the browser.^[1] This makes cool demos possible, getting started with only the following JavaScript:

new EventSource('https://stream.wikimedia.org/…');

And from the command-line, with cURL:

$ curl 'https://stream.wikimedia.org/v2/stream/recentchange'

event: message
id: …
data: {"$schema":…,"meta":…,"type":"edit","title":…}

…

WordPress

WordPress played a major role in the rise of the blogosphere. In particular, ping servers (and pingbacks^[2]), helped the early blogging community with discovery. The idea: your website notifies a ping server over a standardized protocol. The ping server in turn notifies feed reader services (Feedbin, Feedly), aggregators (FeedBurner), podcast directories, search engines, and more.^[3]

Ping servers today implement the weblogsCom interface (specification), introduced in 2001 and based on the XML-RPC protocol.^[4] The default ping server in WordPress is Automattic’s Ping-O-Matic, which in turn powers the WordPress.com Firehose.

This firehose is a Jabber/XMPP server at xmpp.wordpress.com:8008. It provides events about blog posts published in real-time, from any WordPress site. Both WordPress.com and self-hosted ones.^[5] The firehose is also available in as HTTP stream.

$ curl -vi xmpp.wordpress.com:8008/posts.org.json # self-hosted
{ "published":"2022-06-05T21:26:09Z",
  "verb":"post",
  "generator":{…},
  "actor":{…},
  "target":{"objectType":"blog",…,},
  "object":{"objectType":"article",…}
}
{ … }

$ curl -vi xmpp.wordpress.com:8008/posts.json # WordPress.com
{ … }

Internet Archive

It might be surprising, but the Internet Archive does not try to index the entire Internet. This in contrast to commercial search engines.

The Internet Archive consists of bulk datasets from curated sources (“collections”). Collections are often donated by other organizations, and go beyond capturing web pages. They can also include books, music,^[6] and software.^[7] Any captured web pages are additionally surfaced via the Wayback Machine interface.

Perhaps you’ve used the “Save Page Now” feature, where you can manually submit URLs to capture. While also represented by a collection, these actually go to the Wayback Machine first, and appear in bulk as part of the collection later.

The Common Crawl and Wide Crawl collections represent traditional crawlers. These starts with a seed list, and go breadth-first to every site it finds (within a certain global and per-site depth limit). Such crawl can take months to complete, and captures a portion of the web from a particular period in time — regardless of whether a page was indexed before. Other collection are more narrow in focus, e.g. regularly crawl a news site and capture any articles not previously indexed.

Wikipedia collection

One such collection is Wikipedia Outlinks.^[8] This collection is fed several times a day with bulk crawls of new URLs. The URLs are extracted from recently edited or created Wikipedia articles, as discovered via the events from stream.wikimedia.org (Source code: crawling-for-nomore404).

Last month, I edited the VodafoneZiggo article on Wikipedia. My edit added several new citations. The articles I cited were from several years ago, and most already made their way into the Wayback Machine by other means. Among my citations was a 2010 article from an Irish news site (rtl.ie). I searched for it on archive.org and no snapshots existed of that URL.

A day later I searched again, and there it was!

I should note that, while the snapshot was uploaded a day later, the crawling occurred in real-time. I published my edit to Wikipedia on May 30th, at 21:03:30 UTC. The snapshot of the referenced source article, was captured at 21:03:55 UTC. A mere 25 seconds later!

In addition to archiving citations for future use, Wikipedia also integrates with the Internet Archive in the present. The so-called InternetArchiveBot (source code) continously crawls Wikipedia, looking for “dead” links. When it finds one, it searches the Wayback Machine for a matching snapshot, preferring one taken on or near the date that the citation was originally added to Wikipedia. This is important for online citations, as web pages may change over time.

The bot then edits Wikipedia (example) to rescue the citation by filling in the archive link.

WordPress collection

The NO404-WP collection on archive.org works in a similar fashion. It is fed by a crawler that uses the WordPress Firehose (source code). The firehose, as described above, is pinged by individual WordPress sites after publishing a new post.

For example, this blog post by Chris. According to the post metadata, it was published at 12:00:42 UTC. And by 12:01:55, one minute later, it was captured.^[9]

In addition to preserving blog posts, the NO404-WP collection goes a step further and also captures any new material your post links to. (Akin to Wikipedia citations!) For example, this css-tricks.com post links to file on GitHub inside the TT1 Blocks project. This deep link was not captured before and is unlikely to be picked up by regular crawling due to depth limits. It got captured and uploaded to the NO404-WP collection a few days later.

📎 Krinkle Treasure Hunt

Timo Tijhof — Fri, 04 Jun 2021 12:00:00 +0000

I miss the era of very Internet-y things, geocities-style scavenger hunts, with easter eggs and all. So, I devised a treasure hunt of my own!

→ Enter here

This post appeared on timotijhof.net. Reply via email

Profiling PHP in production at scale

Timo Tijhof — Fri, 11 Dec 2020 12:00:00 +0000

At Wikipedia, we built an efficient sampling profiler for PHP, and use it to instrument live requests. The trace logs and flame graphs are powered by a simple setup that involves only free open-source software, and runs at low infrastructure cost.

I’d like to demonstrate that profiling doesn’t have to be expensive, and can even be performant enough to run continually in production! The principles in this article should apply to most modern programming languages. We developed Excimer, a sampling profiler for PHP; and Arc Lamp for processing stack traces and generating flame graphs.

Figure 1: A daily flame graph, from performance.wikimedia.org.

Exhibit A: The Flame Graph

Our goal is to help developers understand the performance characteristics of their application through flame graphs. Flame graphs visually describe how and where an application spends its time. You may have seen them while using the browser’s developer tools, or after running an application via a special tool from the command-line.

Profilers often come with a cost – code may run much more slowly when a profiler is active. This cost is fine when investigating something locally or ad-hoc, but it’s not something we always want to apply to live requests.

To generate flame graphs, we sample stack traces from web servers that are serving live traffic. This is achieved through a sampling profiler. We then send the stack traces to a stream, which is then turned into a flame graph.

Our target was to add less than 1 millisecond to user-facing web requests that complete within 50ms or 200ms, and add under 1% to long-running processes that run for several minutes. And so our journey begins, with the quest for an efficient sampling profiler.

How profiling can be expensive

Internal entry and exit hooks

XHProf is a native extension for PHP. It intercepts the start and end of every function call, and may record function hierarchy, call count, memory usage, etc. When used as a debugger to trace an entire request, it can slow down your application by 3X (+200%).^[1]

It has a sampled mode in which its entry-exit hooks are reduced to no-ops most of the time, and otherwise records only a stack trace. But this could still run code 10-30% slower. The time spent within these hooks for “no-op” cases was fairly small. But, the act of switching to and from such a hook has a cost as well. And, when we intercept every single function in an application, those costs quickly add up.

We also found that the mere presence of these entry-exit hooks prevented the PHP engine from using certain optimisations. When evaluating performance, compare not only a plugin being used vs not, but also compare to a system with the plugin being entirely uninstalled!

We also looked at external ways to capture stack trace samples, using GDB, or perf_events.

External interrupts

GDB unlocks the full power of the Linux kernel to halt a process in mid-air, break into it, run your code in its local state, and then gets out to let the process resume – all without the process’ awareness.^[2]

GDB does this through ptrace, which comes with a relatively high interrupt cost. But, the advantage of this approach is that there is no overhead when the profiling is inactive. Initial exploration showed that taking a single sample could delay the process by a whole second while GDB attached and detached itself. There was some room for improvement here (such as GDB preloading), but it seemed inevitable that the cost would be magnitudes too high.

perf_events

perf_events is a Linux tool that can inspect a process and read its current stack trace. As with GDB, when we’re not looking, the process runs as normal. perf_events takes samples relatively quickly, has growing ecosystem support, and its cost can be greatly minimised.

If your application runs as its own compiled program, such as when using C or Rust, then this solution might be ideal. But, runtimes that use a virtual machine (like PHP, Node.js, or Java), act as an intermediary process with their own way of managing an application’s call stack. All that perf_events would see is the time spent inside the runtime engine itself. This might tell you how internal operations like “assign_variable” work, but is not what we are after.^[3]^[4]

Introducing: Excimer

Excimer is a small C program, with a binding for PHP 7. Its binding can be used to collect sampled stack traces. It leverages two low-level concepts that I’ll briefly describe on their own: POSIX timers, and graceful interrupts.

POSIX timers

With a POSIX timer, we directly ask the operating system to notify us after a given amount of time has elapsed. It can notify us in one of several ways. The timer can deliver signal events to a particular process or thread (which we could poll for). Or, the timer can respond by spawning a new concurrent thread in the process, and run a callback there. This last option is known as SIGEV_THREAD.

Graceful interrupts

There is a vm_interrupt global flag in the PHP engine that the virtual machine checks during code execution. It’s not a very precise feature, but it is checked at least once before the end of any userland function, which is enough for our purpose.

If during such a check the engine finds that the flag is raised (set to 1 instead of 0), it resets the flag and runs any registered callbacks. The engine uses the same feature for enforcing request timeouts, and thus no overhead is added by using it to facilitate our sampling.

At last, we can start sampling!

When the Excimer profiler starts, it starts a little POSIX timer, with SIGEV_THREAD as the notification type. To give all code an equal chance of being sampled, the first interval is staggered by a random fraction of the sampling interval.

We’ll also give the timer the raw memory address where the vm_interrupt flag is located (you’ll understand why in a moment). The code to set up this timer is negligible and happens only once for a given web request. After that, the process is left to run as normal.

When the sampling interval comes around, the operating system spawns a new thread and runs Excimer’s timer handler. There isn’t a whole lot we can do from here since we’re in a thread alongside the PHP engine which is still running. We don’t know what the engine is up to. For example, we can’t safely and non-blockingly read the stack trace from here. Its memory may mutate at any time. What we do have is the raw address to the vm_interrupt flag, and we can boldly write a 1 there! No matter where the engine is at, that much is safe to do.

Not long after, PHP will reach one of its checkpoints and find the flag is raised. It resets the flag and makes a direct inline call to Excimer’s profiling code. Excimer simply reads out a copy of the stack trace, optionally flushing or sending it out, and then PHP resumes as normal.

If the process runs long enough to cover more than one sampling interval, the timer will notify us once more and the above cycle repeats.

Putting it all together

It’s time to put our sampling profiler to use!

Collect – start the profiler and set a flush destination.
Flush – send the traces someplace nice.
Flame graphs – combine the traces and generate flame graphs.

Figure 2: Web servers send stack traces to a Redis stream. This is independently read into a rotated log file and periodically converted to a flame graph.

Collect

The application can start the Excimer profiler with a sampling interval and flush callback.

static $prof = new ExcimerProfiler();
$prof->setPeriod(60); // seconds
$prof->setFlushCallback(function ($log) { ArcLamp::flush($log); });
$prof->start();

The above snippet is from Arc Lamp, as used on Wikipedia. This code would be placed in the early setup phase of your application. In PHP, this could also be placed in an auto_prepend_file that automatically applies to your web entry points, without needing any code or configuration inside the application.

Flush

Next we need to flush these traces to a place where we can find them later. This place needs to be reachable from all web servers, accept concurrent input at low latencies, and have a fast failure mode. I subscribe to the “boring technology” ethos, and so if you have existing infrastructure in use for something like this, I’d start with that. (e.g. ZeroMQ, or rsyslog/Kafka.)

At Wikimedia Foundation, we choose Redis for this. We ingest about 3 million samples daily from a cluster of 150 Apache servers in any given data centre, using a 60s sample interval. These are all received by a single Redis instance.

Flame Graphs

Arc Lamp consumes the Redis stream and writes the trace logs in batches to locally rotated files. You can configure how to split and join these. For example, we split incoming samples by “web”, “api”, or “job queue” entry point; and join by the hour, and by full day.

You can browse our daily flame graphs on performance.wikimedia.org, or check out the Arc Lamp and Excimer projects.

Thanks to: Tim Starling who single-handedly developed Excimer, Stas Malyshev for his insights on PHP internals, Kunal Mehta as Debian developer and fellow Wikimedian who packaged Excimer, and Ori Livneh who originally created Arc Lamp and got me into all this.

📎 Interview on Uses This

Timo Tijhof — Wed, 07 Oct 2020 12:00:00 +0000

Daniel’s Uses This interview series has been a long-time resident in my feed reader. The over 1,000 interviews feature everyone from the people behind The IT Crowd, Winamp, Erlang, and Unix; to some of my personal heroes such as Vi Hart, Chris Coyier, Cassidy Williams, John Gruber, and Brendan Gregg.

Today, yours truly got to add his bit.

→ usesthis.com

This post appeared on timotijhof.net. Reply via email

Should I substr(), substring(), or slice()?

Timo Tijhof — Sat, 26 Sep 2020 12:00:00 +0000

What’s the deal with these string methods, and how are they different?

String substr()

str.substr(start[, length])

This method takes a start index, and optionally a number of characters to read from that start index with the default being to read until the end of the string.

'foobar'.substr(2, 3); // "oba"

The start parameter may be a negative number, for starting relative from the end.

Note that only the first parameter of substr() supports negative numbers. This in contrast to most methods you may be familiar with that support negative offsets, such as String#slice() or Array#slice(). The second parameter may not be negative. In fact, it isn’t an end index at all. Instead, it is the (maximum) number of characters to return.

But, in Internet Explorer 8 (and earlier IE versions), the substr() method deviates from the ECMAScript spec. Its start parameter doesn’t support negative numbers. Instead, these are silently ignored and treated as zero. (I noticed this in 2014, shortly before we gracefully disabled JavaScript for IE 8 on Wikipedia.)

IE 8:

'faux'.substr( -1 ); // "faux"

Standard behaviour:

'faux'.substr( -1 ); // "x"

And, the name and signature of substr() are deceptively similar to those of the substring() method.

String substring()

str.substring(start[, end])

This method takes a start index, and optionally an end index. At glance, a very simple and low-level method. No relative lengths, negative offsets, or any other trickery. Right?

Behold! The two parameters automatically swap if start is larger than end.^[1]

'foobar'.substring(1, 4); // "oob"
'foobar'.substring(4, 1); // "oob", also!

Unexpected values such as null, undefined, or NaN are silently treated as zero. For substring() this also applies to negative numbers.

And, of course, the name and signature of substring() are deceptively similar to substr().

String slice()

str.slice(start[, end])

This method takes a start index, and optionally an end index that defaults to the end of the string. Either parameter may be a negative number, which is interpreted as a relative offset from the end of the string.

I found no defects in browsers or JavaScript engines implementing this method. And it has been around since the beginning of time.

Its only weakness is also its greatest strength — full support for negative numbers.

One might think this can be ignored for cases where you only intend to work with positive numbers. You’d be right, until you write code like the following:

start = something.indexOf(needle); // returns -1 if needle not found.
remainder = str.slice(start); // oops, -1 means something else here!

The notion of negative offsets was confusing to me when I first learned it. But, over the years, I’ve come to appreciate it and it actually became second nature to think about offsets in this way. If you’re unfamiliar, see the examples below.

Conclusion

Let’s compare these methods once more:

str = 'foobarb…z';

// Strip start "foo" > "barb…z"
str.slice(3);
str.substring(3);
str.substr(3);

// Strip end "z" > "foobarb…"
str.slice(0, -1);
str.substring(0, str.length - 1);
str.substr(0, str.length - 1);

// Strip "foo" and "z" > "barb…"
str.slice(3, -1);
str.substring(3, str.length - 1);
str.substr(3, str.length - 3 - 1); // 👀

// Extract start > "foo"
str.slice(0, 3);
str.substring(0, 3);
str.substr(0, 3);

// Extract end > "z"
str.slice(-1);
str.substring(str.length - 1);
str.substr(str.length - 1); // Compat
str.substr(-1); // Modern

// Extract 4 chars at [3] > "barb"
str.slice(3, 3 + 4);
str.substring(3, 3 + 4);
str.substr(3, 4); // 👀

None of these seem unreasonable, in isolation. It’s nice that slice() allows negative offsets. It’s nice that substring() may limit the damage of accidentally negative offsets. It’s nice that substr() allows extracting a specific number of characters without needing to add to the start index.

But having all three? That can incur a very real cost on development in the form of doubt, confusion, and — inevitably — mistakes. I don’t think any of these is worth that cost over some minute localised benefit.

I find substr() or substring() cast doubt on surrounding code. I need to second-guess the author’s intentions when reviewing or debugging such code. Which is wasteful even, or especially, when they (or I) use them correctly.

But what about unit tests? Well, there’s sufficient overlap between the three that a couple of good tests may very well pass. It’s easy to forget exercising every possible value for a parameter, especially one that is passed through to a built-in. We usually don’t question whether the built-in method works. The question is – did we use the right method?

This ubiquitous signature of slice() is well-understood. It is a de facto standard in technology, seen in virtually all programming languages. It is applied to strings, arrays, and sequences of all sorts. As such, that’s the one I tend to prefer.

But more important than which one you choose, I think, is the act of choosing itself. Eliminating the other methods from your work environment reduces cognitive overhead in development, with one less worry whilst reading code, and one less decision when writing it.^[2]

Footnotes:

This “argument swapping” behaviour in substring() has existed since the original JavaScript 1.0 as implemented in Netscape 2 (1996), and reverse-engineered by Microsoft in IE 3. The behaviour was briefly removed by Netscape 4 with JavaScript 1.2 in June 1997, but that same month the misfeature finished its fast-tracked standardisation as part of ECMAScript 1. Thus, the misfeature returned in 1998 with the release of Netscape 4.5 and JavaScript 1.3, which aligned itself with the new specification. ↩︎
In 2014, I wrote a lengthy code review about the string methods which, after much delay, I used as the basis for this article. ↩︎

This post appeared on timotijhof.net. Reply via email

Many dots, do not a query make

Timo Tijhof — Thu, 09 Jan 2020 12:00:00 +0000

How a long sequence of dots allowed a regex to reach its internal stack limit.

Premise

Wikipedia’s production error logs were reporting an increase in app crashes from the search results page. The internal Logstash error report looked as follows:

[RuntimeException]
Cannot consume query at offset 0 (need to go to 7296)

at mediawiki/…/CirrusSearch: QueryStringRegexParser->nextToken
at mediawiki/…/CirrusSearch: QueryStringRegexParser->parse
at mediawiki/…/CirrusSearch: SearchQueryBuilder::newFTSearchQueryBuilder

What caused this?

Background

Wikipedia’s search experience is provided by the CirrusSearch extension for MediaWiki. It is internally backed by an Elasticsearch cluster.

There are a number of custom operators supported in the search field, such as wildcards, excluded words, and things like incategory: and intitle:. These are parsed by the plugin’s middleware and turned into a structured query sent to the Elastic API.

While each error report had a different URL and search query, I noticed most of them had something in common: the search query consisted mostly of dots. For example:

https://de.wikipedia.org/w/?search=.................. (3000 dots)

Such an odd query might not need to yield a useful response, but it is important that it not crash the application. Doing so leaves the user stranded with an unhelpful “Internal server error” page. It can also interfere with on-going deployments as raised error levels usually indicate that a recent software update caused a problem.

Investigation

David Causse (Search Platform team) led the investigation (task T236419).

The RuntimeException comes from a safeguard, in the parser for incoming search queries. The guard exists toward the end of the parsing code, and should never be reached. It is an indication that a problem appeared previously. The problem was narrowed down to a failure executing the following regex:

/\G(?<negated>[-!](?=[\w]))?(?<word>(?:\\\\.|[!-](?!")|[^"!\pZ\pC-])+)/

This regex looks complex, but it can actually be simplified to:

/(?:ab|c)+/

This regex still triggers the problematic behavior in PHP. It fails with a PREG_JIT_STACKLIMIT_ERROR, when given a long string. Below is a reduced test case:

$ret = preg_match('/(?:ab|c)+/', str_repeat('c', 8192));
if ($ret === false) {
    print("failed with: " . preg_last_error());
}

Fails when given 1365 contiguous c on PHP 7.0.
Fails with 2731 characters on PHP 7.2, PHP 7.1, and PHP 7.0.13.
Fails with 8192 characters on PHP 7.3. (Might be due to php-src@bb2f1a6).

In the end, the fix we applied was to split the regex into two separate ones, and remove the non-capturing group with a quantifier, and loop through at the PHP level (change 546209).

The lesson learned here is that the code did not properly check the return value of preg_match, this is even more important as the size allowed for the JIT stack changes between PHP versions.

For future reference, David concluded: The regex could be optimized to support more chars (~3 times more) by using atomic groups, like so /(?>ab|c)+/.

This post appeared on timotijhof.net. Reply via email

To throw or not to throw, that is the question

Timo Tijhof — Sun, 08 Dec 2019 12:00:00 +0000

Why does software accept invalid data? And, at what software layer should we reject it? Also, what are “namespaces” and “special pages” on Wikipedia?

Premise

One day, our server monitoring was reporting a high frequency of fatal errors from web servers. Over 10,000 an hour. The majority shared a single root cause – The program attempted to find the discussion space for a page that didn’t support discussions.

Why was the program trying to do this? And how should the software behave when asked to do something it cannot?

Background

Namespaces and Special pages

The MediaWiki software that powers Wikipedia has a concept of titles and namespaces. Each article (or “wiki page”) has a title. And each title can belong to one of several namespaces.

The pages that contain the encyclopaedic content you’re familiar with, exist under the Article namespace. These are accessed via URLs such as /wiki/Some_subject.

Each Article also has an associated wiki page under the so-called “Talk” namespace. For example, Talk:Some_subject. This is a place where conversations about the article take place. (Questions, concerns, suggestions, and other discussion threads.)

Beyond this, there are many more namespaces. “File” pages represent an uploaded multimedia file, “User” pages represent individual contributors and their profile pages, and so on. Each of these namespaces has an associated talk namespace as well (“File talk”, “User talk”, etc.).

Lastly, there is the “Special” namespace of pages. These do not represent things that can be created or edited by contributors. Instead, this space is reserved for software features. For example, the account sign up page is a “special” page (at Special:Create_account). These do not have a discussion space. That is, there is no “Special talk” namespace.

Special:Contributions

The special page we’ll take a closer look at today is “User contributions” (at Special:Contributions). This is where you can see the contribution history of a specific editor. Besides the mandatory username field, there are date filters, and namespace filters. The namespace filter also allows one to search through any associated namespaces.

Because the “Special” namespace does not contain wiki pages, and thus no contributions, it is not listed in this dropdown menu.

The Problem

Some users browsed URLs to Special:Contributions with the namespace ID of “Special” selected. While this wasn’t an option in the user interface, the request handler did not reject it. After all, it is a valid namespace. Just one that contains no user contributions.

By itself, such query would actually succeed. In so far, that it simply yields no results. It works as well as could be expected.

Where it went wrong is if one would also tick the “Include associated namespace” checkbox.

This forced the software to filter the query to one of two possible namespace IDs. The ID of the “Special” namespace, and the ID of its associated namespace. Except, there is no associated namespace for Special! The code in charge of associating namespaces had no choice but to abort. The question it was asked demanded a specific answer, but it could not give any.

The error report reads as follows (task T150324):

Exception: getAssociated is not valid for the Special namespace.

at Namespace.php: Namespace::isMethodValidFor()
at pagers/ContribsPager.php: Namespace::getAssociated()
at pagers/ContribsPager.php: ContribsPager->getNamespaceCond()
…
at MediaWiki.php: SpecialContributions->execute()
at index.php: MediaWiki->run()

The Investigation

Accepting invalid data

Do we need to change anything, or is the program already good enough?

There are no contributions under the Special namespace. And, there is also no talk space for discussions about these non-existent contributions. The desired outcome isn’t for there to be results, as there can’t be any.

But, we also can’t prevent our editors (or their apps) from asking for results. Perhaps an older app version did list “Special” as option, or another system mistakenly opens the form the wrong way. Or, someone may be intentionally manipulating the system via its URL. It can happen. And when it does, the server has to respond in some way.

So far, the server was responding by crashing… If that happens a lot, alarm bells will ring about a potential outage being underway. When we crash without explanation, end-users (or developers working on an app) can’t tell what’s wrong. Were our servers malfunctioning? Or did the user do something wrong?

Rejecting invalid data

I sometimes think about software as an onion. At its outer layer, anything can happen. We don’t control what end-users and external systems try to do. If we encounter invalid input, we generally prefer to respond clearly. For example, by explaining the nature of the problem so that users may correct it, and carry on.

At this outer layer, bad input is not unexpected and should not cause our software to crash. And, to avoid false alarms in the backend, we need to distinguish end-user mistakes from real bugs in our code. Ideally crashes only happen if there is a bug in the program. It may be worth measuring in the backend when an end-user mistake happens. (For example, it might help you understand that the user-interface is confusing to users.) But, such instrumentation should stand separate from the technical question of whether the system is in full working order.

Who is in charge, and who is responsible?

Once past the outer layer, there are many more layers to our “onion”. Each layer gets closer to core business logic.

A question like “What are recent contributions by user X?” is subdivided into many small instructions and questions (or “functions”). One such function will answer to “What is the talk namespace for a given title?”. This would answer “Talk” for “Article”, and “File_talk” for “File”.

The “Associated namespaces” option on Special:Contributions, uses that function.

If one of the contributions is for a page that has no discussion namespace, what should we do? Show no results at all? Skip that one edit and tell the user “1 edit was hidden”? Or show it anyway, but without the “talk” portion? This is a decision the inner layer cannot make. It only knows the small question being asked. It should not be aware of what the outer layer wants to do (sometimes known as “global state”). The outer layer has to decide how to handle this problem. If the outer layer believes this kind of edit should never show up under normal conditions, then it could show an error message. Something like “Error: Unsupported namespace selection.”

Alternatively, the canundrum can be avoided by structuring the program differently. The outer layer could ask a different question instead. A question that cannot fail. A question that leaves room for unexpected outcomes. Such as “Does namespace X have a talk space?”, instead of “I need the talk space of X, what is it?”. The outer layer then recognises that the question can be answered with “No”, and could then have logic for displaying those contributions in a different way.

This post appeared on timotijhof.net. Reply via email

Tomorrow, may be sooner than you think

Timo Tijhof — Sat, 07 Dec 2019 12:00:00 +0000

These are short stories from bug hunts and incident investigations at Wikipedia.

Impact

After developers submit code to Gerrit, they eagerly await the result from Jenkins, an automated test runner.

Every day during the 15 minute window before 5 PM in San Francisco, code changes submitted for code review would have mysteriously failing tests. Jenkins would wrongly inform developers that their proposed changes cause a problem with the MergeHistory feature of MediaWiki.

Background

The test in question assumed that it would finish by “tomorrow”. At first glance, it seems fair to assume that by tomorrow, a given test will have finished. We know our test suite generally only take a few minutes to run (with a time limit of 30 minutes, to ensure tests report back even if they are stuck).

Investigation

Unfortunately…, the strtotime utility function in PHP, does not interpret “tomorrow” as “this time tomorrow”.

Rather, it takes it to mean “the start of tomorrow”. In other words, the next strike of midnight!

For example, on 14 August 23:59:59, strtotime("tomorrow") would evaluate to a timestamp merely one second into the future — 15 August 00:00:00.

This meant that whenever a test started running shortly before midnight, it would fail. The test server uses UTC as its timezone. As such, a test suite that started less than 15 minutes before 5 PM in San Francisco (which is midnight in UTC), it would mysteriously fail!

– Task T201976

– Change 452873

Originally published in the September 2018 edition of the Production Excellence newsletter at Wikimedia. This article is an expanded version of that.

This post appeared on timotijhof.net. Reply via email

Missing partitions, disappearing audio players, and extreme packet loss

Timo Tijhof — Fri, 06 Dec 2019 12:00:00 +0000

These are short stories from bug hunts and incident investigations at Wikipedia.

New database partition
Mystery of Disappearing Audio Players
Losing packets on the way to Logstash

New database partition

A user reported a timeout error for certain queries from the Public log viewer on commons.wikimedia.org.

Database administrator Manuel Aróstegui investigated the underlying query and found that it was slow (and timing out) due to one of the database replicas having an unpartitioned logging table.

Background

Our database servers carry labels that the MediaWiki application can ask for, alongside a SQL query. This allows replicas to be finely tuned to specific workloads. In particular, when two optimisations strategies are mutually exclusive. The labelling system allows both strategies to be applied, on different database servers. MediaWiki then decides which one is most important for that query.

Partioning the MediaWiki logging table is one such optimisation strategy. For queries in the Public logs that focus on actions by a specific user, we route the query to replicas where the logging table is partioned by user ID. This is in addition to a regular index on the user ID column for that table, which we have on all replicas.

Action

As first response, the faulty server was taken out of rotation. Re-partitioning was completed later that day.

– Task T199790

Mystery of Disappearing Audio Players

Routine triaging of PHP errors led to discovery of the following:

[PHP Notice] Undefined index: 'c9ndx98du2.ogg'
at mediawiki/extensions/Score/includes/Score.php:L507

Background

The Score extension for MediaWiki provides a way to produce image and audio files from music notation (backed by LilyPond). The extension registers a wikitext tag that allows editors to create and embed music on Wikipedia pages.

The “Undefined index” warning from PHP happens when code tries to access a non-existent key from an associative array. For example: $x = array( 'foo' => 1 ); return $x['bar'];. When this happens, the PHP engine implicitly returns the null value. PHP also emits a notice to the error log channel. We feed that into Logstash and Kibana.

“PHP Notice” errors are not uncommon and can sometimes even cause (by accident) the correct behaviour. For example, if the code involves a condition like if ($x['bar']) { … } else { … }. Our error will produce the null value, which casts to false, and we proceed to the else branch. If the bar key is meant to be optional here, and if the else branch correctly handles the scenario for when it is not set, then this code might already behave correctly. A simple fix would then be to expand the condition to first assert that the key exists. Thus preventing the warning message, but otherwise behaving the same.

Action

Back to our investigation; The response was led by volunteer @Ebe123 who is also the lead maintainer of the Score extension.

First, we did some exploratory testing to see if there were any defects we could find with the feature. On the various Wikipedia articles we tested it on, the audio player seemed to work fine.

Back to the error we found on the backend, we traced it to the code responsible for adding the “duration” metadata (used by the audio player). The code for computing this duration stores it in an array, and later reads it. However, these two functions were not using the same logic to create their array key. As such, it was unable to find the duration, and did not add it to the audio player. While this is bad, it appeared to not affect the audio player. It worked and even displayed the correct duration in the interface!

Ebe123 wrote a patch that corrects the key string logic. The duration value is then correctly found in the array and passed on in the way the code originally intended.

During code review, we also looked at why this code existed in the first place (because the player appeared to work fine without it). The code was introduced several years ago in an attempt to fix a bug where the player loaded very slowly for some users. The story is that our multimedia framework needs the duration information before it can start playing back audio. And, for most file types, the framework is able to compute this on its own in the backend and hand it to the audio player ahead of time. However, handler did not support computing durations for files with the audio/ogg MIME-type (which the Score extension uses).

When no duration is given ahead of time, web browsers have a fallback strategy. They attempt to download the track regardless, wait for it to fully arrive, then look at how many seconds it contains audio for, and use that as the duration value. This means the audio would not start playing until after it was fully downloaded. No streaming!

In our isolated testing we were playing relatively short audio clips using a high-bandwidth connection. Thus, the issue was not obvious to us.

We also found a separate bug report from a few months earlier where several users reported that when pressing “Play” the player would dissappear for 5-20 seconds before audio starts playing.

It all makes sense now.

– Task T200835, Task T192550

Losing packets on the way to Logstash

I noticed that for recent bug reports with Error IDs, I was unable to find the associated error report in Logstash. I could also reproduce this for bugs I had reported myself.

Background

In the event of an internal server error, MediaWiki sends a detailed error report to Logstash. MediaWiki then displays an error page to the user, where it mentions the “Error ID”.

Action

Tim Starling (Platform Architect at Wikimedia) started investigating. He created a new Grafana dashboard and the culprit was quickly identified. Over 3000 UDP packets were being dropped at the Logstash servers, every second. That’s over 90% of its total packets – lost!

As first mitigation, he rebooted the server, quadrupled the default receive buffer size (net.core.rmem_default in the Linux kernel) to 4MB, and rebooted it again.

The first reboot significantly improved throughput (from 10% success, to 25% success), but the receive buffer change didn’t have any positive effect and we were still dropping the remaining 75% of packets.

To recap, the buffer was now large enough to accomodate 3 seconds worth of messages which should be enough margin for Logstash to process it. Short spikes aside, it’s unlikely that allowing more stalling would help, because new packets are constantly added to the buffer as well.

Filippo Giunchedi (Site Reliability Engineering team) jumped in and noticed that the workers.pipeline setting was explicitly set to 1, thus allowing Logstash to only use a single thread to process all the messages. This was configured several years earlier (commit) to workaround a problem with the Logstash Multiline plugin; This plugin wasn’t thread-safe and would corrupt logs if activated across multiple threads.

Filippo determined we no longer needed this plugin, disabled it, and allowed the default workers.pipeline setting to take effect – which is to use the number of available CPU cores as the number of threads.

This, together with the 4MB receive buffer Kernel setting, dropped the packet loss rate back to zero.

– Task T200960, Grafana dashboard: Logstash

Originally published in the August 2018 edition of the Production Excellence newsletter at Wikimedia. This article is an expanded version of that.

This post appeared on timotijhof.net. Reply via email

Wikipedia’s JavaScript initialisation on a budget

Timo Tijhof — Wed, 18 Sep 2019 12:00:00 +0000

This week saw the conclusion of a project that I’ve been shepherding on and off since September of last year. The goal was for the initialisation of our asynchronous JavaScript pipeline (at the time, 36 kilobytes in size) to fit within a budget of 28 KB.

The above graph shows the transfer size over time. Sizes are after compression (i.e. the net bandwidth cost as perceived from a browser).

In total, the year-long effort is saving 4.3 terabytes a day of data bandwidth for our users’ page views.

How we did it

The startup manifest is a difficult payload to optimise. The vast majority of its code isn’t functional logic that can be optimised by traditional means. Rather, it is almost entirely made of pure data. The data is auto-generated by ResourceLoader and represents the registry of module bundles. (ResourceLoader is the delivery system Wikipedia uses for its JavaScript, CSS, interface text.)

This registry contains the metadata for all front-end features deployed on Wikipedia. It enumerates their name, currently deployed version, and their dependency relationships to other such bundles of loadable code.

I started by identifying code that was never used in practice (task #202154). This included picking up unfinished or forgotten software deprecations, and removing unused compatibility code for browsers that no longer passed our Grade A feature-test. I also wrote a document about Page load performance. This document serves as reference material, enabling developers to understand the impact of various types of changes on one or more stages of the page load process.

Fewer modules

Next was collaborating with the engineering teams here at Wikimedia Foundation and at Wikimedia Deutschland, to identify features that were using more modules than is necessary. For example, by bundling together parts of the same feature that are generally always downloaded together. Thus leading to fewer entry points to have metadata for in the ResourceLoader registry.

Some highlights:

Editing product team (WMF):
The WikiEditor extension has 11 fewer modules now. Another 31 modules were removed in UploadWizard.
Language product team (WMF):
Combined 24 modules of the ContentTranslation software.
Reading product team (WMF):
Combined 25 modules in MobileFrontend.
Community Wishlist team (WMDE):
Removed 20 modules from the RevisionSlider and TwoColConflict features.

Last but not least, there was the Wikidata client for Wikipedia. This was an epic journey of its own (task #203696). This feature originally had a whopping 248 distinct modules registered on Wikipedia page views. The magnificent efforts of Amir Sarabadani removed over 200 modules, bringing it down to 42 today.

The bar chart above shows small improvements throughout the year, all moving us closer to the goal. Two major drops stand out in particular. One is around two-thirds of the way, in the first week of August. This is when the aforementioned Wikidata improvement was deployed. The second drop is toward the end of the chart and happened this week – more about that below.

Less metadata

This week’s improvement was achieved by two holistic changes that organised the data in a smarter way overall.

First – The EventLogging extension previously shipped its schema metadata as part the startup manifest. Roan Kattouw (@Catrope) refactored this mechanism to instead bundle the schema metadata together with the JavaScript code of the EventLogging client. This means the startup footprint of EventLogging was reduced by over 90%. That’s 2KB less metadata in the critical path! It also means that going forward, the startup cost for EventLogging no longer grows with each new event instrumentation. This clever bundling is powered by ResourceLoader’s new Package files feature. This feature was expedited in February 2019 in part because of its potential to reduce the number of modules in our registry. Package Files make it super easy to combine generated data with JavaScript code in a single module bundle.

Second – We shrunk the average size for each entry in the registry overall (task #229245). The startup manifest contains two pieces of data for each module: Its name, and its version ID. This version ID previously required 7 bytes of data. After thinking through the mathemetical Birthday problem in context of ResourceLoader, we decided that the probability spectrum for our version IDs can be safely reduced from 78 billion down to “only” 60 million. For more details see the code comments, but in summary it means we’re saving 2 bytes for each of the 1100 modules still in the registry. Thus reducing the payload by another 2-3 KB.

Below is a close-up for the last few days (this is from synthetic monitoring, plotting the decompressed size):

The change was detected in ResourceLoader’s synthetic monitoring. The above is captured from the Startup manifest size dashboard on our public Grafana instance, showing a 2.8KB decrease in the uncompressed data stream.

With this week’s deployment, we’ve completed the goal of shrinking the startup manifest to under 28 KB. This cross-departmental and cross-organisational project reduced the startup manifest by 9 KB overall (net bandwidth, after compression); From 36.2 kilobytes one year ago, down to 27.2 KB today.

We have around 363,000 page views a minute in total on Wikipedia and sister projects. That’s 21.8M an hour, or 523 million every day (User pageview stats). This week’s deployment saves around 1.4 terabytes a day. In total, the year-long effort is saving 4.3 terabytes a day of bandwidth on our users’ page views.

What’s next

It’s great to celebrate that Wikipedia’s startup payload now neatly fits into the target budget of 28 KB – chosen as the lowest multiple of 14KB we can fit within subsequent bursts of Internet packets to a web browser.

The challenge going forward will be to keep us there. Over the past year I’ve kept a very close eye (spreadsheet) on the startup manifest — to verify our progress, and to identify potential regressions. I’ve since automated this laborious process through a public Grafana dashboard.

We still have many more opportunities on that dashboard to improve bundling of our features, and (for Wikimedia’s Performance Team) to make it even easier to implement such bundling.

How to protect yourself from npm

Timo Tijhof — Thu, 12 Sep 2019 12:00:00 +0000

What’s the worst that could happen after npm install?

When you open an app or execute a program from the terminal, that program can do anything that you can do.

In a nutshell: Imagine if your computer were to disappear in front of your eyes and re-appear in front of mine. Still open. Still unlocked. What could I do from this moment on? That is what an unknown program could do.

What is at stake?
How does it compare to other package managers?
What can you do about it?

Photo by Raysonho

Upon running npm install, you may be downloading and executing hundreds of programs.

Programs from nice people sometimes ask for your permission. This is because a developer choose to do so.

There may also be laws that could punish them if they get caught not doing so.

What about programs of which the authors choose differently? Well, such program could do quite a bit.

It could access any of your files, modify them, delete them, or upload them. This also applies to the internal files used by other applications.
It could install other programs in the background.
It could talk to other devices linked to your home network.

What is at stake

Files you might not be thinking about:

The cookies in your web browser.
Desktop applications. Chat history, password managers, todo lists, etc. They all use files to store the text and media you send or receive.
Digital media. Your photo albums, home videos, and voice memos.
SSH private keys, GPG key rings, and other access keys and encryption keys used by developers.

Photo by DaraKero_F / CC BY 2.0

Browser cookies

Browsers cookies make it so you’re immediately logged-in when you open a new tab for Gmail, or Twitter. An evil program can copy the browser’s cookies file and share it with the attacker.

They could then read any e-mail you’ve ever received or sent stored there. It could also delete any. (Got a backup?) They can naturally access future e-mails as well. Like the ones you get from “Forgot password” buttons. They could also hide any trace of these (e.g. filter rules).

This affects any website you use. Social network? Access to any post or DM — regardless of privacy setting. Company e-mail, Google Drive? That too.

Sleeper programs

The evil program may configure itself to always start in the background when you open your laptop. A new friend for life!

It could also add local command-line programs that wrap the popular sudo and ssh commands, to make them do a little extra behind the scenes. Next time you run sudo to perform an administrator action and enter your password—you may have given away full system access. Deploying some code? Running ssh cloud.someplace.special might let the attacker tailgate along with you, opening one shell for itself and another for you.

Photo by BikerNormand / CC BY-SA 2.0

Local web server

These background programs could also affect you in a myriad of other ways. I won’t detail those today, except to mention they can keep a local web server running. Spotify and Zoom have been seen in the news doing questionable things with their local web servers.

Is this an npm problem?

Maybe. Technically these concerns apply to any method of executing unknown code. Running npm install isn’t very different from pasting a command like curl url… | bash. They both execute a downloaded program from your terminal. The difference is in user expectation.

Upon seeing the url and the bash invocation, you have a choice: Trust the publisher (the url), or trust the script (download, review, then decide whether to run). The result is generally predictable and without hidden dependencies.

Other package managers

What about Debian (apt-get) or Homebrew? Like npm, code published there is unknown to most of us and hard to review. But, there is an important difference: Peer-review. These traditional repositories are curated by a central authority. You don’t have to trust the script or original authors of each package, so long as you trust the publishers and their curation process.

Image by NASA / Public domain

The scale has changed the game

What about PyPI or Packagist (Composer)? These are like npm. Anyone can publish anything. There is however a difference in scale. PyPI has 194K projects. Packagist is host to 237K packages with 0.5 billion downloads a month. npm has over 1.3 million packages and 30 billion downloads a month. This makes it a much more popular target. [1] [2] [3]

Dependency graphs

There is also a difference in habit: PyPI packages have 7 dependencies on average, with typically 1 indirect dependency. And, I would expect most dependencies there to be from authors the user has trusted before. [4] Snyk.io published in April that the average npm package has a whopping 86 dependencies, with 4+ levels of indirect dependencies. [4]

The ESLint package has 118 npm dependencies [5]. Eleventy, a popular static site generator, requires 555 dependencies (Explore dependency graph). Each one of these may run arbitrary shell commands from the terminal both during the installation process, after later when using the tool.

I get it. Now, what can we do about it?

There isn’t a magic bullet to make everything perfectly safe. But, there are a number of things you can do to reduce risk.

Isolation

For the past year, I’ve been using disposable Docker containers as a way to reduce the risk of compromise. It has controls for network access, and for which directories can be exposed. Docker isn’t a perfect safety net by any means, but it’s a step in the right direction.

Image by Victor Grigas / CC BY-SA 3.0

My base image uses Debian and comes with Node.js, npm, and a few other utilities (such as headless browsers, for automated tests). I use a bash script to launch a temporary container, based on that image. It runs as the unprivileged nobody user, and mounts only the current working directory.

From there, I would run npm install and such. The only thing it interacts with is the source code and local node_modules directory for that specific project. It isn’t given access to any other Git repos, desktop apps, browser cookies, or private documents. And, once that terminal tab is closed, the container is destroyed.

I’ve published the script I use at github.com/wikimedia/fresh. I don’t recommend using it outside Wikimedia, however. Create your own instead. The repository explains how it works.

Other options for isolating your environment:

Speed and flexibility: Use systemd-nspawn or chroot. This takes more work to setup, but provides a faster environment than Docker. In terms of security it is comparable to Docker. Read more about systemd-nspan on ArchWiki.
Security and ease of use: Use a virtual machine (e.g. VirtualBox/Vagrant). This is more secure by default and offers a GUI for controlling what to expose. The downside is that VMs are significantly slower.

Fewer dependencies

Finally, you can reduce risk by reducing the number of packages you depend on in your projects (and then shrink-wrap them). Especially development dependencies, as these tend to be explicitly aimed at executing from the CLI.

Question yourself and question others before introducing new dependencies. Perhaps even encourage maintainers of your favourite packages to Reduce the size of their dependency graph!

Six years of BrowseHappy

Timo Tijhof — Wed, 16 May 2018 12:00:00 +0000

Six years ago (in 2012), I was looking for a newsletter about browser releases. At the time, my motivation was to remember to regularly check and update the jQuery TestSwarm framework as needed for each new browser release. I found a simple overview at browsehappy.com, run by WordPress.

Lacking RSS, I decided to simply check it on a regular basis, and created @browsehappy on Twitter for others also looking to follow browser releases. I decided to pair it with links to relevant blog posts and documentation.

Then, one day, Chrome’s version number was missing on Browse Happy’s homepage. Browse Happy is open-sourced at https://github.com/WordPress/browsehappy, which helped me find that its data actually comes from Wikipedia! Specifically, it scraped markup from article infoboxes, and extracted the version with some string operations.

Those string operations made assumptions about the wiki’s internal templates, which no longer held up after some edits to the Google Chrome article on Wikipedia. Thes data issues repeated itself a number of times…

I helped them to use Wikidata.org as the source for version numbers instead.

Many Wikipedia statements are now maintained on Wikidata, which are then queried and embedded directly in articles on Wikipedia.

Also… browser vendors have boosted their comm efforts a lot since 2012!

Opera started at blogs.opera.com/desktop
Edge started at blogs.windows.com/msedgedev
WebKit renewed their blog at webkit.org/blog
Mozilla and Chromium continue as always at hacks.mozilla.org and blog.chromium.org.

After three years of moderating the feed I took a break, and… never got back. TestSwarm no longer has its own user-agent parser, and for web-dev interests, much better newsletters sprung into existence. The main one for me is webplatform.news, by @simevidas.

Back to Browse Happy.… as part of digital spring cleaning, I decided I shouldn’t be owner @browsehappy on Twitter, especially given it’s now dormant. I’ve reached out to Automattic and transferred ownership.

Good bye @browsehappy, and welcome @Automattic!

Originally published on twitter.com.

This post appeared on timotijhof.net. Reply via email

Measuring Wikipedia page load times

Timo Tijhof — Tue, 09 Jan 2018 12:00:00 +0000

This post shows how we measure and interpret load times on Wikipedia. It also explains what real-user metrics are, and how percentiles work.

Navigation Timing

When a browser loads a page, the page can include program code (JavaScript). This program will run inside the browser, alongside the page. This makes it possible for a page to become dynamic (more than static text and images). When you search on Wikipedia.org, the suggestions that appear are made with JavaScript.

Browsers allow JavaScript to access some internal systems. One such system is Navigation Timing, which tracks how long each step takes. For example:

How long to establish a connection to the server?
When did the response from the server start arriving?
When did the browser finish loading the page?

Where to measure: Real-user and synthetic

There are two ways to measure performance: Real user monitoring, and synthetic testing. Both play an important role in understanding performance, and in detecting changes.

Synthetic testing can give high confidence in change detection. To detect changes, we use an automated mechanism to continually load a page and extract a result (eg. load time). When there is a difference between results, it likely means that our website changed. This assumes other factors remained constant in the test environment. Factors such as network latency, operating system, browser version, and so on.

This is good for understanding relative change. But synthetic testing does not measure the performance as perceived by users. For that, we need to collect measurements from the user’s browser.

Our JavaScript code reads the measurements from Navigation Timing, and sends them back to Wikipedia.org. This is real-user monitoring.

How to measure: Percentiles

Imagine 9 users each send a request: 5 users get a result in 5ms, 3 users get a result in 70ms, and for one user the result took 560ms. The average is 88ms. But, the average does not match anyone’s real experience. Let’s explore percentiles!

The first number after the lower half (or middle) is the median (or 50th percentile). Here, the median is 5ms. The first number after the lower 75% is 70ms (75th percentile). We can say that “for 75% of users, the service responded within 70ms”. That’s more useful.

When working on a service used by millions, we focus on the 99th percentile and the highest value (100th percentile). Using medians, or percentiles lower than 99%, would exclude many users. A problem with 1% of requests is a serious problem. To understand why, it is important to understand that, 1% of requests does not mean 1% of page views, or even 1% of users.

A typical Wikipedia pageview makes 20 requests to the server (1 document, 3 stylesheets, 4 scripts, 12 images). A typical user views 3 pages during their session (on average).

This means our problem with 1% of requests, could affect 20% of pageviews (20 requests x 1% = 20% = ⅕). And 60% of users (3 pages x 20 objects x 1% = 60% ≈ ⅔). Even worse, over a long period of time, it is most likely that every user will experience the problem at least once. This is like rolling dice in a game. With a 16% (⅙) chance of rolling a six, if everyone keeps rolling, everyone should get a six eventually.

Real-user variables

The previous section focussed on performance as measured inside our servers. These measurements start when our servers receive a request, and end once we have sent a response. This is back-end performance. In this context, our servers are the back-end, and the user’s device is the front-end.

It takes time for the request to travel from the user’s device to our systems (through cellular or WiFi radio waves, and through wires.) It also takes time for our response to travel back over similar networks to the user’s device. Once there, it takes even more time for the device’s operating system and browser to process and display the information. Measuring this is part of front-end performance.

Differences in back-end performance may affect all users. But, differences in front-end performance are influenced by factors we don’t control. Such as network quality, device hardware capability, browser, browser version, and more.

Even when we make no changes, the front-end measurements do change. Possible causes:

Network. ISPs and mobile network carriers can make changes that affect network performance. Existing users may switch carriers. New users come online with a different choice distribution of carrier than current users.
Device. Operating system and browser vendors release upgrades that may affect page load performance. Existing users may switch browsers. New users may choose browsers or devices differently than current users.
Content change. Especially for Wikipedia, the composition of an article may change at any moment.
Content choice. Trends in news or social media may cause a shift towards different (kinds of) pages.
Device choice. Users that own multiple devices may choose a different device to view the (same) content.

The most likely cause for a sudden change in metrics is ourselves. Given our scale, the above factors usually change only for a small number of users at once. Or the change might happen slowly.

Yet, sometimes these external factors do cause a sudden change in metrics.

Case in point: Mobile Safari 9

Shortly after Apple released iOS 9 (in 2015), our global measurements were higher than before. We found this was due to Mobile Safari 9 introducing support for Navigation Timing.

Before this event, our metrics only represented mobile users on Android. With iOS 9, our data increased its scope to include Mobile Safari.

iOS 9, or the networks of iOS 9 users, were not significantly faster or slower than Android’s. The iOS upgrade affected our metrics because we now include an extra 15% of users – those on Mobile Safari.

Where desktop latency is around 330ms; mobile latency is around 520ms. Having more metrics from mobile, skewed the global metrics toward that category.

The above graphs plot the “75th percentile” of responseStart for desktop and mobile (from November 2015). We combine these metrics into one data point for each minute. The above graphs show data for one month. There is only enough space on the screen to have each point represent 3 hours. This works by taking the mean average of the per-minute values within each 3 hour block. While this provides a rough impression, this graph does not show the 75th percentile for November 2015. The next section explains why.

Average of percentiles

Opinions vary on how bad it is to take the average of percentiles over time. But one thing is clear: The average of many 1-minute percentiles is not the percentile for those minutes. Every minute is different, and the number of values also varies each minute. To get the percentile for one hour, we need all values from that hour, not the percentile summary from each minute.

Below is an example with values from three minutes of time. Each value is the response time for one request. Within each minute, the values sort from low to high.

The average of the three separate medians is 211ms. This is the result of (5 + 560 + 70) / 3. The actual median of these values combined, is 70ms.

Buckets

To compute the percentile over a large period, we must have all original values. But, it’s not efficient to store data about every visit to Wikipedia for a long time. We could not quickly compute percentiles either.

A different way of summarising data is by using buckets. We can create one bucket for each range of values. Then, when we process a time value, we only increment the counter for that bucket. When using a bucket in this way, it is also called a histogram bin.

Let’s process the same example values as before, but this time using buckets.

Based on the total count (19) we know that the median (10th value) must be in bucket B, because bucket B contains values 10 to 13. And that the 75th percentile (15th value) must be in bucket C because it contains values 14 to 19.

We cannot know the exact millisecond value of the median, but we know the median must be between 11ms and 100ms. (This matches our previous calculation, which produced 70ms.)

When we use exact percentiles, our goal was for that percentile to be a certain number. For example, if our 75th percentile today is 560ms, this means for 75% of users a response takes 560ms or less. Our goal could be to reduce the 75th percentile to below 500ms.

When using buckets, goals are defined differently. In our example, 6 out of 19 responses (32%) are above 100ms (bucket C and D), and 13 of 19 (68%) are below 100ms (bucket A and B). Our goal could be to reduce the percentage of responses above 100ms. Or the opposite, to increase the percentage of responses within 100ms.

Rise of mobile

Traffic trends are generally moving towards mobile. In fact, April 2017 was the first month where Wikimedia mobile pageviews reached 50% of all Wikimedia pageviews. And after June 2017, mobile traffic has stayed above 50%.

Global changes like this have a big impact on our measurements. This is the kind of change that drives us to rethink how we measure performance, and (more importantly) what we monitor.

QUnit anti-patterns

Timo Tijhof — Fri, 13 Feb 2015 12:00:00 +0000

Today, I’d like to challenge the assert.ok and assert.not* methods. I believe they may’ve become an anti-pattern.

assert.ok

Using assert.ok() indicates one of two problems:

The software, or testing strategy, is unreliable. (Unsure what value to expect.)
The author is using it as shortcut for a proper comparison.

The former necessitates improvement to the code being tested. The latter comes with two additional caveats:

Less debug information. (Inaccurate actual/expected diff). Without an expected value provided, one can’t determine what’s wrong with the value.
Masking regressions. Even if the API being tested returns a proper boolean and ok is just a shortcut, the day the API breaks (e.g. returns a number, Promise, or other object) the test will not be able to catch this regression.

Common examples:

// Meh...
assert.ok( result );
assert.ok( obj.fn );

// Better.
assert.equal( typeof obj.fn, 'function' );
assert.strictEqual( result, true );

assert.not

Using assert.not*() indicates one of three problems:

The software is unreliable. (Value is indeterministic.)
The test uses an unreliable environment. (E.g. the input data is dynamic or variable, insufficient isolation or mocking.)
The author is using it as shortcut for a proper comparison.

Common example:

var index = list.indexOf( item );

// Meh...
assert.notEqual( index, -1 );

// Better.
assert.equal( index, 2 );

// Even better?
assert.propEqual( list, [
  'foo',
  'bar',
] );

I’ve yet to see the first use of these assert methods that wouldn’t be improved by writing it a different way. I admit there are limited scenarios where assert.notEqual can’t be avoided in the short-term, for example when the intent is to detect a difference between two unpredictable return values.

When calling a method such as Math.random() twice, one could use notEqual to assert the two return values differ. I still have my doubts about the value of such test, though. It’ll certainly be annoying when it randomly does produce the same value twice and cause a test failure. In the mission of test coverage, my recommendation would be to instead assert that calling the method did not throw an exception, and perhaps assert the type and length of the return value, without comparing the string content.

Originally published on codepen.io.

This post appeared on timotijhof.net. Reply via email

PhantomJS for CI (anno 2014)

Timo Tijhof — Fri, 03 Oct 2014 12:00:00 +0000

How did Apple create Safari, and what is PhantomJS?

Safari

In January 2003 Apple announced Safari, their new web browser for Mac.^[1] The Safari team had just spent 2002 building Safari atop KHTML and KJS,^[2] the KDE layout and javascript engines developed for Konqueror. The Safari team kept the codebase somewhat modular. This allowed Apple-branding and other proprietary features to stay separate whilst also having a sustainable open-source project (WebKit) that is standalone and compilable into a fully functional GUI application. The Mac OS version of WebKit is composed of WebCore and JavaScriptCore – the frameworks that encapsulate the OSX ports of KHTML and KJS respectively. Apple developed the JavaScriptCore library previously for use in Sherlock.^[3]

Chromium

In 2008, Google introduced Chrome and started the open-source project Chromium. Chromium was composed of WebKit’s WebCore and the V8 javascript engine (instead of JavaScriptCore). Google later forked WebCore into Blink in 2013, thus abandoning any upstream connection with WebKit.

While Chromium is a single code-base with bindings for multiple platforms, WebKit is not. Instead, WebKit is based around the concept of ports.

These ports are manually kept in sync. Some maintained by third parties (e.g. not by webkit.org or Apple). Some ports are better than others. “WebKit”, as such, has also become an abstract API, rather than just a framework.

WebKit

A few popular ports:

Safari for Mac.
Mobile Safari for iOS.
Safari for Windows (abandoned).
QtWebKit (by Nokia; due to it being implemented atop Qt, it works on Mac/Linux/Windows).
Android browser (abandoned, uses Chromium now).
Chromium (abandoned, uses Blink now).
WebKitGTK+.

WebKit itself doesn’t do much when it comes to network, GPU, javascript, or text rendering. Those are not “WebKit”. Each port binds those to something present in the OS – or another application layer. E.g. QtWebKit defers to Qt, which in turn binds to the platform.

PhantomJS

PhantomJS is a headless browser using the QtWebKit engine at its core.

The current release cycle of PhantomJS (1.9.x) is based on Qt 4.8.5, which bundles QtWebKit 2.2.4, which was branched off of upstream WebKit in May 2011. Due to the many layers in between, it will take a long time for PhantomJS to get anywhere near the feature-set of current Safari 8. PhantomJS by design is nothing like Safari but, if anything, it is probably like an alpha version (branched from SVN trunk) of Safari 4. Which is why, contrary to Safari 5.0, PhantomJS has only partial support for ES5.

Chromium has its abstraction layer at a higher level (platform independent). When run headless, it is exactly like an actual instance of Chrome on the same platform. When used in a virtual machine on a remote server, one doesn’t even need to be “headless”. We can use regular Chromium (under Xvfb). In theory the visual rendering through Xvfb and VM hypervisor could be different, however.

The word “rebuke”

Timo Tijhof — Wed, 18 Dec 2013 12:00:00 +0000

re·buke

verb

express sharp disapproval or criticism of (someone) because of their behavior or actions

“she had rebuked him for drinking too much“

“the judge publicly rebuked the jury“

noun

an expression of sharp disapproval or criticism

“he hadn’t meant it as a rebuke, but Neil flinched“

(from the Oxford English Dictionary)

I ran into the word whilst watching an episode of Elementary.

The scene continued to feature more rich language.

Holmes: I’ve given further consideration to your rebuke regarding my capacity for niceness.

Watson: I didn’t mean it as a rebuke. I was trying to have a conversation.

Holmes: Either way, you have a point… There is unquestionably a certain social utility to being polite. To maintaining an awareness of other people’s sensitivities. To exhibiting all the traits that might commonly be grouped under the heading “nice”.

Watson: I think you’ll be surprised how easy it is to earn that designation.

Holmes: No. I am not a nice man. It’s important that you understand that.

[..]

Holmes: There is not a warmer, kinder me waiting to be coaxed out into the light. I am acerbic. I can be cruel. It’s who I am; right to the bottom. I’m neither proud of this, nor ashamed of it. It simply is.

Having lines like these is actually not uncommon for the Holmes character and is one of the reasons I enjoy the show so much. Short musings and rants containing rich language happen at regular intervals throughout the series’ episodes.

My compliments to the writers of the show for producing a showpiece for the English language. It is a pleasure to be reminded of these words and even more so to learn about new ones.

This post appeared on timotijhof.net. Reply via email

Timo Tijhof

John Cleese on Creativity (Transcript)

What creativity isn’t

Open and closed mode

Discovery of penicillin

Hitchcock

Implement in the closed mode

Review in the open mode

Conditions for the open mode

Factor 1: Space

Factor 2: Time

Johan Huizinga

Oasis of Quiet — Not so fast

Factor 3: Time (really)

Factor 4: Confidence

Factor 5: Humour

Serious does not mean solemn

Practicing the open mode

Pondering

Play requires trust

Japanese meetings

Connect two ideas in a new way

How to kill creativity

Allow no humour

Undermine confidence

Demand urgency

📎 Unifying Wikipedia mobile and desktop domains

YouTube in a feed reader is… better?

How to follow a channel

Reader experience

Behind the scenes

Pet peeves of the app

“Home” page

How to disable YouTube Shorts, for real!

Perennial breaking of “Hide”

Unreliable delivery

Lockfiles for apps, not packages (still)

Lockfiles are useful

Global dependencies

Benefits and costs

Security updates

Pinning dependencies

npm audit

Dependency update notifications

Further reading

How we balance security and openness at Wikimedia

Background

Security through visibility and trust

The most localized software

No profit motive

Summary

An Internet of PHP

Statistics

PHP as programming language of choice

Content management on PHP

E-commerce on PHP

Anecdotes

PHP at scale

What about my bubble?

Conclusion

Further reading

Browser adoption rates

Firefox (desktop)

Microsoft Edge

Chrome (desktop)

Safari (desktop)

Chrome Mobile

Mobile Safari (iOS)

See also

HTTP/2 performance revisited

Hello, HTTP/2!

Goodbye domain sharding?

Could HTTP/2 be slower than HTTP/1?

Lessons learned

Further reading

How does Internet Archive know?

Wikipedia

WordPress

Internet Archive

Wikipedia collection