And other insights from 7 years of anonymous Wikipedia edits by government employees
Note: the analysis code and source data for this post can be found here.
About five years ago, the CBC, along with other news outlets, ran a mildly cheeky story about federal employees who edit Wikipedia articles while on the job, suggesting it was a waste of taxpayer money.
What are public bureaucrats doing, it asked, making changes to articles about Rush and sex positions?
It was quickly followed by an op-ed in Policy Options defending the anonymous Wikipedians, arguing that it’s good that experts in government are sharing their knowledge and getting cozy with new technologies.
Both pieces were based on a small sampling of tweets by @gccaedits, a Twitter bot that tweets ever time a Wikipedia article is changed anonymously from a government IP address. The bot, maintained by Nick Ruest, a librarian at York University, is one of many around the world using the anon software to track these kinds of Wikipedia edits. The idea is to keep an eye on any suspicious changes to the world’s greatest knowledge base from shady people in power.
But no one did a comprehensive analysis of all the anonymous changes detected by the bot… until now. With seven years of tweets, we can now see what those rascals in Ottawa want the world to believe.
First, some basic stats
Total number of article edits in the data: 18,666
Number of distinct articles edited: 9,402
Number of articles edited only once: 6,387
Earliest edit in the data: July 9, 2014, 13 years after Wikipedia launched.
The most edited Wikipedia articles from government IPs
By far, the most edited article is about… curling. Specifically, the 2017 European Curling Championships. It was edited anonymously from a computer at Shared Services Canada 128 times over five months.
In fact, among the 30 most edited articles, eight are about curling, and all eight were traced to Shared Services.
In fact, Shared Services is responsible for the majority of edits: 9,754 out of 18,666 edits recorded in the data. Not surprising, since this is the central IT department of the federal government, and more likely to have tech-savvy workers.
EDIT: Since this post was published, Christopher Nardi, a Parliament Hill reporter for the National Post, noted that Shared Services is the VPN provider for the government, and anyone on a federal VPN could be assigned a Shared Services IP address. I have not verified this.
National Defence and Environment Canada come distant second and third.
You’d think the DND would mainly edit articles about military issues. They do, but among their three most edited articles are the 2017-18 season of Olympique de Marseille (76 edits) and bro-com director Judd Apatow (52 edits).
Here are the one or two most edited articles per agency (only those that received at least 10 edits):
What information is being edited
It’s hard to quantify what kinds of edits were done. The data doesn’t include what was edited, only a screenshot of the changes and a link to the page that document the edits (known as the diff).
So one must peruse the 18,666 changes manually (basically scroll down the @gccaedits timeline) to get a sense. No, I didn’t look through them all, but I did browse a few that I thought would be more susceptible to vandalism, like major political figures and polarizing issues.
It’s incredibly tame. Most changes are small factual corrections or fixes to spelling, punctuation, and formatting. I found little evidence of attempts to change article to be less or more favourable to a topic.
Which is exactly what a paper in the Canadian Public Administration journal found in 2020, according to the National Post:
I did find one glaring exception, which is shown at the top of this post: in 2018 someone at Shared Services changed the title of the article for former Conservative leader Andrew Scheer to “Trudeau’s Next Victim”.
The change lasted all of two minutes before reverting to the original.
This speaks volumes about the speed at which the Wikipedia community tends to the integrity of the website, which is seen by some scholars as a reliable source of information.
Another example: users at the CBC, my employer, changed the article about the CBC 16 times. Most changes were moving paragraphs around and tidying up a table of job cuts over time.
Here is a sampling of the diffs. Check out the Jupyter notebook with the full analysis code to see more.
https://en.wikipedia.org/w/index.php?diff=708650363&oldid=706291967
https://en.wikipedia.org/w/index.php?diff=708650446&oldid=708650363
https://en.wikipedia.org/w/index.php?diff=708650537&oldid=708650446
https://en.wikipedia.org/w/index.php?diff=708652710&oldid=708652262
https://en.wikipedia.org/w/index.php?diff=708652743&oldid=708652710
https://en.wikipedia.org/w/index.php?diff=708672887&oldid=708668612
https://en.wikipedia.org/w/index.php?diff=1046786639&oldid=1046338243
Government edits have been decreasing with time
Maybe the novelty wore off. Maybe articles are so complete that there’s little left to change. At any rate, the number of edits from government IPs have been on a downward trend.
Covid really cut into the number of edits, as can be seen in the chart below. Either government editors lost the motivation, or they were working from home and were no longer under a government IP address.
Those two big peaks in March 2017 and April 2018? You guessed it: curling. These are the articles that got edited the most in those two months. All but one are about curling.
2017 European Curling Championships
2017 World Women’s Curling Championship
2018 World Mixed Doubles Curling Championship
Madhumila
2016 Humpty’s Champions Cup
2017 Tim Hortons Brier
The wrangled data and Jupyter notebook with the analysis (in Python) can be found here. Take a look. Did you find anything interesting? Share it in the comments!