% So, why are we not using \appdxsection here?
%
% Sit down, my child, and you shall hear a tale.  A tale of terrible
% danger and of great deeds by heroes who fought that danger but were
% ultimately defeated.  This is not a story with a happy ending.  Your
% father and I debated for a long time about whether you were old
% enough to hear it.  We eventually decided you were still too young,
% but you sneaked into my computer and are now reading it anyway, you
% little rascal!  Fine.  If you're old enough to break into the liquor
% cabinet, you're old enough to get drunk, as the saying goes.  (No
% one ever actually said this.  I just made it up.  But I'll bet that
% by this time next year it's trending on Twitter.)
% 
% Anyway, here's the deal:
%
% If we just use "\appdxsection{Foo}\label{appdx:foo}" here, then the
% first appendix's name will be something like "Appendix G".  In other
% words, the section numbering that we've been using so far will
% continue right on in to the appendix letters.  If the last
% non-appendix section was Section 6, then the first appendix will get
% named for the 7th letter of the alphabet.
%
% That sucks, but we all know what the solution, right?  Just do:
%
%   \setcounter{section}{0}
%
%   (And actually, you can even combine that with another command,
%   "\renewcommand{\thesection}{\Alph{section}}", so that subsections
%   within the appendix get lovely appendix-y names like "A.1", "A.2",
%   "A.2.1", etc.)
%
% Ah, but there's a problem:
% 
% Now if you say "Section \ref{appdx:foo}" anywhere in the document,
% the ref's *link* will actually point to the page where -- you
% guessed it -- Section 1 is!  So while the reference's text would
% look correct (saying "Appendix A" or whatever), if you click on it
% it mysteriously jumps to Section 1, and if you hover over it, in
% Evince or in any other PDF reader that supports preview popups, you
% see a popup showing Section 1's header.
%
% I don't have a solution for this.  Or rather, I finally decided to
% stop shaving the LaTeX yak and just do it manually, with a regular
% unnumbered section that has "Appendix A:" in its title explicitly.
%
% Sometimes the dragon wins.

I decided to try out this lossless text-compression demonstration site by Fabrice Bellard. It uses GPT-2 natural language generation and prediction to achieve compression. As sample text, I used the first paragraph of Donald Trump’s recent rally speech in Tulsa, Oklahoma. (I figured if anything can compress well using predictive machine learning, surely Trump’s speech patterns can.)

Here’s the compressor site, with most of the input and all of the output showing:

compression page with both input and output displayed

The output looks like a short string of Chinese characters because the compressed text is represented as a series of Unicode characters (encoding 15 bits of information per character — which makes the compression ratio displayed, 804/49, a bit misleading, since the characters on the bottom are twice as large as the characters on the top: 402/49 would be more more accurate, and still quite impressive).

Anyway, I naturally thought “Hmm! What would happen if I were to paste this presumably random Chinese output into Google Translate?”

I am a prisoner, and I am in a state of mind.

“I am a prisoner, and I am in a state of mind.”

Aren’t we all, Internet? Aren’t we all?

Have you noticed how Trump consistently says that “we can’t let the cure be worse than the problem“? (emphasis mine)

The usual stock phrase ends with the word “disease”. But Trump avoids the stock phrase, probably because he doesn’t want someone quoting it back at him sarcastically at the peak of the COVID-19 death toll. So in order to avoid reminding his listeners that it is, in fact, literally a disease we’re dealing with here, he twists a common saying.

Since Trump’s use of language is so frequently odd anyway, journalists rarely call out his misdirections or try to explain them. But even worse, they often cover for him. There was a particularly dramatic example of this recently:

On The Daily podcast with Michael Barbaro, New York Times journalist Maggie Haberman played audio of Trump saying “I don’t want the cure to be worse than the problem itself” (he always phrases it this way — he never says “disease” in that phrase) and then she did a really interesting thing. She repeated it back for the audience, but with the phrase corrected to its standard form:

“— in his words, the cure can’t be worse than the disease.”

(Here’s a transcript.)

Haberman wasn’t adding any information by rephrasing the President. She wasn’t summarizing a longer or more complex thing Trump said. She wasn’t providing needed context that the listener might not have. She just repeated Trump, with one important fix — and called her fixed version “his words”.

What is going on? It’s not a simple accident. The day before, Michael Barbaro himself did the same thing. He played audio of Trump using the same odd phrasing on a different occasion, and then Barbaro followed it up by similarly fixing the President’s words, albeit with “illness” instead of “disease”. (transcript here)

It’s as though the journalists know something is wrong, and instinctively want to fix it, so they generously clean up after the President, instead of simply pointing out how the President consistently mis-phrases a traditional saying. (Foreign journalists have noticed this tendency of American reporters to edit the President and thus mask what he’s actually saying.)

I’m not suggesting that reporters should indulge in speculation about the President’s motivations in behaving like this, even when those motivations are pretty clear. Instead, I’m suggesting that journalists should point out when something odd is going on — help the audience see patterns. As reporters, they’ve heard Trump use this strange phrasing multiple times; they know full well what is going on. But any given audience member might not have heard all those instances, and thus might not spot the pattern.

Instead of unconsciously correcting Trump, and thus normalizing him, just report on him and help people be aware of patterns. Listeners can come to their own conclusions about what the patterns mean, but no one is in a better position than journalists who cover Trump professionally to point out the patterns in the first place.

Don’t cover for.

Just cover.

(Note: See related Twitter threads here and here.)

Update (2019-11-25): Audrey Eschright has made a link roundup of “pieces I’ve been reading on the topic of modern free and open source software practices, licensing, and ethical concerns.” Thanks, Audrey! (Thanks also to Sumana Harihareswara, whose tweet alerted me to this fine development.)

Update (2019-10-23): Christie Koehler has written a great piece on this same topic: Open Source Licenses and the Ethical Use of Software. It’s much more in-depth than my treatment below; I highly recommend Christie’s post if you’re looking for a thorough examination of this trend.

I just wrote this in an email, and then realized it was basically already a blog post, so here it is. (Disclaimer: in this post, as on this blog generally, I’m speaking only for myself and not for my company or our clients.)

There’s been a lot of talk recently about creating software licenses that include an ethical-use-only clause. Here’s one example among many. There has even been talk about modifying some existing free software / open source software licenses to include such clauses. If I stopped to dig up source links for everything I’d never get this post done, but if you’re active in this field you’ve probably been seeing these conversations too. Feel free to supply links in the comments.

According to the current definition of free and open source software, such licenses would no longer be FOSS. Some people react to that by saying that maybe we need to update the definition of FOSS then, but that’s backwards — you can’t change a thing by changing what labels you call it by. The current definition of FOSS would still exist, and would still mean exactly what it means, whether one calls it “FOSS” or “broccoli” or “gezornenplatz”.

But even ignoring the nominalist arguments, I think these ethics-scoped licenses are, sadly, an unworkable idea on substantive grounds.

Aditya Mukerjee explained why very eloquently in this tweet thread, and you might want to read that first. I would add:

In practice, these kinds of clauses are time bombs that people either don’t hear ticking, in which case they get an unpleasant surprise later, or do hear ticking, in which case they avoid using any software under that license.

The conversations I’ve seen around these licenses seem to start from the position that all (ahem) reasonable people agree about what is ethical. But in fact there are serious and deep disagreements about what is ethical — even among people who would never have expected that they might disagree with each other, there are usually latent disagreements lurking. Here are a couple of examples, just to show how easy it is for this to happen:

1) Some people believe that copyright infringement is immoral. They think that copying without authorization, or at least doing so at scale, harms artists and other creators, and is thus unethical. Other people believe that putting restrictions on copying is inherently immoral — that no one should have a monopoly on the distribution of culture and information. (Note that this is wholly independent of attribution, of course — that’s a separate concern, and both sides here generally agree that misattribution is unethical because it is simply a type of fraud.)

So what happens when someone puts out a license with a clause saying that one may not use this software as part of a system that performs unauthorized copying? Sure, the license will mean it means and will be variably enforceable depending on the jurisdiction. But what I’m getting at is that there is no consensus at all, especially among the kinds of people likely to be pondering these questions in the first place, about whether the restriction would be ethical.

This example, far from being contrived, actually touches the proposed license referred to earlier. That license bases its “do no harm” clause on the Universal Declaration of Human Rights, in which see clause 27(2) — a clause that I do not agree is ethical and that, depending on how it is interpreted, may be in fundamental contradiction with free software licensing.

Next example…

2) Many vegetarians and vegans feel that killing animals for meat — and doing medical testing on animals, etc — is immoral. Most of those people live surrounded by meat-eaters, so they often don’t bring this up in conversation unless asked about it. But it’s only a matter of time before someone releases a license that prohibits the software from being used for any purpose that harms animals.

Oh wait, that already happened.

(To be fair, it looks like maybe that was really a click-through download EULA rather than the underlying software license, at least based on this archived page. It’s a little hard to tell — this was all around 2008, and the license is no longer easy to find on the Net. Which I think is likely to be the fate of most ethics-scoped licenses in the long run.)

Formally speaking, these kinds of ethical-use-only clauses violate both the Free Software Definition and the Open Source Definition. In the FSD, they prevent the software from being used “for any purpose”. In the OSD, they constitute a “field of use” restriction.

Now, you can make any license you want, and if you hire a good lawyer to do the drafting it may even be enforceable in some circumstances. But there is much less consensus around the world about what is “ethical” than many people wish. If this practice were normalized, we would quickly have software licenses that prohibit the software from being used in a system that encourages people to change or abandon their religion, or from being used to educate women, etc.

“Fine”, I hear you say. “I don’t have to use their software, then. But people who agree with my ethics will be free to use the software I release under licenses that enforce those ethics.” Except that no one will: the software won’t be adopted, except maybe by your friends. Anyone seriously thinking of using that software in production will run away as fast as they can from a license clause that opens them up to liability based on some judge’s interpretation of what constitutes a violation of someone else’s ethical guidelines. These licenses may look great on the runway, but they’ll never fly.

I think the FSD and the OSD (which are essentially the same idea expressed in different words) got it right the first time. Free software licenses accomplish some wonderful things, both for individual freedom and for non-monopolistic collaboration built around free-to-fork code. However, FOSS licenses can never provide a generally enforceable framework for ethical behavior. Attempts to make them do the latter not only fail (because the software won’t be widely adopted with non-FOSS license terms anyway) but also reduce the licenses’ effectiveness at doing what they were originally designed to do.

Thanks to user lamayonnaise in this Reddit, I was able to solve the problem described below, which I encountered when upgrading a Debian GNU/Linux box from old stable (9.x, a.k.a. “stretch”) to new stable (10.0, a.k.a. “buster”). I’ve also seen this when upgrading from ‘stable’ to ‘testing’ — presumably the solution below would work there too.

Here’s what the problem looks like — full transcript, out of consideration for search engine indexes:

root# apt-get dist-upgrade
Reading package lists... Done
Building dependency tree       
Reading state information... Done
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 guile-2.2-libs : Depends: libtinfo6 (>= 6) but it is not installed
 libedit2 : Depends: libtinfo6 (>= 6) but it is not installed
 libllvm7 : Depends: libtinfo6 (>= 6) but it is not installed
 libncurses6 : Depends: libtinfo6 (= 6.1+20181013-2) but it is not installed
 libreadline7 : Depends: libtinfo6 (>= 6) but it is not installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).
root# 

Hmmm, that doesn’t look good. I tried following the advice given there, but it didn’t work:

root# apt --fix-broken install
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Correcting dependencies... Done
The following packages were automatically installed and are no longer required:
  guile-2.2-libs libncurses6 libpython3.7-minimal libsasl2-modules libzstd1
  mariadb-common python3.7-minimal
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  libtinfo6
The following NEW packages will be installed:
  libtinfo6
0 upgraded, 1 newly installed, 0 to remove and 1326 not upgraded.
47 not fully installed or removed.
Need to get 0 B/325 kB of archives.
After this operation, 534 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
apt-listchanges: Can't set locale; make sure $LC_* and $LANG are correct!
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
locale: Cannot set LC_ALL to default locale: No such file or directory
Setting up libpam0g:amd64 (1.3.1-5) ...
locale: Cannot set LC_ALL to default locale: No such file or directory
Checking for services that may need to be restarted...awk: error while loading shared libraries: libtinfo.so.6: cannot open shared object file: No such file or directory
Checking init scripts...
awk: error while loading shared libraries: libtinfo.so.6: cannot open shared object file: No such file or directory
dpkg: error processing package libpam0g:amd64 (--configure):
 subprocess installed post-installation script returned error exit status 127
Errors were encountered while processing:
 libpam0g:amd64
E: Sub-process /usr/bin/dpkg returned an error code (1)
root# 

Okay, hmmm, what about trying the same but with apt-get instead of apt? Let’s see:

root# apt-get --fix-broken install
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Correcting dependencies... Done
The following packages were automatically installed and are no longer required:
  guile-2.2-libs libncurses6 libpython3.7-minimal libsasl2-modules libzstd1
  mariadb-common python3.7-minimal
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  libtinfo6
The following NEW packages will be installed:
  libtinfo6
0 upgraded, 1 newly installed, 0 to remove and 1326 not upgraded.
47 not fully installed or removed.
Need to get 0 B/325 kB of archives.
After this operation, 534 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
apt-listchanges: Can't set locale; make sure $LC_* and $LANG are correct!
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
locale: Cannot set LC_ALL to default locale: No such file or directory
Setting up libpam0g:amd64 (1.3.1-5) ...
locale: Cannot set LC_ALL to default locale: No such file or directory
Checking for services that may need to be restarted...awk: error while loading shared libraries: libtinfo.so.6: cannot open shared object file: No such file or directory
Checking init scripts...
awk: error while loading shared libraries: libtinfo.so.6: cannot open shared object file: No such file or directory
dpkg: error processing package libpam0g:amd64 (--configure):
 subprocess installed post-installation script returned error exit status 127
Errors were encountered while processing:
 libpam0g:amd64
E: Sub-process /usr/bin/dpkg returned an error code (1)

Nope.

All right, then. Let’s do it manually:

I’m not sure it was necessary, but at this point I ensured the locale by checking that the uncommented line “en_US.UTF-8 UTF-8” was present in /etc/locale.gen, running the command locale-gen as root, logging out and logging back in, and confirming the locale with locale -a.

Again, that locale dance may not have been necessary. What was necessary were the next steps:

Visit the Debian package pages for libtinfo6 and libpam0g, download the amd64 versions (using the sha256sum command to check the downloaded files against the SHA256 fingerprint listed at the bottoms of the Debian package pages), then install them manually:

root# dpkg -i libtinfo6_6.1+20181013-2_amd64.deb
root# dpkg -i libpam0g_1.3.1-5_amd64.deb

Those commands succeeded, and I confirmed that the packages were now installed:

root# apt-get install libtinfo6
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libtinfo6 is already the newest version (6.1+20181013-2).
libtinfo6 set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 1325 not upgraded.
root# apt-get install libpam0g
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libpam0g is already the newest version (1.3.1-5).
0 upgraded, 0 newly installed, 0 to remove and 1325 not upgraded.
root# 

Now the box was in working order again, and I could finish the dist-upgrade:

root# apt-get dist-upgrade
[...zillions of lines of package names omitted...]
1325 upgraded, 390 newly installed, 19 to remove and 0 not upgraded.
Need to get 41.7 MB/1,155 MB of archives.
After this operation, 1,054 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
[...zillions of lines of success omitted...]

Portrait of Elizabeth Warren.

I’ve been “All In For Warren” for a while now. I expect a lot more people to join us after tonight’s debate :-), but just in case you’re still on the fence, here are four brief arguments Why Warren:

  • She’s making the other Democratic candidates better. She’s offering so much vision that the others are picking it up. The longer she stays in the race, the better the eventual nominee will be. (I think it will be her anyway, so this item is more of an insurance-policy argument.)
  • She has the right enemies. Seriously, ask yourself: can you name one enemy of Joe Biden’s? No, you can’t. When Joe Biden walks into a room, his goal is for everyone in that room to like him. That is not what we need in our next President. Elizabeth Warren has the enemies you’d hope she would have.
  • She understands what is needed, and she’s proposing to actually do it. Most candidates understand what is needed, but they don’t dare propose to actually do it, because they can’t afford to scare off the big-dollar donors. Elizabeth Warren decided not to pursue big-dollar donors from the beginning. That’s freed her up to offer up a spot-on diagnosis of how scaled-up capitalism has captured the state and made its values the state’s values, and she’s saying what needs to be done about that. She doesn’t mind offending the people who pushed us into unsustainable inequalities of wealth, power, and dignity.
  • If she’s campaigning for President, then we’ll probably have a better Senate too. The best presidential campaigns have coattails. Elizabeth Warren’s will be particularly long, because she’s offering so much for other candidates to grab on to.

Want to help? Come on in, the water’s fine!

Visual demonstration of Simpson's Paradox (adapted from https://en.wikipedia.org/wiki/File:Simpson%27s_paradox_continuous.svg)

Do any news organizations have a Numeracy Editor?

For fifteen years, the New York Times had a Public Editor, whose job was to visibly uphold journalistic ethics. The Public Editor would publicly discuss errors, biases, or gaps in the paper’s coverage. (Some other news organizations continue to have a public editor position, though I think it’s not widespread.)

I’d like to propose something narrower: a Numeracy Editor. The Numeracy Editor’s job would be to help reporters and columnists use numerical and statistical reasoning well.

I’ve been pondering this idea for a while, and finally decided to write about it after reading Vatsal G. Thakkar’s excellent NYT Op-Ed Bring Back the Stick Shift a couple of weeks ago. It’s a good piece, but at one point it veers into unexpected non sequitur in an attempt to use statistics to support its argument:

Backup cameras, mandatory on all new cars as of last year, are intended to prevent accidents. Between 2008 and 2011, the percentage of new cars sold with backup cameras doubled, but the backup fatality rate declined by less than a third while backup injuries dropped only 8 percent.

The more you read that, the less it means. For a three-year period, the percentage of new cars sold with backup cameras doubled from whatever it was before — without knowing what it was before, this doesn’t tell us anything: the result of doubling a very miniscule percentage would still be a miniscule percentage, for example. Meanwhile, during that same three-year period, fatalities due to backups declined by some amount (less than a third) from whatever the rate was before — again, we don’t know. So does that decline represent a greater decline in backup fatalities than should be expected from whatever percentage of cars on the road newly have backup cameras? Or a smaller decline? There is no way to say. Also, we don’t know what percentage of cars driving on the road are new cars, which is highly relevant here.

If the author was trying say to that fatalities should have declined more, this paragraph does not support that case, but it doesn’t support any other case either. It throws some statistics into the air, as if to see how the wind catches them, but they don’t connect to each other and they have no bearing on the question at hand. As my friend Tom put it, it’s just a “number casserole”.

I certainly don’t mean to pick on on Thakkar — again, I liked the piece — or on the New York Times. This sort of thing happens in many publications; you can see it all the time, in the regular reporting just as much as in opinion editorials.

But given that this was the New York Times Op-Ed page — a forum that presumably takes quality control and editorial standards seriously — it’s worth asking: how did such a problematic paragraph make it through the filters? I think the answer is that there is no editor whose reputation are self-respect are on the line when numerical clunkers slip through. A few grammatical or spelling errors and someone’s job is in danger, but even glaring errors of statistical reasoning are currently costless.

I get that journalists and their editors tend to have backgrounds in language, political science, history, and other fields that don’t emphasize math. And that’s fine: this isn’t an “everyone should learn more math” argument. There are only a finite number of days in anyone’s life, there isn’t time to learn everything, and people make the choices they make for reasons. That’s exactly why a Numeracy Editor is needed: it would be her job to own this problem, and along the way help journalists learn the math they need. The writers would start to be more careful just knowing that someone is watching. A Numeracy Editor would have caught the problem in that Op-Ed right away, and once spotted, it’s easy to explain; the conversation with the author can take place before publication, as with any other kind of editing. Many errors of numerical or statistical reasoning are easy to understand once they’re pointed out (although there are also subtler cases, such as Simpson’s Paradox, that occur in real-life, policy-relevant situations and need to be watched for).

Unlike Public Editor, Numeracy Editor need not be a public-facing role. The main point is to help writers and other editors use math appropriately and to prevent mistakes. If the editor also wants to conduct a public discussion about using numbers and graphs in journalism, that would be a great public service too, but it’s a bonus. The role could do a lot of good purely behind the scenes.

Numeracy Editor should be an easier position to hire for than the broader role of Public Editor has been, because it doesn’t require nearly as much journalistic experience (the Numeracy Editor isn’t making hard judgement calls about how much anonymous sourcing is acceptable in a story, for example) and because the advice it provides would be less controversial.

Anyway, I don’t run a newspaper; all I have is this blog. I’d love to hear from anyone who works in or near journalism what they think of this idea.

(You can respond in a comment, or in this Twitter thread, or in this Identi.ca thread.)

I guess I’ll just write this as though I have reason to be believe that the people who write headlines for the New York Times read my blog.

For the record: I’m a subscriber, and I think the Times does some terrific reporting and investigative journalism — when they’re at their best, there’s no one better. That makes the unforced errors all the more disappointing.

Look at the top of today’s edition’s front page:

Top of New York Times front page for 2018/10/03.

First note the caption beneath the big color photo on the left, which says:

A migrant caravan headed north Monday from Tapachula, Mexico, where members had stopped after crossing in from Guatemala.

Now all the way over on the right, note the bold headline at the top of the rightmost column:

Trump Escalates Use of Migrants As Election Ploy

Issuing Dark Warnings

Stoking Voters’ Anxiety With Baseless Tale of Ominous Caravan

If you take the headline at face value, and then look over at the photo, you would naturally come to the conclusion that the New York Times is contradicting itself on its own front page.

It turns out that the article under the headline is indeed about a baseless tale — just not one about the existence of the caravan itself, even though that’s what the headline would imply to any casual reader:

President Trump on Monday sharply intensified a Republican campaign to frame the midterm elections as a battle over immigration and race, issuing a dark and factually baseless warning that “unknown Middle Easterners” were marching toward the American border with Mexico.

[emphasis mine]

In twenty words of headline, there wasn’t some way to fit something specific about the false claim in?

How about this:

Trump Falsely Implies Terrorism Threat From Caravan

“Unknown Middle Easterners”

Stoking Voters’ Anxiety With Baseless Claim About Migrant Caravan

There, did it in 19 words, one fewer than the number they used for a misleading and less informative headline.

Yes, by the way, you know and I know and the New York Times knows that “Middle Easterner” doesn’t mean “terrorist”. But it’s perfectly clear what Trump is doing here and the NYT shouldn’t shy away from describing it accurately… in the headline.

(Entirely separately from the above, there’s the question of why the New York Times is running a giant color photograph of the migrants above the fold on its front page, for the second time in the past few days. These caravans have been going on since 2010; they’re larger and more organized the last couple of years, but they’re not new. As an independent news outlet, why let a politican’s talking points drive cover art choices in the first place?)

Self-censored page of 'Green Illusions', by Ozzie Zehner
image credit

A particularly insidious problem with online social media platforms is biased and overly-restrictive ban patterns. When enough people report someone as violating the site’s Terms Of Service, the site will usually accept the reports at face value, because there simply isn’t time to evaluate all of the source materials and apply sophisticated yet consistent judgement.

No matter how large the company, even if it’s Facebook, there will simply never be enough staff to evaluate ban requests well. The whole way these companies are profitable is by maintaining low staff-to-user ratios. If policing user-contributed content requires essentially arbitrary increases in staff size, that’s a losing proposition, and the companies understandably aren’t going to go there.

One possible solution is for the companies to make better use of the resource that does increase in proportion to user base — namely, users!

When user B reports user Q as violating the site’s ToS, what if the site’s next step were to randomly select one or more other users (who have also seen the same material user B saw) to sanity-check the request? User B doesn’t get to choose who they are, and user B would be anonymous to them — the others wouldn’t know who made the ban request, only what the basis for the request is, that is, what user B claimed about user Q. The site would also put their actual Terms of Service conveniently in front of the checkers, to make the process as easy as possible.

Now, some percentage of the checkers would ignore the request and just not bother. That’s okay, though: if that percentage is high, that tells you something right there. If user Q is really violating the site’s ToS in some offensive way, there ought to be at least a few other people besides user B who think so, and some of them would respond when asked and support B’s claim. The converse case, in which user Q is perhaps controversial but is not violating the ToS, does not necessarily need to be symmetrically addressed here because the default is not to ban: freedom of speech implies a bias toward permitting speech when the case for suppressing it is not convincing. However, in practice, if Q is controversial in that way then some of the checkers would be motivated to respond because they realize the situation and want to preserve Q’s ability to speak.

The system scales very naturally. If there aren’t enough other people who have read Q’s post available to confirm or reject the ban, then it is also not very urgent to evaluate the ban in the first place — not many people are seeing the material anyway. ToS violations matter most when they are being widely circulated, and that’s exactly when there will be lots of users available to confirm them.

If user B issues too many ban requests that are not supported by a majority of randomly-selected peers, then the site could gradually downgrade the priority of user B’s ban requests generally. In other words, a site can use crowd-sourced checking both to evaluate a specific ban request and to generally sort people who request bans in terms of their reliability. The best scores would belong to those who are conservative about reporting and who only do so when (say) they see an actual threat of violence or some other unambiguous violation of the ToS. The worst scores would belong to those who issue ban requests against any speech they don’t like. Users don’t necessarily need to be told what their score is; only the site needs to know that.

(Of course, this whole mechanism depends on surveillance — on centralized tracking of who reads what. But let’s face it, that ship sailed long ago. While personally I’m not on Facebook, for that reason among many, lots of other people are. If they’re going to be surveilled, they should at least get some of the benefits!)

Perhaps users who consistently issue illegitimate ban requests should eventually be blocked from issuing further ban requests at all. This does not censor them nor interfere with their access to the site. They can still read and post all they want. The site is just informing them that the site doesn’t trust their judgement anymore when it comes to ban requests.

The main thing is (as I’ve written elsewhere) that right now there’s no cost for issuing unjustified ban requests. Users can do it as often as they want. For anyone seeking to no-platform someone else, it’s all upside and no downside. What is needed is to introduce some downside risk for attempts to silence.

Other ideas:

  • A site should look more carefully at others’ ban requests against material that someone else has already made a rejected ban request about, or that someone who has a poor ban-reliability score has requested a ban on, because there would be a higher likelihood that those other requests are also unjustified.

  • A lifetime (or per-year) limit on how many ban requests someone can issue.

  • Make ban requests publicly visible by default, with opt-out anonymity (that is, opt-in to be identified) for the requester.

Do you have other (hopefully better) ideas? I’d love to hear them in the comments.

If you think over-eager banning isn’t a real problem yet, remember that we have incomplete information as to how bad the problem actually is (though there is some evidence out there). By definition, you mostly don’t know what material you’ve been prevented from seeing.