Parsing PRISM: Gen. Keith Alexander did not claim “dozens of attacks” were prevented.

June 14th, 2013

PRISM logo

Over and over we’ve read that Gen. Keith Alexander, the head of the NSA, claimed that its massive surveillance program has prevented “dozens” of terrorist attacks. Journalists are careful to report this claim as simply what Alexander said, not as a fact itself — we’re responsible journalists, far too wise in the ways of the world to believe something just because someone in the Administration said it! We know better than that.

Except that he didn’t say it. At least as far as I can tell — if anyone knows of a source for the claim other than the below, please let me know. So far, the only source I’m aware of is the exchange with Sen. Patrick Leahy referred to here.

What Gen. Alexander said was subtly but signicantly different, and he’s probably not surprised to see it being misinterpreted in the NSA’s favor right now. We shouldn’t look to the NSA for a correction on this, but do note that Alexander was careful not to lie. No doubt he would lie, if he had too, but this time we did the work for him.

(Not to take undue credit: this discrepancy was pointed out to me by a friend who prefers to remain unattributed. Later a mutual friend pointed us to this post, which has the quotes and the analysis and the video link. I’m really just repeating what that post has already pointed out.)

First of all, Gen. Alexander never said “dozens of attacks”. The dozens he referred to were dozens of call records that contributed to the discovery or disruption of… something, something he calls “events” (apparently elsewhere he’s only talked about two actual attacks disrupted; I don’t have the source for that, but if you do please leave it in the comments).

Watch how this works:

Gen. Keith Alexander: “…it’s dozens of terrorist events that these have helped prevent.”

Sen. Patrick Leahy: “OK, so dozens? Now we collect millions and millions and millions of records through 215, but dozens of them have proved crucial, critical, is that right?”

Gen. Keith Alexander: “For both here and abroad, in disrupting or contributing to the disruption of terrorist attacks.”

Sen. Patrick Leahy: “Out of those millions, dozens have been critical?”

Gen. Keith Alexander: “That’s correct.”

Fascinating. He didn’t say “dozens of attacks”. He does, at first, after a long and clearly thoughtful pause (see the video below), say “dozens of events” once. What’s an “event”? If you disrupt a terrorist meeting, that’s an event. If you disrupt a terrorist eating dinner, is that an event? Maybe. I don’t know. But I do know that when someone in national security wants to defend their work, they use the word “attacks”. Attacks are what matter. When they use the much weaker word “events”, it is not an accident — it is because the stronger word is not available.

Sen. Leahy then gives him the opening to subtly switch the subject to the call records, rather than the events or attacks or whatever they are. Whether Leahy did that by accident or not I don’t know either. But Alexander gratefully takes Leahy’s pivot, to the extent of avoiding even having an explicit subject in his next two sentences — he just grabs Leahy’s antecedent like a life raft and rides it the rest of the way.

He never said dozens of attacks. He very carefully did not say dozens of attacks.

Satisfied that he didn’t say dozens of attacks?

Now let’s look at some headlines:

NSA: ‘Dozens of attacks’ prevented by snooping (The Register)

NSA chief: Surveillance has stopped dozens of potential attacks (Chicago Tribune)

NSA head: Surveillance program prevented dozens of terrorist attacks (Salon)

Alexander: Phone Collection Has Prevented ‘Dozens’ of Attacks (Democracy Now)

And just today I saw it in the New York Times too:

In a robust defense of the phone program, General Alexander said that it had been critical in helping to prevent “dozens of terrorist attacks” both in the United States and abroad…

Current score:

Experienced Washington NSA directors:   1  
Experienced Washington journalists:   0  

Here’s that video:

PRISM: The Problem with Collect-Then-Select.

June 12th, 2013

PRISM logo

[Note: This post now uses the phrase "collect-then-select", instead of "collect-then-analyze", which wasn't quite as accurate. Other than that, and adding the references at the end, I've made no changes. There is a redirection in place from the old URL.]

One notion that keeps surfacing in the ongoing PRISM leak is that intelligence services have started collecting vast amounts of data just to store for potential later use under a specific warrant. In other words, they want to have it all easily at hand for when they’re actually investigating someone and need to discover that person’s contacts, social network, travel patterns, consumer habits, etc.

For the actual investigation, so the claim goes, they’ll obtain warrants as needed, even if the initial collection was unwarranted — in other words, the collection phase can skate by without a warrant, because even though they have the data they haven’t actually looked at it yet, so no one’s rights are being violated. Then later when they do look at it, they make sure they have a warrant.

This sounds sane, or at least like a good-faith attempt to abide by some kind of legal framework while still getting the job done… until you think about it:

A low-level systems administrator just leaked thousands of top-secret documents. How can they guarantee that your data is safe, even if it’s supposedly just being stored and not analyzed?

This point is understandably hard for intelligence services to acknowledge. No one wants to think about their system’s failure modes. But if you’re collecting and storing private data about millions of citizens, failure modes become not merely important, but a dominant consideration.

Legal protections are designed with failure modes in mind. We cannot guarantee that our systems operate as designed; we can at best hope. This is why “collect then select” is a problem. It’s not because the data is hurting anyone by sitting idly in a storage facility, unexamined by humans or machines. It’s because you can’t be sure it’s really idle. If a conscience-stricken 29 year old can leak thousands of top-secret documents to a journalist, a more mercenary employee — or perhaps just one whose family is being threatened by some very interested party — can access your data and make it available to someone else. This risk is inherent in the centralized collection and storage of the data. By collecting it, the intelligence services have created another route of vulnerability for private information about you. I’m sure they’re doing their best to protect it, but in the long run, their best probably won’t be enough.

Anyway, as Moxie Marlinspike eloquently argues, we should all have something to hide.

References:

I’ve seen the “collect-then-select” notion described in many places. The three I was able to dig up after the fact are all from the New York Times:

Disclosures on N.S.A. Surveillance Put Awkward Light on Previous Denials:

“Right now we have a situation where the executive branch is getting a billion records a day, and we’re told they will not query that data except pursuant to very clear standards,” Mr. Sherman said. “But we don’t have the courts making sure that those standards are always followed.”

N.S.A. Chief Says Phone Record Logs Halted Terror Threats:

Analysts can look at the domestic calling data only if there is a reason to suspect it is “actually related to Al Qaeda or to Iran,” she said, adding: “The vast majority of the records in the database are never accessed and are deleted after a period of five years. To look at or use the content of a call, a court warrant must be obtained.”

ACLU Files Lawsuit Seeking to Stop the Collection of Domestic Phone Logs:

Timothy Edgar, a former civil liberties official on intelligence matters in the Bush and Obama administrations who worked on building safeguards into the phone log program, said the notion underlying the limits was that people’s privacy is not invaded by having their records collected, but only when a human examines them.

That same article goes on to make another important point about why collect-then-select is problematic:

Moreover, while use of the database is now limited to terrorism, history has shown that new government powers granted for one purpose often end up applied to others. An expanded search warrant authority justified by the Sept. 11 attacks, for example, was used far more often in routine investigations like suspected drug, fraud and tax offenses.

Epic botch of the PRISM story.

June 11th, 2013

PRISM logo

[Update 2013-06-13: See Rick Perlstein's piece about this on The Nation's blog. Glenn Greenwald later responded here.]

Mark Jaquith’s post The PRISM Details Matter is spot-on. Glenn Greenwald has misunderstood a key technical fact, one that removes the most explosive charge in the whole scoop. And for some reason, Greenwald refuses to correct it.

The crucial question is:

Are online service companies giving the government fully automated access to their data, without any opportunity for review or intervention by company lawyers?

Greenwald essentially says yes, they are. Yet nothing leaked so far indicates that this is the case, and the companies all vehemently deny it. They say they have humans in the chain. The information leaked so far supports this claim or is at least consistent with it.

It looks like Greenwald & co simply misunderstood an NSA slide, most likely because they don’t have the technical background to know that “servers” is a generic word and doesn’t necessarily mean the same thing as “the main servers on which a company’s customer-facing services run”. The “servers” mentioned in the slide are just lockboxes used for secure data transfer. They have nothing to do with the process of deciding which requests to comply with — they’re just a means of securely & efficiently delivering information once a company has decided to do so.

As Jaquith emphasizes, this is not merely a pedantic point. This is central to the story, and as far as I can tell, Greenwald continues to misunderstand and thus misrepresent it. It’s an epic botch in an important story :-( .

An email I sent to some friends yesterday, about this exact same point:

  From: Karl Fogel
  To: <undisclosed recipients>
  Subject: Re: Cowards | Uncrunched
  Date: Mon, 10 Jun 2013 14:18:57 -0500
  
  One of the above wrote:
  >Since the topic has taken over part of my morning, thought I'd share: 
  >http://uncrunched.com/2013/06/07/cowards/ 
  
  I read this post when it came out, yeah.  I think it's mostly wrong.
  
  What is described here is just a delivery mechanism.  *If* you're a
  company that's complying with government requests for data (and not all
  requests are abusive or unreasonable) a lockbox is a perfectly sensible
  way to do it.
  
  Sure, the lockbox may run on a server that belongs to the company, but
  this is not the same as -- indeed, is *totally unrelated to* -- giving
  the government direct access to your servers, the servers that are
  related to the actual service you provide as part of your business,
  which is how far too many bloggers are portraying it.
  
  Grrrrr.
  
  Uncrunched quotes Claire Cain Miller approvingly:
  
    "While handing over data in response to a legitimate FISA request is a
    legal requirement, making it easier for the government to get the
    information is not."
  
  What?  This makes no sense.  The lockbox may or may not make it easier
  for the government, but it sure makes it easier for the *company* to
  securely hand over data while lowering the risk of some unauthorized
  third party gaining access.  If you're going to comply, might as well do
  it responsibly and without increasing the compliance burden on yourself.
  What the hell are the companies supposed to do?  Put the data on a
  CD-ROM and mail it to Fort Meade?
  
  It's not like there aren't legitimate things to complain about here.  I
  don't understand why Uncrunched is wasting time with non-problems.

Privacy Promises and Client-Side Betrayal.

June 9th, 2013

Message on phone screen promises to self destruct.  Really.

Some apps are making an impossible promise, one that these days might really matter to people. The promise is this:

“You can control the copies of data you send to other people.”

You can’t. It’s not even possible in principle. If an app promises that you can send people email messages, photos, audio recordings, or videos that will “self-destruct” or “can only be viewed for a limited time controlled by you” or “can only be viewed by people you approve”, just smile and back away slowly.

These promises all depend on client-side betrayal. That is, they depend on a device obeying commands from someone other than its owner. For example, a smartphone that does what the phone manufacturer — or mobile carrier — wants it to do, instead of what its owner asks.

Now, it’s true that some devices actually do practice client-side betrayal. This is one of the reasons I don’t own a Kindle.

But when you send someone a message or a photo, how can you be sure they’re using a device that will betray them? You can’t. You can’t count on their device serving your goals instead of theirs.

You might think apps only make such promises for platforms where they can count on the recipient’s device being of the betraying sort. But they can never know for sure that the necessary betrayal will occur as required. To start with, many of them run on Android devices, after all. The Android operating system is open source (admittedly on a very long release cycle, with various manufacturer-specific proprietary divergences along the way, but still, in the long run it is open source). A sufficiently motivated user could modify their Android device such that every frame written to the screen or every sound written to the speakers is recorded to the SD storage card. That video may have self-destructed as advertised, but that doesn’t really matter if there’s a perfectly good copy still sitting in permanent storage. Heck, while we’re at it, why even believe the deletion happened at all? An Android user could modify the OS-level delete system call to not actually delete, but rather move the file over to an easily accessible first-in-last-out garbage collection area from which recent items can be rescued if the user decides they’re interesting enough. (This is not so different from how most OS delete calls work already. Feeling safer yet?)

Everything I said about Android above applies to any open source operating system: GNU/Linux, Firefox Mobile OS, all of them.

If the user controls her operating system, then she controls her data, period. And open source means you do control your operating system. You don’t necessarily have to know how to program, you just have to know how to hire people who can program — just as you don’t have to know how internal combustion engines work to hire a mechanic to fix your car.

Open source means no client-side betrayal, at least for people who care enough to avoid it.

Anyway, even without open source operating systems, the recipient can still save it old school: just point another camera-enabled device at the screen and take a picture.

Who’s depending on client-side betrayal?

Full disclaimer: I first started noticing this trend through my work with OpenITP, but the opinions here are entirely my own. The users and developers OpenITP works with are people who need to be able to take privacy promises seriously; as a result, I’ve become more sensitive to those promises than I used to be. When someone makes a promise that conflicts with my technical understanding of how the digital world works, I start asking questions.

Here is a partial list of apps that have caused me to ask questions lately (emphasis added):

surespot:

“You can delete your message from the receivers phone.”

“Be confident sending private information and pictures. You have control over your messages, when you delete a sent message it will be removed from the receivers phone and images are not sharable unless you make them so.”

Priv.ly:

“Privly makes it possible for you to control your data after posting it across the internet. You can post to Facebook without allowing Facebook access to your communications, you can even unsend emails…”

Snapchat (from their blog):

“Deleting Snaps From the Recipient’s Device”

“After a snap has been opened, the temporary copy of it is deleted from the device’s storage. We try to make this happen immediately, sometimes it might take a minute or two. The files are deleted by sending a ‘delete’ instruction to the phone’s file system. This is the normal way that things are usually deleted on computers and phones — we don’t do anything special (like ‘wiping’).

Extra Details

While an unopened snap is being stored on the device, it’s not impossible to circumvent the Snapchat app and access the files directly. This isn’t something we support or encourage and in most cases it would involve jailbreaking or ‘rooting’ the phone and voiding its warranty. If you’re trying to save a snap, it would be easier (and safer) to just take a screenshot or take a picture with another camera.

Also, if you’ve ever tried to recover lost data after accidentally deleting a drive or maybe watched an episode of CSI, you might know that with the right forensic tools, it’s sometimes possible to retrieve data after it has been deleted. So… you know… keep that in mind before putting any state secrets in your selfies :)”

Wickr:

“sender-based control over who can read messages, where and for how long”

“[Wickr can] send text messages, videos, documents that self-destruct — all encrypted, and it exceeds NSA top-level encryption on the device before it goes out on network with a key that only you have.” (Founder Nico Sell quoted in Silicon Beat.)

Hush Box:

“Secure, self-destructing email.”

(I don’t know anything more about this one; there’s no further explanation on the page, beyond the above.)

Confusing the Threat Models

What these promises have in common is that they confuse two very different threat models. One is the scenario where you’re communicating with an ally — someone who, as far as you know, has no intent to do you harm, though they could do so by accidentally re-sharing something. The other is the scenario where you’re communicating with a stranger or with someone who might actively intend you harm.

The “Extra Details” section in the Snapchat marketing quote above is a particularly good example of this confusion. Somewhere between the first and second sentences, they subtly switch whom they’re addressing. This first sentence is clearly to the sender:

“While an unopened snap is being stored on the device, it’s not impossible to circumvent the Snapchat app and access the files directly.”

Then suddenly they switch to talk to the recipient…

“This isn’t something we support or encourage and in most cases it would involve jailbreaking or ‘rooting’ the phone and voiding its warranty. If you’re trying to save a snap, it would be easier (and safer) to just take a screenshot or take a picture with another camera.”

…then just as suddenly they switch back, still without any explicit acknowledgement that they’re talking to two different parties with possibly different interests:

Also, if you’ve ever tried to recover lost data after accidentally deleting a drive or maybe watched an episode of CSI, you might know that with the right forensic tools, it’s sometimes possible to retrieve data after it has been deleted. So… you know… keep that in mind before putting any state secrets in your selfies :)”

That middle portion, where they talk to the recipient, could be translated to “For our sake, please don’t interfere with the process of your device betraying you.” Recipients in the first threat model will cooperate; those in the second threat model won’t.

Ultimately, these apps aren’t really about security and privacy. They’re about convenience in situations where dependable privacy isn’t a requirement. An app that deletes a photo after showing it to your friend for six seconds is just a convenience for everyone involved. It makes things easier for both parties: Hah hah, look at this picture of me stuffing a hundred dollar bill into his sock while he pours a margarita down my throat! Wasn’t it a great vacation? Wish you were there. No need to worry about deleting this photo; it’ll take care of itself. See you Tuesday at the office.

That’s fine, if that’s all you wanted. But the vast majority of people who read the marketing around these apps will take them at their word. Wow, I can send an email that self destructs immediately after the recipient is done reading it? That’s great! It’s like attorney-client privilege without the expensive law degree. Where do I sign up?

People don’t think about threat models. They think about features and promises. If the app says it does X, they believe it does X.

The trouble is, problem recipients are not evenly distributed across all the pictures and emails and videos one sends. The problem recipients are concentrated in the sensitive items, because the temptation to be a problem recipient is highest exactly for the things a sender would most want deleted. Or as the great saying has it (attributed to both George Orwell and Paul Fussell): “What someone doesn’t want you to publish is journalism; all else is publicity.”

Most people who come to my front door are honest, but the lock on the door is not for them. Promises of client side cooperation are pointless, from the sender’s point of view, if they are most likely to be circumvented by those most tempted to harm the sender in the first place.

One more example, just to drive the point home.

Some mass email services — ”mass email” is not a euphemism for spam, by the way; these services are tremendously useful, helping legitimate organizations run their announcement email lists, etc — promise to tell you how many recipients have opened the email you sent.

“Whuh-huuuh-whaaat??” I thought to myself, when I first heard about this. How on earth can anyone else know whether I’ve opened up an email, let alone whether I’ve read it? My mailreading software does not send signals to third parties when I open messages. That would be an incredible betrayal of my trust.

And yet there are these services, promising exactly that. Here’s Mailchimp (the quotes below are from their Features page and their Reports Overview page):

MailChimp’s free reports tell you who’s opening, clicking, and coming back for more…

Our interactive graphs show you how many emails were delivered, how many people opened your email

Opens by Location: See where in the world your subscribers are located and track engagement by country.

A/B Split Testing People who A/B test their email campaigns get 11% better open rates and 17% more clicks…

It turns out — surprise! — that they’re depending on client-side betrayal, of course.

These days, most people are reading mail in their browser, using one of the online services like Hotmail, etc, or in some other network-enabled email client. And when those email clients get an email that includes an image, they will (in some cases) display images by default — even if the image content isn’t embedded inside the email, but rather is merely linked to from the mail and has to be fetched (at the time the message is opened) from somewhere out on the Internet.

So what these services do is include a tiny image, just one pixel large and, if possible, the same color as the message’s background color. But they don’t include that pixel directly in the mail. Instead, they keep it on their own servers, at a URL unique to that particular message. When the recipient opens the message, their mailreader fetches all the images, even the tiny & invisible ones, and it is by receiving the request for the image at that unique URL that the upstream service knows the mail has been opened.

Now, to their credit, some mailreading services — Google’s Gmail is one — turn off image display by default, precisely to avoid this betrayal. You have to turn it on explicitly, and when you change that setting they warn you about the privacy implications.

Possibly Mailchimp talks about that somewhere, though I didn’t see it if so. The part that I saw just tells senders that Mailchimp can determine the “open rate” for the emails they send out. If Mailchimp has statistics on what percentage of email users set images to display by default — thus making themselves vulnerable to at least one kind of client-side betrayal — that would be very interesting; I don’t know if they do.

It turns out there’s a history here.

I showed an early draft of this post to my friend Jeff Ubois, and he instantly thought of who had covered this ground already. There’s a 2009 book by Viktor Mayer-Schönberger called Delete: The Virtue of Forgetting in the Digital Age; I haven’t read it, but the discussion with Jeff made me wish I had. There’s also an article by US Federal District Judge James Rosenbaum called In Defense of the delete Key”, published in the law journal The Green Bag, about how the Delete key doesn’t actually delete (as I touched on earlier).

Client-side betrayal can have consequences both social and legal. Phrases Jeff tossed out include “inadvertent waiver rule” and “spoliation of evidence”. The sender’s request to delete may still be considered an expression of intent, and that can be legally useful under certain circumstances. Pity about that photo appearing on the front page of the New York Times, but at least it won’t be admissible in court because you tried to ensure the recipients would delete it. That’s some comfort, anyway.

So here’s my promise to you:

If you send me something in digital form, you cannot count on me — or my devices — deleting all my copies of it unless I explicitly tell you they’re gone. The copies I receive from you will continue to exist as long as I want them to, and whether I share them with others is entirely a matter of social conventions and of honor, not of technical enforcement from your side. You also can’t be certain that I have not read an email you sent; you can be certain I have read it if I say I have, or if I reply to it.

But really, these are just the same promises the rest of the Internet makes. If someone thinks otherwise, it means they’re depending on client-side betrayal by your devices — but it’s up to you, not them, whether that betrayal happens.

Or as my friend Jim Blandy puts it: “It’s my computer, damn it!”

Beyond App Contests: Playing the Long Game

May 28th, 2013

[I recently found this unpublished draft sitting in my CivicCommons Tumblr account.  I'm not sure why I didn't post it back in January 2012, when I apparently wrote it, but anyway here it is now!]

Recently, during a discussion about civic apps contests, Abhi Nemani asked two sharp questions:

  1. How do you get meaningful ideas from city hall to entrepreneurs?
  2. What can a city do instead of an apps contest?

In response I brainstormed a bit, deliberately keeping the filter turned somehere between “low” and “off”:

  1. A city could maintain a portal tracking requests for particular data sets, and advertise that portal to potential app developers, with the idea that data set releases get prioritized when enough people are specifically requesting them.  (Don’t some cities already do this on their data set pages, actually?)

    One problem with this is that many app developers do not like to signal their intentions, both because they want to keep competitive advantage and because they don’t like feeling they’re promising to do something on spec — they want the data set available, but they don’t want to look like they’ve made a commitment about it.  So the city can’t depend on people saying exactly what they want to do with the data set; the city just has to be willing to consider generic requests for the data set as meaning something.

  2. Cities are sometimes too driven by a need to quantify results, and have those results be immediate.

    What if instead, a city just held a regular, recurring hackathon event, say 2-4 times a year, at which city technologists and local entrepreneurial hackers met unconference-style and did whatever comes to mind.  The city keeps records of who attends.  Then, as apps come out over the next few months/years/decades/whatever, the city figures out which apps are popular — if it can’t do that, we’re worse off than I thought — and compares the apps’ authorships with past hackathon attendees.  When there is overlap, ask a few questions to make sure there’s some causal relationship, and when there is, consider it  a policy triumph and don’t be shy about putting out a press release!

Come to think of it, (2) expresses the high-level principle I was really aiming at:

Cities should actively create an environment that encourages entrepreneurial hacktivity, and then try to measure the results a while afterwards, with actual usage stats as the basic measure of success (rather than relying on artificially-selected judges who don’t have time nor expertise to evaluate apps in real-life circumstances).

“Usage stats” doesn’t have to just mean number of unique users who use a certain app per month or something.  The city could do surveys too. (For example, an app like Square makes a big difference to people who aren’t direct users, because it enables commercial transactions where there is no permanent storefront — so the buyers, who don’t run Square, should still be counted as beneficiaries.  This can be hard to measure well; I’m not sure what a general answer would be, but listening to the “buzz” from different communities is going to be part of it.)

None of this is to condemn apps contests — they can do a lot to kick-start local entrepreneurial energy.  But apps contests don’t by themselves set up a long-term, sustainable environment for civic hackitivity, in part because they’re always in the position of guessing future successes rather than highlighting and learning from existing successes.  They’re a seed, but it would be a mistake to think of the resultant apps as the crop.  The crop is rather an environment, in which city data output interacts dynamically with the community of people using it as input.

Chicago CTO John Tolva’s post “Open Data in Chicago: progress and direction” touches on this, saying of the recent “Apps for Metro Chicago” contest.

The apps were fantastic, but the real output of A4MC was the community of urbanists and coders that came together to create them. In addition to participating in new form of civic engagement, these folks also form the basis of what could be several new “civic startups” (more on which below). At hackdays generously hosted by partners and social events organized around the competition, the community really crystalized — an invaluable asset for the city.

… The overarching answer is not about technology at all, but about culture-change. Open data and its analysis are the basis of our permission to interject the following questions into policy debate: How can we quantify the subject-matter underlying a given decision? How can we parse the vital signs of our city to guide our policymaking?

That’s the long game.  Holding apps contests is fine — but long-term and data-driven followup are what really make the difference.

Ectoplasmic Security Is Important Too.

May 3rd, 2013

Rebekah Brooks' Ex-Body Guard Charged In Phone Hacking Probe

Hmmm, wouldn’t that be a “Spirit Guard”? (Or perhaps “Zombie Guard”?)

Compound words and hyphenation… two great tastes that go great together.

ANVC Scalar looks interesting, but isn’t quite open source yet.

April 11th, 2013

ANVC Scalar logo

ANVC Scalar looks very promising:

Scalar is a free, open source authoring and publishing platform that’s designed to make it easy for authors to write long-form, born-digital scholarship online. Scalar enables users to assemble media from multiple sources and juxtapose them with their own writing in a variety of ways, with minimal technical expertise required. …

The feature list and the showcase look great. If this tool is even half as good as it seems to be, the world will be a better place.

There’s just one problem: the code isn’t open source.

I couldn’t find the source code linked to from their site. [Update: I eventually found a link to it, when I re-trawled the site one last time after having already written most of this post. The link is from their sign-up page, but the license stated there, the ECL-2.0, is not the same license as actually found in their source code snapshot. See below for details.]. There’s a contact form, which one could use to ask them for the source code, but hmm, that’s not how these things are usually done. The only message I could find about development was this:

Development Roadmap: Scalar is in ongoing development. This spring 2013 beta release provides broad public access to the platform via the Scalar servers (click the orange “sign up” button to get started.) While many authors have experimented with Scalar during our alpha phase, we are eager to roll the platform out to even more users. We look forward to hearing from you.

It’s perfectly fine to be planning to be open source and just not have gotten there yet, of course (though it’s usually a much better strategy to just be open from day one instead of waiting for everything to be perfect before going public under an open source license — the advantages of open source are greater the sooner in the development process you open up). But what the Scalar site says is that the software is open source: present tense. That’s only meaningful if there is source code released publicly under an open source license.

I finally resorted to Google: search://github anvcscalar/ (after github+scalar didn’t get useful results), and found their Github repository at github.com/anvc/scalar:

Congratulations on discovering Scalar, the next generation in media-rich, scholarly electronic publishing!

If you just want to create a Scalar project, the easiest route is to work from our servers. You can register and learn more at http://scalar.usc.edu/scalar/ . Using the version of Scalar that is hosted on our servers guarantees that you are working on the most up-to-date version of the software. During our beta phase, updates will continue to happen with some frequency as features are added, user feedback is incorporated and Scalar continues to broaden the horizons of electronic publishing. If you are technically inclined and decide to host your own version of Scalar, you’re free to customize and modify it in any way, but it’s up to you to download, install and troubleshoot updates as they become available.

We are also very grateful for all feedback based on your experiences using Scalar. We are especially interested to know where and how you are using it, innovative or unexpected uses of Scalar, requests for features, opportunities for future development, potential press, archive or scholarly society partnerships, as well as reports on any bugs or difficulties you may experience. Learn more at http://scalar.usc.edu/scalar/

So, is the open source code to Scalar really here? Well… sort of and sort of not. There is a complete source tree, and an “INSTALL.txt” file whose instructions look like they would get that version of Scalar up and running. But there’s only one significant code commit, from 11 days ago (March 30th): the initial import of a source code snapshot, under their own custom license that does not appear to be an open source / free software license as recognized by the OSI and the FSF. The restrictions placed by the license are not onerous, but I’m not sure the indemnification clause is compatible with the Open Source Definition, and clause 4 disallows anonymous and pseudonymous redistribution of modified versions, which I believe is also incompatible (as well as being a bad idea):

4. Any files that have been modified must carry notices stating the nature of the change and the names of those who changed them.

There are other potential problems with the license, but I won’t go into them here. The point is, this is not an open source license. So, stacking up the situation:

  • You can’t find the code from their site, at least not from the expected places. But if you’re persistent and use a search engine, you can find it.

  • They’re not doing development in the open. Instead, they’ve dumped one code snapshot out to the public, and it’s not clear at what intervals they will put out the next ones. There’s no public forum for development discussion, nor is there any public bug tracker. (Alternatively: maybe these things all exist and I just couldn’t find them?)

  • The license is not an open source license, though this appears to be more by accident than design — they clearly do intend to be open source. The best way would just be to use a recognized open source license, because even if they fixed the issues in theirs, people would still have to learn yet another custom one-off license. (There’s no need for them to spell out the trademark protections as they do because a copyright license does not imply trademark permissions anyway.)

It’s the point about not being under an open source license that makes them “officially” not yet open source. But the other points are important too. While you can be technically open source while doing development in a closed manner, why would you want to?

None of the above denigrates their technical achievement so far: the software is pretty exciting, and I hope these issues get resolved soon, because it would be great to have a tool like this available as open source software. I took the time to write about Scalar both because the project looks so interesting, and because what their current situation (open-source-wise) is not uncommon. We often see projects using the words “open source” without quite getting the tune. Fortunately, it’s easy to fix if they want to.

How easy is a Morozov-style takedown of Evgeny Morozov?

April 2nd, 2013

I just read Evgeny Morozov’s critique of Tim O’Reilly in the Baffler. It misses its mark pretty widely. I know Tim, and have worked with him on some of the things mentioned in the piece, and I don’t recognize the man Morozov thinks he’s found.

The article is profoundly intellectually ungenerous. If there are N ways to interpret something, Morozov picks the one that most matches his thesis, whether or not that’s the interpretation that makes the most sense in context. He also indulges in guilt-by-association and guilt-by-superficial-similarity. So what that Eric Raymond likes guns? So what if some things Tim says are similar to some things Ayn Rand said? Does that mean Tim O’Reilly is any closer to being a libertarian, second-amendment-quoting Randian? (He’s not — far from it. I’m not either, but some of my friends are. I wonder what Morozov would say.)

I’m increasingly disenchanted by Morozov’s apparent belief that he is a more careful and rigorous thinker than, well, everyone. This piece contained a great example of why. Morozov starts out by quoting O’Reilly:

Expanding on this notion of “algorithmic regulation,” O’Reilly reveals his inner technocrat:

I remember having a conversation with Nancy Pelosi not long after Google did their Panda search update, and it was in the context of SOPA/PIPA. . . . [Pelosi] said, “Well, you know, we have to satisfy the interests of the technology industry and the movie industry.” And I thought, “No, you don’t. You have to get the right answer.” So that’s the reason I mentioned Google Panda search update, when they downgraded a lot of people who were building these content farms and putting low quality content in order to get pageviews and clicks in order to make money and not satisfy the users. And I thought, “Gosh, what if Google had said, yeah, yeah, we have to sit down with Demand Media and satisfy their concerns, we have to make sure that at least 30 percent of the search results are crappy so that their business model is preserved.” You wouldn’t do that. You’d say, “No, we have to get it right!” And I feel like, we don’t actually have a government that actually understands that it has to be building a better platform that starts to manage things like that with the best outcome for the real users. [loud applause]

Here O’Reilly dismisses the entertainment industry as just “wrong,” essentially comparing them to spammers. But what makes Google an appropriate model here? While it has obligations to its shareholders, Google doesn’t owe anything to the sites in its index. Congress was never meant to work this way. SOPA and PIPA were bad laws with too much overreach, but to claim that the entertainment industry has no legitimate grievances against piracy seems bizarre.

Now, wait a second. Tim was spot-on. He’s pointing out the big problem of representative democracy: the distortion inherent in the transactional, seat-at-the-table model, the distortion that comes from having interest groups with deep pockets. They become, effectively, first-order constituents even though they’re not citizens. Tim reminded the listener that the explicit purposes of copyright law do not include pleasing any particular corporate actors — industry can be a means to an end, in these laws, but it’s not supposed to be an end in itself. If government can achieve the stated ends without making Demand Media happy, then it is free to do so. Of course, that’s understandably difficult for politicians in practice… But Morozov’s refutation isn’t about implementation details. It’s about the philosophy, the underlying purpose:

While it has obligations to its shareholders, Google doesn’t owe anything to the sites in its index. Congress was never meant to work this way. SOPA and PIPA were bad laws with too much overreach, but to claim that the entertainment industry has no legitimate grievances against piracy seems bizarre.

Read that carefully: Morozov is saying that, while Google is only strictly speaking responsible to its shareholders, Congress’s responsibility includes satisfying… industrial/corporate constituents? Not merely as a means, but actually as a first-order end??

Uh? Can he really mean that?

I assume he doesn’t, and that rather he’s just not thinking very carefully. Earlier in the essay, Morozov seemed just fine with the idea of “disrupting someone’s business model”. I guess he’s just not in favor of Tim O’Reilly being in favor of it.

(See how easy a Morozov-style takedown of Morozov is?)

Tim bounces a lot of big ideas around. Anyone sincerely looking for something to criticize could find something useful to say (and in many cases Tim would appreciate it, and even change his mind). Yet when Morozov gets close to one of these things, he shies away from making an effective criticism, and instead opts to make Tim’s ideas look bad through shallow, associative analysis, without saying outright what would be a better idea. Morozov provides no constructive analysis; he wants someone to be wrong, but he doesn’t particularly care what’s right. This is just Andrew Breitbart for intellectuals.

Big ideas have porous boundaries, but that isn’t the same as being meaningless. A good critic recognizes the useful big ideas, and after puncturing them helps define their boundaries better, or else counters with other ideas — puts something on the line, actually comes out and says something capable of being refuted. Morozov never takes the second steps. He plants seeds of doubt, but takes no responsibility for the crop that results.

(Update: O’Reilly himself seems to have had a similar reaction to mine.)

Rants.org RSS feed fixed.

March 16th, 2013

The RSS feed for this site is now fixed. Thanks, Gunnar.

Someone designed that on purpose?

March 15th, 2013

At the bottom of this otherwise good article is a little circle with the label “604 Kudos” next to it (the number will be different by the time you see it):

sashmackinnon.com/what-its-like-to-die

If you move your mouse pointer into that circle, without clicking, the circle reacts and the text changes for a second or so to “Don’t move”… then a moment later the picture is a little heart and the kudo count has been incremented by one. You’ve apparently kudo’d this article even though you didn’t click!

Step 1

Step 2

Look ma, no click!

Step WTF

WTF? That’s so obviously wrong that I am at a loss to explain how any web designer could possibly have thought it was okay. If there is one user interface contract every user knows, it’s that “If you didn’t click, you didn’t do it”. Even without that bizarre “Don’t move” imperative, it would have been a bad idea; with the imperative, it’s an intentional bad idea. What are we going to see next? Click-through EULAs that don’t actually wait for you to click, but claim your agreement because you hovered your mouse pointer in the wrong place?

This isn’t just a dark road. It’s a dark and silly road. Designers, resist please.