I’m taking a vacation from email for a few weeks, so please don’t worry if you don’t get a response — just resend in late August.
About fifty million years ago, I encountered a minor bug in the OpenOffice word processor. It was an easy fix, a menu layout problem or something like that, so I thought I’d have a go at patching it. Of course, the first step would be to build the latest development version of the code and see if the bug was still present.
Well, I got stopped on that step. I spent an entire day trying to build OpenOffice, and didn’t succeed. I don’t think I even came close, though it was hard to tell. I eventually concluded that to be an OpenOffice developer, you’d need to first get a Ph.D. in building OpenOffice, and gave up in frustration. It brought home to me the importance of making software easy for developers to build — especially in open source software, where you depend on developers who bring their own energy and who will quickly take that energy elsewhere if it is not rewarded.
Years later, the OpenOffice project forked — well, the actual story is a bit more complicated than that, but basically today there is LibreOffice and Apache OpenOffice. Both are active open source projects, and it’s fair to think of LibreOffice as one of two equally legitimate inheritors of the old OpenOffice mantle in the sense of development continuity. (Do search://apache openoffice libreoffice “document foundation”/ for the detailed story.)
I happened to be talking to some of the LibreOffice developers recently, and related my build experience from years ago, and how it had turned me off from ever considering OpenOffice development again, and from even considering LibreOffice development after the fork happened. The whole thing had left me scarred: buildability was such an obvious non-priority then that I didn’t see how a project could possibly ever get from there to something a normal mortal might build in finite time.
Wait, it’s gotten better, they said.
I expressed skepticism, but they swore it was true. Really?, I said. Okay, I’ll start from the top of the LibreOffice.org home page and see if I can find my way to useable build instructions, right now, right here, while we’re on the phone.
And you know what? They were right!
$ sudo apt-get update $ sudo apt-get build-dep libreoffice $ git clone git://anongit.freedesktop.org/libreoffice/core libreoffice $ cd libreoffice $ ./autogen.sh $ make dev-install
The whole thing built. Without errors. I had working libreoffice debug binaries in six easy, well-documented steps.
That was amazing — it changed my mind about how much a project can improve its build experience if the developers really decide to prioritize it. (Disclaimer: I haven’t tried the same with Apache OpenOffice; it might well be equally easy.)
They asked me if as penance I’d fix another minor bug, since I wasn’t able to fix that menu bug all those years ago, and offered bugs.launchpad.net/ubuntu/+source/libreoffice/+bug/1141106 as the victim. This seemed like a completely fair request; I didn’t make any promises but I said I’d take a look. Sadly, I have to admit that I’m not going to fix it any time soon, only due to other commitments. It’s not a hard fix in theory, but verifying that it works everywhere could take some back-and-forth with various bug reporters and testers, since it’s a modification to run-time shell scripts, and right now I need to ruthlessly cut down on small-scale random commitments.
So as an apology for not fixing that bug, I wrote this blog post. Kudos to the LibreOffice team for having given such a complex piece of software such an easy build process. Although by not fixing bug 1141106 I guess I’m contradicting my own claim, still, I think that being so conveniently buildable must be a major ingredient in getting developers in the door, and that this pays off for the project in the long run.
For those friends I’m not seeing in Portland this year: sorry to miss you this time!
Going to OSCON is always enjoyable, I always learn new things, and it’s wonderful to catch up with old friends and meet new people… but one can’t do everything everywhere. I wasn’t scheduled to speak this year, and there’s just too much on my plate. So, I decided to skip it this once.
See you next year!
Every Fourth of July, the New York Times prints the entire Declaration of Independence of the United States on the back page of its main section, in facsimile and in text. I read the whole thing on the subway this morning, just to remind myself what they were thinking of.
I’m pretty sure they were not thinking of a country where the government classifies the extent and nature of its surveillance, and even lies about it when citizens and their representatives ask. The distinction between discussing the overall process of surveillance and discussing individal targets of surveillance is crucial. Edward Snowden informed us about the process; he has been careful not to leak the targets (unless you count revelations of a very general nature, such as that we spy on the governments of our allies). No terrorist knows more today about whether they’re being watched than they did before Snowden’s leaks. Anyone trying to blow something up would naturally assume, and behave as if, they were under surveillance already.
Can anyone point to any real harm to national security from Snowden’s leaks? I have yet to hear of any. The leaks merely informed the citizenry of what we should have been informed of all along. It’s not a question of whether the government should sometimes be able to eavesdrop, or about whether there is rigorous enough judicial review or oversight. It’s that whatever we’re going to do, the policies about when and how we do it are legitimate matters of public debate — and we can’t debate them if we don’t know them. This is about civilian control over the military and intelligence services. Snowden himself said this eloquently enough, as have many others, so I won’t belabor the point.
But there is one slightly different argument I’d like to respond to:
Some people say that, even if in some abstract sense it is right that this information should come out and be debated, Snowden was wrong to leak it because in doing so he violated his oath to guard the secrets he had been entrusted with.
But he had a conflict of oaths: on the one hand, he and those around him were sworn to uphold the Constitution; on the other hand, he’d made a promise to keep secrets secret. What is the right thing to do when you promise to keep a secret, and then the secret they tell you is that some people aren’t keeping their promises?
For those who still don’t feel that conflict as Snowden felt it, on this July Fourth I’d like point out that George Washington was an officer of the British militia in the American colonies. Well before the American revolution of 1776, he led a military force acting on behalf of the British crown, defending first part and then all of the Virginia colony’s borders. I don’t know enough about colonial militias to know if holding those positions required swearing an oath of loyalty, but it seems likely that it did (the new United States Army itself instituted an oath of allegiance fairly early on during the Revolutionary War). In any case there is at least some conflict in serving in a country’s militia and then leading an army against that same country’s army. But by their nature revolutions involve broken promises. You can read the Declaration of Independence as one long justification for when and why they should be broken (seriously, take a look).
Oaths sometimes conflict with each other, and you don’t always find out how until it’s too late. Then you have to decide what to do. Edward Snowden did the right thing in a difficult situation, and the debate that has ensued is evidence of this.
(It’s interesting that people who fret that Snowden broke his oath don’t seem to get as worked up when people get divorced and thus, in many cases, break their marriage vows. Marriage isn’t about national security… but then again, neither were Snowden’s leaks.)
If you agree, please say so, preferably in public — on your blog, if you have one, or on Facebook, or on Twitter, or on the bumper of your car, or on the back of your laptop. It’s important. There are a lot of people right now, especially politicians who are worried either about being attacked on national security or about losing the trust of the intelligence community, who feel they have to condemn what Snowden did. In some cases they’re sincere; in other cases they sense which way the wind is blowing in their particular environment and they say what they need to say to keep their position. I don’t even blame them, but it’s important that they not be the only voices out there. Say you’re glad to have the information that Snowden leaked. Explain clearly why it’s important that the public be able to talk about these things. Don’t let anyone feel they’re alone in thinking this, and you won’t be alone either.
Happy Fourth of July.
My post about how a central claim of the PRISM story turns out not to be true has drawn a wide range of comments. There’s one particular kind of comment I’d like to address here: the idea that, even if what I said was true, it was a mere technical detail and is not important in the larger story. (Here’s one such response.)
If the original claim about PRISM had been true, it would have had major implications for how we understand power and the nature of our political world.
The idea that the importance of a fact (or an error) would correlate to the number and familiarity of the words required to explain it is wrong. Just because life would be easier for reporters and readers if that were the case doesn’t make it the case. Sometimes, a thing is important even though it requires new words or concepts in order to be explained. This is one of those times. The number of syllables involved in causing a misunderstanding has no relationship to the significance of that misunderstanding.
Remember George W. Bush’s famous 16 words? “The British government has learned that Saddam Hussein recently sought significant quantities of uranium from Africa.” Those were just 16 words. The importance of the lie is unrelated to its length.
Greenwald and MacAskill did not lie. They simply misunderstood something. But what they misunderstood was very, very important. The issue in the PRISM reporting apparently arose because Greenwald and MacAskill misunderstood the meaning of this label on an NSA slide:
“Collection directly from the servers of these U.S. Service Providers: Microsoft, Yahoo, Google, Facebook, PalTalk, AOL, Skype, YouTube, Apple”
When the leaks first appeared, only two things really stood out as news, especially to the tech community — news in the sense of being something we fundamentally didn’t know before: the massive scale of phone call logs collection, and the claim that the NSA could “directly and unilaterally seize the communications off the companies’ servers” (referring to online services companies, not phone companies).
The latter claim about “directly and unilaterally” seizing communications from company servers was the more shocking one. This is partly because it was about data, not just metadata. But it was also because it meant that people we thought we knew — in many cases, people we’d worked with — had been hiding something big, something that, unlike (say) receiving and acting on National Security Letters, we didn’t think the law required them to hide and that we would not expect could be successfully hidden for long. It meant that not only did the system not work the way we thought it worked, it wasn’t even built the way we thought it was built. The moment I first read that quote, I straightened in my chair. If this is true, I thought, then we’re living in a very different place from the one we imagined.
The quote is from Glenn Greenwald’s and Ewan MacAskill’s original article in The Guardian on June 6th:
…defenders of the FAA argued that a significant check on abuse would be the NSA’s inability to obtain electronic communications without the consent of the telecom and internet companies that control the data. But the Prism program renders that consent unnecessary, as it allows the agency to directly and unilaterally seize the communications off the companies’ servers.
Looking at that text, what would you think it means? (Go ahead and read it in context in the original article, just to be sure.)
The most natural interpretation of “directly” and “unilaterally” — really, the only interpretation the authors could expect, given the context — is that the NSA could get anything it wants directly from the servers of major online services companies, without asking the company first (hence “unilaterally”). In other words, the companies’ lawyers don’t have a chance to review the request and push back. It means a monopolar world in which even commercial services are essentially an arm of the government, instead of a multipolar world where, even though the government may be heavy-handed, there are still competing pressures, negotiations, and compromises — a world where the possibility of saying “no” still exists.
The scarier, monopolar interpretation was the one reacted to by the very next person the article quotes, Jameel Jaffer, the director of the ACLU’s Center for Democracy:
“It’s shocking enough just that the NSA is asking companies to do this… The NSA is part of the military. The military has been granted unprecedented access to civilian communications.
This is unprecedented militarisation of domestic communications infrastructure. That’s profoundly troubling to anyone who is concerned about that separation.”
Glenn Greenwald and Ewen MacAskill say nothing to correct Jaffer’s interpretation. As the authors of the piece, and therefore the people who chose to quote Jaffer’s reaction in the first place, one can only assume that the reason they did not correct Jaffer’s interpretation is that they did not think Jaffer misunderstood.
Eventually, I, along with many others looking at the primary source materials and other sources, realized that Greenwald and MacAskill had overstated this part of the case — the NSA does not have direct, unilateral access. It doesn’t have a secret route around company lawyers. It doesn’t have engineers planted in senior positions in every major online service company (or at least if it does, the documents leaked so far do not contain evidence of that).
Instead, what’s going on is, more or less, what we thought was going on, just with more abuse and less restraint on the government side. That’s serious, but it’s comprehensibly serious, not “Oops, I guess we live in the shadowy power of the Deep State after all” serious. The true situation is one that can still respond to popular pressure and political dissatisfaction.
I think it’s clear by now that what I (and Mark Jaquith and Rick Perlstein) wrote is right as far as the facts are concerned. For some other analyses that have come out since then supporting the claim that there is not direct and unilateral access, see Ashkan Soltani, and Declan McCullagh at CNET, and Hunter Walker’s piece at Talking Points Memo, especially the quote from Ben Adida of Mozilla, and the New York Times (despite the misleading title on the piece, its content confirms the less alarmist interpretation).
So why do I care?
Adapting a comment I made in reply to a reader of the earlier post:
The big picture is about understanding the true dynamics of the world we live in, so we can decide how to act and what is most important to focus on. The picture Greenwald originally painted is, more or less, one of government-dominated oligopoly in which basically all the big players sat down at the same table and agreed to play by the NSA’s rules. I don’t think that was an accurate picture. I see instead multiple power bases, with some degree of internal dissent within each organization (including even the NSA and the FISA courts, but much more so within the companies), and on important issues even open dissent between actors. Yes, there’s a lot of coercion and compromise, and there is no doubt that some companies hand over more than they should without asking enough questions — but they don’t all do that. Of course we shouldn’t be happy that the average person’s most immediate choice is which big protector(s) to grant conditional trust to. But as I said in response to someone else in a blog comment, it’s not like Russia and North Korea are the same thing (and the U.S. is neither). There are meaningful differences among surveillance states, and understanding the kind you live in is important if you’re trying to figure out which risks to take for what goals.
This is a more complex picture than the one Greenwald painted, but if it is a truer one, then the paths available for resisting a surveillance state are quite different than they would be in a more monolithic situation. Do you take to the streets, or do you file lawsuits? If the latter, then against whom, a company or the government? (I don’t mean to suggest these are the only options; they’re just examples.)
Hence the importance of people understanding that the government does not do unmediated “direct” and “unilateral” collection from the servers of all major private-sector online service companies. How realistic was that idea ever? What U.S. company, that originated as a mass-market services company and not as a government contractor, would agree to give government IT staff unfettered access to its live-data servers? The business risk would be incredible, the risk of public embarrassment incredible… the proposition just doesn’t make sense to me. It never passed the smell test.
People in the U.S. following this story are trying to figure out what kind of country they live in, because after all, there are countries where the companies wouldn’t have a choice about granting that kind of access. If Glenn Greenwald succeeds in persuading U.S. readers that they live in one of those countries, and he is wrong, then he will unintentionally help to erode the feeling of collective empowerment and of individual rights that is crucial for resisting further encroachment.
That’s why I care.
Addenda: One critic’s claim that I “repackaged” Mark Jaquith’s (very fine) post isn’t true. I wrote the bulk of my post before finding out about Jaquith’s; when I saw Jaquith’s, I thought it expressed the problem very well and I decided to point to it (and restructured my post accordingly). Also, though I am a Fellow at the New America Foundation, I had no awareness (until some commenters on my original post mentioned it) that NAF receives Gates or Schmidt money. Anyway, the funding for my work with NAF doesn’t come from those sources. Though the place the funding does come from won’t assuage those critics, since it’s the Open Internet Tools Project, which is largely funded by the U.S. State Department. To reiterate: the views expressed on this blog are my own and are not influenced by nor attributable to the New America Foundation, the Open Internet Tools Project, or any other organization. Finally, Mark Jaquith has updated his post to account for Greenwald’s response. I think Mark’s analysis of that response (search for the phrase “Update: Greenwald response”) is very good, and have nothing to add except a big +1.
Over and over we’ve read that Gen. Keith Alexander, the head of the NSA, claimed that its massive surveillance program has prevented “dozens” of terrorist attacks. Journalists are careful to report this claim as simply what Alexander said, not as a fact itself — we’re responsible journalists, far too wise in the ways of the world to believe something just because someone in the Administration said it! We know better than that.
Except that he didn’t say it. At least as far as I can tell — if anyone knows of a source for the claim other than the below, please let me know. So far, the only source I’m aware of is the exchange with Sen. Patrick Leahy referred to here.
What Gen. Alexander said was subtly but signicantly different, and he’s probably not surprised to see it being misinterpreted in the NSA’s favor right now. We shouldn’t look to the NSA for a correction on this, but do note that Alexander was careful not to lie. No doubt he would lie, if he had too, but this time we did the work for him.
(Not to take undue credit: this discrepancy was pointed out to me by a friend who prefers to remain unattributed. Later a mutual friend pointed us to this post, which has the quotes and the analysis and the video link. I’m really just repeating what that post has already pointed out.)
First of all, Gen. Alexander never said “dozens of attacks”. The dozens he referred to were dozens of call records that contributed to the discovery or disruption of… something, something he calls “events” (apparently elsewhere he’s only talked about two actual attacks disrupted; I don’t have the source for that, but if you do please leave it in the comments).
Watch how this works:
Gen. Keith Alexander: “…it’s dozens of terrorist events that these have helped prevent.”
Sen. Patrick Leahy: “OK, so dozens? Now we collect millions and millions and millions of records through 215, but dozens of them have proved crucial, critical, is that right?”
Gen. Keith Alexander: “For both here and abroad, in disrupting or contributing to the disruption of terrorist attacks.”
Sen. Patrick Leahy: “Out of those millions, dozens have been critical?”
Gen. Keith Alexander: “That’s correct.”
Fascinating. He didn’t say “dozens of attacks”. He does, at first, after a long and clearly thoughtful pause (see the video below), say “dozens of events” once. What’s an “event”? If you disrupt a terrorist meeting, that’s an event. If you disrupt a terrorist eating dinner, is that an event? Maybe. I don’t know. But I do know that when someone in national security wants to defend their work, they use the word “attacks”. Attacks are what matter. When they use the much weaker word “events”, it is not an accident — it is because the stronger word is not available.
Sen. Leahy then gives him the opening to subtly switch the subject to the call records, rather than the events or attacks or whatever they are. Whether Leahy did that by accident or not I don’t know either. But Alexander gratefully takes Leahy’s pivot, to the extent of avoiding even having an explicit subject in his next two sentences — he just grabs Leahy’s antecedent like a life raft and rides it the rest of the way.
He never said dozens of attacks. He very carefully did not say dozens of attacks.
Satisfied that he didn’t say dozens of attacks?
Now let’s look at some headlines:
NSA: ‘Dozens of attacks’ prevented by snooping (The Register)
NSA chief: Surveillance has stopped dozens of potential attacks (Chicago Tribune)
Alexander: Phone Collection Has Prevented ‘Dozens’ of Attacks (Democracy Now)
And just today I saw it in the New York Times too:
In a robust defense of the phone program, General Alexander said that it had been critical in helping to prevent “dozens of terrorist attacks” both in the United States and abroad…
|Experienced Washington NSA directors:||1|
|Experienced Washington Senators:||N/A|
|Experienced Washington journalists:||0|
Here’s that video:
[Note: This post now uses the phrase "collect-then-select", instead of "collect-then-analyze", which wasn't quite as accurate. Other than that, and adding the references at the end, I've made no changes. There is a redirection in place from the old URL.]
One notion that keeps surfacing in the ongoing PRISM leak is that intelligence services have started collecting vast amounts of data just to store for potential later use under a specific warrant. In other words, they want to have it all easily at hand for when they’re actually investigating someone and need to discover that person’s contacts, social network, travel patterns, consumer habits, etc.
For the actual investigation, so the claim goes, they’ll obtain warrants as needed, even if the initial collection was unwarranted — in other words, the collection phase can skate by without a warrant, because even though they have the data they haven’t actually looked at it yet, so no one’s rights are being violated. Then later when they do look at it, they make sure they have a warrant.
This sounds sane, or at least like a good-faith attempt to abide by some kind of legal framework while still getting the job done… until you think about it:
A low-level systems administrator just leaked thousands of top-secret documents. How can they guarantee that your data is safe, even if it’s supposedly just being stored and not analyzed?
This point is understandably hard for intelligence services to acknowledge. No one wants to think about their system’s failure modes. But if you’re collecting and storing private data about millions of citizens, failure modes become not merely important, but a dominant consideration.
Legal protections are designed with failure modes in mind. We cannot guarantee that our systems operate as designed; we can at best hope. This is why “collect then select” is a problem. It’s not because the data is hurting anyone by sitting idly in a storage facility, unexamined by humans or machines. It’s because you can’t be sure it’s really idle. If a conscience-stricken 29 year old can leak thousands of top-secret documents to a journalist, a more mercenary employee — or perhaps just one whose family is being threatened by some very interested party — can access your data and make it available to someone else. This risk is inherent in the centralized collection and storage of the data. By collecting it, the intelligence services have created another route of vulnerability for private information about you. I’m sure they’re doing their best to protect it, but in the long run, their best probably won’t be enough.
Anyway, as Moxie Marlinspike eloquently argues, we should all have something to hide.
I’ve seen the “collect-then-select” notion described in many places. The three I was able to dig up after the fact are all from the New York Times:
“Right now we have a situation where the executive branch is getting a billion records a day, and we’re told they will not query that data except pursuant to very clear standards,” Mr. Sherman said. “But we don’t have the courts making sure that those standards are always followed.”
Analysts can look at the domestic calling data only if there is a reason to suspect it is “actually related to Al Qaeda or to Iran,” she said, adding: “The vast majority of the records in the database are never accessed and are deleted after a period of five years. To look at or use the content of a call, a court warrant must be obtained.”
Timothy Edgar, a former civil liberties official on intelligence matters in the Bush and Obama administrations who worked on building safeguards into the phone log program, said the notion underlying the limits was that people’s privacy is not invaded by having their records collected, but only when a human examines them.
That same article goes on to make another important point about why collect-then-select is problematic:
Moreover, while use of the database is now limited to terrorism, history has shown that new government powers granted for one purpose often end up applied to others. An expanded search warrant authority justified by the Sept. 11 attacks, for example, was used far more often in routine investigations like suspected drug, fraud and tax offenses.
Mark Jaquith’s post The PRISM Details Matter is spot-on. Glenn Greenwald has misunderstood a key technical fact, one that removes the most explosive charge in the whole scoop. And for some reason, Greenwald refuses to correct it.
The crucial question is:
Are online service companies giving the government fully automated access to their data, without any opportunity for review or intervention by company lawyers?
Greenwald essentially says yes, they are. Yet nothing leaked so far indicates that this is the case, and the companies all vehemently deny it. They say they have humans in the chain. The information leaked so far supports this claim or is at least consistent with it.
It looks like Greenwald & co simply misunderstood an NSA slide, most likely because they don’t have the technical background to know that “servers” is a generic word and doesn’t necessarily mean the same thing as “the main servers on which a company’s customer-facing services run”. The “servers” mentioned in the slide are just lockboxes used for secure data transfer. They have nothing to do with the process of deciding which requests to comply with — they’re just a means of securely & efficiently delivering information once a company has decided to do so.
As Jaquith emphasizes, this is not merely a pedantic point. This is central to the story, and as far as I can tell, Greenwald continues to misunderstand and thus misrepresent it. It’s an epic botch in an important story :-(.
An email I sent to some friends yesterday, about this exact same point:
From: Karl Fogel To: <undisclosed recipients> Subject: Re: Cowards | Uncrunched Date: Mon, 10 Jun 2013 14:18:57 -0500 One of the above wrote: >Since the topic has taken over part of my morning, thought I'd share: >http://uncrunched.com/2013/06/07/cowards/ I read this post when it came out, yeah. I think it's mostly wrong. What is described here is just a delivery mechanism. *If* you're a company that's complying with government requests for data (and not all requests are abusive or unreasonable) a lockbox is a perfectly sensible way to do it. Sure, the lockbox may run on a server that belongs to the company, but this is not the same as -- indeed, is *totally unrelated to* -- giving the government direct access to your servers, the servers that are related to the actual service you provide as part of your business, which is how far too many bloggers are portraying it. Grrrrr. Uncrunched quotes Claire Cain Miller approvingly: "While handing over data in response to a legitimate FISA request is a legal requirement, making it easier for the government to get the information is not." What? This makes no sense. The lockbox may or may not make it easier for the government, but it sure makes it easier for the *company* to securely hand over data while lowering the risk of some unauthorized third party gaining access. If you're going to comply, might as well do it responsibly and without increasing the compliance burden on yourself. What the hell are the companies supposed to do? Put the data on a CD-ROM and mail it to Fort Meade? It's not like there aren't legitimate things to complain about here. I don't understand why Uncrunched is wasting time with non-problems.
Update 2014-05-08: It’s nice to see that the New York Times — and the U.S. Federal Trade Commission! — have caught up with this problem: Off the Record In a Chat App? Don’t Be Sure: Case Against Snapchat Tells Users to Beware. (Compare it with this rather less skeptical earlier story in the Times from February 2013.)
Update 2014-01-29: a shorter and less technical version of this post is now published on Slate / New America Foundation “Future Tense”: Privacy Apps Like Snapchat Make a Promise They Can’t Keep.
Some apps are making an impossible promise, one that these days might really matter to people. The promise is this:
“You can control the copies of data you send to other people.”
You can’t. It’s not even possible in principle. If an app promises that you can send people email messages, photos, audio recordings, or videos that will “self-destruct” or “can only be viewed for a limited time controlled by you” or “can only be viewed by people you approve”, just smile and back away slowly.
These promises all depend on client-side betrayal. That is, they depend on a device obeying commands from someone other than its owner. For example, a smartphone that does what the phone manufacturer — or mobile carrier — wants it to do, instead of what its owner asks.
Now, it’s true that some devices actually do practice client-side betrayal. This is one of the reasons I don’t own a Kindle.
But when you send someone a message or a photo, how can you be sure they’re using a device that will betray them? You can’t. You can’t count on their device serving your goals instead of theirs.
You might think apps only make such promises for platforms where they can count on the recipient’s device being of the betraying sort. But app designers can never know for sure that the necessary betrayal will occur as required. To start with, many apps run on Android devices, after all. The Android operating system is open source (admittedly on a very long release cycle, with various manufacturer-specific proprietary divergences along the way, but still, in the long run it is open source). A sufficiently motivated user could modify their Android device such that every frame written to the screen or every sound written to the speakers is recorded to the SD storage card. That video may have self-destructed as advertised, but that doesn’t really matter if there’s a perfectly good copy still sitting in permanent storage. Heck, while we’re at it, why even believe the deletion happened at all? An Android user could modify the OS-level delete system call to not actually delete, but rather move the file over to an easily-accessible holding area from which recent items can be rescued if the user decides they’re interesting enough. (This is not so different from how most OS delete calls work already. Feeling safer yet?)
Everything I said about Android above applies to any open source operating system: GNU/Linux, Firefox Mobile OS, all of them.
If the user controls her operating system, then she controls her data, period. And open source means they do control their operating system. They don’t necessarily have to know how to program, they just have to know how to hire people who can program — just as they don’t have to know how internal combustion engines work to hire a mechanic to fix their car.
Open source means no client-side betrayal, at least for people who care enough to avoid it.
Anyway, even without open source operating systems, the recipient can still save it old school: just point another camera-enabled device at the screen and take a picture.
Who’s depending on client-side betrayal?
Full disclaimer: I first started noticing this trend through my work with OpenITP, but the opinions here are entirely my own. The users and developers OpenITP works with are people who need to be able to take privacy promises seriously; as a result, I’ve become more sensitive to those promises than I used to be. When someone makes a promise that conflicts with my technical understanding of how the digital world works, I start asking questions.
Here is a partial list of apps that have caused me to ask questions lately (emphases added):
“You can delete your message from the receivers phone.”
“Be confident sending private information and pictures. You have control over your messages, when you delete a sent message it will be removed from the receivers phone and images are not sharable unless you make them so.”
“Privly makes it possible for you to control your data after posting it across the internet. You can post to Facebook without allowing Facebook access to your communications, you can even unsend emails…”
[Update: Priv.ly improved their language after this post was originally published; the claims on their web site seem to be much more accurate now.]
Snapchat (from their blog):
“Deleting Snaps From the Recipient’s Device”
“After a snap has been opened, the temporary copy of it is deleted from the device’s storage. We try to make this happen immediately, sometimes it might take a minute or two. The files are deleted by sending a ‘delete’ instruction to the phone’s file system. This is the normal way that things are usually deleted on computers and phones — we don’t do anything special (like ‘wiping’).
While an unopened snap is being stored on the device, it’s not impossible to circumvent the Snapchat app and access the files directly. This isn’t something we support or encourage and in most cases it would involve jailbreaking or ‘rooting’ the phone and voiding its warranty. If you’re trying to save a snap, it would be easier (and safer) to just take a screenshot or take a picture with another camera.
Also, if you’ve ever tried to recover lost data after accidentally deleting a drive or maybe watched an episode of CSI, you might know that with the right forensic tools, it’s sometimes possible to retrieve data after it has been deleted. So… you know… keep that in mind before putting any state secrets in your selfies :)”
“sender-based control over who can read messages, where and for how long”
“[Wickr can] send text messages, videos, documents that self-destruct — all encrypted, and it exceeds NSA top-level encryption on the device before it goes out on network with a key that only you have.” (Founder Nico Sell quoted in Silicon Beat.)
“Secure, self-destructing email.”
(I don’t know anything more about this one; there’s no further explanation on the page, beyond the above.)
“Self-Destruct any message, file, photo, video or voice recording with our timed Burn Notice.”
[Note: I added this one on 2013-09-04, long after this post originally appeared, because I ran across this article.]
The Confide app turned out to be a kind of Greatest Hits of impossible privacy promises:
“Messages disappear after they’re read, ensuring all of your communication remains private, confidential and always off the record.”
“…we alert you if a screenshot is attempted.”
“Get alerted when your message is read.”
“…allows you to speak freely, without a risk of what you say being forwarded on or permanently stored…”
“We alert you (and the recipient) if the recipient attempts to take a screen shot.”
And my personal favorite: “Notify me about Confide for Android.”
[Note: added this one on 2014-01-09, after this post originally appeared, thanks to this article.]
Confusing the Threat Models
What these promises have in common is that they confuse two very different threat models. One is the scenario where you’re communicating with an ally — someone who, as far as you know, has no intent to do you harm, though they could do so by accidentally re-sharing something. The other is the scenario where you’re communicating with a stranger or with someone who might actively intend you harm.
The “Extra Details” section in the Snapchat marketing quote above is a particularly good example of this confusion. Somewhere between the first and second sentences, they subtly switch whom they’re addressing. This first sentence is clearly to the sender:
“While an unopened snap is being stored on the device, it’s not impossible to circumvent the Snapchat app and access the files directly.”
Then suddenly they switch to talk to the recipient…
“This isn’t something we support or encourage and in most cases it would involve jailbreaking or ‘rooting’ the phone and voiding its warranty. If you’re trying to save a snap, it would be easier (and safer) to just take a screenshot or take a picture with another camera.”
…then just as suddenly they switch back, still without any explicit acknowledgement that they’re talking to two different parties with possibly different interests:
Also, if you’ve ever tried to recover lost data after accidentally deleting a drive or maybe watched an episode of CSI, you might know that with the right forensic tools, it’s sometimes possible to retrieve data after it has been deleted. So… you know… keep that in mind before putting any state secrets in your selfies :)”
That middle portion, where they talk to the recipient, could be translated to “For our sake, please don’t interfere with the process of your device betraying you.” Recipients in the first threat model will cooperate; those in the second threat model won’t.
Ultimately, these apps aren’t really about security and privacy. They’re about convenience in situations where dependable privacy isn’t a requirement. An app that deletes a photo after showing it to your friend for six seconds is just a convenience for everyone involved. It makes things easier for both parties: Hah hah, look at this picture of me stuffing a hundred dollar bill into his sock while he pours a margarita down my throat! Wasn’t it a great vacation? Wish you were there. No need to worry about deleting this photo; it’ll take care of itself. See you Tuesday at the office.
That’s fine, if that’s all you wanted. But the vast majority of people who read the marketing around these apps will take them at their word. Wow, I can send an email that self destructs immediately after the recipient is done reading it? That’s great! It’s like attorney-client privilege without the expensive law degree. Where do I sign up?
People don’t think about threat models. They think about features and promises. If the app says it does X, they believe it does X.
The trouble is, problem recipients are not evenly distributed across all the pictures and emails and videos one sends. The problem recipients are concentrated in the sensitive items, because the temptation to be a problem recipient is highest exactly for the things a sender would most want deleted. Or as the great saying has it (attributed to both George Orwell and Paul Fussell): “What someone doesn’t want you to publish is journalism; all else is publicity.”
Most people who come to my front door are honest, but the lock on the door is not for them. Promises of client side cooperation are pointless, from the sender’s point of view, if they are most likely to be circumvented by those most tempted to harm the sender in the first place.
One more example, just to drive the point home.
Some mass email services — “mass email” is not a euphemism for spam, by the way; these services are tremendously useful, helping legitimate organizations run their announcement email lists, etc — promise to tell you how many recipients have opened the email you sent.
“Whuh-huuuh-whaaat??” I thought to myself, when I first heard about this. How on earth can anyone else know whether I’ve opened up an email, let alone whether I’ve read it? My mailreading software does not send signals to third parties when I open messages. That would be an incredible betrayal of my trust.
MailChimp’s free reports tell you who’s opening, clicking, and coming back for more…
Our interactive graphs show you how many emails were delivered, how many people opened your email…
Opens by Location: See where in the world your subscribers are located and track engagement by country.
A/B Split Testing People who A/B test their email campaigns get 11% better open rates and 17% more clicks…
It turns out — surprise! — that they’re depending on client-side betrayal, of course.
These days, most people are reading mail in their browser, using one of the online services like Hotmail, etc, or in some other network-enabled email client. And when those email clients get an email that includes an image, they will (in some cases) display images by default — even if the image content isn’t embedded inside the email, but rather is merely linked to from the mail and has to be fetched (at the time the message is opened) from somewhere out on the Internet.
So what these services do is include a tiny image, just one pixel large and, if possible, the same color as the message’s background color. But they don’t include that pixel directly in the mail. Instead, they keep it on their own servers, at a URL unique to that particular message. When the recipient opens the message, their mailreader fetches all the images, even the tiny & invisible ones, and it is by receiving the request for the image at that unique URL that the upstream service knows the mail has been opened.
Some browser-based mailreading services have image display turned off by default, which avoids this betrayal. Google’s Gmail was one of them, until recently, but they have started transitioning to showing images by default, and their notice about the changeover explicitly warns users that this will allow some senders to know whether a mail has been opened. I’m not sure about the other services (if anyone knows, please leave a note in the comments).
Possibly Mailchimp talks about that somewhere, though I didn’t see it if so. The part that I saw just tells senders that Mailchimp can determine the “open rate” for the emails they send out. If Mailchimp has statistics on what percentage of email users set images to display by default — thus making themselves vulnerable to at least one kind of client-side betrayal — that would be very interesting; I don’t know if they do.
It turns out there’s a history here.
I showed an early draft of this post to my friend Jeff Ubois, and he instantly thought of who had covered this ground already. There’s a 2009 book by Viktor Mayer-Schönberger called Delete: The Virtue of Forgetting in the Digital Age; I haven’t read it, but the discussion with Jeff made me wish I had. There’s also an article by US Federal District Judge James Rosenbaum called In Defense of the delete Key”, published in the law journal The Green Bag, about how the Delete key doesn’t actually delete (as I touched on earlier).
Client-side betrayal can have consequences both social and legal. Phrases Jeff tossed out include “inadvertent waiver rule” and “spoliation of evidence”. The sender’s request to delete may still be considered an expression of intent, and that can be legally useful under certain circumstances. Pity about that photo appearing on the front page of the New York Times, but at least it won’t be admissible in court because you tried to ensure the recipients would delete it. That’s some comfort, anyway.
So here’s my promise to you:
If you send me something in digital form, you cannot count on me — or my devices — deleting all my copies of it unless I explicitly tell you they’re gone. The copies I receive from you will continue to exist as long as I want them to, and whether I share them with others is entirely a matter of social conventions and of honor, not of technical enforcement from your side. You also can’t be certain that I have not read an email you sent; you can be certain I have read it if I say I have, or if I reply to it.
But really, these are just the same promises the rest of the Internet makes. If someone thinks otherwise, it means they’re depending on client-side betrayal by your devices — but it’s up to you, not them, whether that betrayal happens.
Or as my friend Jim Blandy puts it: “It’s my computer, damn it!”
[I recently found this unpublished draft sitting in my CivicCommons Tumblr account. I'm not sure why I didn't post it back in January 2012, when I apparently wrote it, but anyway here it is now!]
Recently, during a discussion about civic apps contests, Abhi Nemani asked two sharp questions:
- How do you get meaningful ideas from city hall to entrepreneurs?
- What can a city do instead of an apps contest?
In response I brainstormed a bit, deliberately keeping the filter turned somehere between “low” and “off”:
- A city could maintain a portal tracking requests for particular data sets, and advertise that portal to potential app developers, with the idea that data set releases get prioritized when enough people are specifically requesting them. (Don’t some cities already do this on their data set pages, actually?)
One problem with this is that many app developers do not like to signal their intentions, both because they want to keep competitive advantage and because they don’t like feeling they’re promising to do something on spec — they want the data set available, but they don’t want to look like they’ve made a commitment about it. So the city can’t depend on people saying exactly what they want to do with the data set; the city just has to be willing to consider generic requests for the data set as meaning something.
- Cities are sometimes too driven by a need to quantify results, and have those results be immediate.
What if instead, a city just held a regular, recurring hackathon event, say 2-4 times a year, at which city technologists and local entrepreneurial hackers met unconference-style and did whatever comes to mind. The city keeps records of who attends. Then, as apps come out over the next few months/years/decades/whatever, the city figures out which apps are popular — if it can’t do that, we’re worse off than I thought — and compares the apps’ authorships with past hackathon attendees. When there is overlap, ask a few questions to make sure there’s some causal relationship, and when there is, consider it a policy triumph and don’t be shy about putting out a press release!
Come to think of it, (2) expresses the high-level principle I was really aiming at:
Cities should actively create an environment that encourages entrepreneurial hacktivity, and then try to measure the results a while afterwards, with actual usage stats as the basic measure of success (rather than relying on artificially-selected judges who don’t have time nor expertise to evaluate apps in real-life circumstances).
“Usage stats” doesn’t have to just mean number of unique users who use a certain app per month or something. The city could do surveys too. (For example, an app like Square makes a big difference to people who aren’t direct users, because it enables commercial transactions where there is no permanent storefront — so the buyers, who don’t run Square, should still be counted as beneficiaries. This can be hard to measure well; I’m not sure what a general answer would be, but listening to the “buzz” from different communities is going to be part of it.)
None of this is to condemn apps contests — they can do a lot to kick-start local entrepreneurial energy. But apps contests don’t by themselves set up a long-term, sustainable environment for civic hackitivity, in part because they’re always in the position of guessing future successes rather than highlighting and learning from existing successes. They’re a seed, but it would be a mistake to think of the resultant apps as the crop. The crop is rather an environment, in which city data output interacts dynamically with the community of people using it as input.
Chicago CTO John Tolva’s post “Open Data in Chicago: progress and direction” touches on this, saying of the recent “Apps for Metro Chicago” contest.
The apps were fantastic, but the real output of A4MC was the community of urbanists and coders that came together to create them. In addition to participating in new form of civic engagement, these folks also form the basis of what could be several new “civic startups” (more on which below). At hackdays generously hosted by partners and social events organized around the competition, the community really crystalized — an invaluable asset for the city.
… The overarching answer is not about technology at all, but about culture-change. Open data and its analysis are the basis of our permission to interject the following questions into policy debate: How can we quantify the subject-matter underlying a given decision? How can we parse the vital signs of our city to guide our policymaking?
That’s the long game. Holding apps contests is fine — but long-term and data-driven followup are what really make the difference.