The folks at Framabook graciously sent me some copies of the print version of their French translation of my book (in French, “Produire Logiciels Libres [2]“, in English, “Producing Open Source Software [3]“). They also sent some questions for an online interview to accompany the release, and Olivier Rosseler translated my responses.
The French version of the interview is now up at www.framablog.org/index.php/post/2011/04/10/karl-fogel-interview [4]. I’m posting the English original here, and thank them very much for asking such provocative questions.
From: Karl Fogel To: Christophe Masutti, Alexis Kauffmann Subject: Re: Interview french version POSS Date: Fri, 11 Mar 2011 19:05:00 -0500 Christophe Masutti writes: > Hi Karl, could tell a few words about yourself to our French speaking > readers? > > The French version of POSS has just been published, and your book was > translated or is being translated in other languages. What are your > feelings about all theses remixes of your work, all made possible > because you chose to put your book under a free licence? My feelings are 100% positive. This has simply no downside for me. The translation makes the book accessible to more readers, and that's exactly what I want. I'm very grateful to all the translators. > If you were to write a second version of POSS today, what would you > change in it or add to it? By the way, do you plan on doing such a > rewriting? Well, in fact I am always adjusting it as open source practices change. The online version evolves steadily; maybe eventually we'll announce that some kind of official "version 2.0" has been reached, but really it's a continuous process. For example, five or six years ago, it was more common for projects to run their own development infrastructure. People would set up a server, install a version control system, a bug tracker, a mailing list manager, maybe a wiki, and that would be where project development happens. But there's been a lot of consolidation since then. Nowadays, only the very largest and very smallest projects run their own infrastructure. The vast majority use one of the prebuilt hosting sites, like GitHub, Google Code Hosting, SourceForge, Launchpad, etc. Most open source developers have interacted with most of these sites by now. So I've been updating the part of the book that talks about hosting infrastructure to talk more about using "canned hosting" sites like the above, instead of rolling your own. People now recognize that running a hosting platform, with all its collaboration services, is a big operational challenge, and that outsourcing that job is pretty much required if you want to have time to get any work done on your project. I've also updated the book to talk about new versions of open source licenses (like the GNU General Public License version 3, that came out after the book was first published), and I've adjusted some of the recommendations of particular software, since times have changed. For example, Git is much more mature now than it was when I first wrote the book. > FLOSS is being produced pretty much the same way now than five years > ago. But forges have appeared that differ from the SourceForge model. > I'm thinking of GoogleCode, and especially GitHub. GitHub can be > considered as the "Facebook" of Open Source forges, in the way that > they offer social network functionalities, and that it is possible to > commit directly from one's browser. The notion of "fork" here is > different from what we are used to. What do you think about all that? Actually, I think the notion of forking has not changed -- there has been some terminological shift, perhaps, but no conceptual shift. When I look at the dynamics of how open source projects work, I don't see huge differences based on what forge the project is using. GitHub has a terrific product, but they also have terrific marketing, and they've promoted this idea of projects inviting users to "fork me on GitHub", meaning essentially "make a copy of me that you can work with". But even though there is a limited technical sense in which a copy of a git-based project is in theory a "fork", in practice it is not a fork -- because the concept of a fork is fundamentally political, not technical. To fork a project, in the old sense, meant to raise up a flag saying "We think this project has been going in the wrong direction, and we are going to take a copy of it and develop it in the right direction -- everyone who agrees, come over and join us!" And then the two projects might compete for developer attention, and for users, and perhaps for money, and maybe eventually one would win out. Or sometimes they'd merge back together. Either way, the process was a political one: it was about gaining adherents. That dynamic still exists, and it still happens all the time. So if we start to use the word "fork" to mean something else, that's fine, but it doesn't change anything about reality, it just changes the words we use to describe reality. GitHub started using "fork" to mean "create a workable copy". Now, it's true that the copy has a nice ability to diverge and remerge with the original on which it was based -- this is a feature of git and of all decentralized version control systems. And it's true that divergence and "remergence" is harder with centralized version control systems, like Subversion and CVS. But all these Git forks are not "forks" in the real sense. Most of the time, when a developer makes a git copy and does some work in it, she is hoping that her work will eventually be merged back into the master copy. When I say "master" copy, I don't mean "master" in some technical sense, I mean it exactly in the political sense: the master copy is the copy that has the most users following it. So I think these features of Git and of GitHub are great, and I enjoy using them, but there is nothing revolutionary going on here. There may be a terminology shift, but the actual dynamics of open source projects are the same: most developers make a big effort to get their changes into the core distribution, because they do not want the effort of maintaining there changes independently. Even though Git somewhat reduces the overhead of maintaining an independent set of changes, it certainly does not reduce it so much that it is no longer a factor. Smart developers form communities and try to keep the codebase unified, because that's the best way to work. That is not going to change. > In June 2010, Benjamin Mako Hill remarked in his "Free Software Needs > Free Tools" article that hosting open source projects on proprietary > platforms was kind of a problem. According to you, is this a major > problem, a minor one, or is it no problem at all? > http://mako.cc/writing/hill-free_tools.html Well, I know Mako Hill, and like and respect him a great deal! I think I disagree with him on this question, though, for a couple of reasons. First, we have to face reality. It is not possible to be a software developer today without using proprietary tools. Only by narrowing the definition of "platform" in an arbitrary way is it possible to fool ourselves into thinking that we are using exclusively free tools. For example, I could host my project at Launchpad, which is free software, but can I realistically write code without looking things up in Google's search engine, which is not free software? Of course not. Every good programmer uses Google, or some other proprietary search engine, daily. Google Search is part of the platform -- we cannot pretend otherwise. But let's take the question further: When it comes to project hosting, what are the important freedoms? You are using a platform, and asking others to use it to collaborate with you, so ideally that platform would be free. That way, if you want to modify its behavior, you can do so: if someone wants to fork your project (in the old, grand sense), they can replicate the hosting infrastructure somewhere under their control if absolutely necessary. Well, that's nice in theory, but frankly, if you had all the source code to (say) Google Code Hosting, under an open source license, you still would not be able to replicate Google Code Hosting. You'd need Google's operations team, their server farms... an entire infrastructure that has nothing to do with source code. Realistically, you cannot do it. You can fork the project, but generally you are not going to fork its hosting platform, because you don't have the resources. And since you can't run the service yourself, you also can't tweak the service to behave in the ways you want -- because the people who run the physical servers have to decide which tweaks are acceptable and which aren't. So in practice, you can't have either of these freedoms. (Some hosting services do attempt to give their users as much freedom as possible. For example, Launchpad's code is open source, and they do accept patches from community members. But the company that hosts Launchpad still approves every patch that they incorporate, since they have to run the servers. I think SourceForge is about to try a similar arrangement, given their announcement of Allura yesterday.) So, given this situation, what freedom is possible? What remains is the freedom to get your data in and out. In other words, the issue is really about APIs -- that is, "application programming interfaces", ways to move data to and from a service in a reliable, automatable way. If I can write a program to pull all of my project data out of one forge and move it to a different forge, that is a useful freedom. It means I am not locked in. It's not the only freedom we can think of; it's not even the ideal freedom. But it's the practical freedom we can have in a world in which running one's own servers has become prohibitively difficult. I'm not saying I like this conclusion. I just think it is reality. The "hunter gatherer" phase of open source is over; we have moved into the era of dependency on agricultural and urban infrastructure. You can't dig your own irrigation ditches; you can't build your own sewer system. It's too hard. But data portability means that if someone else is doing a bad job of those things, you can at least move to someplace that is doing a better job. So I don't care very much that GitHub's platform is proprietary, for example. Of course I would prefer it to be entirely open source, but the fact that it is not does not seem like a huge problem. The thing I look at first, when I'm evaluating any forge-like service, is: how complete are their APIs? Can I get all my data off, if I need to? If they provide complete APIs, it means they are serious about maintaining the quality of the service, because they are not trying to lock in their users through anything other than quality of service. > In France, high school and junior high students don't have computing > classes. Do you think computing as a subject -- and not only as a tool > for other subjects -- should be taught in schools? Absolutely. The ability to understand data and symbolic processing is now very important. It's a form of literacy. You don't have to be a programmer, but you need to understand roughly how data works. I had a conversation the other day that showed this gap in a very clear way. I was at the doctor, having some tests done. The test involved a video image of my heart beating (using an ultrasound device), and the entire sequence was recorded. It was amazing to see! So afterwards, I asked at the front desk if I could get the data. Those were my exact words: "Can I please get all the data from that echocardiogram?" The clerk's reply was that they could give me a sheet with low-resolution pictures. "Thanks, but I actually want the data," I replied. Yes, she said, that's what she was offering. To her, the phrase "the data" did not have the very specific meaning it does to the data-literate. What I meant, of course, was that I wanted every single bit that they had recorded. That's what "all the data" means, right? It means you don't lose any information: it's a bit-for-bit copy. But she didn't have a definite concept of data. To her, data means "something that I can recognize as being related to the thing requested". For me, it was informational and computational; for her, it was perceptual. I realize this sounds harsh, but I really believe that is a form of illiteracy today. You have to recognize when you are getting real information versus fake information, and you have to understand the vast difference in potential between the two. If I go to another doctor, imagine the difference between me handing her a USB thumb drive with the complete video recording of my echocardiogram, and handing her some printouts with a few low-resolution still images of my heart. One of these is useful, while the other is utterly pointless. Increasingly, companies that have a deep understanding of data -- of data about you -- have ways to use that data that are very profitable for them, but are not necessarily to your advantage. So computing classes, of some kind, are a form of defense against this, an immune response to a world in which possession of and manipulation of data is increasingly a form of power. You can only understand how data can be used if you have tried to use it yourself. So yes, computing classes... but not only as a defense :-). It's also a great opportunity for schools to do something collaborative. Too much of learning is about individual learning. In fact, schools outlaw many forms of collaboration and call it cheating. But in computing classes, the most natural thing to do is have the students form open source projects, or participate in existing open source projects. Of course, the majority of students will not be good at it and should not be forced to do it. This is true of any subject. But for those who find it a natural fit, optional computer classes are a great opportunity that they might not have had otherwise. So as a chance to expose people early to the pleasures of collaborative development, I think computing classes are important. It will have an amazing effect for a subset of students, just as (say) music classes do. > Now one last question: what would be your advice to young programmers > wishing to enter the FLOSS community? Please answer with just one > sentence and not a whole book :-) Find an open source project you like (preferably one you use already) and start participating; you'll never regret it. Best, -Karl