December 2010

National Inventory of Legal Materials

A project I’ve been working on for a few months (with support from O’Reilly Media) is now public, and I’m really excited about it — it’s a good cause, and also has the potential to become a very interesting piece of software.

First, the good cause part. From Carl Malamud’s post on O’Reilly Radar:

The Law.Gov movement identified 10 core principles for the dissemination of primary legal materials in the United States. If you find a jurisdiction that violates one of those principles, you can enter in a bug report. The code for the bug tracker is open source and we have a bug tracker for the bug tracker so you can help us get this ready for production.

The legal bug tracker is a classic open source story. We started with the Media Bugs code base developed by Scott Rosenberg and his team… the basic Media Bugs code base was extensively reworked and repurposed to be adapted for the National Inventory of Legal Materials. …

In other words, we turned this:

Screenshot of mediabugs.org

Into this:

Screenshot of bugs.resource.org

The transformation goes deeper than just appearance. The bug report fields themselves are different. For example, MediaBugs.org uses bug types like “misquotation”, “faulty statistics”, “error of omission”, etc:

Screenshot of mediabugs.org, reporting a bug with selection menu opened

Whereas the National Inventory of Legal Materials needed a checklist of potential violations: Does the jurisdiction charge for access to its laws? Is access vendor-neutral? and so forth:

Screenshot of bugs.resource.org, reporting a bug

You might think we had to change the database schema. But we didn’t!

Ben Brown’s PeoplePods codebase, on which both the MediaBugs and NILM Bug Tracker code are based, uses a clever trick: an indirection table (the ‘meta’ table) that allows PHP code to associate arbitrary fields to the central bug object, on the fly. In the code, you just start using the field, prefixed with “meta_“:

  <input type="text" class="text"
         name="meta_jurisdiction_contact_city"
         id="jurisdiction_contact_city"
         .../>

…and everything Just Works. The field springs into existence automagically — you can start using it everywhere, run SQL queries against it, etc.

Which brings me to the second part of this post.

If you’re a PHP programmer looking to help out on a good project, come talk to us. We can use a few more hands on deck, and you’d look far to find a better cause than helping to make primary legal materials publicly accessible to the public. While the project’s first priority is to meet the needs of the National Inventory of Legal Materials (see the current bug and feature request list here), in the longer term, I think it might be possible to turn this codebase into a bug tracker generator: a system that allows people to quickly create and deploy a customized bug tracker.

Usually people address customized bug tracking needs in one of two ways: they make do with some existing tracker (e.g., filing non-software bugs in one of the many free software bug trackers), or they spend a lot of time and effort building a bug tracker from scratch or near-scratch. Neither solution is entirely satisfactory. In the first, users — often non-technical users — have to endure an interface that is clearly not attuned to the actual nature of the bug reports being filed. In the second, the burden of writing a tracker from scratch raises the organizational risk, so you end up spending a lot of time making sure the requirements have been spec’d out accurately, which of course is impossible.

This code base has the potential to drastically lower the cost of making a customized tracker. So far, I’ve simply forked the MediaBugs theme and made the customizations manually; the differences are not huge, and that was the fastest route to shipping. Eventually, I’d like to add an abstraction layer that encapsulates the customizations in a description file, that could be either be generated via some kind of graphical administrative interface or just written out by hand:

{
  "bug":
    fields :
      [
        [ /* Stuff that goes on screen 1 of the submission process. */
          "summary": {
            "description" : "A summary of the bug.",
            "type" : "text"
          },
          "body": {
            "description" : "The main description body of the bug.",
            "type" : "text"
          },
          "contacted": {
            "description" : "Whether or not the bug source was contacted.",
            "type" : "boolean"
          },
          "response": {
            "description" : "If the source responded, what did they say?"
            "type" : "text",
            "display_if" : "contacted"
          },
          ...
        ],
        [ /* Stuff that goes on screen 2 of the submission process. */
          ...
        ]
        ...
     ]
  ...
}

Again, that vision is for the long term. Right now, what we need are some PHP programmers who want to help make the current code base better serve its purpose of tracking legal access bugs. Once we’ve done that, there will be two trackers (the original MediaBugs, and the new NILM Bugs) sitting on top of the same foundational code, and we can use the difference between them to triangulate on the right set of abstractions to make a true tracker generator.

Take a look at the project, especially the README, and then ask questions if you think you might be interested in helping. It’s all open source, and your committment can be as large or as small as you want it to be.

bugs.resource.org

There’s about to be an outcry over the possibility that U.S. Internet service providers might start charging by the byte — so-called “pay as you go” Internet service. Before the hard-headed economic realists duke it out with the participatory democracy free-speech propeller heads (I’m in both camps, so I say all that with love), here’s a modest proposal:

Charge based on the square root of the number of bytes.

Sometimes we think it’s natural that people should pay based on how much of something they use. The exceptions to that are interesting: while we’re okay with the principle when it comes to food (perhaps because people all generally use the same amount, within a narrow range), many are not okay with it when it comes to medical care. We’re mostly okay with it for electricity and water, even though consumption can vary widely and both goods are really required infrastructure for life in a modern society.

We haven’t really decided how we feel about it for Internet usage. It has a certain appeal: why should your neighbor stream online videos all day, slowing down everyone and transferring orders of magnitude more data over shared pipes than you do, yet pay the same amount per month?

One counterargument is that you and your neighbor are rate-limited and can only transfer a certain number of bytes per second at a maximum anyway, so if you consider the monthly charge to be paying for that maximum, it’s up to each user whether they want to use the full capacity each month or not.

That argument’s not very convincing, though, because the system isn’t physically capable of supporting everyone at a maximum simultaneously anyway. It’s a theoretical capacity only; in practice, the pipes are a shared resource, and the ISPs deal with that reality every day. People who consistently use more than their fair share of that resource should face a disincentive.

A better argument might be: bidirectional Internet access is so important to participation in society that we should find ways to subsidize it. A world where the rich have access to all the online video they want while the poor have to make do with ASCII art is a losing proposition for everyone.

But then how to make sure there’s disincentive to over-consume?

Charging by the square root of the number of bytes transferred per unit of time means that each user’s costs rise with usage, but with a much flatter curve than simply charging a straight rate per byte. You pay more, and if you use orders of magnitude more you pay noticeably more — but you don’t pay orders of magnitude more. The teenager who wants to upload her own movie will be able to do so, but doing things like that often enough will merit some consideration from the user, which is what we want.

(Obviously it doesn’t have to be exactly the square root function; I just mean some well-defined function that flattens the curve and can be intuitively “felt” by users given enough experience. Square root’s probably a good one, I’d guess, but I haven’t actually worked through the data. What do I look like, some kind of hard-headed economic realist?)