This is an Urbanscale Weeknote titled “Weeks 43-44: International garbageman,” written by Adam Greenfield in New York on the 3rd of November 2011.

Weeks 43-44: International garbageman

Adam Greenfield on 3 November 2011

In teaching and public speaking, I pretty routinely make the point that every act of design involves choices that are deeply interested, in the sense that they necessarily serve someone’s needs before (or to the exclusion of) those of other parties.

This is not a particularly profound point, but you might be surprised how much pushback it generates. I’m, at least, always a little taken aback by how defensive people can be in the face of this argument. It’s not a value judgement, after all, just an assertion that it’s more or less impossible to design from a stance of neutrality and objectivity.

The trouble is that neutrality and objectivity are precisely the qualities that people habitually tend to ascribe to machine-readable flows of information, especially if that information was recorded by sensors in the first place. Very often, you’ll hear folks refer to something unimpeachable called “the data,” as though it had been Inscribed by God In The Morning Of The World. This tendency is particularly endemic to technologists, but it affects most of us in one way or another.

Again, we’re not suggesting that being interested — having an interest — is somehow an ethically compromised position, underhanded or dishonest. We’re saying it’s inevitable.

Subjectivity and curation can be found at every stage of any design process that involves the collection and representation of information that aspires to convey some state of affairs in “the real world.” Even in a simple scenario involving the plotting of GIS data on a map, there’s the decision of which readings to accept, and which “obviously absurd” values to discard, what resolution to default to, and how to label the features depicted.

This isn’t an abstraction for us. In the course of two current projects, Urbanflow and Transitflow, we’re working with just such data, furnished to us by the City of Chicago. I thought you’d appreciate a glimpse at the complexity involved when we venture to turn GIS files into high-quality, pedestrian-friendly maps.

Here’s the GIS information, more or less raw:

This is Leah’s description of what you’re looking at:

1. The big blue thing that looks like the river is the river.

2. The grey polygons are the actual streets as defined by the curblines (data from City of Chicago).

3. The red lines are the City’s “Transportation” dataset. (Yes, they include a line in the river to represent the transportation option presented by the river).

3a. Re: Wacker Drive: the red lines represent Upper Wacker as the outer two lines (one per direction) and Lower Wacker as the inner two lines (one per direction).

3b. The red lines that jump north from Wacker represent Riverwalk.

4. The dark blue lines somewhat coinciding with the red Riverwalk lines are the city’s representation of Riverwalk in their collection of “open space” datasets.

5. The hot pink lines are subway/El lines.

6. The black Ss are subway stations.

7. The blue Cs are CTA bus stops.

8. The fainter green lines are the 2009 TIGER line files from the US Census for Cook County (they are a streets dataset). In theory these should completely coincide with the dark red lines — and mostly do, except where Wacker Drive and Riverwalk are concerned. The TIGER file does not distinguish direction of traffic (which makes sense because the the two directions are not separate pieces of pavement) and represents Lower Wacker as south of Upper Wacker…then represents Riverwalk as a continuous line north of both, and in the river.

9. For fun, the faint grey lines are building footprints, and the heavier green lines are the Pedway paths through those buildings.

If you notice why it seems that there are buildings drawn in the middle of some of the streets, you may also notice that those apparent “buildings” coincide with subway stops. Those are the built above-ground stations of the El.

There you have it. There’s The Data. If you want to do something with it — like, say, turn it into a map people can use to find their way around the city — you’re going to have to interpret it. And, not to put too fine a point on it, that means making choices that are going to privilege some needs and potential uses over others.

Example: one of the primary features you’re looking at in this image is the three-layer infrastructure sandwich called Wacker Drive. There are features of this structure that primarily serve the needs of automobility, and there are others that are more salient to pedestrian or transit-rider experience. Because we’re forthrightly designing services to support the latter set of uses, we’ll emphasize the spatial features that are relevant to that particular understanding of the city, while abstracting, minimizing or ghosting back everything that isn’t.

In this view, Leah points out that the lines representing the top deck of the structure, Upper Wacker Drive, have been displaced horizontally, almost as if the z dimension had been folded back onto the ground plane. (I’ve visited Chicago a half-dozen times in my life, but somehow managed to never quite get that “Upper” and “Lower” Wacker Drive were entirely literal — i.e. vertical — designations.) If we’re interested in designing a two-dimensional artifact that’s intended to help people understand the spatial organization of the city, one of the first things we’re going to have to do is figure out a way to convey its layered complexity. And that means reconstructing a comprehensible map view from data that looks like this.

As it happens, there are extant conventions that handle the representation of this kind of feature on a 2D map; you’ll see the superimposition of plan view as dotted outline, as polygon of slightly lower opacity, or volumetric wireframes in false 3D. Any determination as to which (if any) of these strategies makes most sense in our context will be entirely driven by its ability to support pedestrian comprehension and use.

Hopefully it’s evident from these examples that interest isn’t a dirty word. If you’re a driver, there will be other and better representations of Chicago available to you, and we’d be delighted if you availed yourself of them. Our explicit interest is in helping people get around on foot, or via public transit, and that’s the set of needs we’re optimizing for. Any artifact we produce will be simultaneously an accurate representation derived from this GIS data, and inherently and straightforwardly biased.

- Not entirely coincidentally, the dynamic of curation and selection in Leah’s work resonates with a lot of what I’ve been up to the last two weeks. We’re preparing a master list of open municipal data sets worldwide, and I’m pointman on the project, which has so far meant a dozen hours or so of reasonably miserable, head-down grunt work on my part.

The project has a few goals, the first part of which has to do with helping ordinary people more readily find, understand and make use of open municipal information. (The feeds in question may themselves be machine-readable, but most of the sites indexing them are barely human-readable.)

The second part is in beginning to articulate standards for what a city or municipal governing body ought to be releasing. As I’ve begun wading through the thousands of datasets, streams and feeds available, the quality that just leaps out at me is the utter randomness and contingency of what’s on offer. There’s no consistency, no agreement whatsoever about what constitutes a minimum viable release on the part of a municipal administration.

So you get wonderfully idiosyncratic data sets — some of my favorites are Chicago’s Popular Fiction Titles at the Chicago Public Library, Boston‘s list of all retail bakeries in the city, or Dublin’s Log of Litter Fines Issued, 2003-2005 — but barely anything that would permit apples-to-apples comparisons between places. Some of our effort, then, involves identifying the most useful things currently being offered by any city, as well as getting a grip on the most sensible units of analysis and of measurement to apply to them.

If what we’re ultimately interested in is supporting sensemaking, this is both a critical and an unduly neglected aspect of the provision of data. You don’t ordinarily get, bundled with your nifty new open data set, any way to contextualize what you’re seeing, to validate it against other views of the world. To take a purely hypothetical example: if a feed somewhere suggests that there were 270 bicyclist injuries involving cars in the city last year, and “only” 213 this year, you’d be likely to conclude that things were getting safer. But the statistic in and of itself neglects factors that are hugely relevant to any proper understanding of the situation: whether or not the legal definition of an “injury” has changed over that time period, say, or whether the administrative boundaries of the city have expanded or contracted.

And a truly rigorous approach to data-driven sensemaking would at least try to correct for the pressure that exists on (institutional and individual) actors, at all times and places, to present pictures of the world that are favorable to their interests. Fans of The Wire will recognize this as “juking the stats.” This is what Wire creator David Simon had to say, in a 2009 interview with Bill Moyers:

You show me anything that depicts institutional progress in America — school test scores, crime stats, arrest reports, arrest stats, anything that a politician can run on, anything that somebody can get a promotion on. And as soon as you invent that statistical category, fifty people in that institution will be at work trying to figure out a way to make it look as if progress is actually occurring when actually no progress is.

This kind of thing is just an inherent risk at the collection end of any system designed to represent the real world and influence real-world decisions. These may be abstractions we’re discussing, but they’re abstractions with teeth; when the course of a career, even a life, will be inflected by what The Data says, there will always be abundant incentive for someone, somewhere to cook the numbers.

Given that, it strikes us that it would be useful to build in some mechanism that would allow users to backstop or reference what they’re being presented with against some externally-verifiable picture of reality. This is the most ambitious aspect of our project, and the bit we’re furthest from being able to address right now, but in the long term it’s potentially the most useful.

We’ll keep you updated, of course, as this initiative proceeds — I think we ought to have a shareable Google Doc up next week. For now, I want to thank everyone who’s already helped us by providing us with links to the open muni data resources you’re aware of. You’re helping define what best practices in this space will look like in the months and years to come.

- We’re delighted to see that a note of skepticism about the value and utility of QR codes in everyday circumstances has finally emerged in the technical community. As we’ve mentioned before, our gut take is that outside the East Asian context, relatively very few people understand just what a QR code is, or how to effectively use one. Further, we expect this will be the case for some time yet to come, especially in cases where code-scanning functionality isn’t integrated directly into the camera feature of a mobile device.

About that East Asian context. Personally, I really dislike it when people describe the Japanese experience with QR as somehow “ahead of” the West, or use that experience as evidence that there’s anything natural or straightforward about using the codes. I lived in Tokyo during the period in which QR codes were first rolled out in Japan, and I can tell you firsthand that it matters when there’s essentially a single national media market, a hegemonic belief in the need for a particular technology, and a strong consensus around “educating” the audience in that technology’s use.

Absent these factors, we’d expect you would see piecemeal adoption, haphazard and poorly thought-out deployment, against a background of diverging technical standards — i.e., precisely the situation which has in fact emerged in the West. Given this set of circumstances, our suspicion is that only a relatively small minority of passersby will fully understand and be able to act on the call to action implicit in a QR code. We therefore consider belief in QR’s efficacy to be an article of faith, and any investment in them outside the context of a very compelling use case to be rather foolish.

Or these, anyway, have been our own articles of faith. Since we’d prefer to anchor our design practice in the verifiable and the empirical, we’re going to be undertaking a program of New York-based field research designed to confirm or disconfirm our suspicions. What we’d like to be able to come back to you with is some statistically significant finding as to the percentage of passersby in a spread of New York City locations who recognize, understand and are able to act on a QR code presented to them. Then we can have a more meaningful conversation about the wisdom of deploying them in advertising and other applications in the public way.

- Logistics and housekeeping: At long last, J.D.’s finally getting his bad self on a plane and moving to New York City… which means the all of us will be here and permanently based out of the Centre Street studio for the first time ever. (Yowza.) With the crew assembled, plans are afoot for the first annual Urbanscale family Thanksgiving at Dinosaur Bar-B-Que. And I’m giving a few public talks in NYC, this week and next — keep an eye on our Twitter account for details. See you next week. Endmark