A conversation with Dan Thomas and Suzanne Peck about open government

Series: Jon Udell's Interviews with Innovators
Date: 2006/06/23
MP3 link: here
Blog writeup: here

Jon Udell: Hi, this is Jon Udell. This week I spoke with Suzanne Peck, who is the CTO of Washington D.C., and Dan Thomas, who directs a program called DCStat. In the middle of June, DCStat began releasing a staged series of data feeds to the web. These feeds are in various XML formats, and they contain operational data about a variety of city services. The radical but refreshing idea behind all this is to make the delivery of taxpayer-funded services transparent and accountable.

Dan Thomas: Dan Thomas and Suzanne Peck from D.C.

Jon: Hi folks.

Suzanne Peck: Hi. How are you?

Jon: Excellent. Thank you.

Suzanne: All right, so here we are.

Jon: I would like to start with a 50,000 foot overview. There is some documentation about the Center for Innovation and Reform, and the particular project we're here to talk about today which is...is open government the working title for this project, or what do you want to call it?

Suzanne: I think that is absolutely wonderful. I've not heard that before. So I think you are coining a phrase of art. Do you?

Jon: No. Not me. I read that in one of your documents.

Suzanne: Okay. Well, I think together we can coin a phrase of art. I've not heard that before and that is exactly what we're doing.

Jon: Okay. So to set the context here, what Dan told me about that got me really excited was the feed pump, as he calls it, that was just switched on, I guess about a week ago. Which is, in a series of stages, going to relay out -- to anyone who wants to use this data -- a lot of operational information coming out of the city's systems about, initially, service requests to the mayor's office about things like street repaving and graffiti removal and such. But then there's a whole set of other data feeds that are scheduled to come out, I've got the list here somewhere, a bunch of stuff about real estate...

Suzanne: We can walk through that.

Jon: Yeah. The operative assumption seems to be first of all that, well, this is everyone's data anyway so why shouldn't it be transparent and available? But beyond that, why shouldn't the performance of government be transparently measurable? Hence, now, the ability to dip into this data and generate reports about, well, how long is it taking to resolve these requests? How many are still open? It's an absolutely delightful concept and I'd like to hear a bit about the genesis of this project. fascinating.

Suzanne: You've just given one of the most articulate rationales for what we're doing I've ever heard. Better than anything we've ever said. So you really have it, Jon. This whole concept I would say comes from three geneses. One is that if the city is going to take your tax dollars for services... if city municipalities exist to organize efficient citizen services, business services and visitor services, then they should be accountable for the efficiency and level of service with which they use those dollars.

Jon: What a radical concept!

Suzanne: Genesis II is that, historically, information about the efficiency and the level of services across an enterprise, a municipal enterprise, has been vertically available function by function, agency by agency. But that kind of data isn't real data, because real data doesn't occur vertically. Real data occurs horizontally. If as a citizen, for example, the things most interesting to you, that you are most focused on, are am I getting this services I am paying for? Am I safe? Are my children well educated? The answers to those questions don't come from any particular vertical agency. The answers to those questions come horizontally in an integrated way across a number of agencies, each of whom provides a part of the answer to that question. Am I getting efficient city services? Am I safe? Are my children well educated? Are there economic opportunities for me and my family here? Are there good neighborhoods in which to live? And if those things are not here, how does my city go about making sure that I am on the road to getting them? Those are all happening horizontally.

Jon: And I think what's implicit when you say horizontal is that, previously, it may have been difficult even internally to correlate across data sets. But exposing this stuff is going to make it easier not only internally but also externally. To be able to make a correlation between, let's say, decaying physical infrastructure in a particular neighborhood and crime in that neighborhood.

Suzanne: Absolutely. You're exactly right. This data was in vertical silos, and it was in a variety of formats that could not easily be translated into a common standard format and combined or permutated in any number of ways -- and certainly not in real time.

The third leg of this was technology. We literally never could have even begun to think about it without tool sets like EAI. Here in the city we have 370 systems. We needed to have some mechanism that let us reach into those systems, translate formats on the fly using middleware, and then lay up all that data into data repositories, in formats -- XML being one we are principally using -- that could be easily retrieved. Tools like EAI, like the enterprise service bus, like search engines, like...

Dan: Business intelligence.

Suzanne: Yes.

Dan: Geographic Information Systems.

Suzanne: GIS. Right. All of those things had to come to maturity, Jon, and had to come to maturity together, before we were able to say, we have the capability in real time to transform those types of legacy data into any type of combination and permutation so that it answers questions about crime, about safety, about education, about economic opportunities. So, for example, to your exact point a minute ago. The city administrator, the first way he used it was to establish crime hot spots in the city and I think he did 39?

Dan: There were 14.

Suzanne: Oh 14. Okay. So he did establish 14 hotspots in a 68 square mile area of the city and they were probably the very worst hotspot was the Washington D.C. equivalent of Chicago's Cabrini-Green called Sursum Corda, and it was a public housing development that was just like a Chicago Cabrini-Green. Now it's one thing to say okay, I'm going to improve the crime stats, I'm going to put a larger police presence there, and that's historically what people did. You'd put more policemen and wait and see what happened. And things would improve somewhat, but the second you didn't have that many police there, things just went back down again. This ability to integrate data in real time and make it geographically based allowed us to do anticipative patterning on the front end. And on the back end it allowed us to ask the question, why? Why are things happening? So we and the police certainly understood that if there are crack houses, if there are abandoned autos, if there are nuisance properties, if there are poor school test scores, if there are high unemployment rates, that all of those things together not only positively contribute to crime but if you remove them all, have a negative effect on crime.

Jon: Oh sure. I think everyone will appreciate the power of multi-dimensional analysis being brought to bear on these cluster types of problems.

Suzanne: Now see, you just said in 10 words what took me 400.

Jon: The additional insight here...you mentioned Chicago, and I'm sure you're aware of Adrian Holovaty's ChicagoCrime.org which was one of the early and most striking Google Maps mashups. He used a bunch of data similar to one of the feeds that you guys are slated to be publishing later this year, of police incidents. This was being published to the web but not in a very useful form. So it's my understanding that a great amount of the effort that went into his construction of that mashup was basically scraping web pages and doing a lot of reverse engineering of the data formats. And here you've just given it to me. I mean I'm sitting here and looking at a quick-and-dirty mashup I made with your data and Google Maps, and I'm at 14th street Northeast, and I click a marker and up pops "Curb and gutter repair, 2504 14th street, order, date, status". I don't really know anything about your internal information architecture. But that just makes the point that there can be independent application development that's going to drive this process out in a way that engages many, many more people, and leads to all sorts of interesting collaborations. And that is, to me, the most fascinating aspect of this.

Dan: I wanted to amplify something that you said. There's no question that Chicagocrime.org was an inspiration to us. We had been building similar tools that had been made available inside the district to staff here, leaders, managers and staff in the various agencies. One of the things we did see is, we've always had the vision of providing this information out to the residents of the District of Columbia.

Jon: Yeah.

Dan: One of the first things we developed were some views, called analysis view and neighborhood view, and those were directed at city staff of varying levels of technical skill. Then we developed another product called mobile view that runs on wireless devices. That's deployed out now to folks in the Police Department, to the Department of Public Works, Department of Transportation, and so on. They can now wirelessly access the same information that we're beginning to publish out through these RSS feeds. The final view that we had envisioned when we started two years ago was to also create a resident's view.

The world is a very interesting place, and we wanted to...dare I say... experiment. We wanted to see if the internet community might pick this up if we were very proactive at making it easily accessible to build things on top of our data that we couldn't necessarily even conceive, that will leverage the power of the Web 2.0 environment.

Jon: Well, there's no question about that in my mind. I'm looking at a point on the map. "14th Street, Northeast, curb and gutter repair" is the annotation on the record that I put there. The reason that I was able to put it there is because your data set includes latitude and longitude coordinates. Talk me through the process of how this data point got to where it is.

Dan: Okay. Well, where it might start is that a telephone call from a resident comes in and...did you say it was a curb repair or something?

Jon: Yeah.

Dan: Okay. So, it could be in front of their home. It could they are walking down the street and notice the problem.

Jon: They can call through the phone or log into the web and do it?

Dan: Correct. We have both of those methods and in our Mobile View application -- the next version is just coming out -- people who are not on city staff will be able to create new service requests also from their wireless devices. It goes to what we call the Mayor's call center, a central point where we collect this information, and it goes into a proprietary system there called Hansen that effectively logs the ticket. From there, depending on the type of call, it is then sent out to the appropriate agency. They get a work order, and there's a service level agreement that is attached to the various types of services that the city provides. They range from two days or three days to fill a pothole to a certain amount of time to correct that curb problem. So when that's logged, our DCStat systems are plugged into that ticketing system as well as to numerous others around the city, and we receive that record within five minutes of it coming in to the city.

Jon: Okay.

Dan: And we begin to track the performance of that service request. An important thing to recognize about these feeds and service requests is that we do refresh those feeds hourly. So not only are we publishing this information for our residents, we're giving it to them at almost the same time that we're finding out. So they're in the loop from the beginning.

Jon: Let's talk a little bit more about the organization of the data. I was able to spray these points onto a map, but I wasn't really sure what I had accomplished or what kind of analysis I was going to do. Partly because, true, XML is self-describing, but I really don't have any documentation that explains what these dates exactly mean. I saw a bunch of orders that were all dated on the same day as the feed but I can't imagine that they were all new orders that day...

Suzanne: They are all new orders.

Dan: Those are all new orders.

Jon: For that day?

Suzanne: Yep.

Dan: For that day.

Jon: Really? Okay.

Dan: Actually, let me correct myself there. We have new orders, it's like a journal, also if there's a change in the disposition of a order that came earlier it moves from open status to open overdue status. Those types of...

Suzanne: But it means activity. Those are all new, that day, activity of some kind. Either original activity or closure activity.

Jon: Okay. I got it. Another question is that, when I first hit your data site, I got a big dump of seven or eight megabyte of stuff. I'm talking about the full XML files, not the RSS, but the full files with all the detail in them.

Dan: Right.

Jon: And it looks like it's running a couple hundred K every day after. Does that sound about right?

Dan: Yeah. There's actually three different products and its important to distinguish between them.

Jon: Okay.

Dan: The RSS feed is intended for human consumption. It's intended for people to subscribe to, and it comes out both daily and weekly. Summarizations and statistical reports of what happened in the previous day or the previous week.

Jon: Right, I saw that and in fact I'm subscribed to that one.

Dan: That's one of the feeds. The second feed is the very large file you described, the multi-megabyte file, and that is a daily file with a lot of information that's in the XML. That's intended to be brought down and brought into say an Excel spreadsheet, or an Access database, for people to start doing summations of their own. They can go beyond what we've canned and packaged in our daily and weekly reporting...

Jon: Exactly, yeah.

Dan: Use their own information...

Jon: Yeah, I stuck it into an XQuery database, for example.

Dan: OK.

Jon: What I noticed is that, though, I hadn't actually been there for about a week and then today the files were for the 19th and 20th of June, so I had the dailies, but the stuff between the 13th or 14th and the 19th wasn't there.

Dan: There's another link on our site that takes you to the archive, and that's where you'll pick up things that are not current information.

Jon: Well, it's actually the archive that I'm talking about. Right now, I'm on the archive page and under XML it has June 19 and June 20. Is it intended to be a full archive of all the...of everything, or...

Dan: We have a father-grandfather scheme, we're going to roll it up at a monthly basis...

Jon: OK.

Dan: And then in the current month we will have the month-to-date, but, to be honest, off the top of my head I don't remember the exact way that we're doing that. The point is that you'll be able to retrieve all the historic information as well.

Jon: OK.

Suzanne: Dan mentioned the specific data sets.

Jon: Yeah.

Suzanne: Can we talk for a minute about those data sets?

Jon: Yeah, absolutely.

Suzanne: Right. So, up now, is all of the service request data, the citizen service request information...

Jon: Yeah.

Suzanne: Which is service requests for things like bulk trash removal, downed trees. I, myself, call with great frequency about a tunnel into the city where occasionally there is homeless detritus, and I will call in to get that removed. So, that information is already there. On the 26th, Dan's going to put up vacant property information. On the 9th of July, he's going to put up MPD crime information. You can see a cadence of about every two weeks. So, future two week increments are going to be things like real estate, crime information, crime reports and arrests. Then we'll be putting up things like real estate sale information, property location information, liquor license information.

Jon: You've obviously had access internally to this stuff for quite some time...

Suzanne: Yes.

Jon: Apart from the raw data that's becoming available for people to use as they may choose to, what kinds of public-facing applications are on the web now that make use of this stuff as it sits behind the firewall?

Suzanne: The Department of Consumer and Regulatory Affairs has a whole set of business information that aids online application for small business licenses. The District of Columbia is a city of 90,000 small business people, and now they don't have to come to brick-and-mortar and stand in line for a day to get their business licenses. Our DMV application... when I came to the city, the true horror of the city's residents was having to procure or renew their driver's or vehicle license, it literally took an entire day. Now because all that DMV information is available forward-facing, it is a four-minute web application.

Jon: That's cool. Things were the same in Baltimore when I was there; I hope it's improved there as well. On top of the data, I was feeling the need for some, well, if you guys were supporting software developers officially, there would be, you know, documentation, sample applications, things like that to get people jump-started. What are your thought along those lines?

Dan: First, if you do go into the feeds, there's quite a bit of XML data in there, as you've discovered.

Jon: Mmmm-hmmm.

Dan: I should say that what we did and gave you access to is a soft launch in the sense that we are trying to work out all the little kinks because this is a rather new area for us.

Jon: Yeah.

Dan: But, we are putting together information about the data properties that we're publishing first. And there's some... what I'm hearing from you is it left you hungry for a little bit more.

Jon: Well, that's true, but it also made me think, well, I could easily infer the wrong thing. In other words, if I look in your data set and I see this number of records that apparently were ordered on this date and they're still not signed off, and this many are listed as overdue, I might or might not reach the correct conclusion.

Dan: Yes, and so we are preparing information about that so that you can do the correct interpretation. Some of the other secret nuggets that are down there that we're going to be acquainting people include some keys that exist in these files that you'll see show up in the other data sets as well. Key information that has to do with, for example, the address, so you can begin to cross-walk this data spatially.

Jon: Yeah.

Dan: We've given a number of geographic aggregation identifiers, so if you live in a particular ward, you can aggregate things out in that manner. So we've provided a number of things, and we know some ways to apply them because we are using them internally...

Jon: Mmmm-hmmm.

Dan: So that is coming. In some ways you can't use it until at least the second data set shows up. But definitely there's some additional material coming.

Jon: Now there is an interesting potential down here, which I'm sure you've thought about, and it is that -- and I've always found this to be a fascinating thing -- there's this thing that happens around public data. When people discover that actually is public, it frightens them. Like, well, I didn't really mean for it to be THAT public. For example, information about people's property taxes. The kinds of things that were always notionally public, but you had to schlep down to City Hall and dig through a pile of stuff to find it. So it wasn't something that was going to be showing up in somebody's spreadsheet as part of a comparison with a bunch of other things. Or voter registration. I think there was a case in, I'm going to say it was in the state of New York a couple of years ago, where people were shocked to discover that their party affiliations were discoverable on the web. "That's an invasion of privacy!". Well, no, they're defined to be public records, right? But the thing is, they were notionally public, but they were practically obscure...

Suzanne: Good observation...

Jon: So I'm looking through your stuff and thinking that if I receive a liquor license or a business registration, and all this public information about the type of my business, name, address, blah-blah-blah, is there, I mean, you know, one of the kinds of entrepreneurial opportunities that's going to arise is I'll be getting spammed by a lot of people who see that there's an opportunity to do something related to that. It's just going to happen and it's not something that should derail a project like this in any sense. But I wonder what you think about that?

Dan: Another kind of a difficult situation that I'm aware of along the line of what you're speaking about is in Florida where they've gone in and been scanning all of the court records as document management project and then made that information available on the web...

Jon: Mmmm-hmmm.

Dan:... and, I don't know if you saw that, but there were a lot of things where people's names and social security numbers and so on were then exposed and hadn't been redacted before the information was made available. We are conscious of these particular kinds of things. Wo what we are doing for example with the service request, and perhaps that is one of the important points to make, is that this is the address where a city service was provided.

Jon: Right.

Dan: Not necessarily the person that called in the request.

Jon: Right.

Dan: There are a number of locales, and the district is one of them, that do publish tax information on the internet.

Jon: Right.

Dan: And, so, we're trying to balance the need to protect people's privacy, that is a very important aspect of this, but it's a world today where information is being locked up and -- have no doubt -- what we're hearing is actually, that many, many people do have our information. However, we don't know who they are, we don't know what they're doing with it. I think this is actually a very refreshing approach to that particular situation.

Jon: Oh, oh, it absolutely is. I'm just noticing that there's, on the one hand, there's a tremendously liberating and democratic aspect to all of this, but on the other hand, it's part of a growing cultural trend in the direction of transparency, and it's something that people are all going to be adjusting to over, well, certainly the course of the rest of our lives. And there will be unexpected, unanticipated consequences that are going to take people by surprise.

Dan: Yes.

Jon: But, you know, that's the nature of the world we live in and I'm not sure when you really think it through you'd want it to be any other way.

Dan: And the democracy point that you're making is really central to the motivation behind what we're doing here. We're moving, I believe, more towards a true democracy. Because this information has always been accessible, but it's actually typically accessible to folks that have the time and resources to go get it.

Jon: Exactly. Exactly right.

Dan: And that might be, let's say, in the case of even the press. There is the FOIA law that has enabled folks to go and litigate to collect this information, but then there's still an interpretation going on between the press themselves and the residents of the community. Well here, what we're doing, is putting it on a level playing field even for every person that can gain access to a computer.

Jon: Exactly. And when we get a few. I will call them "lead users" or "user innovators" in Eric von Hippel's terminology. Active civilian types who seize on to this stuff and bring it into the political discourse, so when someone says, well, " make the following claim about what I did or didn't do with the crime rate or the job rate over this period of time, someone can say, well, here's data, you know, that's your interpretation, this is mine, let's discuss the differences.

Dan: Exactly.

Jon: To be able to have a reference data set to ground that conversation is, it's tremendous, it's just tremendous to contemplate what that can be like.

Suzanne: You know Jon, now I just tested, I live in Fairfax County Virginia.

Jon: Yeah.

Suzanne: And so I just while you were talking I went on to Fairfax County Virginia Tax Records and put in my own address just to see, you know, what it is that's there and...

Jon: Yeah, what'd you find?

Suzanne: And so, there was, you can see there's my address, there's absolutely everything about my house including all of my assessment values in every tax year, all of the sales information about it.

Jon: Yup.

Suzanne: Everything about the building itself.

Jon: Yup.

Suzanne: The map, the structure size. But it's not only physical information, it is financial information.

Jon: Yup.

Suzanne: It's literally everything including, here's something that says neighborhood sales where I can go in, I just brought it up, and I can get the sales...

Dan: The comps.

Suzanne:...the comps of everything that is around me.

Jon: Mmmm-hmmm.

Suzanne: And so that's all...

Dan: Including driving directions to her front door.

Suzanne: Including driving directions, right. So you're dead-on when you say it's always been there, but it hasn't ever been this darn easy, and it's just amazingly easy...

Jon: Yeah.

Suzanne: So what is the reaction of people to seeing information that they always could have seen but never did? I think that's a fascinating question for you as a writer to ask.

Jon: Well, I know what the reaction is. It is, in many cases, shock, and a sense of privacy violation, which then gradually gives way to "Oh, well, yes, this is public data and I just never really confronted the reality of that."

Dan: One point of human behavior that I've observed with our systems that might be interesting to you is that invariably when folks get exposed to the systems where we've... The thing to understand about DCStat is that we've done broad integration across these data sets and pulled that together....

Jon: Mmmm-hmmm.

Dan: ...the first thing that they do is that they type in their own home address.

Jon: Mmmm-hmmm.

Dan: "What do they know about me?" And then they poke around and say, "Oh, OK, you know information about my service calls, you know my house value" and so on. Well, then the very next thing that they do is they say: "Well, what do you know about my neighbor?"

Jon: [Laughs]

Dan: And so they then do a little exploration there and then they start to say "Well, wait a second, what do you know about my neighborhood and how do I compare with folks in the next-door neighborhood."

Jon: Yeah.

Dan: So there's almost this Maslow's hierarchy, I don't know what to call it, but it's amazing, the pattern repeats over and over again.

Jon: Well, we should probably wrap up here. I want to say I really am just delighted to speak with you guys and to see that this is happening. I know that it's going to turn into all sorts of interesting, unexpected results.

Suzanne: One of the candidates, by the way, we are now in a mayoral election period, our mayor Anthony Williams, greatly beloved, who has brought the city back from a half-a-billion dollar deficit to a billion-and-a-half-dollar surplus has elected not to run for a third term, although the district allows you to be mayor for life if you would like, there are no term limits. So we are facing a September 12th primary. In this city the Democratic primary is tantamount to election, although there is a November general. One of the principal candidates now running for mayor says that DCStat is, and he says it in his short 10-minute speech out on the husting, that DCStat is the mechanism by which he is going to bring accountability to every leader and manager in the city, and it is the principal mechanism by which he is going to govern his administration if he is elected. That's very powerful stuff.

Jon: It is. It is. And I will just be fascinated to see how this turns out.

Suzanne: Jon, you're ending this conversation the way, literally for eight years every single writer, reporter, editor has ever ended a conversation with us. Would you like to know how that is?

Jon: How's that?

Suzanne: "Uh, well,I really have to go now, I'd like to stay longer, but..." [laughs].

Jon: [laughs]

Suzanne: 'Cause we never get tired!

Jon: [laughs]

Suzanne: Anyway, it was really fun talking to you.

Jon: Thanks very much guys.