Jon Udell: Hi, this is Jon Udell, and today's podcast is an interview with Chris Gemignani. He and his brother Zach started a company called Juice Analytics. They're both practitioners of, and advocates for, a more agile way of visualizing and analyzing business data.
Chris Gemignani: You were serious about making this Jon's radio.
Jon: [laughs] Yeah, that's funny, huh? So anyway, it's nice to finally sort-of meet. I was looking back at how our paths intersected and I guess the first time was when Paul Kedrosky did that bit of Flash visualization of the Tiger Woods' golf data. It struck a chord with me because I'm always annoyed at how poorly we visualize data, and in particular how we seldom make effective use of animation. So I thought it was cool that he did the Flash thing and then you came in with the Excel version of the exact same thing, which was interesting in two different ways. One of them, just that it was possible to do that. I'm not the only person who made the comment that, "Gee, I should know that it's possible to do that kind of thing in Excel, but I sure don't." The other interesting thing to me about it was how effectively the screencast that you made served as a teaching tool, so that when I went back and recreated what you had done, I did it just by following along in the screencast.
Chris: The Paul Kedrosky post was really a slow pitch into the sweet spot of what we do. One thing that is certainly missing from Excel is no sense of animation. While Paul Kedrosky's thing wasn't the most revealing use of animation, still it's better to have some ability to animate than none. And the other thing that's shocking about his post is, he did it in Flash but about 90% of the audience is not going to be able to follow along with something that's done in Flash. Excel really is the ubiquitous business tool for dealing with data.
Jon: Because you have the data as well at your disposal, which is, of course, the thing you don't get with just the animation.
Chris: Right.
Jon: I guess I would characterize what you and your brother are doing as kind of a boutique consultancy in the area of analytics. I really like the phrase "agile analytics" that you guys have come up with. It rings a lot of bells for me, and presumably for other people.
Chris: Both Zach and I have experience in a corporate environment. I worked in the credit card industry for quite a number of years. So we've seen a lot about how analytics and learning about customers takes place in the field and on the screens of people. I had done some stuff in my previous jobs where I was trying to do much richer data visualization of customer behavior, and that was exciting for both of us, and I think we found we had the opportunity to strike out on our own, doing analytics consulting for companies. There are a lot of companies now that are collecting a lot of customer data. I would hazard a guess that it's kind of exploded over the last ten years, as more stuff goes web-based and you've got click logs and all that kind of stuff to deal with. So people are collecting this stuff and the systems haven't really advanced to getting that much insight out of it.
Jon: It's pathetic. We're basically stuck with a handful of stock charts, and that's not really very helpful at all. I guess it just takes an interesting skill set. So you guys are quants on the one hand, and programmers on the other hand. I think there's a creative sensibility that's needed here because a lot of this is pretty much uncharted territory in terms of bringing alternative and more effective ways of visualizing data into the mainstream of business.
Chris: One thing that I think is true - you had a post just on your weblog a little while ago about how your superpower was carrying data across different boundaries. I forget exactly what you said.
Jon: You're in the same club.
Chris: In big businesses you have a lot of people who are specialists. They only work on one system, or they're analysts, but they don't do the database work. I've always been a generalist, myself, and tried to build a very wide toolset for doing things. And so we kind of bring that to bear. One of the things we do is we try and pull all of a company's data about its customers from all these different siloed systems into a single dataset and then, from there, we build these customer-level data sparklines. You talk about how it's pathetic how there hasn't been much innovation in how to view data. I think we are starting to see some new ways of viewing data starting to become at least common knowledge, so stuff like tag clouds -- more and more people are coming into contact with those, or with tree maps.
Jon: There's a handful of new widgets on the scene.
Chris: Some, but we probably went for about 10 or 15 years without any real new interface widgets.
Jon: Yeah, exactly.
Chris: More consumer-facing applications are getting slick, Flash-based interfaces. I think that's going to eventually push up to enterprise apps.
Jon: Here's my theory. See if you agree about the benefit of delivering office-like technology through the web. I think people tend to focus on a lot of the obvious benefits having to do with total cost of ownership and ease of deployment and administration. Those are valid benefits, but people are right to point out that you're also sort of giving away a lot of the power of the fat-client applications. Though it's sort of a paradox, right? Because people never manage to scratch more than the surface of things like Excel, but there is lots of stuff under the hood there, as you've been demonstrating. Maybe it comes around from another direction in the sense that what we have when we do things through the web is an environment in which whatever it is that can be accomplished is almost, by definition, shareable. Things were shareable, and are shareable, in the Excel environment, but not in the same way. People need to be able to actually see the things that other people are doing and then riff off of them with their own things. Kind of like that little exchange that we had, actually a couple of times. Somebody did a tag cloud thing, I did a tag cloud thing, you did a tag cloud thing. We were riffing back and forth on it. In an environment where the web is providing the playground in which everyone can do that together, it could get us to a place where we see a lot of accelerated innovation. At least that's my hope.
Chris: That's one of the frustrations I have with Excel. It's kind of a missed opportunity. First of all, I want to emphasize how really ubiquitous it is in a corporate environment. You really will find people working in Excel all day long, and they also used to work in Word, but now they're sending emails. But Excel is still there at the center of what people are doing. Microsoft is rolling out Excel 2007, and I don't really think ... they're not making it easier to share, they're making it more powerful. But again, those powerful features aren't what people use. People need a simpler set, and one of the reasons Excel is so popular is it does for a non-programmer sort of what MySQL does for Linux-heads. It's just a place where you can store your data; you don't have to think about it, you can dump some data in there and it's really easy to manipulate. Another thing we've seen: often we find when you're trying to do analysis in the workplace environment, the problem is really getting hold of the data and finding the right way to look at the data. It's not a problem of statistical sensitivity, or things being really all that subtle once you actually get your hands on the data, but the problem is actually finding the right way of looking at the data.
Jon: Well that is a pretty subtle thing, don't you think? There are lots of relatively useless ways to chart numbers, and not a whole lot of ones that produce novel insight.
Chris: Yeah. We're doing a lot of, on the one hand, high-end exploration with what you can do with Excel charting, and some of it's pretty crazy. But also one of our frustrations with Excel is the built-in charts are...there's a lot that's been learned over the last 20 years about how to make a good chart, and make it easy for people to read, and the built-in charts are just not effective at communicating.
Jon: And so you don't mean in terms of different chart types than the ones that are available, you're just talking about the execution of the ones that are in the kit.
Chris: Yeah, the execution of the ones that are in the kit. I think there's a sense of, from Microsoft's part, that we'll give you a huge palette of pretty things to choose from. But there's not much that's there to narrow down what would be the right charts to be looking at, at this time, with this data.
Jon: So if you were the czar of Microsoft Excel, what would you focus on doing?
Chris: I think there may be room, with Excel, to split it into some different tools. Excel tries to wrap up all these different functionalities into one package. I think there is an ability to split out a version of Excel that is mainly useful for holding onto a chunk of data and doing basic formulas. A lot of the advanced functionality that they're putting into this next version, I don't think that average business folks are actually going to be able to use. It would be nice to see some sort of a concept of a lightweight Excel that didn't give as many options to people, but actually tried to push people in a good direction.
Jon: Seems to me that, part of the problem we have, in general, with complicated software packages, is that people don't get much of an opportunity to observe other people using tools. It's occurred to me that that's maybe one of the most interesting uses of screencasting. Because, in general, the way that we, as human beings, learn new behavior is by emulation of other people who we see doing stuff. I've sometimes given this example of the ape and the termite stick, the first time an ape saw another ape use a termite stick to feed himself, it was like, "Wow! I'm going to get one of those, and that's a lesson I'm not going to forget." The way things are set up right now, we all sort of work in a fairly isolated way, which is to say, we're not looking over one another's shoulders very much. I've also thought about this in the context of peer programming, that a lot of that is literally about being able to observe the unconscious knowledge that the other person has. To absorb it by seeing it. To get ideas that would never have occurred to you and answer questions that you never would have asked, just by seeing things happen. Think about how 90 plus percent of the functionality of Office is below the tip of the iceberg, and no one sees it, and they particularly don't see it in contexts where they have a need and they see the solution. That's really how learning happens, how knowledge is acquired and sticks.
Chris: I think that's one thing we would like to do with our blog is I think there's been a lot of communication and conversation in the programming world, among geeks, about different ways of getting work done. You have all these different programming styles, and a lot of different methodologies for how to do work, and personal productivity, and a lot of rich conversations. A lot of people inside of business are just coming to this, and they really haven't been exposed to it very much. Agile analytics is a riff directly off of agile development methodology. I think its good because I think there's been all this innovation in ways to do development. More and more, the business environment for average businesses is looking a little bit like a software development shop.
Jon: Yeah, its been fascinating to watch you guys kind of open a window on ways of thinking about things, and ways of doing things that are not normally exposed. You know, it's tremendously interesting to me and, like you, I would envision that becoming more common behavior. That there's a lot of potential for collaboration and cooperation that's historically happened, to some extent, on the Internet. That's what interest groups have always done. But they've always done it in the medium of text, and particularly when we're talking about software behaviors and the analysis of data, and all the things that go on in those realms, text alone isn't the ideal medium to convey knowledge and technique. So, I really look toward a time in the not too distant future when it is relatively routine for people to do more of the kind of capturing and sharing that you've been doing. I'd be interested to hear what your experience has been with that because it certainly is a whole new set of challenges in terms of mastering the techniques that make it possible to do that communication, what's been your experience with that?
Chris: When I was reading the blog when you were coining this term, right away I was like, "Yeah! It's clear that this is going to be something that's cool and easy to do." I find it's easier to do a screencast even though it can take a little bit longer to set up and produce. You know what's going to be out there is something they can follow along directly. Whereas it may take you two pages of words to describe the technique that you're trying to describe. Words are necessarily imprecise and so someone runs off the rails at a certain point. They don't necessarily know where they went wrong.
Jon: Yeah, and it's not an either/or thing. Nothing ever goes away, you just add on more stuff. It's really exhilarating but its also kind of threatening to people There's just a whack of stuff that, on one hand, it's possible to do and really empowering to get your hands around but, on the other hand, I fairly challenging to learn.
Chris: But these things will get easier over time. Writing a blog used to be a challenge, and now people just go to blogspot. I think that as long as you don't ask people to edit very much, and I pretty much don't edit, what I produce for streamcasts.
Jon: If you do short ones, like you've been doing.
Chris: Keep it short, and talk really fast.
Jon: It is a remarkable thing.
Chris: People don't know what they don't know. If you read 43 Folders, Merlin over there has been pushing a wonderful concept that all this email sucks, and people didn't know it sucks, until he started telling them. They might have had some sense it sucked but weren't aware, of appropriate techniques for dealing with it. I think a little bit of what we hope for our site is to be a little bit like 43 Folders for the office set. Where there's a lot of stuff people do in Excel, I hate to keep banging on that because that's not the only thing we do. There's other skills, too. But people spend all afternoon trying to get some data formatted in a certain way in Excel and if they had the right tools it would take them a minute.
Jon: Yeah, and not having occurred to them that such solutions exist, they just keep on doing the same old, repetitive, time-wasting thing. If you could break that cycle...
Chris: In offices throughout America this is happening today, at this very moment.
Jon: As we speak.
Chris: Someone is wasting their time.
Jon: I've thought sometimes about if you could recover the economic value of all the time that people spend when replying to emails, reformatting the quoted text, it would probably be the GDP of several small nations. So, to circle back to your credit card days, you had mentioned feeling kind of scarred by the OLAP wars. Sounded like an interesting thing to hear.
Chris: I know it sounds dramatic.
Jon: I'm sure it is, in some ways.
Chris: I'm sure I can make it dramatic, if I push hard enough. I guess what Zach and I feel is that there's kind of a current "best practice" for dealing with data in an organization and that is that you have all these production systems. And on the backend you build an enterprise data warehouse, which is this enormous data store that all this information flows into. Then on top of that you land some sort of OLAP supporting solution like business object search or Cognos on top of that. And we've seen this in action and have felt really unsatisfied by it. I was actually involved with one client, not my proudest moment; we were designing an in-house data-mart, and spent most of the year actually building this thing. Spec-ing out all the resources to do this. And this was a large, large client, so there were a lot of internal layers to deal with, and nothing happened all that fast. And so we were supposed to be delivering this. And right when it was delivered, the organization went through a re-org, and this whole data-mart -- maybe a million dollars was spent and that whole effort just fell on the ground, and there was no one there to catch it, because no one needed it anymore. And this actually happened twice at this organization, two times in a row. For one thing, when you have this concept, this sort of monolithic approach, it's a question of whether your technical cycle has to operate more quickly than your business cycle. So if you re-org every six months, and to go in a new direction, which might be a perfectly valid thing to do, then you're going to find all these technical projects are going to wind up getting orphaned. That's a little bit of the motivation behind this agile analytics thing. Why get all bent out of shape trying to create a single version of the truth when you're kind of zigzagging, anyway. And you don't know quite where you're going to end up.
Jon: I think that you're probably right, that the single version of truth is the illusory Holy Grail that drives a lot of those efforts. Is that a fair statement?
Chris: Yes. I think that an awful lot of emphasis is placed on that. I do think it's largely a false premise, because even if you have the same number, there's many more questions about how that number came to be. And everyone has to put a really rich context around, "What does this number actually mean? What does it mean to me and my job function?"
Jon: Yes, yes. This is a hard conversation to have in some circles because....
Chris: Absolutely.
Jon: Because someone always snaps back and says, "Wow, your bank account either is or isn't what it should be, right? And you think there's any room for error there?" No, I don't, but, you know... [laughing]
Chris: It's different when you're working for an organization of 20,000 people with a lot of different responsibilities. The operations people have a specific set of goals that are important to them, and the marketing people have another set of goals. So some of those people really flip out when you're reporting numbers, and one set of numbers doesn't tie to another set of numbers. But there's an awful lot of reasons for those individual perspectives.
Jon: So what's been your best way of getting at the context?
Chris: What we like to do is try and come at the problem from a bottom-up perspective. We don't solve all these problems, but try and start from the customer first and work your way toward a common understanding. First of all, this whole concept of coming back to the single version of the truth, creating a single understanding of the customer, for instance, in a company, it's a social undertaking, not a technical undertaking.
Jon: And what would you mean by that?
Chris: You don't have a single understanding of who your customers are just because you have a single database that stores your customers. What it means is that at this company, we understand how our best customers behave, and how our best customers use our service. And we understand how our worst customers behave and how they use our service. We have a clear feeling that allows me to say, when I see someone doing something, this is a good customer, that's a bad customer.
Jon: So this is still, in many cases, quantitative, but there are also, I'm guessing, lots of qualitative or more narrative ways of getting at these behaviors?
Chris: Let me cycle back to this approach. What we do is we grab all the data from all the different data silos we can from the organization and build a comprehensive view of what the customer is doing, and try to build that visually into a picture, such that their 10,000 customers will be creating 10,000 pictures that show those customers' behaviors over time. And then, where you go from there is you can flip through those and visually try and recognize what's the difference between someone who's good for my company and someone who's not.
Jon: And so this is a Tuftean small multiple sort of a deal, where....
Chris: We build a web interface in front of that that you can look at a whole lot of small multiples in these little pictures, and then zoom in and look at someone in more detail.
Jon: This is obviously hard to do just in the audio domain, but is it worth trying to characterize what those things are displaying?
Chris: There's kind of an unscrambling-the-omelet approach. Traditional reporting is based on aggregates, where you're saying, here's how many customers are doing this and here's how many who are doing that. But that's hiding behavior that's going on underneath those aggregates.
Jon: Yeah.
Chris: So one example we have is a client who has an online schooling business.
Jon: Okay.
Chris: Students could subscribe to take a course, and there might be 90 lessons in the course.
Jon: Okay.
Chris: And they would take the course over six months. They might take two lessons a week. One thing we found when you look at that visually is that you will see people that are falling behind in the course, that are supposed to be taking two or three lessons a week, and they're only taking one. Then you'll see as they get toward the end of the course, they get stressed, and you'll see them complete 30 lessons in a single day.
Jon: So you're saying if you just aggregated all of that, you wouldn't see those individual patterns?
Chris: No, they're compressed out. For those thirty completed lessons, you don't know that that's not actually someone doing the work, that's someone gaming the system or, you know, gaming themselves.
Jon: Right. So is one of the ways that this can be described as an agile methodology that it's meaningful to do these individual cuts through the data set for relatively small samples? You don't necessarily have to process everything? You can take slices and get....
Chris: You could down-sample this. I think when we talk about the agile analytics, one of the main things we mean by that is that any time you produce reports or you produce some sort of number, most of the time it's just going to generate more questions.
Jon: Right, so you want to get that iterative kind of thing going.
Chris: Can you build an analytics approach that supports iteration?
Jon: Yeah.
Chris: And a lot of the so-called traditional approaches don't really support iteration because the data is lying in this one particular way. So when a number is low, are you going to actually be able to investigate that and figure out what's going on? To do that might take some kind of simulation or you're going to have to dig deeper or get down to the actual customer level and find out that a process is broken.
Jon: It sounds like, at the core of what you're talking about, then, are some relatively flexible data structures. What are those? What is the technology here that allows you to be more flexible?
Chris: So it comes down to how quickly can you take data out of all these systems and manipulate it and re-structure it very quickly? So technically on our side, we're using SAS in a lot of this stuff. But I think there are opportunities here for us in the future of analytics, to come up with ways of gluing some of this stuff together. So I don't really think that the solution is a more flexible of programming tools, I think it's finding more ways to connect data in lightweight ways. We have another client that builds a kind of metadata warehouse on top of a bunch of existing systems. They have this sort of RDF thing that describes, they extract it from a bunch of different silos; then they've built this sort of map that says what all the data they've extracted means.
Jon: Really? So they're doing actual RDF processing over that?
Chris: Yes. And it's really pretty interesting, because, first of all, it preserves all of the existing silos. It's a very lightweight approach. They're working in a very limited problem domain. So it's easy for them, once they've set up all the RDF, they don't have to adapt it all that much as they come into a completely new client.
Jon: So they're running RDF queries and transformations?
Chris: Yes.
Jon: Using what?
Chris: I don't know.
Jon: OK.
Chris: I actually planned on interviewing one of these guys for our blog because it's such a wonderful example of this kind of agile approach. They can roll something out that allows a consolidated view across a bunch of different silos in like a week's time or a couple week's time instead of 12 months.
Jon: Yeah, that does sound pretty interesting. I'm sure I'll be hearing from all of the RDF folks, as I periodically do. They keep waiting for me to see the light and get on board, which I probably will someday.
Chris: Another thing that we recognize, is computational power is really remarkable these days. And so it's possible for me, with a desktop PC, to replicate a lot of what I was able to do at Citibank working on some kind of enormous, parallel AS/400.
Jon: Oh, yeah, yeah.
Chris: So I can process hundreds of gigs of data on the desktop.
Jon: Yeah, the amount that you can stick into memory is getting to be really interesting, too. I've recently been feeling like we're on the verge of a huge in-memory database renaissance.
Chris: For a lot of problem domains, if you're dealing with a business where you don't really have that much data, you have tens of thousands of customers and objects running around, but not tens of millions, then all these new potentialities are going to be opening up.
Jon: I think that's absolutely right. There are lots of ways in which we see this happening. I did this metadata search thing for the Infoworld site, and we just turned it on the other day. The next day this guy wrote to me. He's got this Ajax style mash-up that takes all the metadata into workstation memory and does lightning fast interactive search with type-ahead completion. The fact of the matter is everybody has some kind of a working set, and with more and more RAM available, more and more of us can put more and more of those working sets in there and just party on the data, right?
Chris: Right. Can I just stuff all that data into working memory, and if I can, it doesn't matter how inefficient I am. I can do any kind of crazy thing I want. So fewer and fewer people are going to be the Citibanks of the world with a hundred million customers.
Jon: So what are you feeling about the prospects for making better use of animation to get at the time dimension, I mean, it seems so fundamental to me that our brains are hard-wired to detect motion. That's just what our pattern recognizers do, yet we make hardly any use of that in any of our representations or analyses of data. Where are the bottlenecks there? I think it must be a lot to do with tools and technologies, but also there's probably not enough ideas about what's possible to do or what would be a good way to do it.
Chris: I think the consumer shall lead us. You've probably seen the new Google Finance application where they kind of did Oliver Steele's...
Jon: Yes, I know, it's very much like that.
Chris: They did it big time.
Jon: But that's not really an animation. That's more just a range selection.
Chris: You can kind of slide it back and forth, I guess.
Jon: But that's not really, "Show me this stuff changing over time, or show me these things progressing through time in comparison to one another."
Chris: Yeah. Well, it's a missed opportunity for Excel because I'm pretty sure it's not in Excel 2007. But the ability to lay out a graph and then choose one of the dimensions that are behind that graph and say animate that dimension would be really nice.
Jon: Well, you've shown how to do it, but it's the hard way, right?
Chris: It's not that hard. Coming back around to Excel, again, Excel is fairly hard to extend. You can write these add-ins but you've got to deal with corporate security policies and all this kind of trouble. If you'll allow me to be Excel czar again for a sec.
Jon: Yeah, you're king for a day.
Chris: I think building out a lighter-weight, more minimal application with the ability to do extensions and kind of Greasemonkey style stuff that Firefox has shown us can be done. I think that's the way you make stuff happen. And if you have that, then you could, someone would figure out how to animate Excel charts. I mean, you know it's possible. And then, five million people would be able to install it and get that running.
Jon: Maybe it will be you.
Chris: Maybe it will be me. I don't know if you saw, I have a post very recently on an Excel chart cleaner.
Jon: I did see that.
Chris: A very important tool, I think, if you're into visualization, if you're into making stuff look good. A pretty easy-to-use tool that in a more or less a single click lets you apply tupty style infographic quality to Excel charts.
Jon: Do you have before and after examples up there?
Chris: Screencasts coming. And that I think will really help to explain the concept, but it just makes life easier. If you can make something just really easier for people to use, then they'll use it.
Jon: Yeah. What we all imagine is that Apple Knowledge Navigator scenario, right? Where you have this intelligent assistant, and you say, "Get me the Brazilian rainforest data," and there it is. "Now, let's see that over this period of time." OK. We've absorbed from science fiction this notion of what things could be in a world where the boundaries that separate the data silos are completely broken down, and where these transformations are things that just naturally occur instead of people like you and me having to roll up our sleeves and spend, maybe not weeks but at least hours. So that the whole thing is just kind of frictionless, and then we really do get to do that highly iterative and interactive exploration of scenarios. And, gosh, we're still awfully far away from that, aren't we?
Chris: Yes, but Excel does provide a sort of data canvas that a non-technical person can use. It can have all these pages and it's essentially limitless. This really connects with people and people do understand that. I'll close out with one final comment on that. What I think is the problem is people don't really know what they need. So how they're using it is not a great guide for how they should be using it, could be using it in the future. And the exciting this that we've seen with Firefox is it's got an easy extension mechanism, you have all kinds of extensions being created, people try them out, it's lightweight to try them out, and the things that are good persist. You can just harvest the best of the extensions to build the next version of your application.
Jon: So why is it so different, then? Why is Firefox with Greasemonkey so different than Excel with macros, which, by the way, are recordable, not possible in Firefox, someone still has to hack up the Greasemonkey code. So what is the difference there?
Chris: There's a big difference in mindsets. Excel is not particularly friendly toward add-ins, I don't feel. They're kind of a pain to create. I just had to create one, and Excel likes to warn you. Like, if you open a spreadsheet that has macros enabled in it, Excel does almost everything it can to get you to not open that file.
Jon: They've learned that lesson the hard way. [laughter]
Chris: It's kind of user-hostile and ... OK, there's some good reasons, but it's kind of... all right, well... [laughter]
Jon: Well, I guess we should wrap up.
Chris: OK.
Jon: OK. Thanks, Chris.