JU: Hi, this is Jon Udell and today's podcast is a conversation with Nathan McFarland and Benjamin Hill. Nathan's company, CastingWords, is a podcast transcription service that farms out the work to people all around the world using Amazon's Mechanical Turk system. Benjamin's project, Mycroft, packages up small tasks in ways that people can interact with on web pages. One of the taglines for Web 2.0 is "harnessing collective intelligence". These are clearly two leading examples of that idea, so I got Nathan and Benjamin together to compare notes.
JU: So the idea of this was that you guys are working in a similar area here. It's actually interesting timing, because I just finished up a screencast about the Sun Grid compute utility, the dollar per CPU hour deal and on the other hand you guys are working the sort of human end of the equation, in different ways. So Nathan I saw your talk at ETech where your presentation along with Amazon's Mechanical Turk talk and then we had a chance to talk a little bit after that about the ins and outs of your project. So CastingWords is a project transcription service that farms out the work to people through MTurk and then Benjamin, you got into touch with me about Mycroft, which is a word that has a number of meanings include Sherlock Holmes' brother and I guess Firefox search plug-ins, but your Mycroft is another sort of distributed human intelligence kind of a project, but it's different from MTurk in ways that we want to discuss. Since I know a little bit more about Nathan's project is about, maybe you can start and fill us in on the background for yours and start to sketch in how these are similar and also different.
BH: Absolutely, well I like that you mentioned the Sun Grid computing effort. That was part of the original inspiration for Mycroft is we were looking at what Suns providing and how much they're charging for the storage space, and we thought "well maybe some peer to peer version of that could do better."
JU: So, by the way, they're actually not charging for storage, they're charging for compute.
BH: I thought Sun also had a dollar a gig a month going on.
JU: You know that's Amazon, right? That's Amazon's S3.
BH: Amazon has a new one, but Sun did it a while ago for back up, now Amazon is doing it too and everybody is getting in on the act.
JU: It's interesting because I just happened to be talking to those guys, and so your storage is transient and is used by the compute farm but you don't pay for it and you don't pay for the bandwidth getting it up and down, so you really are just paying for, in Sun's case, compute cycles, whereas Amazon's S3 you're just paying for storage. So everyone, all different angles are being tried here, it's quite interesting.
BH: Absolutely, and at HP they're still trying to get the word publicized. The thinking is that with computons, you got the network, you've got the storage, and you've got the processing power, and then we're trying to think of what wouldn't just get its butt kicked as Moore's law continues to just double and double all of those. We came up with, well really human ability to do work is not going to double every time a chip speed doubles, there's still something intrinsic to what people can do that computers just can't. Barring any major amazing break through in AI, that could be the limiting factor coming up, so we chose to start focus on "what can people do that computers can't". Which I think is very much in common between our two projects. Then we decided to go small scale. What can you ask people to do without actually leaving whatever tasks they're already engaged in? That's when we decided to bundle it up looking like a banner ad, but instead of asking and trying to prey on users for moments of their attention, we try to ask them nicely if they'd care to contribute a little bit of brain power to solving a larger task.
JU: I have not seen one of these in action yet, but I've read through your materials and I gather that the idea is to present an extremely concise little puzzle that a human could solve easily that a machine couldn't. Examples would be "translate this phrase from one language to another" or "characterize this image" or there's a series of examples in the commerce net paper that I will also link to from my blog when I post. One of the things that I was thinking about was kind of the difference between the sorts of tasks that you're trying to package up in this highly interactive and, what's the word for it, you know these are like little micro tasks, right? Like you said something somebody can devote just a few seconds of effort to sort of here and there as they have spare cycles. Which differs a lot from the kind of thing that, well I don't know in general about MTurk,
JU: But the kind of thing that Nathan's project is, and we talked about this a little bit at ETech, that you actually started out with a model where you thought people would just chip in sort of little bits and pieces, but in the end its worked out better for you to give people more whole tasks, right, and give them more of a sense of completion.
NM: And these tasks are pretty large, I mean a podcast takes a little while to transcribe, but consistency and also worker satisfaction you'd have to have a really small chunk to make people very happy, I mean a Mycroft size chunk, because once you're transcribing, a minute is not much fun, when you transcribe a whole podcast though you learn something from it, you got a lot better worker satisfaction from that.
JU: Right, is there even a sense of, well I guess there's not, like if I go to LibriVox I know who read chapter three of call of the wild.
NM: Yeah, we don't actually push that through to our customers. We know and track our workers and who did what, but the customers don't know that.
JU: Right.
NM: We sort of package our work pool and process management and all that stuff together, and so they just give us a podcast and we give them a transcript and it's supposed to be dead simple and it pretty much is.
JU: So there's not a reputation effect here, but yet on one hand you get paid and on the other hand you can get paid for doing something that seems like a complete task, that you can feel like you did something worthwhile, you did a good job. So, Benjamin, from your perspective, it's a different incentive structure for the kind of thing you're doing.
BH: Absolutely, and we've done a lot of investigating into different incentive models here. We've started to think of the whole knowledge workspace as divided on one axis along people's opinions all the way to things that are absolute. Like people's opinions would be "tag the image", absolute would be "OCR check this one line of text". There's one correct answer versus "it really matters what people thing" and then along another axis we've got everything from easy where it just takes a few seconds of time, all the way up to very difficult where it takes a longer amount of time. If you can imagine both Mycroft and CastingWords occupying different bubbled areas of that knowledge workspace, so actually we're going after different targets and I think both have a lot of value at that point.
JU: I think that's right, I'll just read through a couple of the kind of examples you give in your commerce net paper. Because they do give the flavor of it, "is this image inappropriate for children over seventeen", "annotate this image with descriptive text", "what is the text in this captcha image", "how much do you like the clip from this new pop song", "which hairstyle makes you trust this politician more". So actually a whole lot of market research survey kind of stuff would seem to fit really nicely into this model.
BH: Yep, actually that's absolutely correct. Market research is always looking to find how you can get better data based on location, based on a demographic, and that's what we've had the most success with so far. As we bring these puzzles out to the people, placing them in the site, we know what site the people are on and possibly what they're doing on the time. We might have a little bit of demographic information about them, and you do things like an IP address lookup, you might know where in the country they are, etcetera. So that sort of on demand market research was a big chunk of the pie, and then off to the other side people's feelings about ok they're on a blog, they don't really want to pull out their wallet to open the tip jar, micro payments aren't quite there yet, they're not quite easy enough, hopefully they will be soon, but if they don't want to quite pull out their wallet would they mind contributing a few seconds of their time towards a task instead, something that is potentially a higher pay out than clicks on a banner.
JU: So this is actually a quid pro quo for the use of the site?
BH: Yeah, we were definitely were looking at that as an incentive model. I mean, salon.com you have to pull out your wallet to get to the premium content. Is it possible that if you do a few minutes of work you could get to the premium content?
JU: Like if I wash my dishes?
BH: Less attention grabbing alternatives to banner ads.
JU: Yeah, yeah. So, Nathan, from your perspective one of the things that I found fascinating about your project was the amount of infrastructure that you were able to use from Amazon. Not just the MTurk piece of it, but also the whole commerce engine and I gather you are even using S3 for transient storage. Is that right?
NM: That's true. We don't use Amazon e-commerce, but we use MTurk and s3. S3 has actually been really good. We keep all the mp3s there and we keep all the transcripts backed up there. It's been kind of nice, we've had one hardware failure and one machine move and since all of the external data is in cloud and redundant and safe, these things have been basically transparent to our workers and customers.
JU: Yeah that's really cool. MTurk is giving you a framework for; I mean it's basically handling payment for you, right?
NM: Yeah, but there's an actual toolkit called e-commerce from Amazon and we don't actually use that toolkit.
JU: Oh, okay.
NM: It's a technical term and sort of, whatever, it's a brand. So we don't actually use that, but yes. MTurk handles all of the internal payments to the workers and all that stuff and all that's tracked and handled through them. It works pretty well; actually the whole thing has worked out pretty well.
JU: Yeah, it sounds like it. What I was particular fascinated to hear and talking to you about is it sounds like you're in that enviable position where you're almost 100% devoted to the business logic.
NM: Yeah, it's nice and it is exactly.
JU: Yeah, yeah. It's having to do with tuning the sort of work through and the algorithms around acceptance and verification of work and things like that, right?
NM: Yeah, we do all of that stuff and other than that we get customers and that's what we concentrate on and the actual working is outsourced and a lot of other stuff is outsourced. We've outsourced Google ad text and stuff like that to MTurk. We just do the stuff we're good at, basically.
JU: Yeah, that's a good place to be.
NM: It's a fun place to be too.
JU: Right. So from your perspective, Benjamin, the infrastructure is not a given. It's actually something that you're going to need to create because this is, like you've said, it's kind of a new banner ad like model, but something that doesn't really exist yet, right?
BH: Yeah, we hope to leverage some of the existing ad content delivering networks if they can do some of the interaction that we need as well if they can support the sort of in page or personalized incentives that we're going after, but yeah there is absolutely a lot more to build. We think hopefully it will give us more of a sustainable competitive advantage in the long run if we're first to market with that, but we're also looking to partner to get a jump start on some of that. I am jealous of CastingWords amazing amount of infrastructure already in place. It sounds like an absolutely awesome setup.
NM: Yeah, it's so much fun and so lucky to be in a place where this is possible, really.
BH: With the amount of outsourcing you guys have been able to achieve, how does the competitive landscape look, or are you really just continuing to capitalize on first to market there and cleaning up?
NM: It's - a weird market, right? There is a lot of old school transcription firms out there, and there's not already a huge amount of podcasts getting transcribed actually. I mean, there's a market and we're hitting it pretty well, but it's like we do some podcasts for some doctors, medical podcasts, because these guys literally have an agreement with other transcription firms, but it's tough for them because that goes through the hospital and it's all that stuff and they don't want to do that. It's much easier for them as individuals to use us, so we're going after that kind of market. Our market is, it's interesting to see how deeply you are segmented, if it's transcriptions it's a huge market and we're a teeny little bit. For podcast transcriptions we're pretty big, but it's a little market.
BH: Okay.
JU: What are you finding out about the kinds of people who are becoming the regulars at doing this?
NM: Most of the regulars, actually it's interesting, most of the truly regulars are making money from their podcasts somehow, already before they come to us. They think of this as a way of getting even more visitors or whatever, Google visibility basically. There's a small subset which are doing it straight for accessibility. We got some; they're basically in public sector or talking about public sector problems, health, something, who knows. Disability is one of them, and they're doing it just for the accessibility, for people who can't hear, which is an interesting idea. We've got a couple customers like that.
JU: So those are your customers, but I'm actually in a way more curious about the workers. What you're finding about who the regulars are there, what are these people like, where are they coming from?
NM: On the workers side?
JU: Right, yeah, so who are these folks? What are you finding out about them and what are the prospects from just a general employment perspective. I mean it sounds ideal, somebody could devote a fairly well defined amount of time and earn a fairly well defined amount of money, it sounds fascinating.
NM: Yeah, I don't, well we've been structuring it more as a part-time, and I think almost all of our workers are people picking up a job here and there. A lot of them have indicated that they are working at the same time as they are working on our stuff at something else, some other job that doesn't require anything more than a physical presence for whatever reason. A lot of the other ones are stay at home mothers or something like that. This is just a subsidiary income for most of these workers, and there are a lot of them which is part of the reason I think. Our workflow isn't, we don't have so much that we can maintain a steady flow for the huge number of workers we have.
JU: Right. So far it's, I'm guess, almost entirely English speaking transcription that's going on, but nonetheless I'm sure you have an international population of workers to some extent, right?
NM: Yeah, Amazon makes it a little tricky to do that just because of their payment mechanisms, but we have discovered that impact, we do have an international worker pool even though it's heavily biased towards the US because of the payment stuff. That's interesting; we have a pretty international English-speaking podcast transcription service. We get, you know, a few, well, pretty constant but not a large amount of them from Australia or the UK or whatever and our international workers certainly handle those things better than domestic would. Interesting that you can actually tell that if a British worker gets a British podcast, the quality comes out better. Not that we limit them in per say, so most of the time it doesn't, it just needs more editing. It's all these little things that we discover as we go along that makes it so much fun, actually.
JU: On the Mycroft side of things, we are you guys in terms of development and deployment?
BH: Well, we are wrapping up the initial tests of Mycroft here at the information school of Berkley, and continuing it onwards, we're talking with some potential partners. We're still looking for the right fit along those lines, and really I believe the first thing that we're going to get done is continue to gather up places that need a problem solved for them. Like we've been looking at project Gutenberg and the initial task there of fixing the OCR, things along that are more communities that are already in existence and so once we've found our golden first one to try, we're going to concentrate on that, get a provable positive return on investment number, and then you know you've got your physical web setup at that point.
JU: So let's talk through the Gutenberg example for just a minute. So I guess the idea is, like you said, you have a sloppy OCR result that you want to clean up. In this case it's not, with CastingWords it's very much a workflow where the customer is evaluating the result, there might be a return loop where the thing is sent back for editing and there might be an editor involved, you know, but here I'm guessing it is just more of a statistical process, right? You're going to get the same fragment checked by a number of different people that really have no knowledge of one another, and then you'll just be looking for statistical convergence, is that basically the idea?
BH: That's what we started with, when we thought we'd be getting a lot choppier results, and that's definitely still part of it, but we've had absolutely amazing luck with, well first of all we run it through a dictionary check, so we just kick out the ones that fail your standard check, and that's the interesting merging of people and what computers are good at. So we're only giving the people things that the computer already things is broken.
JU: Okay.
BH: And then the people type in the line, and we've actually found a number of people we've tested it on really enjoy did it exactly. There's the group of people that enjoy tagging the images, then there's the other group that really kind of get a kick knowing they've contributed something concrete, and we thought we would have to go through pretty heavy statistical analysis to find out which ones are correct, but even just doing a two-way or three-way match has been way better than expected.
JU: So how can you give me instant feedback that I've done something correctly?
BH: We can't do it the first time. We can tell if you're just messing around to see if it's anywhere close to what the computer thought it was, but we can't always give you instant feedback if you did it correctly, and you can give the next people the confirmation person instant feedback, so there's a lot of fascinating potentials for overlapping those two methods.
JU: Okay, and obviously a huge potential for spam as well, so that's...
BH: Oh yeah, that's where we start getting into seeing if it's at all close to what the computer thought it was. How do you tell if it's actually a person attempting to answer this, or if there's a computer pretending to be a person answering the test that supposedly only people can do? It's just, you know, how much you need to loop it back on itself to filter the results. Then you start, say what CastingWords is talking about, the resource pool is amazing huge for this. You have the options to set up those sorts of workflows that will self filter overtime.
JU: Right, I mean presumably you can run multiple algorithms in parallel in let them compete even, right? Just subset the thing out to...
BH: Absolutely, let the multiple algorithms fight and see who gets the best results. Meta moderation has proven so successful on a lot of discussion sites. Does this other person, are they contributing valid results or are they spamming the system? You just keep going until you get the results you need out of it. So we're still looking for, I mean we have a few ideas around where the need is as long as where somebody with the funding that need is and bringing those two together, so we're investigating those.
JU: So what do you think is the sort of upper limit here if there is one, because you alluded to the idea that you wanted to just take just a few seconds of people's time, have them make a small contribution, have that be potentially a form of payment for access that otherwise would be requiring a monetary form of payment, but at the upper limit this is something that can on one hand engage people in game behavior, right? So that's always a huge incentive, I mean the ESP game I guess was a good example of that kind of thing, there were I think some people that just got addicted to it and wanted to do it a lot. So there's that aspect of it, and then there's also just the question of where on the scale of intellectual tasks, how far up the latter could a system like this climb, right? Is it something that could and should try to move toward more complete tasks and more challenging tasks, or do you think its best in the niche that you've carved out so far?
BH: Well I think the niche that we're going for initially is the right place to start, but absolutely. As you say, there are human skills that are more rarified, the ability to translate between English and some rare language that only a few people have. A few seconds of those people's time, applied to the correct problem, is very valuable. So there is no reason you shouldn't be going to your favorite blog and getting the choice, once we know a little more about you, would you like to answer five questions to tip this blogger? Or would you like to answer one question that we suspect you might be able to answer and value you more for that? Wherever we can continue to really let people capitalize on what they're good at, to keep the amount of time down to a reasonable level, that's what we're going after. As you get into more complex problems, there's that dividing line of when is it not a temporary distraction? When are you actually leaving whatever you were doing and working on something? That starts to blur the line a little bit with what Mechanical Turk is going after. So we'll see how that area works out.
JU: So that's a line that you'd rather not cross, at least at first, it sounds like.
BH: At least at first, I think we should go after the area that, as far as we can tell, nobody else is going to go after. We've got a lot to overcome in the way of people perceive banner ads as these sneaky horrible things that if you actually click on it; it's going to download viruses instantly. We have to fight that perception and get people to trust us, and say yes we're actually working for a good cause or we are working for a cause funding whatever you are reading us right now. Trust us, play with us for just a few seconds and we won't pop up any windows, we won't annoy you, we'll just value you for what you can do.
JU: Right. Now currently if people want to see this, I believe they have to sign up to try it, right? There's not an anonymous participation mode yet?
BH: Oh, absolutely anonymous, the majority of our users are anonymous. Actually, just to plug the system, if you go to, right now to, MycroftNetwork.com we've got an example banner ad on the front page that everybody thinks is just an example until they start to play with it and realize that it's not.
JU: Okay.
BH: We also have a number of blogger and site-owning friends of ours who posted these banners for the purposes of the experiments we ran during school. If you go to any of those blogs, you can see it in action. If you have a banner blocker of any sort of course, you will see an empty space where we would normally be, so that's something to watch out for.
JU: Right, and what's the implementation there in terms of someone that wants to provide one of these banner type things.
BH: Right now we've got a very academic "would you like to help us test our theories?" incentive model for putting it on your site, and a little bit of "would you like to contribute to project Gutenberg", but mostly people that are hosting it like what we're exploring and like to help us testing out our different methods. For us to grow, that is going to have to change. We are going to have to actually be able to pay people competitively that they'd be paying normal banner ads.
JU: So from a webmaster perspective, what am I actually sourcing into my page?
BH: Oh, it's as easy as placing Google ad words into your page. They're a small fixed snippet of JavaScript that loads in iframe back to our site. We hope to migrate to Flash in the new future for a little more interactivity, but right now it's very static. In fact we've even had some people sign up and host banners that we've never met, because the whole things a self-service model at this point, they've registered, put it on their blog, and are happily connecting us with their users. We probably owe them at least a few beers.
JU: So the data is not necessarily centrally collected?
BH: The data is centrally collected, the iframe that looks like a banner on their site is coming from our Mycroft server and when you click submit on that you don't leave the site, but the submit data is sent to our site.
JU: Okay, okay. It sounds like you guys are kind of exploring different ends of the same continuum. What else do you see, either of you, going on in this space that is interesting, if anything?
BH: I'm really curious to see if Mechanical Turk is going to implement any sort of market-based model for pricing these tasks.
NM: You know, we're really curious about that too.
BH: Because the same way rent-a-coder will bid down jobs, if Mechanical Turk had a bidding period for jobs as well, I think it would very much change the dynamic.
NM: It would be interesting to see from multiple angles. We have a hard time getting a worker pool efficiently and still getting the throughput we want, right now, because of pricing mechanisms we think. We'd be willing to pay certain workers more and other workers less, but that's not an option right now.
JU: Oh.
NM: And so there's a lot of pricing issues at Mechanical Turk that could be very interesting to see where they're going and they seem to talk about all of them, but that's just, you don't know where they're actually going.
JU: Right, well the thing has only been up since, what, November or something like that?
NM: Yeah, we were one of the first external hits that was on MTurk, and we put it up on Thanksgiving.
JU: Yeah, and so what have you heard from other MTurk users, if you have spoken to them, I'm sure you have.
NM: We've talked with a number of them and most of them tend to do very small tasks, maybe not quite as small as Mycroft tasks, but quite small tasks and take statistical approaches and they seem to be happy with the system in general. That they seem to be doing very different work than we are.
JU: Why do you think that is?
NM: Wow that's interesting. I think a lot of it is actually the marketing from Amazon; at least their initial marketing is very much aimed at that kind of tasks. They try to attack requesters doing that kind of bulk stuff that you could statistically analyze and test.
BH: I was kind of wondering when they first came out with it if they didn't just test their A, what is it? A7 search results was a lot of the ones I saw originally.
NM: Oh it was, it was a lot of the ones for months, and the mapping stuff as well. I think, I mean I don't know, but my guess is that yeah they did it larger because they had a large number of internal customers that desperately needed something like this, and then from what I can tell from talking with all of the Amazon guys, their general approach is build the project even if it is just for internal and then once you've got a project you can decide whether it goes out or not.
JU: Right, well it is certainly fascinating. You mentioned A9 and also the mapping stuff, I mean from the minute that I saw those Amazon street levels, the block level views, yeah block views, I was thinking the notion that Amazon is going to directly employ people to go around and do this, and it's clearly such a distributable task, was questionable. So I can certainly see that that would be the kind of thing that they'd want to farm out this way.
NM: Yeah, I've actually been pretty surprised at the wide variety of tasks coming out of Amazon. They've got all kinds of stuff. So I think there's definitely a lot of internal need, and I think that's where this came from.
BH: I think it's fascinating as the tasks start to reach outside of the computer. Can you ask somebody to go get a shot of a celebrity for you, or really where do the limits lie.
NM: Yeah that's an interesting question. From what we've talked with other requesters, a lot of physical tasks don't have a really good result rate, but on the other had we actually had, just for our personal interest, we each know something about a local building code and we couldn't find it and we put a task in and people actually managed to find the local building authority and find the right person and ask them on the phone. So some tasks definitely do work.
BH: Okay. At that point you're getting into Yahoo answers and Google answers. I'm curious to see how the market evolves, what Mechanical Turk goes for, the variable pricing mechanism, if it's easier if Mycroft can actually head somewhere with the micro task concept.
NM: Yeah, I think it's just a lot of fun to watch.
JU: I agree, and I'm sure for both of you guys it's a lot of fun to be playing the game right now. You're both in kind of an enviable situation, really, doing something that is quite novel. It must be fun, I would think.
NM: Oh it's so much fun, I really love getting up and working everyday, even if I do it all day.
BH: It's nice to be in a place where we can say "yeah we care about what people can do because they're people" and really going after that. We hope to get that message out, that ok computers are getting smarter and faster and there's a lot more of them, but still people can do things that computers just plain old can't.
JU: This is a fascinating deal, and I wish you both the best of luck with it, it's going to be a good ride I think.
NM: Thank you.
BH: And thanks for having us on the podcast.
JU: Sure, okay, take care guys.