← Home

Getting Started

Table of contents

  1. Making things work on a small scale
    1. The challenge
    2. Incrementally grow the set of questions people can ask
    3. Incrementally build up knowledge required to answer questions well
    4. Curate the set of answerers
    5. Filter what users see
  2. Initial applications
    1. Criteria for good initial applications
    2. Potential applications
    3. A first application—proposal #1
    4. A first application—proposal #2
  3. Where do we go from here?

Making things work on a small scale

The challenge

Dialog markets are intended to produce high-quality conversations that help users resolve challenging problems. One might worry that there is a mismatch between the difficulty of this task and the capability of the proposed architecture, in particular at the very beginning. The capability of the system depends on the relevant skill of participants and the power of the mechanism. These two factors can be traded off against each other to some extent, but if the overall combination is below some threshold, the quality of dialogs will be low, and in particular lower than what would be required for people to be happy to pay in exchange. So, it is important to understand how we can ensure that the quality of conversations is sufficiently high.

To illustrate the challenge, imagine that a crowd made up of Mechanical Turk workers was faced with the task of producing helpful follow-up questions and advice to the question “How can I improve my sleep?”, without further assistance or filtering. The result would probably include some obvious follow-ups and ideas, but it wouldn’t be the kind of advice that people would pay for. The advice wouldn’t include much that people couldn’t easily think of themselves (given that Turkers are in general not experts on the topic, are probably no better than others at using Google, and are less motivated than the person who initiated the conversation). While there are some potential advantages from having multiple people think about the problem and from getting outside perspective, there are also significant costs. Turkers won’t know the asker’s context, there is overhead in learning about it, and conversing with a group could be incoherent and could introduce delays. Overall, such dialogs wouldn’t be worth much, most probably not enough to pay Turkers minimum wage.

In the next few sections, I will discuss some strategies that help overcome this challenge.

Incrementally grow the set of questions people can ask

The web-based system for answerers should be fully general, but for the (probably mobile) app for askers, it is probably better to incrementally expand coverage. Focusing first only on a single application, then on a few applications, and only later on expanding to arbitrary questions will simplify the problem and makes reuse/automation easier.

I am imagining a sequence of apps, for example: (1) “How can I feel better right now?”, (2) “How can I sleep better?”, (3) “How can I improve my relationship?”, (4) “Should I go on vacation?”, etc. I will discuss better sequences of initial applications below. For now, I am more interested in general properties of this sequence of apps.

The apps will all be essentially the same app, each with a different skin. There is a single button that starts a dialog with the corresponding question and then shows a sequential dialog view, allowing the user to provide multiple-choice and free-form feedback.

Depending on the app, we might choose a different revenue model:

  • Optional tipping after the dialog is over (maybe using in-app purchases)
  • Pay to continue the conversation after n exchanges
  • First dialog free, fixed fee for future dialogs
  • Affiliate links for movie/book/product/vacation recommendations

For each app, we measure how long it takes for the dialogs to be of high quality. The hope is that this time goes down as the system is supported by better automation and gains some general-purpose domain knowledge on how to approach certain kinds of dialogs. (I discuss accumulation of knowledge below.) The fact that we are ordering applications such that we tackle the easier, more concrete ones first will reverse this effect to some extent, but I hope that there will be a point—once the base of answerers is fairly broad and many topics have been explored—at which it takes much less time for the system to do well on new question types. When that time span is short enough, we publish the fully general app that allows people to ask any question.

Incrementally build up knowledge required to answer questions well

Share procedural and declarative knowledge across dialogs via “objective” sub-dialogs

Answerers can create “objective” dialogs for questions such as:

  • “How should we approach self-care questions?”
  • “How should we approach career questions?”
  • “How should we approach dialogs over short time-scales?”
  • “What common causes of bad sleep are there?”
  • “What are the most promising treatments for sleep apnea?”

These dialogs can be referenced/included within other “subjective” dialogs, which constitute a much larger fraction of all dialogs. A small part of the reward from these subjective dialogs then flows to the contributors of the included objective dialogs. If an objective dialog includes other objective dialogs, a portion of the reward flows to those as well, and so on.

I expect that there will be a division of labor, with domain experts mostly working on objective dialogs, which are very polished (maybe comparable to the best Wikipedia pages), and generalists doing some of the work on subjective dialogs, simply by following the prescriptions and processes outlined in the objective dialogs.

Adding references to relevant objective dialogs is a task that seems well-suited for automation.

Improve automated contributions over time via feedback mechanisms

Since questions are initially chosen from a relatively small set, and since most responses are multiple-choice, it should be straightforward for us (as the organization running the market) to implement basic automation. When a dialog ends, we will elicit information from both askers and other participants on how well things went, and on how much (they think) different parts contributed to the overall outcome. This will help us preferentially re-use contributions that were most helpful in the past, or ones that are predicted to be most helpful in a new situation based on past judgments. (I’ll discuss this approach in the section on reward distribution below.)

Curate the set of answerers

Seed with people who have skills appropriate to the task

To generate high-quality objective dialogs, we may want to seek out experts in the domains we focus on. To generate high-quality subjective dialogs, I imagine that we will seek out people who have shown interest in, or skill at, careful reasoning and analytical thinking, but who are not necessarily domain experts.

I don’t think the system can succeed if it is necessary to have domain experts run all dialogs. The required pay would be more than most people would be willing to pay. A key hypothesis underlying the dialog markets project is that a substantial part of the work can be done by groups of generalists as long as they have adequate reasoning skills and are equipped with expert-curated instructions. A further hypothesis is that a significant part of that work can be automated over time.

Maintain answerer quality using strict reward assignment

To maintain high answer quality as the system grows, we use a strict reward mechanism, so that work that is below the target quality receives $0 (or potentially even negative) reward. This is intended to quickly disincentivize participation from workers who might otherwise swamp the system with low-quality contributions.

Filter what users see

By default, the sequential chat view (on mobile) shows only a small set of contributions to the original asker. The system will only send contributions to the asker if they are predicted to be helpful when evaluated later on (e.g. based on author, likes). At the very beginning, we may essentially manually curate what askers see. This allows the system to work well (from the asker’s perspective) even if a significant number of low-quality contributions are posted to the web-based system.

Initial applications

Above, I outlined the strategy of selecting a single specific question as the initial application, then expanding to related questions one by one, until the system can learn to provide high-quality dialogs for new types of questions quickly. Here, I want to discuss what such a sequence could look like.

Criteria for good initial applications

We’d like to find an initial application that satisfies the following criteria:

  1. The problem is important to the target audience. If people don’t care about the problem, they will be less willing to try new approaches (e.g., they might not be willing to install an app), and will also be less likely to pay for helpful contributions.

  2. Our solution will be much better than the next-best solution. If this isn’t the case, there will be little incentive for people to switch over from whatever substitute they are currently using.

  3. The amount of information required from users is at the sweet spot between too little and too much. If there is very little information required (e.g. for “What is the largest known prime number?”), we can’t exercise our advantage compared to other Q&A sites. If there is too much context required, we can’t provide high-quality advice.

  4. Simple automation can go a long way. If each question requires a lot of custom human labor, we may still be able to produce a market for conversations, but the prices may be higher than people would be willing to pay and we wouldn’t be on a direct path towards building a market where automation can play a substantial part.

  5. We (as the system designers) understand the target audience well. This will make it much easier to build a system that is likely to be successful, as we can get instant feedback by using our mental models of the target audience.

  6. There is a natural progression to other topics. This is true for almost any question, but it is worth considering whether particular initial questions are likely to lead to better follow-up applications than others.

  7. We (as the system designers) are excited about the application and likely follow-up applications.

The following criteria matter somewhat:

  1. We understand the domain well. We don’t need to be a domain experts ourselves, but we need to be able to judge how well the system is working, both from a user’s perspective and from a domain expert’s perspective. So, easy access to domain experts is a plus. In the ideal case, we can fill the role of the domain expert ourselves.

  2. People would be willing to pay for helpful advice. This isn’t strictly necessary for the first application, but should be the case for applications in the neighborhood of the first application. If it is unlikely that at least some people would be willing to pay, this may indicate that the problem isn’t important to people or that our solution isn’t good.

  3. People would want to use the system more than once. This reduces user acquisition costs. However, this will also cause people to expect the system to know about and take into account previous conversations, so this may make it more difficult to reliably live up to users’ expectations. For some single-use apps (such as “Should I buy or rent?”), we could allow users to go through a series of dialogs to decide (“Can I get a bank loan?”, “Are there any reasonable apartments I could afford to buy?”, etc.), making it effectively multi-use.

  4. The app doesn’t paint a misleading picture of the longer-term goals for dialog markets. In the beginning, it will take a lot of time to make an app work really well for any single question, so the first few questions will shape how potential collaborators view the project.

And here are some criteria that don’t matter for the initial application:

  1. There is a large target audience. The initial application only serves to seed the system, so doesn’t need a large user base per se, as long as there is a trajectory towards applications with larger audiences.

  2. There is a natural way to make it viral/social. This may matter for future applications, but not initially, where the main goal is to test whether our system can in fact produce high-quality interactions at low cost.

Potential applications

I brainstormed a list of about 300 potential applications/questions, and most of the applications had one of the following forms:

  • Should I/we do x?
  • How can I/we do x?
  • Who/what/where/[which x] should I/we y?

So, most of the ideas are about whether one ought do something or how one ought to do it. In terms of topics, the most common ones were:

  • Health (e.g. How can I sleep better? How can I lose weight?)
  • Entertainment (e.g. What book should I read next? What should I do for fun today?)
  • Work and research (e.g. How can I be more productive? What company should I work for?)
  • Life planning (e.g. What career is right for me? When should I get married?)
  • Money (e.g. How can I earn $1000 on the side? How should I prepare for retirement?)
  • Social (e.g. How can I find a girl/boyfriend? How can I improve my long-distance relationship?)
  • General self-improvement (e.g. Are there any big mistakes I am making? What can I improve in my life?)
  • Products (e.g. What clothes should I wear? What mobile phone should I buy?)
  • Learning (e.g. How can I learn Spanish? What skills should I learn?)
  • Analysis and explanation (e.g. Why did Obama win the election? How do bicycles work?)

In the long run, I expect that many questions will come from the long tail and won’t have been asked in their exact form before. However, in the beginning it is worth focusing on specific questions, so that answer strategies can be reused across conversations.

I made a quick attempt to score potential applications based on (a) how important the problem is for the audience, (b) how completely I expect that we can solve it, and (c) how plausible it is that we can monetize it (through direct payments, affiliate fees, etc). This may be worth doing more systematically, and with better criteria; my brief stab at it resulted in the following list of applications that scored well in terms of a multiplicative combination of the three features:

  • Should I buy or rent?
  • How can I immigrate to {country}?
  • How can I be happier?
  • How can I find a girl/boyfriend?
  • How can I get laid?
  • What startup/company should I work for?
  • How can I make $1000?
  • How can I gain/lose weight?
  • How can I be healthier?
  • How can we improve the success probability of our company?
  • How can I sleep better tonight?
  • Why do I feel anxious/depressed?
  • Should I stay at my current job or change jobs?
  • How can I get a raise?
  • What doctor should I go to?
  • How can I have more energy?
  • (When) Should I/we get married?
  • (When) Should I/we have children?

A first application—proposal #1

Let’s consider the following question as a first application:

“How can I feel better right now?”

For this question, I am imagining that the dialog would, in its initial phase, go through a mostly automated flow chart (or checklist) of issues, comparable to this online tool, or a more short-term version of one of these self-care checklists. Towards the end, it could turn into more of an open conversation with an expert, in some ways comparable to talk therapy tools like 7 Cups of Tea, Kindly, or BlahTherapy.

Comparison to existing tools

In contrast to these existing tools:

  1. We interpolate more smoothly between the two modes. Within a single dialog, the possibility to diverge from the more automated “flowchart” path is always there; the user can always provide a manual response as opposed to one of the provided multiple-choice responses. Across dialogs, we learn over time how to automate bigger parts of these sorts of conversations.

  2. We put more emphasis on making things easy for the user. In particular, we only ask the user for information when we expect to need it and can’t obtain it in another way, and also try to take multiple choice as far as possible.

  3. We improve more over time, accumulating knowledge on how to have helpful conversations on this topic (in the form of “objective” sub-dialogs that human contributors take into account), and integrating that knowledge into algorithms (that learn from what human contributors do).

  4. We do want our users to pay, e.g. in the form of tips after the dialog is over, as a requirement to continue past some point, or in the form of optional pledges that incentivize more contributors to join the conversation.

How good is this application?

How does this application fare with respect to the criteria for good initial applications that I outlined earlier? I’d say the problem is definitely important to the target audience and people would want to use it more than once if it worked. It’s very likely that there is a natural progression to other topics, and that the amount of information required from users is at the sweet spot between too little and too much. It’s probably the case that simple automation can go a long way and that our solution will be much better than the next-best one. I’m less sure that the app wouldn’t paint a misleading picture of the longer-term goals for dialog markets and that people would be willing to pay for helpful advice.

Potential issues

A potential issue with this applications is that it is related to a wide range of other topics. This could prevent us from solving the problem to the extent that users expect us to solve it, at a price that users are willing to pay, and with the level of automation that I am imagining. I expect that the system will work fairly well in the beginning of dialogs (using decision trees of multiple choice questions as a baseline strategy), and that it can work fairly well towards the end of dialogs (recovering one-on-one conversation), but I have some uncertainty about the intermediate stage that requires the crowd to choose next steps, guided by “objective” dialogs that provide instructions on how to perform well, and an incentive system that strongly encourages contributions that follow these instructions.

A second potential issue is that users will expect near-real-time interaction for this application due to its short time horizon. For longer-term questions such as “What career should I pursue?”, it is more acceptable for follow-ups to happen only occasionally, and with some delay in between. For this application, users probably expect quick back-and-forth. This is feasible using automation, but once the dialog switches to the mode where contributions come from the crowd, it may slow down. Depending on how much of a slowdown this is, and depending on how well we manage users’ expectations, this could lead to a frustrating experience.

A third potential issue is that, for this particular application, some of the benefit to the asker might be derived not from the content of the conversation, but from the mere fact that they are having a conversation with someone (or some system) that cares and listens. If the activity per se is responsible for a big part of the benefit, this application would not be ideal, since it could be different in kind from follow-up applications where the goal is more directly to think through a question and provide relevant information.

Example of a sequence of follow-up applications

The application above is related to many topics, including health, exercise, sleep, relationships, work, money, and life planning. This makes it difficult to say what a likely trajectory might be. Actual expansion plans will depend on empirical feedback, based on what sub-dialogs happen frequently and how well they score according to our desiderata for applications. Still, for the sake of concreteness, here is an example trajectory that includes a few potential next applications:

  • How can I feel better right now?
  • Should I seek professional help for mental health?
  • What kind of exercise should I do?
  • How can I make new friends?
  • How can I sleep better?
  • How can I improve my relationship?
  • How can I be healthier?
  • How can I be happier in general?
  • How can I earn more income?
  • How can I find purpose in my life?
  • What career is right for me?

A first application—proposal #2

Let’s consider another potential first application:

“How can I find a girlfriend/boyfriend?”

I’m more uncertain about how conversations would go for this application. I can imagine that contributors would want to think through ways to meet potential partners who might be a good fit, and how to increase the probability that such meetings will lead to a good relationship. To that end, the dialog might initially elicit some personal information, such as the age of the person asking, where they live, what they do for work/school, and what they are looking for in a partner. We might then try to figure out how we can help—for example, whether we should focus on finding better places to meet partners or whether we should focus on improving the asker’s chances given such meetings. Finally, we might work towards coming up with actionable steps in those areas.

Comparison to existing tools

The overall problem under discussion is that of moving a person from the state where they don’t have a girlfriend/boyfriend to the state where they do. Different existing solutions address different parts of this problem. Dating sites and apps (such as OkCupid, Match.com, Tinder) help a person meet and approach potential partners online. MeetUp and Facebook Events help with finding opportunities to meet potential partners offline. Various informational websites (such as WikiHow), blogs, podcasts, and books help with advice on how to increase one’s attractiveness and approach to dating in order to improve the chances that meetings will lead to a relationship. Forums and subreddits (such as r/dating_advice) may also help with that. There are also matchmakers (who help with finding dates), dating coaches (who help with appearance, conversation, etc.), and dating seminars.

Compared to dating sites and apps, MeetUp and Facebook, we wouldn’t try to organize the interaction of multiple people people online, but rather focus on other parts of the problem. Compared to existing informational sites, blogs, podcasts, and books, our system would be much more interactive and we would try to do as much of the work as possible for users. We would require less initiative from users, and users wouldn’t need to start out with an accurate view of their situation for our approach to succeed. Compared to forums and subreddits, we would have quicker back-and-forth, better incentives for helpers, and we would improve more over time. Compared to matchmakers and dating coaches, our system would be cheaper, more anonymous, and there would be less friction in getting started.

How good is this application?

How does this application fare along the criteria I outlined earlier? It’s definitely the case that the problem is important to the target audience and that there is a natural progression to other topics. I expect that the app doesn’t paint a misleading picture of the longer-term goals for dialog markets. I’m not sure whether the amount of information required from users is at the sweet stop between too little and too much, whether simple automation can go a long way, whether our solution will be much better than others, and whether people would be willing to pay. By its nature, people wouldn’t use the system more than once if it worked very well.

Potential issues

My main uncertainty is about how far simple automation can take us, and about the consequences thereof for cost and quality of our solution. For some other applications (such as “How can I feel better now?” and “Should I buy or rent?”), there are existing automated systems that are useful for at least some people. For this app, I don’t know of such existing systems. I also expect that the dialogs for this app will be substantially longer than for more self-delimited questions such as “Should I buy or rent?”. Whether this creates a problem depends on how much custom labor is required per dialog. It is easy to imagine that only a small fraction of these dialogs can be automated, even with plenty of training data. This will be the case if each asker eventually requires highly customized coaching, as opposed to relatively generic advice chosen from a limited set and adapted to the asker’s circumstances. For example, this could happen if askers find it difficult to accept advice, so that most of the dialog is not about figuring out what advice to give, but rather about how to communicate it in a way that is convincing to them, and if there is a lot of variation in what is convincing to different people.

There are also a variety of potential social/privacy issues associated with this application. For example, to most effectively help, it could be useful to know what the asker looks like, but this makes the asker more easily identifiable, which could raise privacy concerns given that the asker’s information is accessible in our semi-public (NDA-gated) market. This application could also attract contributors who are not primarily motivated to help, but rather are seeking dates, participating for entertainment value, trolling, etc. The privacy issue can be addressed to some extent by marking particular kinds of content as access-restricted, so that only a limited subset of trusted users can see it, but this seems like a complication that it would be best to avoid initially.

Example of a sequence of follow-up applications

As above, I expect that the actual series of follow-up applications will depend on what sub-questions occur frequently, and what nearby questions these sub-questions suggest. Still, here is an example of how things could go:

  • How can I find a girlfriend/boyfriend?
  • How can I be more attractive to potential partners?
  • Where can I meet potential partners?
  • How can I make friends?
  • What should we do for our first date?
  • What activity should I do with my significant other?
  • What activity should I do with my friends?
  • How can I improve my relationship with my significant other?
  • How can I improve my friendships?
  • Should I stay in my current relationship?
  • (When) Should we get married?
  • (When) Should we have children?

Where do we go from here?

At this point, we could think through more potential first applications, or we could think through the aforementioned ones in more detail. However, it may be more useful to gather empirical feedback. I suggest the following procedure:

  1. Pick one of the applications mentioned above; ideally one where you have some expertise, or can gather some expertise within a relatively short amount of time.

  2. Create a website that automatically starts a chat between yourself and any given visitor. Make it easy for yourself to send multiple-choice questions to the visitor. Perhaps style the website so that it looks a bit less like a chat and a bit more like a multiple-choice quiz to get over users’ potential aversion to talking to strangers.

  3. To acquire users, run a few ads on Google Adwords or Bing for searches that correspond to the application.

  4. Gather dialogs until you get a sense for what they tend to be like. Then repeat with another candidate application.

I expect that this will be instructive. It will help you understand how much repetition there is between users, how willing people are to keep up such dialogs, and how much information needs to be elicited to provide helpful advice. As an additional step, you can try different payment schemes. Would people be willing to tip after the dialog is over? Would people be willing to pay to continue the dialog after some fixed number of exchanges?

The user experience won’t be identical to the true dialog market setting. The responses will be less well-informed, but perhaps more coherent, since they come from a single person. In the simple version outlined above, users won’t be able to access a tree-structured dialog that reflects the current state of problem solving. Nonetheless, the simulation of the user experience is close enough that you should be able to gain a fair amount of certainty that an application is viable without implementing the real dialog market just yet.

If you're excited about this project, get in touch. I'm starting a company to make it real.