# Andreas Stuhlmüller

I'm cofounder of Ought, a non-profit doing research on using machine learning to support deliberation. Previously, I was a researcher in Noah Goodman's Computation & Cognition lab at Stanford. And before that, I was a Ph.D. student in the Department of Brain and Cognitive Sciences at MIT, working in Josh Tenenbaum's Computational Cognitive Science group.

The question that drives my research is: How can we better solve our problems? Machines are potentially much better at answering hard questions *if* such questions can be posed formally. For example, given a formal model that describes how diseases cause symptoms and a formalized prior belief about disease rates, machines can reason precisely about what observed symptoms imply.

In its various aspects, this question touches on mathematics, computer science, psychology, and philosophy:

- What does a language look like that makes it easy to formalize our prior knowledge and that allows us to pose the questions we want to answer?
- Given a formal problem statement in such a language, what do algorithms look like that can efficiently find answers?
- How is knowledge structured and used in the human mind, and can this inform our approach to the first two questions?
- Any particular question derives from the more general question
*“What should be done?”*How can we understand more precisely what this question means?

Read more about my interests, publications and projects.

## Publications

Predicting Human Deliberative Judgments with Machine Learning
abstract
bibtex

Owain Evans, Andreas Stuhlmüller, Chris Cundy, Ryan Carey, Zachary Kenton, Thomas McGrath,
Andrew Schreiber (2018)

*FHI Technical Report.*

Machine Learning (ML) has been successful in automating a range of cognitive tasks that humans solve effortlessly and quickly. Yet many realworld tasks are difficult and slow: people solve them by an extended process that involves analytical reasoning, gathering external information, and discussing with collaborators. Examples include medical advice, judging a criminal trial, and providing personalized recommendations for rich content such as books or academic papers. There is great demand for automating tasks that require deliberative judgment. Current ML approaches can be unreliable: this is partly because such tasks are intrinsically difficult (even AI-complete) and partly because assembling datasets of deliberative judgments is expensive (each label might take hours of human work). We consider addressing this data problem by collecting fast judgments and using them to help predict deliberative (slow) judgments. Instead of having a human spend hours on a task, we might instead collect their judgment after 30 seconds or 10 minutes. These fast judgments are combined with a smaller quantity of slow judgments to provide training data. The resulting prediction problem is related to semi-supervised learning and collaborative filtering. We designed two tasks for the purpose of testing ML algorithms on predicting human deliberative judgments. One task involves Fermi estimation (back-of-the-envelope estimation) and the other involves judging the veracity of political statements. We collected a dataset of 25,000 judgments from more than 800 people. We define an ML prediction task for predicting deliberative judgments given a training set that also contains fast judgments. We tested a variety of baseline algorithms on this task. Unfortunately our dataset has serious limitations. Additional work is required to create a good testbed for predicting human deliberative judgments. This technical report explains the motivation for our project (which might be built on in future work) and explains how further work can avoid our mistakes. Our dataset and code is available at https://github.com/oughtinc/psj.

Evaluating Compositionality in Sentence Embeddings
abstract
bibtex

Ishita Dasgupta, Demi Guo, Andreas Stuhlmüller, Samuel J. Gershman, Noah D. Goodman (2018)

*Proceedings of the 40th Annual Conference of the Cognitive Science Society.*

An important frontier in the quest for human-like AI is compositional semantics: how do we design systems that understand an infinite number of expressions built from a finite vocabulary? Recent research has attempted to solve this problem by using deep neural networks to learn vector space embeddings of sentences, which then serve as input to supervised learning problems like paraphrase detection and sentiment analysis. Here we focus on 'natural language inference' (NLI) as a critical test of a system's capacity for semantic compositionality. In the NLI task, sentence pairs are assigned one of three categories: entailment, contradiction, or neutral. We present a new set of NLI sentence pairs that cannot be solved using only word-level knowledge and instead require some degree of compositionality. We use state of the art sentence embeddings trained on NLI (InferSent, Conneau et al. (2017)), and find that performance on our new dataset is poor, indicating that the representations learned by this model fail to capture the needed compositionality. We analyze some of the decision rules learned by InferSent and find that they are largely driven by simple heuristics at the word level that are ecologically valid in the SNLI dataset on which InferSent is trained. Further, we find that augmenting the training dataset with our new dataset improves performance on a held-out test set without loss of performance on the SNLI test set. This highlights the importance of structured datasets in better understanding, as well as improving the performance of, AI systems.

Learning Physical Parameters from Dynamic Scenes
abstract
bibtex

Tomer D. Ullman, Andreas Stuhlmüller, Noah D. Goodman and Joshua B. Tenenbaum (2018)

*Cognitive Psychology*

Humans acquire their most basic physical concepts early in development, and continue to enrich and expand their intuitive physics throughout life as they are exposed to more and varied dynamical environments. We introduce a hierarchical Bayesian framework to explain how people can learn physical parameters at multiple levels. In contrast to previous Bayesian models of theory acquisition (Tenenbaum et al., 2011), we work with more expressive probabilistic program representations suitable for learning the forces and properties that govern how objects interact in dynamic scenes unfolding over time. We compare our model to human learners on a challenging task of estimating multiple physical parameters in novel microworlds given short movies. This task requires people to reason simultaneously about multiple interacting physical laws and properties. People are generally able to learn in this setting and are consistent in their judgments. Yet they also make systematic errors indicative of the approximations people might make in solving this computationally demanding problem with limited computational resources. We propose two approximations that complement the top-down Bayesian approach. One approximation model relies on a more bottom-up feature-based inference scheme. The second approximation combines the strengths of the bottom-up and top-down approaches, by taking the feature-based inference as its point of departure for a search in physical-parameter space.

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
abstract
bibtex

William Saunders, Girish Sastry, Andreas Stuhlmüller, Owain Evans (2018)

*Proc. of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018)*

AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven't yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human "in the loop" and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human's intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent's learning (whereas an RL baseline fails due to catastrophic forgetting). However, this scheme is less successful when catastrophes are more complex: it reduces but does not eliminate catastrophes and the supervised learner fails on adversarial examples found by the agent. Extrapolating to more challenging environments, we show that our implementation would not scale (due to the infeasible amount of human labor required). We outline extensions of the scheme that are necessary if we are to train model-free agents without a single catastrophe.

Dialog Markets
abstract
bibtex

Andreas Stuhlmüller (2016)

*Technical Report*

People frequently ask Google questions like “What career is right for me?” “How can I find a boyfriend?” or “How can I make more money?” While Google can answer self-contained questions like “What is the capital of Turkey?” it cannot answer vague questions, even though these tend to be the questions people care about the most. Answering these questions requires background information about the person who asked. Good advice is tailored to the particulars of the asker’s situation. A natural way to elicit such background is through a dialog in which follow-up questions are asked. If you ask a doctor “Why do I have stomach pain?” they won’t just give you an answer. They will ask clarifying questions: “When did it start?” — “Where is it?” — “Is it constant or intermittent?” And only then they suggest possible causes. In this set of notes, I propose dialog markets, a mechanism for creating high-quality conversations that resolve vague questions. A dialog starts when an individual asks a question and pledges a reward that will be used to pay for helpful contributions. Humans and machines then contribute follow-up questions and other responses. The core problem that the market mechanism needs to solve is how to distribute reward over contributions such that we incentivize conversations that are as valuable as possible for the asker. We can view this as a particular set of questions that we can delegate to the market: “How much should the asker pay for this contribution?” This view gives rise to the grounding problem for dialog markets: incentives will be aligned if answers to this meta-level question are good, but that in turn depends on aligned incentives. I outline skome approaches towards addressing this problem. Finally, I suggest an implementation plan: I discuss how to ensure that early dialogs have sufficiently high quality to get the project off the ground and name a few potential initial applications.

Modeling Agents with Probabilistic Programs
abstract
bibtex

Owain Evans, Andreas Stuhlmüller, John Salvatier, and Daniel Filan (electronic)

This book describes and implements models of rational agents for (PO)MDPs and Reinforcement Learning. One motivation is to create richer models of human planning, which capture human biases and bounded rationality. Agents are implemented as differentiable functional programs in a probabilistic programming language based on Javascript. Agents plan by recursively simulating their future selves or by simulating their opponents in multi-agent games. Our agents and environments run directly in the browser and are easy to modify and extend. The book assumes basic programming experience but is otherwise self-contained. It includes short introductions to "planning as inference", MDPs, POMDPs, inverse reinforcement learning, hyperbolic discounting, myopic planning, and multi-agent planning.

Agent-Agnostic Human-in-the-Loop Reinforcement Learning
abstract
bibtex

David Abel, John Salvatier, Andreas Stuhlmüller, and Owain Evans (2016)

*Future of Interactive Learning Machines Workshop at NIPS 2016*

Providing Reinforcement Learning agents with expert advice can dramatically improve various aspects of learning. To this end, prior work has developed teaching protocols that enable agents to learn efficiently in complex environments. In many of these methods, the teacher’s guidance is tailored to agents with a particular representation or underlying learning scheme, offering effective but highly specialized teaching procedures. In this work, we introduce *protocol programs*, an agent-agnostic schema for Human-in-the-Loop Reinforcement Learning. Our goal is to incorporate the beneficial properties of a human teacher into Reinforcement Learning without making strong assumptions about the inner workings of the agent. We show how to represent existing approaches such as action pruning, reward shaping, and training in simulation as special cases of our schema, and conduct preliminary experiments evaluating the effectiveness of protocols in simple domains.

Learning the Preferences of Ignorant, Inconsistent Agents
abstract
bibtex

Owain Evans, Andreas Stuhlmüller, and Noah D. Goodman (2016)

*Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI-2016)*

An important use of machine learning is to learn what people value. What posts or photos should a user be shown? Which jobs or activities would a person find rewarding? In each case, observations of people's past choices can inform our inferences about their likes and preferences. If we assume that choices are approximately optimal according to some utility function, we can treat preference inference as Bayesian inverse planning. That is, given a prior on utility functions and some observed choices, we invert an optimal decision-making process to infer a posterior distribution on utility functions. However, people often deviate from approximate optimality. They have false beliefs, their planning is sub-optimal, and their choices may be temporally inconsistent due to hyperbolic discounting and other biases. We demonstrate how to incorporate these deviations into algorithms for preference inference by constructing generative models of planning for agents who are subject to false beliefs and time inconsistency. We explore the inferences these models make about preferences, beliefs, and biases. We present a behavioral experiment in which human subjects perform preference inference given the same observations of choices as our model. Results show that human subjects (like our model) explain choices in terms of systematic deviations from optimal behavior and suggest that they take such deviations into account when inferring preferences.

C3: Lightweight Incrementalized MCMC for Probabilistic Programs using Continuations and Callsite Caching
abstract
bibtex

Daniel Ritchie, Andreas Stuhlmüller, and Noah D. Goodman (2016)

*Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS)*

Lightweight, source-to-source transformation approaches to implementing MCMC for probabilistic programming languages are popular for their simplicity, support of existing deterministic code, and ability to execute on existing fast runtimes. However, they are also slow, requiring a complete re-execution of the program on every Metropolis Hastings proposal. We present a new extension to the lightweight approach, C3, which enables efficient, incrementalized re-execution of MH proposals. C3 is based on two core ideas: transforming probabilistic programs into continuation passing style (CPS), and caching the results of function calls. We show that on several common models, C3 reduces proposal runtime by 20-100x, in some cases reducing runtime complexity from linear in model size to constant. We also demonstrate nearly an order of magnitude speedup on a complex inverse procedural modeling application.

Learning the Preferences of Bounded Agents
bibtex

Owain Evans, Andreas Stuhlmüller, and Noah D. Goodman (2015)

*Poster at the NIPS 2015 Workshop on Bounded Optimality*

Coarse-to-Fine Sequential Monte Carlo for Probabilistic Programs
abstract
bibtex

Andreas Stuhlmüller, Robert X.D. Hawkins, N. Siddharth, and Noah D. Goodman (2015)

*Technical Report. arXiv:1509.02962 [cs.AI]*

Many practical techniques for probabilistic inference require a sequence of distributions that interpolate between a tractable distribution and an intractable distribution of interest. Usually, the sequences used are simple, e.g., based on geometric averages between distributions. When models are expressed as probabilistic programs, the models themselves are highly structured objects that can be used to derive annealing sequences that are more sensitive to domain structure. We propose an algorithm for transforming probabilistic programs to coarse-to-fine programs which have the same marginal distribution as the original programs, but generate the data at increasing levels of detail, from coarse to fine. We apply this algorithm to an Ising model, its depth-from-disparity variation, and a factorial hidden Markov model. We show preliminary evidence that the use of coarse-to-fine models can make existing generic inference algorithms more efficient.

Why do you ask? Good questions provoke informative answers
abstract
bibtex

Robert X. D. Hawkins, Andreas Stuhlmüller, Judith Degen, and Noah D. Goodman (2015)

*Proceedings of the Thirty-Seventh Annual Conference of the Cognitive Science Society*

What makes a question useful? What makes an answer appropriate? In this paper, we formulate a family of increasingly sophisticated models of question-answer behavior within the Rational Speech Act framework. We compare these models based on three different pieces of evidence: first, we demonstrate how our answerer models capture a classic effect in psycholinguistics showing that an answerer’s level of informativeness varies with the inferred questioner goal, while keeping the question constant. Second, we jointly test the questioner and answerer components of our model based on empirical evidence from a question-answer reasoning game. Third, we examine a special case of this game to further distinguish among the questioner models. We find that sophisticated pragmatic reasoning is needed to account for some of the data. People can use questions to provide cues to the answerer about their interest, and can select answers that are informative about inferred interests.

Modeling Cognition with Probabilistic Programs: Representations and Algorithms
abstract
bibtex

Andreas Stuhlmüller (2015)*Ph.D. thesis, Massachusetts Institute of Technology*

This thesis develops probabilistic programming as a productive metaphor for understanding cognition, both with respect to mental representations and the manipulation of such representations.

In the first half of the thesis, I demonstrate the representational power of probabilistic programs in the domains of concept learning and social reasoning. I provide examples of richly structured concepts, defined in terms of systems of relations, subparts, and recursive embeddings, that are naturally expressed as programs and show initial experimental evidence that they match human generalization patterns. I then proceed to models of reasoning about reasoning, a domain where the expressive power of probabilistic programs is necessary to formalize our intuitive domain understanding due to the fact that, unlike previous formalisms, probabilistic programs allow conditioning to be *represented* a model, not just *applied to* a model. I illustrate this insight with programs that model nested reasoning in game theory, artificial intelligence, and linguistics.

In the second half, I develop three inference algorithms with the dual intent of showing how to efficiently compute the marginal distributions defined by probabilistic programs, and providing building blocks for process-level accounts of human cognition. First, I describe a Dynamic Programming algorithm for computing the marginal distribution of discrete probabilistic programs by compiling to systems of equations and show that it can make inference in models of "reasoning about reasoning" tractable by merging and reusing subcomputations. Second, I introduce the setting of amortized inference and show how learning inverse models lets us leverage samples generated by other inference algorithms to compile probabilistic models into fast recognition functions. Third, I develop a generic approach to coarse-to-fine inference in probabilistic programs and provide evidence that it can speed up inference in models with large state spaces that have appropriate hierarchical structure.

Finally, I substantiate the claim that probabilistic programming is a * productive* metaphor by outlining new research questions that have been opened up by this line of investigation.

The Design and Implementation of Probabilistic Programming Languages
abstract
bibtex

Noah D. Goodman and Andreas Stuhlmüller (electronic)

Probabilistic programming languages (PPLs) unify techniques for the formal description of computation and for the representation and use of uncertain knowledge. PPLs have seen recent interest from the artificial intelligence, programming languages, cognitive science, and natural languages communities. This book explains how to implement PPLs by lightweight embedding into a host language. We illustrate this by designing and implementing WebPPL, a small PPL embedded in Javascript. We show how to implement several algorithms for universal probabilistic inference, including priority-based enumeration with caching, particle filtering, and Markov chain Monte Carlo. We use program transformations to expose the information required by these algorithms, including continuations and stack addresses. We illustrate these ideas with examples drawn from semantic parsing, natural language pragmatics, and procedural graphics.

Learning physics from dynamical scenes
abstract
bibtex

Tomer D. Ullman, Andreas Stuhlmüller, Noah D. Goodman, and Joshua B. Tenenbaum (2014)

*Proceedings of the Thirty-Sixth Annual Conference of the Cognitive Science society*

Humans acquire their most basic physical concepts early in development, but continue to enrich and expand their intuitive physics throughout life as they are exposed to more and varied dynamical environments. We introduce a hierarchical Bayesian framework to explain how people can learn physical theories across multiple timescales and levels of abstraction. In contrast to previous Bayesian models of theory acquisition (Tenenbaum, Kemp, Griffiths, & Goodman, 2011), we work with more expressive *probabilistic program* representations suitable for learning the forces and properties that govern how objects interact in dynamic scenes unfolding over time. We compare our model and human learners on a challenging task of inferring novel physical laws in microworlds given short movies. People are generally able to perform this task and behave in line with model predictions. Yet they also make systematic errors suggestive of how a top-down Bayesian approach to learning might be complemented by a more bottomup feature-based approximate inference scheme, to best explain theory learning at an algorithmic level.

Learning Stochastic Inverses
abstract
bibtex

Andreas Stuhlmüller, Jessica Taylor, and Noah D. Goodman (2013)

*Advances in Neural Information Processing Systems (NIPS) 27*

We describe a class of algorithms for *amortized inference* in Bayesian networks. In this setting, we invest computation upfront to support rapid online inference for a wide range of queries. Our approach is based on learning an inverse factorization of a model's joint distribution: a factorization that turns observations into root nodes. Our algorithms accumulate information to estimate the local conditional distributions that constitute such a factorization. These *stochastic inverses* can be used to invert each of the computation steps leading to an observation, sampling *backwards* in order to quickly find a likely explanation. We show that estimated inverses converge asymptotically in number of (prior or posterior) training samples. To make use of inverses before convergence, we describe the *Inverse MCMC* algorithm, which uses stochastic inverses to make block proposals for a Metropolis-Hastings sampler. We explore the efficiency of this sampler for a variety of parameter regimes and Bayes nets.

Knowledge and implicature: Modeling language understanding as social cognition
abstract
bibtex

Noah D. Goodman and Andreas Stuhlmüller (2013)

*Topics in Cognitive Science*

Is language understanding a special case of social cognition? To help evaluate this view, we can formalize it as the *rational speech-act* theory: listeners assume that speakers choose their utterances approximately optimally, and listeners interpret an utterance by using Bayesian inference to "invert" this model of the speaker. We apply this framework to model scalar implicature ("some" implies "not all", and "N" implies "not more than N"). This model predicts an interaction between the speaker's knowledge state and the listener's interpretation. We test these predictions in two experiments, and find good fit between model predictions and human judgements.

Reasoning about Reasoning by Nested Conditioning: Modeling Theory of Mind with Probabilistic Programs
abstract
bibtex

Andreas Stuhlmüller and Noah D. Goodman (2013)

*Journal of Cognitive Systems Research*

A wide range of human reasoning patterns can be explained as conditioning in probabilistic models; however, conditioning has traditionally been viewed as an operation *applied to* such models, not *represented in* such models. We describe how probabilistic programs can explicitly represent conditioning as part of a model. This enables us to describe reasoning about others' reasoning using *nested* conditioning. Much of human reasoning is about the beliefs, desires, and intentions of other people; we use probabilistic programs to formalize these inferences in a way that captures the flexibility and inherent uncertainty of reasoning about other agents. We express examples from game theory, artificial intelligence, and linguistics as recursive probabilistic programs and illustrate how this representation language makes it easy to explore new directions in each of these fields. We discuss the algorithmic challenges posed by these kinds of models and describe how Dynamic Programming techniques can help address these challenges.

Learning Stochastic Inverses for Adaptive Inference in Probabilistic Programs
abstract
bibtex

Andreas Stuhlmüller and Noah D. Goodman (2012)

*Poster at the NIPS 2012 Workshop on Probabilistic Programming*

We describe an algorithm for adaptive inference in probabilistic programs. During sampling, the algorithm accumulates information about the local probability distributions that compose the program's overall distribution. We use this information to construct targeted samples: given a value for an intermediate expression, we stochastically invert each of the steps giving rise to this value, sampling {\em backwards} in order to assign values to random choices such that we get a likely parse of the intermediate value. We propose this algorithm for importance sampling and as a means of constructing blocked proposals for a Metropolis-Hastings sampler.

A Dynamic Programming Algorithm for Inference in Recursive Probabilistic Programs
abstract
bibtex

Andreas Stuhlmüller and Noah D. Goodman (2012)

*Second Statistical Relational AI workshop at UAI 2012 (StaRAI-12)*

We describe a dynamic programming algorithm for computing the marginal distribution of discrete probabilistic programs. This algorithm takes a functional interpreter for an arbitrary probabilistic programming language and turns it into an efficient marginalizer. Because direct caching of sub-distributions is impossible in the presence of recursion, we build a graph of dependencies between sub-distributions. This factored sum-product network makes (potentially cyclic) dependencies between subproblems explicit, and corresponds to a system of equations for the marginal distribution. We solve these equations by fixed-point iteration in topological order. We illustrate this algorithm on examples used in teaching probabilistic models, computational cognitive science research, and game theory.

Knowledge and implicature: Modeling language understanding as social cognition
abstract
bibtex

Noah D. Goodman and Andreas Stuhlmüller (2012)

*Proceedings of the Thirty-Fourth Annual Conference of the Cognitive Science Society*

**2012 Cognitive Science Society computational modeling prize for Language**

Is language understanding a special case of social cognition? To help evaluate this view, we can formalize it as the *rational speech-act* theory: listeners assume that speakers choose their utterances approximately optimally, and listeners interpret an utterance by using Bayesian inference to "invert" this model of the speaker. We apply this framework to model scalar implicature ("some" implies "not all", and "N" implies "not more than N"). This model predicts an interaction between the speaker's knowledge state and the listener's interpretation. We test these predictions in two experiments, and find good fit between model predictions and human judgements.

Why blame Bob? Probabilistic generative models, counterfactual reasoning, and blame attribution
abstract
bibtex

John McCoy*, Tomer Ullman*, Andreas Stuhlmüller, Tobias Gerstenberg, and Joshua B. Tenenbaum (2012)

*Proceedings of the Thirty-Fourth Annual Conference of the Cognitive Science Society*

We consider an approach to blame attribution based on counterfactual reasoning in probabilistic generative models. In this view, people intervene on each variable within their model and assign blame in proportion to how much a change to a variable would have improved the outcome. This approach raises two questions: First, what structure do people use to represent a given situation? Second, how do they choose what alternatives to consider when intervening on an event? We use a series of coin-tossing scenarios to compare empirical data to different models within the proposed framework. The results suggest that people sample their intervention values from a prior rather than deterministically switching the value of a variable. The results further suggest that people represent scenarios differently when asked to reason about their own blame attributions, compared with the blame attributions they believe others will assign.

Inducing Probabilistic Programs by Bayesian Program Merging
abstract
bibtex

Irvin Hwang, Andreas Stuhlmüller, and Noah D. Goodman (2011)

*Technical Report. arXiv:1110.5667 [cs.AI]*

This report outlines an approach to learning generative models from data. We express models as probabilistic programs, which allows us to capture abstract patterns within the examples. By choosing our language for programs to be an extension of the algebraic data type of the examples, we can begin with a program that generates all and only the examples. We then introduce greater abstraction, and hence generalization, incrementally to the extent that it improves the posterior probability of the examples given the program. Motivated by previous approaches to model merging and program induction, we search for such explanatory abstractions using program transformations. We consider two types of transformation: Abstraction merges common subexpressions within a program into new functions (a form of anti-unification). Deargumentation simplifies functions by reducing the number of arguments. We demonstrate that this approach finds key patterns in the domain of nested lists, including parameterized sub-functions and stochastic recursion.

Nonstandard Interpretations of Probabilistic Programs for Efficient Inference
abstract
bibtex

David Wingate, Noah D. Goodman, Andreas Stuhlmüller, and Jeffrey M. Siskind (2011)

*Advances in Neural Information Processing Systems (NIPS) 24*

Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how \emph{nonstandard interpretations} of probabilistic programs can be used to craft efficient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efficient construction of global proposals.

Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation
abstract
bibtex

David Wingate, Andreas Stuhlmüller, and Noah D. Goodman (2011)

*Proceedings of the 14th international conference on Artificial Intelligence and Statistics*

We describe a general method of transforming arbitrary programming languages into probabilistic programming languages with straightforward MCMC inference engines. Random choices in the program are "named" with information about their position in an execution trace; these names are used in conjunction with a database holding values of random variables to implement MCMC inference in the space of execution traces. We encode naming information using lightweight source-to-source compilers. Our method enables us to reuse existing infrastructure (compilers, profilers, etc.) with minimal additional code, implying fast models with low development overhead. We illustrate the technique on two languages, one functional and one imperative: Bher, a compiled version of the Church language which eliminates interpretive overhead of the original MIT-Church implementation, and Stochastic Matlab, a new open-source language.

Learning Structured Generative Concepts
abstract
bibtex

Andreas Stuhlmüller, Joshua B. Tenenbaum, and Noah D. Goodman (2010)

*Proceedings of the 32nd Annual Conference of the Cognitive Science Society*

Many real world concepts, such as "car", "house", and "tree", are more than simply a collection of features. These objects are richly structured, defined in terms of systems of relations, subparts, and recursive embeddings. We describe an approach to concept representation and learning that attempts to capture such structured objects. This approach builds on recent probabilistic approaches, viewing concepts as generative processes, and on recent rule-based approaches, constructing concepts inductively from a language of thought. Concepts are modeled as probabilistic programs that describe generative processes; these programs are described in a compositional language. In an exploratory concept learning experiment, we investigate human learning from sets of tree-like objects generated by processes that vary in their abstract structure, from simple prototypes to complex recursions. We compare human categorization judgements to predictions of the true generative process as well as a variety of exemplar-based heuristics.

Learning Compositional Representations
abstract
bibtex

Andreas Stuhlmüller (2009)

*Bachelor's thesis at the University of Osnabrück*

How could an agent learn and reason in a complexly structured, stochastic world? This problem is at the center of both artificial intelligence and psychology. One candidate answer to this question is that both learning and reasoning can be explained as probabilistic inference over a language-like hypothesis space for generative models. The goal of this thesis is to describe what makes this approach plausible and to demonstrate the learning of generative models from structured observations in a simple world consisting of stochastic, tree-generating programs. In order to make program inference feasible, I derive a new class of adaptive importance sampling algorithms for probabilistic programs that let us compute the likelihood of structured observations under a given generative model. Using this algorithm in combination with Markov Chain Monte Carlo methods, I quantitatively show that we can reliably infer generative models from observations in our demonstration world.

## Projects

I am one of the authors of WebPPL, a probabilistic programming language for the web.

I run Forest, a repository for generative models.

I contributed to MIT-Church, the original implementation of the Church probabilistic programming language, and to Bher, a lightweight compiler for Church.

I wrote Cosh, a Church implementation based on exact inference with aggressive sharing of subcomputations.

## Interests

### Probabilistic Programming Languages

Probabilistic programs express generative models, i.e., formal descriptions of processes that generate data. To the extent that we can mirror mechanisms “out there in the world” in a formal language, we can pose queries within this language and expect the answer to line up with what happens in the real world. For example, if we can accurately describe how genes give rise to proteins, this helps us in predicting which genetic changes lead to desirable results and which lead to harm.

Traditionally, Bayesian networks have been used to formalize and reason about such processes. While extremely useful, this formalism is limited in its expressive power compared to programming languages in general. Probabilistic programming languages like Church provide a substrate for computational modeling that brings together Bayesian statistics and Turing-complete programming languages.

Get started with the Probabilistic Models of Cognition tutorial.

### Universal Inference Algorithms

A useful system for probabilistic programming needs to provide more than the ability to formally describe mechanisms: It must answer questions about such mechanisms, i.e., infer the consequences of conditioning these mechanisms on propositions of interest. Using Bayesian terminology, a probabilistic program specifies a prior distribution over return values. When we observe a value for a particular variable within the program, the goal of inference is to compute the posterior distribution on the unobserved variables. For example, given a program expressing a prior over diseases and symptoms, we may condition this program on the values of observed symptoms and compute the posterior distribution on diseases.

The problem of efficiently computing the posterior distribution provides a rich set of research directions: How do existing Bayes net inference algorithms like Metropolis-Hastings, importance sampling, and belief propagation generalize to the setting of probabilistic programs? How can we improve upon existing algorithms by making use of additional structure exposed by probabilistic programs? How can compilation techniques like flow analysis and abstract interpretation help? How can the task of inference in a particular program be formalized itself and treated as a meta-inference problem?

Read The Design and Implementation of Probabilistic Programming Languages and learn how to build a probabilistic programming system with priority-based enumeration, particle filtering, and MCMC inference algorithms.

### Structure of Concepts

What is the structure of thought? The language of thought hypothesis proposes that thoughts are expressions in a mental language, and that thinking consists of syntactic manipulation of these expressions. The probabilistic language of thought hypothesis suggests that these representations have probabilistic meaning. If we treat concepts as stochastic programs and thinking as approximate Bayesian inference, can we predict human behavior in experimental settings better than using other approaches, e.g., prototype theory? If we learn more about the primitives and composition mechanisms used in the brain to represent abstract concepts, can this inform how we construct tools for reasoning?

Read my CogSci 2010 paper with Josh Tenenbaum and Noah Goodman on concept learning experiments and modeling.

### Decision Theory

When we attempt to solve any particular problem, we do so because we believe that, overall, this is what we should be doing; we believe that, taking everything we know into account, we *prefer* to solve this problem over doing anything else. This judgement can be mistaken: We can make errors in our reasoning, fail at introspection, and forget to consider consequences. Using our intuitions, we attempt to solve a particular decision problem—the problem that defines the correct answer to the question *what should be done?*—and our solution is only approximately correct.

In the long run, we want to improve at solving *this* decision problem. Autonomous intelligent systems, if ever built, need to contribute to solving this decision problem if the outcome is to be of moral worth. This seems to require that we improve upon our intuitive understanding of what this decision problem is. Currently, the most tractable approach to this seemingly impossible problem appears to be the study of decision theory and of related notions such as choice, control, and consequentialism.

## Elsewhere

github.com/stuhlmueller

Open source projects in JavaScript, Python and Scheme.

facebook.com/stuhlmueller

Shared photos and location updates.

scholar.google.com/citations?user=-PR7-K0AAAAJ

Publications as indexed by Google Scholar.