Thursday, September 25, 2008

Visualizing the Truth

Here's an idea I submitted to Google's 10 to the 100th Project:

8. Your idea's name (maximum 50 characters)
Visualizing the Truth

9. Please select a category that best describes your idea.
Everything else

10. What one sentence best describes your idea? (maximum 150 characters)
To improve decision making, we store knowledge as a massive Bayesian belief network, display it intuitively, and enable extensive what-if simulations.

11. Describe your idea in more depth. (maximum 300 words)
Our brains are amazing decision making devices.  For most problems involving a few variables, we can mentally simulate various outcomes before deciding, often with great results.  For larger problems we can also rely on instincts/emotions (i.e. pre-computed solutions based on lots of experience).  However, for the most complex issues, especially those critical decisions faced by government leaders, our human brains are not able to accumulate enough data or foresee enough outcomes.  The number of possibilities is too great.  (For example, banning DDT to save endangered birds seems reasonable, but then third-world farmers produce less food and many people die of hunger.  Ideally, we could predict such long-term consequences from the start.)

My idea is to augment human decision making with Bayesian networks in a way that scales with the exponential growth of computer hardware.  Bayesian networks enable us to represent knowledge intuitively as "beliefs about the world."  Arguably, they function similarly to the brain's neocortex.  Running on a large computer cluster, a massive Bayesian network could represent a repository of our society's knowledge.  Unlike text-based systems, the belief representation would be ideally suited for decision making.  This wiki-style belief network would be totally open to modifications by the global community (with abuse prevention) via a variety of input methods, including mobile devices.  Users could add variables/nodes, modify connections/probabilistic relationships or utility values, etc. based on their own experience.  (Software agents could offload some of this burden, e.g., by suggesting new connections based on inferred relationships.)  Crucially, the belief network could be viewed in visually beautiful ways (e.g., see IBM's Many Eyes project).  Users could perform extensive what-if scenarios.  The result would be a concise summary of our collective beliefs and a substrate for meaningful simulations.

12. What problem or issue does your idea address? (maximum 150 words)
The problem is that of making hugely important decisions by fundamentally limited decision making machines (i.e. the human brain).  My idea provides a way for us to avoid persuasive fallacies, to which our human brains are so susceptible, that have greatly hindered progress on large-scale decisions.  For example, as presidential candidates propose new policies during their campaigning, citizens could immediately run the ideas through the belief network and see the expected outcomes.  It would provide a simple, automatic way for the general public to cut through the BS.

13. If your idea were to become a reality, who would benefit the most and how? (maximum 150 words)
In general this technology could be used by anyone making difficult decisions.  However, government policy makers are one of the most important target users.  Members of congress could spend their time performing highly informative what-if scenarios more quickly and effectively than via spoken language.  They could see immediate answers to questions like, "If we enforce this new policy, how does that affect key economic and environmental variables?  How does it affect more distant variables, like diplomatic relationships?  What are the expected probabilities of these outcomes?  How should we make this decision in order to maximize expected utility given conflicting goals?"  ("Utility" values could be determined by voting.)  Imagine permanent client installations as centerpieces of congressional meeting places.  Thus, the beneficiaries would include all citizens of any country whose government decides to utilize this technology.

14. What are the initial steps required to get this idea off the ground? (maximum 150 words)
We would need to design, implement, and test a prototype.  This involves connecting a scalable Bayesian network software library (freely available) with a website displaying the belief network in an attractive manner.  The Bayesian network could be hosted on a scalable compute platform, like Amazon's EC2.  There should be a dead-simple (fun!) way to input new data via the website and mobile clients.  Similarly, users should be able to perform simulations by making (temporary, client-side only) changes to the network and watching the results propagate through the belief network.  Besides the basic technologies involved (Bayesian networks, hosting, and user input methods), the success of the project greatly depends on the visual attractiveness of the website and how much fun it is to use.  Thus, it is critical to involve skilled graphic designers and possibly video game developers in the implementation.

15. Describe the optimal outcome should your idea be selected and successfully implemented. How would you measure it? (maximum 150 words)
The ideal outcome is that when facing complicated decisions, the proposed system would change our default decision making behavior, similarly to how Google search has changed our default memory recall behavior.  The effect would be pervasive but undoubtedly very diffuse; measuring the effect would be difficult.  One simple way might be through a public poll, e.g., "When faced with difficult decisions, do you: A) think about it for a while, B) decide on instinct, C) get advice from friends, D) consult"  Ideally, using the system would be both fun (encouraging massive participation) and highly informative.  The result is that our most important decisions would be based on more than just a few mental simulations, more than even instincts or emotions, but on the collective knowledge of humanity.


Joseph said...

If it were to take off and become widely used, do you expect such a system would be capable of giving useful information about its own influence? E.g., how would it answer this question: If this system started answering that candidate A would perform task x better than candidate B, how would it influence the election?

The idea itself is pretty ambitious. My biggest question relates to the acquisition of data. Personally, I don't believe in the various "knowledge engineering" projects. But a Bayesian network doesn't really work with declarative facts anyway. What kind of information would people be contributing? "In my experience, situation X results in situation Y with confidence P... unless situation Z also happened." Coming up with a way to make it fun is a formidable challenge.

Johnnyburn said...

Great idea, Tyler. Post again when its selected for the top 100. I'll vote for you. Unless there's a better idea. :)

Tyler Streeter said...

That kind of recursive effect would definitely be very dangerous... it would potentially nullify all results. So it would be crucial to normalize out that effect. It seems that there are methods to normalize for similar situations - you definitely don't want a positive feedback loop, like, "Tom tells Susan it's supposed to rain, then, based on that knowledge, Susan tells Tom it's supposed to rain, so Tom increases his belief that it will rain, then tells Susan he's now more sure it will rain, so she increases her belief that it will rain, then...", so Bayesian networks include a way to count the evidence only once. I'll bet it would be possible to include a node representing the system itself and create causal connections representing its effect on the other variables.

Concerning data acquisition... Ok, maybe "fun" is too strong of a word, but it should be as painless as possible. Why would people want to provide data? I guess the closest example is Wikipedia. Although most people probably wouldn't say they're having "fun" editing Wikipedia pages, they're still motivated to do it. (Most people just like being right...) The system I'm proposing would need to tap that same motivation.

There might be islands of knowledge that are unconnected to the mainland because they don't relate to anything else. Their probability values might even be inaccurate because they're only modified by biased sources. But, just like Wikipedia, as soon as these islands start to affect other bodies of knowledge, more people care about the integrity of the system and start voting on the relevant links.

As far as the data input format goes... it's critical that things are presented in an intuitive way. Too many if-then branches and it gets confusing. It seems possible to use only variable nodes and causal links. People could add new variable nodes and new causal links and vote on the link weights (i.e. the values of the conditional probability tables relating two variable nodes). For example, when the network is visually displayed, someone would double-click a causal link, set the values of the two variable nodes in question, and cast a vote for the conditional probability value. (Or, on a mobile phone app, you search for a variable by name, see a list of its incoming and outgoing causal links, select a link, set the variable values, and vote for the probability.)