Mental Model

We have a bunch of events (say, Burglary, Alarm, and John Calling), we’ve run experiments and noted the probability of each event occurring.
This is great, we now have a way to tell how likely, say, the Alarm is to ring on any random day.

However, wouldn’t it be much more informative to compute the probability of the Alarm ringing given the current observation of, say, John’s phone call? wouldn’t the calculated probability

“The Alarm is this likely to ring now that John just called”

be way more tailored to our current reality than a generic

“The Alarm is generally this likely to ring”?

Yup, it would.

This tailored probability makes an assumption then that the probability of the Alarm ringing is affected by John calling, and that’s why we should start thinking of the events as not isolated entities, but as ones that can affect the probability of one another.

We do this by building a network of how events/nodes affect one another, and in a cause-and-effect way.

We start with an assumption that all nodes affect each other, and remove the connection between ones that hardly do.

This is a Bayesian Network.

What if we don't take the time to remove connections?

We’ll end up with redundant connections, the network will be usable, but it’ll do redundant computations.

Building the Network

The network is composed of three things: the nodes (the events), the connections between them (the arrows), and how strong each connection is (the CPTs).

The Nodes

The Nodes in this example are,

Burglary happens
Earthquake occurs
Alarm rings
John calls
Mary calls

This is the network with every node added.
The nodes in this example all have 2 possible observations, they can either occur or not occur.
They’re all binary.

NOTATION ALERT

For a given binary node (e.g., A), I write its +ve observation (occured) as its lowercase version (e.g., a), and just add a " $\neg$ " to its -ve observation (e.g., $\neg$ a)

Define Connections

We ran experiments and figured the following connections for each node,

B
- causes: A
E
- causes: A
A
- causes: J, M
J
- causes: nothing (yet)
M
- causes: nothing (yet)

This is the network with every connection added.

Measure Connections

“how much is each observation in node A caused by its parent observations?”

This is what the CPT of node A tell us.

EXAMPLE

The CPT of A contains the probability of its observations (a, ¬a) given all the possible combinations of its parents’ observations.

To make the diagram more compact

I only include entries for the +ve observation of a node (e.g.., a), and this is okay since each node has only 2 possible observations (e.g., a or ¬a); its -ve observation is its complement:
$p (\neg a) = 1 - p (a)$

To fill out each CPT table, we use methods like Maximum Likelihood Estimation.

This is the network with every CPT added.

Using the Network

Now that the network is built, how is it used?

Remember that we mainly want it to, given a set of observations of some nodes, to compute the probability of the rest of the nodes.

This is called inference.

Predictive inference occurs when we predict the probability of a child observation occurring, given that a parent one has occurred.
Diagnostic inference occurs when we evaluate the probability of a parent observation being the one that had caused a child one.
Predictive inference is done before diagnostic one.

Inferring a node usually requires inferring its parent/children nodes first, this is done until reaching nodes that were inferred.

We basically move from the node that we want to infer towards observed nodes.

How Predictive Inference is done

One Parent

Assume that R1 is a far ancestor of B and it was observed.
This is how $P (a ∣ r 1)$ is computed.

Multiple Parents

Assume that R1, R2 is a far ancestor of B, E, and they were observed.
This is how $P (a ∣ r 1, r 2)$ is computed.

How Diagnostic Inference is done

One child

Assume that L1 is a far descendant of J, and it was observed.
This is how $P (a ∣ l 1)$ is computed.

Multiple children

Assume that L1, L2 is a far descendant of J, M, and they were observed.
This is how $P (a ∣ l 1, l 2)$ is computed.

In the diagram below, each node outputs its inference given the current set of observations.

Before any specific observation combination is given, each node should output its inference given no evidence or the “general” probability of its possible observations.
(e.g., Node A outputs P(a) and P( $\neg$ a))
However, once an observation combination is given, a node’s inference may change.
It only changes if the observation has an “active” path to it.

To find what nodes are affected by each observation, we need another diagram that shows us what active paths are there for every observed node.

How to create an Active Paths diagram

I like to imagine every observed node as an information source that tries its best to reach as much nodes as possible, but its movement needs to follow the following rules,
(suppose that A is the observed node whose active paths we’re trying to draw)

Information can’t flow from a parent to another if the child is not observed.
Information can’t flow from a sibling to another if the parent is observed.
Information can’t cross another observed node.

Examples

(Example #1)
Say the node B was observed, this will be the diagram, showing the active paths of B.

it affects every single node, except its spouse.

Where did the CPTs go?

I draw the very final diagram this way, since adding the CPTs would add too much visual clutter.

(Example #2)
Say node M was observed, this will be its active path.

it affects every single node, nothing blocks its path.
(Example #3)
Say both of them were observed, this will be their active paths.
(Example #4)
Here’s an interesting one, say nodes A and M were observed, this will be their active paths.

The information stream of M can’t cross A (to B or E) or move to its sibling.

Connections

[]

After-Thoughts

I’ve spent hours upon hours, days upon days, working on this. It’s exhausting.

Making a visualization that makes sense and captures what a Bayesian network is and how it operates was extremely challenging, and Asser, I don’t like how u pushed through a burnout and spent that much time completing it.

But I still am gald u valued your thoughts that much =)

Asser'sKnowledge Space

Explorer

Bayesian Networks

Mental Model

Building the Network

The Nodes

Define Connections

Measure Connections

Using the Network

How Predictive Inference is done

One Parent

Multiple Parents

How Diagnostic Inference is done

One child

Multiple children

How to create an Active Paths diagram

Examples

Connections

After-Thoughts

Graph View

Table of Contents

Backlinks