Feb 2, 2026

Classifying Three Triangles with a Multi-Layer Perceptron

In the XOR post, a small multi-layer perceptron learned a function a single perceptron could not: XOR needs two boundaries, and stacking a hidden layer gave us those two boundaries.

That idea scales. If two lines can carve out the XOR region, then a handful of lines can carve out something much richer. Here we tackle a problem that looks hard but decomposes cleanly:

Given a point (x, y), output 1 if it lies inside any of three triangles, and 0 otherwise.

Three triangles on a Cartesian plane meeting at the origin

The three triangles all meet at the origin:

Right: (0,0), (4,0), (2,2)
Left: (0,0), (-4,0), (-2,2)
Bottom: (0,0), (-2,-2), (2,-2)

Every straight edge is a single linear decision — which side of a line a point is on — and that is exactly what one neuron computes.

The idea: edge → triangle → union

A perceptron draws one straight line and reports which side of it a point is on. A triangle is just the region where a point is on the inside of all three of its edges at once. And “inside any triangle” is the OR of the three triangle answers.

That gives us a three-layer decomposition, one layer per level of the logic:

Edge layer. One neuron per edge line. Each fires (1) when the point is on the inside of that line.
Triangle layer (3 neurons). Each computes the AND of its triangle’s three edges: inside iff all three edges say inside.
Output layer (1 neuron). The OR of the three triangle neurons.

Six lines, not nine

Three triangles with three edges each looks like nine edge neurons. But look again at where the triangles meet: they all touch the origin, and several edges lie on the same line.

The bottoms of the right and left triangles both sit on the x-axis, y = 0.
The right triangle’s left edge and the bottom triangle’s right edge are both the line x = y.
The left triangle’s right edge and the bottom triangle’s left edge are both the line x = -y.

Each of those pairs is not just the same line but the same side — inside the two triangles that share it falls on the same half. So one neuron can serve both. That collapses nine edges down to six distinct lines, and six edge neurons is all we need:

#	Line	Belongs to	Inside test
e0	`y = 0`	right + left bottoms	`y ≥ 0`
e1	right upper edge	right	`-x - y + 4 ≥ 0`
e2	`x = y`	right left-edge + bottom right-edge	`x - y ≥ 0`
e3	left upper edge	left	`x - y + 4 ≥ 0`
e4	`x = -y`	left right-edge + bottom left-edge	`-x - y ≥ 0`
e5	bottom floor	bottom	`y + 2 ≥ 0`

The shared neurons simply fan out to two AND gates instead of one, which is exactly what the hero diagram at the top shows. Nothing here needs a fancy optimiser — every weight can be written down by hand, because AND and OR are themselves linearly separable and each edge is just a line.

Turning an edge into a neuron

Take the right triangle’s bottom edge, from (0,0) to (4,0). That is the line y = 0, and “inside” is above it, so the test is:

y \ge 0

A neuron computes w · [x, y] + b and passes it through an activation. If we use a step activation (1 when the sum is ≥ 0, else 0), then weights w = [0, 1] and bias b = 0 give exactly y ≥ 0. Fires above the x-axis, silent below.

That is neuron e0. Every other row of the six-line table is derived the same way — write the edge as a·x + b·y + c ≥ 0 with the inside made positive, then read off w = [a, b] and b = c. Stacked together the six neurons form the weight matrix W1 (shape 2 × 6) and bias vector b1 (length 6).

AND and OR as neurons

Once the edge layer has produced its six 0/1 values, the rest is pure boolean logic — and both AND and OR are one-neuron-easy. The key fact we lean on: those six values are exactly 0 or 1, never anything in between, because they come out of a step activation. That lets every bias be a plain integer — we just count how many inputs must be on.

AND of n inputs. Give each input weight 1, so the sum equals the number of inputs that are on. It should fire only when all n are on, i.e. when the sum reaches n. So subtract n:

h_1 + h_2 + h_3 - 3 \ge 0

All three on gives 3 - 3 = 0, which fires. Drop any input to 0 and the sum is 2 or less, so sum - 3 is negative and the neuron stays silent. For a 3-input AND the bias is simply -n = -3.

OR of n inputs. Same weights, but now at least one input on is enough — the sum only has to reach 1. So subtract 1:

t_1 + t_2 + t_3 - 1 \ge 0

Any single 1 gives 1 - 1 = 0 and fires; all zeros give -1 and stay silent. The bias is -1, regardless of how many inputs there are.

No fractional thresholds like 2.5 or 0.5 are needed. Those only show up if you want a boundary that sits between two integers — useful when inputs can be any real number, but pointless here, where the inputs are already clean 0s and 1s.

The network in NumPy

Because every weight is known, we can write the whole classifier as three matrix multiplications and a step function. No training loop required.

import numpy as np
import numpy.typing as npt

def step(x: npt.NDArray[np.float64]) -> npt.NDArray[np.float64]:
    return (x >= 0).astype(np.float64)

# Layer 1: one neuron per distinct edge line. Columns are e0..e5.
w1 = np.array(
    [
        #  e0  e1  e2  e3  e4  e5
        [0, -1, 1, 1, -1, 0],   # x weights
        [1, -1, -1, -1, -1, 1],  # y weights
    ],
    dtype=np.float64,
)
b1 = np.array([[0, 4, 0, 4, 0, 2]], dtype=np.float64)

# Layer 2: AND each triangle's 3 edges. Shared edges feed two columns.
#   right  = e0 & e1 & e2
#   left   = e0 & e3 & e4
#   bottom = e2 & e4 & e5
w2 = np.array(
    [
        # right  left  bottom
        [1, 1, 0],  # e0 -> right, left
        [1, 0, 0],  # e1 -> right
        [1, 0, 1],  # e2 -> right, bottom
        [0, 1, 0],  # e3 -> left
        [0, 1, 1],  # e4 -> left, bottom
        [0, 0, 1],  # e5 -> bottom
    ],
    dtype=np.float64,
)
b2 = np.array([[-3, -3, -3]], dtype=np.float64)

# Layer 3: OR the three triangles.
w3 = np.array([[1], [1], [1]], dtype=np.float64)
b3 = np.array([[-1]], dtype=np.float64)


def predict(points: npt.NDArray[np.float64]) -> npt.NDArray[np.float64]:
    edges = step(points @ w1 + b1)      # (N, 6)  which edge-halves are "inside"
    triangles = step(edges @ w2 + b2)   # (N, 3)  inside each triangle?
    return step(triangles @ w3 + b3)    # (N, 1)  inside any triangle?

The @ operator is matrix multiplication, just as in the XOR post. Here it lets every point in a batch flow through all three layers at once.

Trying it out

Feed in a handful of points — some clearly inside a triangle, some clearly outside:

tests = np.array(
    [
        [2.0, 0.5],   # inside right
        [-2.0, 1.0],  # inside left
        [0.0, -1.0],  # inside bottom
        [0.0, 1.0],   # in the gap above the origin, outside all
        [3.0, -1.0],  # outside every triangle
        [-3.0, 0.5],  # inside left
    ]
)

print(predict(tests).ravel().astype(int))

The output matches the geometry exactly:

[1 1 1 0 0 1]

The three “inside” points return 1, and the two points in the empty regions — plus the point at (0, 1), which sits in the wedge between the two upper triangles — return 0.

Why bother with the hidden layers?

It is worth pausing on what each layer bought us.

The edge layer turns raw coordinates into six simple yes/no features. On its own, no single one of these lines separates “inside a triangle” from “outside” — a point above the x-axis could be in either upper triangle or neither.
The triangle layer combines edges into meaningful regions. This is the step a single perceptron can never take: a triangle is an intersection of half-planes, and intersection is not linearly separable from raw inputs.
The output layer unions those regions.

This mirrors the XOR lesson exactly, just scaled up. XOR needed two lines and one combination. The three triangles need six lines and two levels of combination — and noticing that the edges share lines is what got us from nine neurons down to six. The moment you can stack layers, “how many boundaries” and “how they combine” stop being a limitation and start being a design choice.

What if we didn’t know the triangles?

Notice what we actually did to build this network: we knew the three triangles, so we read the six edge lines straight off the diagram and typed the weights in by hand. Every number in w1, b1, w2, b2 came from geometry we could see.

Real problems don’t hand you the geometry. You get a pile of points, each already labelled 1 (inside something) or 0 (outside) — but nobody tells you there are three triangles, or where their edges are. Now you can’t type the weights in, because you don’t know them.

Training is how the network finds those weights on its own. The recipe is exactly the one from the XOR post:

Start with the same shape — 2 inputs → 6 → 3 → 1 — but fill the weights with small random numbers instead of the hand-picked ones.
Show it a labelled point, let it predict, and measure how wrong the prediction was.
Use backpropagation to nudge every weight a little in the direction that reduces that error.
Repeat over thousands of points, thousands of times.

Slowly, the six first-layer neurons drift until they sit on lines that carve the plane the right way, the middle layer settles into AND-like gates, and the output settles into an OR. The network rediscovers the same kind of structure we built by hand — six boundaries combined in two stages — without ever being told the triangles exist.

So the hand-wired version and the trained version are two ways to reach the same place. Hand-wiring shows that a solution of this shape exists and what it looks like. Training is what you use when you can see the labelled points but not the shape behind them — which is the situation in every real problem.

(One practical detail: training uses the smooth sigmoid activation from the XOR post rather than the hard step we used here, because backpropagation needs a slope to follow. The step function is flat everywhere, so it gives gradient descent nothing to push against.)

So how many hidden neurons — 6 or 600?

Here is the honest catch. We used exactly 6 first-layer neurons because we counted 6 lines in the picture. But if you don’t know the shape — the whole point of training — you don’t know it’s 6. It could be 4 triangles, or a circle (which is really many short line segments), or something with no clean count at all. So how do you pick the number?

The short answer: you don’t compute it, you over-provide it and let training sort it out. Three facts make this work.

Too few genuinely fails. The first layer can only draw as many boundary lines as it has neurons. Three triangles need 6 lines, so a first layer of 3 neurons cannot represent the answer no matter how long you train — there aren’t enough lines to go around. This is underfitting, and it is a hard ceiling: the shape simply won’t fit. If you train our problem with 3 hidden neurons it plateaus well short of the 6-neuron version, because two of the six edges have nowhere to live.

Extra neurons are cheap insurance. What if you guess 20 when 6 would do? Training still works. The spare neurons don’t corrupt the answer — they drift to lines that don’t matter, or the layers after them learn to ignore them (weight near zero). An unused neuron contributes nothing rather than something wrong. So the cost of guessing too high is just a bit of wasted computation, while the cost of guessing too low is a network that can’t solve the problem.

That asymmetry decides the strategy. Because too-few is fatal and too-many is merely wasteful, you deliberately err high: pick a hidden size comfortably larger than you think the problem needs, train, and check the accuracy. If it is good, you had enough neurons. If it plateaus low, you were under capacity — make the layer bigger and retrain. You are tuning a dial by feel, not deriving the exact count.

So “6 or 600?” has a practical answer: for this toy problem 6 is exactly right and 600 would train fine but waste effort; for a problem whose shape you cannot see, you start generous, watch the accuracy, and adjust. The number of hidden neurons is not something you read off the data — it is a knob you set, and picking it well is a real part of designing a network.

Complete code

import numpy as np
import numpy.typing as npt


def step(x: npt.NDArray[np.float64]) -> npt.NDArray[np.float64]:
    return (x >= 0).astype(np.float64)


w1 = np.array(
    [
        [0, -1, 1, 1, -1, 0],
        [1, -1, -1, -1, -1, 1],
    ],
    dtype=np.float64,
)
b1 = np.array([[0, 4, 0, 4, 0, 2]], dtype=np.float64)

w2 = np.array(
    [
        [1, 1, 0],
        [1, 0, 0],
        [1, 0, 1],
        [0, 1, 0],
        [0, 1, 1],
        [0, 0, 1],
    ],
    dtype=np.float64,
)
b2 = np.array([[-3, -3, -3]], dtype=np.float64)

w3 = np.array([[1], [1], [1]], dtype=np.float64)
b3 = np.array([[-1]], dtype=np.float64)


def predict(points: npt.NDArray[np.float64]) -> npt.NDArray[np.float64]:
    edges = step(points @ w1 + b1)
    triangles = step(edges @ w2 + b2)
    return step(triangles @ w3 + b3)


tests = np.array(
    [
        [2.0, 0.5],
        [-2.0, 1.0],
        [0.0, -1.0],
        [0.0, 1.0],
        [3.0, -1.0],
        [-3.0, 0.5],
    ]
)

print(predict(tests).ravel().astype(int))