Vision might seem simple - things in the real world have size and shape and color, and your eyes and brain just detect them, right? It’s actually a lot more complicated than that! There are lots of different configurations of the external world that could cause the same pixels of light to enter your eyes, and your brain figures out which one is the most likely.

Here’s an example that you might have seen before, called the Necker cube:

If you saw that and knew it was a three-dimensional object, you could interpret it as either of the following:

In that example the ambiguity is obvious, but in real life everything you see is ambiguous, you just don’t realize it. For example, if you see someone wearing a green shirt, maybe they’re actually wearing a white shirt and someone is shining a perfectly shirt-shaped green light on them; but that explanation is so unlikely that you don’t even consider it.

A lot of clues come from the context - the situation or other things around. Take a look at this image:

If you saw only the row of letters, the thing in the middle would be obviously a “B”, and if you saw only the column of numbers, it would be obviously a “13”, but the symbol itself is ambiguous. The following comic plays on this effect:

This effect doesn’t only occur in recognizing letters or objects - it can even affect how you perceive colors, as you can see in this video.

In fact, in some sense you can’t even really say that an object “is” a certain color, because what color it is depends on what light is shining on it and what’s around it; your brain tries to figure out these things and account for them. For example, in this picture, you can tell that there’s bright light shining on the top of the cube and a dark shadow on the front:

The brown-looking square on the top and the yellow-looking square on the front are actually the same color of pixels, but because of the contrast with the squares around them, they don’t look like it. (You might remember the picture of the dress that people were arguing about on the internet a few months ago - it’s the same idea.)

Your brain automatically adjusts for a lot of things other than lighting, too. For example, in this picture:

The two yellow lines are drawn the same size on the picture, but the top one looks bigger because your brain is interpreting it as being farther away.

Some of the things that you “see” are actually just your brain filling in what’s probably there. Your eyes are moving around all the time, and they’re actually making lots of tiny jumps (called saccades). While they’re in the middle of jumping from one focus to another, you can’t see anything, but your brain fills it in. You may have noticed this when you click a link on a webpage and are waiting for it to change, and then suddenly you notice it has changed even though you thought you were looking at it the whole time.

When you’re seeing something, it’s impossible to see certain properties of it without seeing others. For example, try to name the colors of the words in this picture:

It’s hard, isn’t it! You can’t see the color without also reading the word. This is called the Stroop effect. Similar effects happen for other properties too, and not just static properties - for example, there was one experiment where people watched a line on a computer screen growing from short to long and then had to judge how long it got, and if it took more time to grow, they thought its length was longer.

Context is also important in that when you look at a scene, you don’t just perceive the set of things in it, you perceive general features of it, and then combine them to make sense of the overall scene. If you look at this picture:

you can see that there are a lot of lines, and that they’re red and green and horizontal and vertical, but if you had to say whether any of the lines are both red and horizontal, it’s not immediately obvious. (This is why Where’s Waldo is hard.) It might seem uninteresting that this task is hard, but the interesting thing is that if you were perceiving each object separately, it would be easy.

A lot of your ability to turn light signals in your eyes into a perception of the world depends on practice. When people are born blind and later get surgery that gives them the ability to see, they can’t automatically look at a picture and tell what’s part of which object. We’re much better at recognizing things that we’re more familiar with - for example, here’s a picture of two different humans and two different monkeys:

To us, the two humans don’t look much alike at all, whereas the two monkeys look almost exactly the same. If we had spent our lives interacting with monkeys, the monkeys would look very different from each other too. Six-month-old babies haven’t spent that much time looking at humans, and they’re equally good at differentiating the two humans as the two monkeys.

We’re not born knowing nothing about how to see, though. As soon as babies are born, they would rather look at things that are vaguely face-looking than things that aren’t, like the shape on the left as opposed to the one on the right in this picture:

Babies can even imitate some facial movements as soon as they’re born, like sticking out their tongue (here’s a video). This means that without ever having seen a face before, a baby can look at someone’s face and in some sense know that the other person has a face just like they do and that their face can move in the same ways.

Here are some websites with more cool optical illusions: