[Markdown]

How to describe images in the Fediverse: What is a screen reader, and how does it work?

What is a screen reader, and how does it work?

A screen reader is a software application which blind or visually-impaired people install on their devices, e.g. computers or smartphones, to help them experience what the screen shows.

What does a screen reader do?

Mainly, a screen reader reads the text on the screen out loud, much like a mother reads a book out loud to her child. It basically is a text-to-speech application. At least some screen readers can also send text to a Braille display, a hardware device that mechanically renders text as Braille script, one line at a time.

The user can navigate around the screen and choose which text shall be read. It is also possible to navigate within a text, e.g. skip back a few words to have a certain part of the text re-read, or to pause and resume the reading.

Some screen readers can identify the language of a text which they are about to read out, and then they read the text out loud in the appropriate language.

When a screen reader encounters an image, it announces the image with, "Image," or, "Graphic," or something like that. If the image has an alt-text, then the screen reader reads out the alt-text immediately after announcing the image.

What limitations do screen readers have?

Some websites and blog posts claim that screen readers still have an upper character limit for alt-texts, i.e. they cannot read out alt-texts that are longer than a certain number of characters. This is no longer true.

If screen readers can identify a language and choose the right language in which to read a text out loud, they can only do so before reading out loud. They cannot identify and switch languages mid-reading. Especially, they cannot identify and switch languages in the middle of an alt-text. This is a problem when a post includes an image with text in a different language than the post itself.

While screen readers can navigate through text, they cannot navigate through alt-text. One they've started reading an alt-text, they can only either read through the whole alt-text or skip back to the beginning of the alt-text. But they cannot pause while reading an alt-text, and they cannot skip back to some point within the alt-text. This is one reason why alt-text should not be longer than absolutely necessary.

What can't a screen reader do?

One thing that screen readers definitely cannot do is look at images and describe them. Particularly, describe them accurately, no matter how obscure the contents of the image are, depending on the context in which the images are posted and while following all the rules in this wiki. No AI out there is fully capable of this at all, especially not without any specific prompt.

Besides, an AI will take several seconds or even half a minute or more to put together an image description. But a screen reader cannot pause for that long and expect its user to wait for half a minute or so until the image description is calculated.

At most, some screen readers can use OCR to read text in an image if it's readable enough. But even that isn't 100% reliable.

The only really reliable source for accurate, context-dependent image descriptions and correct text transcripts are humans.