Jupiter Rowland

Jupiter Rowland

jupiter_rowland@hub.netzgemeinde.eu

Duothematic channel. Primary topic is virtual worlds/OpenSim, secondary topic is the Fediverse beyond Mastodon. This channel is NOT about real life!

Kategorien

Alles
(streams)
Fediverse
How to
Image description meta
Image descriptions
Metaverse in general
OpenSim
Second Life

Image descriptions in the Fediverse

2025-03-17 19:13:51

Profil ansehen

Jupiter Rowland

jupiter_rowland@hub.netzgemeinde.eu

I have learned a lot about describing images according to Mastodon's standards, and I want to share my knowledge, but I haven't learned enough

Artikel ansehen

Zusammenfassung ansehen

It must have been two years ago that I've learned about the importance of describing images in the Fediverse.

Now, I'm not someone who's easily satisfied with the absolute bare minimum. If I have to do it, I want to do it right. I want to do it the best I can. "Better than nothing" isn't good enough. In fact, this already holds true for the alt-text police and the Mastodon HOA. And if I have to describe my images, I want to be way ahead of them. I don't want my image descriptions to suddenly be sub-standard because Mastodon has kept raising its standards, but I haven't.

Perfectioning image descriptions on Hubzilla because Mastodon requires them

So I've spent these past years educating myself about alt-text and image descriptions and researching about what Mastodon users require, what Mastodon users want, what Mastodon users don't want. "Mastodon users" because, seriously, Mastodon is pretty much the only place in the Fediverse where image descriptions matter. Or used to be until two months ago when people who were on Mastodon and Instagram suddenly started escaping from Meta Platforms, Inc. and flocking to Pixelfed and brought Mastodon's accessibility rules with them. But that was only two months ago.

Until then, just about nobody outside Mastodon knew or cared about image accessibility. But if your content has a chance of ending up on some Mastodon timeline, it's pretty much mandatory.

For my research, I've had lots of sources of information. Various hashtags on mastodon.social, sometimes also on Mastodon instances targetted at disabled users. A whole number of webpages and blog articles about alt-text and image descriptions, even though they're mostly geared towards static commercial websites or HTML-formatted blogs. These webpages and articles keep contradicting what's happening on Mastodon, and what Mastodon users tend to love, but then again, they also contradict each other, e.g. mentioning a person's race vs never mentioning a person's race because that's racist.

Still, I've learned a lot.

Explanations matter

Early on during my research, I've learned that Mastodon users love alt-text as sources of additional information on the topic of the image. What really got me to re-think my way of describing images was this toot by @Stormgren:

Stormgren schrieb den folgenden Beitrag Mon, 03 Jul 2023 18:20:44 +0200

Alt-text doesn't just mean accessibility in terms of low -vision or no-vision end users.

Done right also means accessibility for people who might not know much about your image's subject matter either.

This is especially true for technical topic photos. By accurately describing what's in the picture, you give context to non-technical viewers, or newbies, as to exactly what they're looking at, and even describe how it works or why it matters.

#AltText is not just an alternate description to a visual medium, it's an enhancement for everyone if you do it right.

(So I can't find any prior post of mine on this, so if I've actually made this point before, well, you got to hear a version of it again.)

This mattered. A lot. Because I didn't want to post real-life cat photos.

What I wanted to post images about, and what I actually had already started posting images about, was 3-D virtual worlds. Super-obscure 3-D virtual worlds based on OpenSimulator. Something that only maybe one in 200,000 Fediverse users knows anything about. Everyone else, so I figured, would require tons of explanation to be able to understand my image posts.

Something else I took away from this is that it's better to give people all information they may not have but need to understand your image post on a silver platter right away than to expect them to look up even only some of this information themselves. This goes doubly when you know that they won't even find that information.

And in fact, I also learned that neurodivergent people may require more extensive explanations than neurotypical people. I've actually had a neurodivergent Mastodon user thank me for an absolutely monstrous image description with an absolute info dump of explanations in it.

That early already, I also learned other things. For example, the rule that alt-text must not exceed 200 characters (or even only 125 characters) does not exist on Mastodon. Instead, many Mastodon users love long, extensive, detailed image descriptions. Well, if they want them, they shall have them.

Inclusion means the same chances for everyone, no matter how

Another thing was that accessibility means that blind or visually-impaired users must have the very same chances to experience an image as sighted users. There's no arguing that, I guess.

Again, my images are about 3-D virtual worlds. A kind of 3-D virtual worlds that have been referred to by the buzzword "metaverse" or "metaverses" as early as 2007, 14 years before Zuckerberg used that word, and I can prove that. What my images show may be referred to as "metaverse". Not an artistic impression of a metaverse, not an AI rendering of a metaverse, but actually existing, living, breathing 3-D virtual worlds that are being referred to as "metaverses", or whose network is being referred to as "the metaverse".

In short: The metaverse exists. And my images show it. They show the actually, really existing metaverse.

Chances are that this has sighted users on the edges of their seats in excitement. What do they do then? Only look at what matters in the image within the context of the post? Of course not! Instead, they go exploring this exciting, recently discovered, whole new universe by taking in all the big and small details in the image.

Now allow me to re-iterate: Accessibility means that blind or visually-impaired users must have the very same chances to experience an image as sighted users. Anything else equals ableism.

In this context, it means that blind or visually-impaired users must have the very same chances to take in all the big and small details in my images just the same as sighted users. But they can't see them. So I have to sit down and describe all the details in the image to them. And explain them, of course, if they don't understand them.

This was when my image descriptions really grew to titanic sizes.

Text transcripts in edge-cases

Also, if there is text in an image, it must be transcribed verbatim. My understanding of that is that any and all text that is anywhere within the borders of an image must be transcribed absolutely identically to the original. Now, I'm not talking about what's called flattened copy. I'm talking about signs or posters or logos or box art or the like strewn about the image.

This rule does not cover a number of edge-cases, though. For example, it does not cover text which is unreadable in the image as it is posted, but which whoever posts the image can source and thereby transcribe verbatim nonetheless. I figured that if no exception is explicitly made, then there is no exception for such text. If it can be transcribed, it must be transcribed. So my first long image description ended up with 22 individual text transcripts of various lengths.

The location where an image was taken, so I learned, should be mentioned, too, unless very good reasons speak against it. None of these reasons apply to my images from virtual worlds. Cue me not only mentioning where an image is from, but explaining it in more and more characters so that everyone understands it with no prior special knowledge required.

Alt-text for the image plus image description in the post

Now the question was where to put all that information. Into the post itself (which would inflate it to ridiculous lengths)? Into the alt-text like everyone on Mastodon (for which it would be too long at several thousand characters)? Into a reply (which would be inconvenient and stay entirely unnoticed by Mastodon users)?

I actually did a test run in the shape of (content warning: eye contact, alcohol) a thread with four times the same post, but different ways of describing the image in it. I cross-posted it to Lemmy to have people vote on which is the best place to describe an image. The poll wasn't really representative although describing the image in the post itself technically won: It only got five votes.

Then I got into an argument with @Deborah, a user with a physical disability that makes it impossible for her to access alt-text. Money quote from way down this comment thread:

Deborah schrieb den folgenden Beitrag Mon, 10 Jul 2023 23:30:45 +0200

@Jupiter Rowland

I have a disability that prevents me from seeing alt text, because on almost all platforms, seeing the alt requires having a screenreader or working hands. If you post a picture, is there info that you want somebody who CAN see the picture but DOESN’T have working hands to know? Write that in visible text. If you put that in the alt, you are explicitly excluding people like me.

But you don’t have to overthink it. The description of the image itself is a simple concept.

Her point was clear: Information only available in alt-text, but neither in the post text body nor in the image itself, is inaccessible and therefore lost to all those who cannot access alt-text. And not everyone can access alt-text.

From elsewhere, I learned that externally linked information is inconvenient and potentially inaccessible. Conclusion: It's generally best to provide all information necessary for understanding a post in the post itself.

Okay, so when I describe and explain my images at the level of detail that I deem necessary (and that level is sky high), the description, complete with included explanations, must go into the post text body.

But then there was the alt-text police as a department of the Mastodon HOA. At least some of them demand every image in the Fediverse have a useful (as in sufficiently detailed and accurate) alt-text. Yes, even if there's already an image description in the post. That is, if they can't see the image description in the post right off the bat because the post is hidden behind a summary and CW, then of course that image description doesn't count.

When I realised that, I started describing all my original images twice. Once with a long and detailed image description in the post itself. Once with a shorter, but still extensive image description in the alt-text. That said, I often had to cut text transcripts because multiple dozen text transcripts wouldn't all fit into a maximum of 1,500 characters, especially not including the descriptions necessary for people to even find them.

The longest image descriptions in the Fediverse

Even though Hubzilla doesn't really have a character limit for alt-text, I have to limit myself because I've long since learned that Mastodon cuts alt-texts from outside off at the 1,500-character mark if they're longer than 1,500 characters. I was told that Misskey does the same. And I figured that all their respective forks do that, too. Also, (content warning: eye contact, alcohol) even Hubzilla can only display so many characters of alt-text.

By the way: I've yet to see anyone on Mastodon sanction someone for an alt-text that's too long or too detailed. As long as I don't, I'll suppose it doesn't happen. I'll suppose that Mastodon is perfectly happy with 1,500-character alt-texts.

As for my long descriptions, they've started humongous already. The first one was already most likely the longest image description in the Fediverse. It started out at over 11,000 characters that took me three hours. More research, one edit and another round about two hours later, it stood at (content warning: eye contact, alcohol) over 13,000 characters. Then came (content warning: eye contact, food, tobacco, weapons, elevated point of view) over 37,000 characters for one image. Then came (content warning: eye contact, food) over 40,000 characters for one image. Then came over 60,000 characters for one image which took me two whole days, morning to evening. And I even consider that image obsolete and insufficient nowadays.

My image descriptions have grown so long that they have headlines, often on multiple levels.

I barely get any feedback for these image descriptions, but it doesn't look like I get more criticism than praise.

More things to learn

Still, my learning process continued.

I learned that it's actually good to have both an alt-text and a long image description.

I learned that "picture of", "image of" and "photo of" are very bad style. The photograph, more specifically the digital photograph, can be considered a default nowadays. All other media, however, must be mentioned. So if I have a shaded, but not ray-traced digital 3-D rendering, I have to say so.

I learned that people may want to know about the camera position (its height above the ground in particular) and orientation. And so I mention both if there are enough references in the image to justify them. (For example, it probably isn't worth mentioning that the camera is oriented a few degrees south of west if the background of the image is plain white and absolutely featureless otherwise.)

I learned that technical terms and jargon which not everyone may be familiar with must be avoided if anyhow possible and explained if not. Since I can't constantly write around any and all terms specific to virtual worlds in general and OpenSim in particular in everyday words, this alone added thousands upon thousands of characters of explanations to my long image descriptions.

I learned that abbreviations of any kind must be avoided like the plague if anyhow possible. At the very least, they must be spelled out in full and then associated with their own abbreviation at first. Then, initialisms that are spelled letter by letter must have their latters separated with full stops whereas acronyms that are pronounced like words must not have these full stops.

For example, the proper way to use "OAR" is by first spelling it out: "OpenSimulator Archive," followed by the initialism in parentheses with full stops between the letters, "(O.A.R.)", then explaining what an OAR is without requiring any prior knowledge except for what has already been explained in the image description. Later on, the initialism "O.A.R." may be used unless it is so far down the image description that it has to be spelled out again to remind people of what it means.

I learned that not only the sizes of objects in the image belong into the image description, but they must be explained using references to either what else is in the image or to what people are easily familiar with like the size of body parts. I only have one image post that actually takes care of this.

I learned that not only colours belong into the image description, but they must be described using a small handful of basic colours plus brightness plus saturation. After all, what does a blind person know what sepia or fuchsia or olive green or Prussian blue or Burgundy red is?

How to describe what amounts to people

I learned that, when describing a person or anything in the image akin to a person (avatar, non-player character, animesh figure, static figure etc.), their gender must never be mentioned unless either the gender is clearly demonstrated, or it has been verified, or it is clearly and definitely known otherwise. I do mention the gender of my own avatar because I've created him, and I've defined him as male. I also mention @Juno Rowland's gender because I've created her, too, and I've defined her as female.

Similarly, I learned that, when describing a person (etc. etc.), their race or ethnicity must never be mentioned although some sources say otherwise. Rather, the skin tone must be mentioned, more specifically, one out of five (dark, medium-dark, medium, medium-light, light; I may expand this to nine with another four tones in-between).

Beyond that, I learned that the following may belong into the description of a person:

the identity of the person if it's of importance
age range or apparent age range
body shape
hair colour
hair length
hair shape/texture
hairstyle
ditto for a beard if applicable
clothes, footwear, jewellery and accessories including:
- shape and fit including details such as sleeve length
- colours
- colour patterns and probably also print patterns
- materials (although blind/visually-impaired users commented in this thread that I do not have to describe what certain fabric weave patterns, e.g. herringbone, look like; I'm still not sure whether I should or should not explain what the toe cap of a specific full brogue shoe looks like)
- designer/creator (although not always, plus it'd lead to even more massive explanations if I were to tell people who made which clothes on which avatar because I'd have to explain and often research who they are in the first place)
facial expression, posture and gestures

Normally, I avoid having anything that looks like a person in my images. One reason was the eye contact trigger which I can't avoid for Mastodon users when posting on Hubzilla. I've since moved all my image posting to (streams) which can actually make Mastodon blank out all images in a post. The other reason is because it's an enormous effort to not only describe an avatar appropriately, but to also explain avatars in OpenSim in general and that one avatar in particular so that everyone understands the image and the post.

I prefer portraits nowadays, especially with a background that's as minimalist as possible. It's enough of an effort to describe the avatar; it'd go completely out of hand if I also had to describe the entire surrounding.

Similarly, I still avoid having realistic-looking buildings in my images. And the last non-realistic building required up over 40,000 characters of description alone. Granted, it's both gigantic and highly complex, not to mention that it mostly has glass panes for walls so that much of its inside is visible. But if there was a realistic-looking building in one of my images, I'd first have to spend days researching English architectural terms, and then I'd have to explain all these terms for the laypeople who will actually come across the image.

Details on alt-texts and image descriptions themselves

Style-wise, I learned that alt-text must not contain line breaks. Hubzilla and (streams) themselves showed me that using the quotation marks on your keyboard in alt-text is a bad idea, too. I've never done the former, and I've stopped doing the latter.

Other things of which I know that they don't belong into alt-text are hashtags, hyperlinks (both embedded links and plain URLs), emoji, other Unicode characters which screen readers won't necessarily parse as letters, digits or interpunction, image credits and license information (the latter two must be in plain sight if they are required).

I learned that screen readers may or may not misinterpret all-caps. It's actually better to transcribe text in all-caps without the all-caps and mention in the image description that the original text is in all-caps.

I also learned recently that, in fact, extremely long image descriptions are not necessarily bad, not even in social media. Fortunately, I don't have to deal with a character limit for my posts. Only two limits matter: 1,500 characters for alt-text because Mastodon cuts off everything that goes beyond. And the 100,000 characters of post length above which Mastodon probably rejects posts altogether, rendering the image description efforts that has inflated the posts beyond these sizes moot. And yes, I can post over 100,000 characters on Hubzilla.

Whenever I learned something new, I declared all my image descriptions in which I hadn't implemented it yet obsolete.

Lack of information and lack of communication lead to assumptions

But I still don't know enough.

I dare say I have learned a whole lot. But it's all more or less basic concepts. What I still don't know enough about is what the general guidelines are when it comes to applying these concepts to such extremely obscure edge-cases as my virtual world images.

What I'm doing is a first. People have posted virtual world images in the Fediverse before, even on Mastodon. It happens all the time. A few have also added basic alt-text. But I'm the first to actually put some thought into how this has to be done if it shall be done properly.

I've still got a lot of unanswered questions. And truth be told, if one person tries to answer them, they're still unanswered. I don't need one answer from one person. I need a general community consensus for an answer.

When I ask a question on how to do a certain thing when describing my virtual world images, I don't want one person to answer. I don't want one person to answer, another person to answer the exact opposite and these two persons not knowing about each other either. But this is Mastodon's standard modus operandi because people generally can't see who has replied what to any given post before or after them.

I want to ask that question, and then I want one or several dozen people to discuss that question. Not only with me, but even more with each other. Mastodon semi-veterans who live and breathe Mastodon's accessibility culture, non-Mastodon Fediverse veterans who can wrap their minds around having no character limit, accessibility experts, actually blind or visually-impaired people, neurodivergent people who need the kind of info dumps that I provide. Plus myself as the only one of the bunch who knows a thing about these virtual worlds.

Alas, this is impossible in the Fediverse. Mastodon is too limited and too much "microblogging" and "social media" for it. And while the Fediverse does have places that are much better for discussions, Mastodon users don't congregate there, and those who do populate these places know nothing about accessibility or Mastodon's culture.

It doesn't help that I rarely post images, and when I do, I rarely get any feedback. The reasons why I rarely post images are because describing them has become such a huge effort, and many motives that I'd like to show are too complex to realistically be described appropriately.

So I have to replace a whole lot of detail knowledge with assumptions based on what I know, what I've experienced, what I can deduce from all this and what appears logical to me.

In fact, a lot of what I do in my image descriptions is based on the idea that if I mention something in an image, and a blind or visually-impaired person doesn't know what it looks like, chances are that they want to know it, and that they expect to be told what it looks like. No matter what it might be. However, it's my assumption that this may actually extend to just about everything in an image.

Educate yourself, then educate others

Still, I think I have amassed a whole lot of knowledge about alt-text in particular and image descriptions in general.

Now I'd really like to share this knowledge with others. For one, I want to give them a chance to have a very big edge over the ever-increasing requirements for good enough alt-text. Besides, I actually keep seeing people making the same glaring mistakes over and over and over again.

On top of all that, what few image description guides there are that touch the Fediverse only cover Mastodon. There are none that disregard post character limits, or at least that don't take triple-digit character limits as a given. This is the only guide for long image descriptions in social media/social networks that deals with long image descriptions at all. But even that guide doesn't take into account the possibility of being able to post tens of thousands, hundreds of thousands, millions of characters at once. Being able to describe one image in over 60,000 characters and then drop these over 60,000 characters all into the same post as the image itself. Not needing the extra capacity of alt-text for information that doesn't fit into the post itself anymore.

Most other image description guides are only for static websites and/or blogs. However, not only does most of the Fediverse not have any HTML, and not only do SEO keywords make no sense in Fediverse alt-text, but Mastodon's alt-text culture which dominates the whole Fediverse is vastly different from what accessibility experts and Web designers have cooked up for static websites and blogs. On a website, an alt-text of 300 characters is way too long. On Mastodon, it may actually be too short.

So, after studying various alt-text guides as well as Mastodon's alt-text culture, I felt the need to write down what I know so that others can learn from it.

For a while, I have toyed with the idea of starting yet another wiki on this Hubzilla channel of mine. This would be my first wiki about the Fediverse after two wikis about OpenSim, one of which is still very incomplete. The downside might be that it'd be hard to find unless I keep pointing individual people to it.

Then, a few months ago, I discovered a draft for an article on image descriptions in the Join the Fediverse Wiki. So I started expanding it last week with what I know. But as it seems, most of the information I've added isn't even welcome in the wiki. This was probably meant to be a rather simple alt-text guide.

Now I may actually create that wiki on Hubzilla. What I want to write won't fit onto one single page anyway, and I need some more structure.

I'm also wondering what to do with the knowledge I've gathered about content warnings, including a massive list of things that people may be warned about.

Image description meta

Link zur Quelle

"Nothing About Us Without Us", only it still is without them most of the time

2024-08-21 22:53:28

Profil ansehen

Jupiter Rowland

jupiter_rowland@hub.netzgemeinde.eu

When disabled Fediverse users demand participation in accessibility discussions, but there are no discussions in the first place, and they themselves don't even seem to be available to give accessibility feedback

Artikel ansehen

Zusammenfassung ansehen

"Nothing about us without us" is the catchphrase used by disabled accessibility activists who are trying to get everyone to get accessibility right. It means that non-disabled people should stop assuming what disabled people need. Instead, they should listen to what disabled people say they need and then give them what they need.

Just like accessibility in the digital realm in general, this is not only targetted at professional Web or UI developers. This is targetted at any and all social media users just as well.

However, this would be a great deal easier if it wasn't still "without them" all the time.

Lack of necessary feedback

Alt-text and image descriptions are one example and one major issue. How are we, the sighted Fediverse users, supposed to know what blind or visually-impaired users really need and where they need it if we never get any feedback? And we never get any feedback, especially not from blind or visually-impaired users.

Granted, only sighted users can call us out for an AI-generated alt-text that's complete rubbish because non-sighted users can't compare the alt-text with the image.

But non-sighted users could tell us whether they're sufficiently informed or not. They could tell us whether they're satisfied with an image description mentioning that something is there, or whether they need to be told what this something looks like. They could tell us which information in an image description is useful to them, which isn't, and what they'd suggest to improve its usefulness.

They could tell us whether certain information that's in the alt-text right now should better go elsewhere, like into the post. They could tell us whether extra information needed to understand a post or an image should be given right in the post that contains the image or through an external link. They could tell us whether they need more explanation on a certain topic displayed in an image, or whether there is too much explanation that they don't need. (Of course, they should take into consideration that some of us do not have a 500-character limit.)

Instead, we, the sighted users who are expected to describe our images, receive no feedback for our image descriptions at all. We're expected to know exactly what blind or visually-impaired users need, and we're expected to know it right off the bat without being told so by blind or visually-impaired users. It should be crystal-clear how this is impossible.

What are we supposed to do instead? Send all our image posts directly to one or two dozen people who we know are blind and ask for feedback? I'm pretty sure I'm not the only one who considers this very bad style, especially in the long run, not to mention no guarantee for feedback.

So with no feedback, all we can do is guess what blind or visually-impaired users need.

Common alt-text guides are not helpful

Now you might wonder why all this is supposed to be such a big problem. After all, there are so many alt-text guides out there on the Web that tell us how to do it.

Yes, but here in the Fediverse, they're all half-useless.

The vast majority of them is written for static Web sites, either scientific or technological or commercial. Some include blogs, again, either scientific or technological or commercial. The moment they start relying on captions and HTML code, you know you can toss them because they don't translate to almost anything in the Fediverse.

What few alt-text guides are written for social media are written for the huge corporate American silos. 𝕏, Facebook, Instagram, LinkedIn. They do not translate to the Fediverse which has its own rules and cultures, not to mention much higher character limits, if any.

Yes, there are one or two guides on how to write alt-text in the Fediverse. But they're always about Mastodon, only Mastodon and nothing but Mastodon. They're written for Mastodon's limitations, especially only 500 characters being available in the post itself versus a whopping 1,500 characters being available in the alt-text. And they're written with Mastodon's culture in mind which, in turn, is influenced by Mastodon's limitations.

Elsewhere in the Fediverse than Mastodon, you have much more possibilities. You have thousands of characters to use up in your post. Or you don't have any character limit to worry about at all. You don't have all means at hand that you have on a static HTML Web site. Even the few dozen (streams) users who can use HTML in social media posts don't have the same influence on the layout of their posts as Web designers have on Web sites. Still, you aren't bound to Mastodon's self-imposed limitations.

And yet, those Mastodon alt-text guides tell you you have to squeeze all information into the alt-text as if you don't have any room in the post. Which, unlike most Mastodon users, you do have.

It certainly doesn't help that the Fediverse's entire accessibility culture comes from Mastodon, concentrates on Mastodon and only takes Mastodon into consideration with all its limitations. Apparently, if you describe an image for the blind and the visually-impaired, you must describe everything in the alt-text. After all, according to the keepers of accessibility in the Fediverse, how could you possibly describe anything in a post with a 500-character limit?

In addition, all guides always only cover their specific standard cases. For example, an image description guide for static scientific Web sites only covers images that are typical for static scientific Web sites. Graphs, flowcharts, maybe a portrait picture. Everything else is an edge-case that is not covered by the guide.

There are even pictures that are edge-cases for all guides and not sufficiently or not at all covered by any of them. When I post an image, it's practically always such an edge-case, and I can only guess what might be the right way to describe it.

Discussing Fediverse accessibility is necessary...

Even single feedback for image descriptions, media descriptions, transcripts etc. is not that useful. If one user gives you feedback, you know what this one user needs. But you do not know what the general public with disabilities needs. And what actually matters is just that. Another user might give you wholly different feedback. Two different blind users are likely to give you two different feedbacks on the same image description.

What is needed so direly is open discussion about accessibility in the Fediverse. People gathering together, talking about accessibility, exchanging experiences, exchanging ideas, exchanging knowledge that others don't have. People with various disabilities and special requirements in the Fediverse need to join this discussion because "nothing about them without them", right? After all, it is about them.

And people from outside of Mastodon need to join, too. They are needed to give insights on what can be done on Pleroma and Akkoma, on Misskey, Firefish, Iceshrimp, Sharkey and Catodon, on Friendica, Hubzilla and (streams), on Lemmy, Mbin, PieFed and Sublinks and everywhere else. They are needed to combat the rampant Mastodon-centricism and keep reminding the Mastodon users that the Fediverse is more than Mastodon. They are needed to explain that the Fediverse outside of Mastodon offers many more possibilities than Mastodon that can be used for accessibility. They are needed for solutions to be found that are not bound to Mastodon's restrictions. And they need to learn about there being accessibility in the Fediverse in the first place because it's currently pretty much a topic that only exists on Mastodon.

There are so many things I'd personally like to be discussed and ideally brought to a consensus of sorts. For example:

Explaining things in the alt-text versus explaining things in the post versus linking to external sites for explanations.
The first is the established Mastodon standard, but any information exclusively available in the alt-text is inaccessible to people who can't access alt-text, including due to physical disabilities.
The second is the most accessible, but it inflates the post, and it breaks with several Mastodon principles (probably over 500 characters, explanation not in the alt-text).
The third is the easiest way, but it's inconvenient because image and explanation are in different places.
What if an image needs a very long and very detailed visual description, considering the nature of the image and the expected audience?
Describe the image only in the post (inflates the post, no image description in the alt-text, breaks with Mastodon principles, impossible on vanilla Mastodon)?
Describe it externally and link to the description (no image description anywhere near the image, image description separated from the image, breaks with Mastodon principles, requires an external space to upload the description)?
Only give a description that's short enough for the alt-text regardless (insufficient description)?
Refrain from posting the image altogether?
Seeing as all text in an image must always be transcribed verbatim, what if text is unreadable for some reason, but whoever posts the image can source the text and transcribe it regardless?
Must it be transcribed because that's what the rule says?
Must it be transcribed so that even sighted people know what's written there?
Must it not be transcribed?

...but it's nigh-impossible

Alas, this won't happen. Ever. It won't happen because there is no place in the Fediverse where it could sensibly happen.

Now you might wonder what gives me that idea. Can't this just be done on Mastodon?

No, it can't. Yes, most participants would be on Mastodon. And Mastodon users who don't know anything else keep saying that Mastodon is sooo good for discussions.

But seriously, if you've experienced anything in the Fediverse that isn't purist microblogging like Mastodon, you've long since have come to the realisation that when it comes to discussions with a certain number of participants, Mastodon is utter rubbish. It has no concept of conversations whatsoever. It's great as a soapbox. But it's outright horrible at holding a discussion together. How are you supposed to have a meaningful discussion with 30 people if you burn through most of your 500-character limit mentioning the other 29?

Also, Mastodon has another disadvantage: Almost all participants will be on Mastodon themselves. Most of them will not know anything about the Fediverse outside Mastodon. At least some will not even know that the Fediverse is more than just Mastodon. And that one poor sap from Friendica will constantly try to remind people that the Fediverse is not only Mastodon, but he'll be ignored because he doesn't always mention all participants in this thread. Because mentioning everyone is not necessary on Friendica itself, so he isn't used to it, but on Mastodon, it's pretty much essential.

Speaking of Friendica, it'd actually be the ideal place in the Fediverse for such discussions because users from almost all over the place could participate. Interaction between Mastodon users and Friendica forums is proven to work very well. A Friendica forum can be moderated, unlike a Guppe group. And posts and comments reach all members of a Friendica forum without mass-mentioning.

The difficulty here would be to get it going in the first place. Ideally, the forum would be set up and run by an experienced Friendica user. But accessibility is not nearly as much an issue on Friendica as it is on Mastodon, so the difficult part would be to find someone who sees the point in running a forum about it in the first place. A Mastodon user who does see the point, on the other hand, would have to get used to something that is a whole lot different from Mastodon while being a forum admin/mod.

Lastly, there is the Threadiverse, Lemmy first and foremost. But Lemmy has its own issues. For starters, it's federated with the Fediverse outside the Threadiverse only barely and not quite reliably, and the devs don't seem to be interested in non-Threadiverse federation. So everyone interested in the topic would need a Lemmy account, and many refuse to make a second Fediverse account for whichever purpose.

If it's on Lemmy, it will naturally attract Lemmy natives. But the vast majority of these have come from Reddit straight to Lemmy. Just like most Mastodon users know next to nothing about the Fediverse outside Mastodon, most Lemmy users know next to nothing about the Fediverse outside Lemmy. I am on Lemmy, and I've actually run into that wall. After all, they barely interact with the Fediverse outside Lemmy. As accessibility isn't an issue on Lemmy either, they know nothing about accessibility on top of knowing nothing about most of the Fediverse.

So instead of having meaningful discussions, you'll spend most of the time educating Lemmy users about the Fediverse outside Lemmy, about Mastodon culture, about accessibility and about why all this should even matter to people who aren't professional Web devs. And yes, you'll have to do it again and again for each newcomer who couldn't be bothered to read up on any of this in older threads.

In fact, I'm not even sure if any of the Threadiverse projects are accessible to blind or visually-impaired users in the first place.

Lastly, I've got some doubts that discussing accessibility in the Fediverse would even possible if there was a perfectly appropriate place for it. I mean, this Fediverse neither gives advice on accessibility within itself beyond linking to always the same useless guides, nor does it give feedback on accessibility measures such as image descriptions.

People, disabled or not, seem to want perfect accessibility. But nobody wants to help others improve their contributions to accessibility in any way. It's easier and more convenient to expect things to happen by themselves.

Fediverse Image description meta

Link zur Quelle

AI superiority at describing images, not so alleged?

2024-09-06 16:04:42

Profil ansehen

Jupiter Rowland

jupiter_rowland@hub.netzgemeinde.eu

Could it be that AI can image-describe circles even around me? And that the only ones whom my image descriptions satisfy are Mastodon's alt-text police?

Artikel ansehen

Zusammenfassung ansehen

I think I've reached a point at which I only describe my images for the alt-text police anymore. At which I only keep ramping up my efforts, increasing my description quality and declaring all my previous image descriptions obsolete and hopelessly outdated only to have an edge over those who try hard to enforce quality image descriptions all over the Fediverse, and who might stumble upon one of my image posts in their federated timelines by chance.

For blind or visually-impaired people, my image descriptions ought to fall under "better than nothing" at best and even that only if they have the patience to have them read out in their entirety. But even my short descriptions in the alt-text are too long already, often surpassing the 1,000-character mark. And they're often devoid of text transcripts due to lack of space.

My full descriptions that go into the post are probably mostly ignored, also because nobody on Mastodon actually expects an image description anywhere that isn't alt-text. But on top of that, they're even longer. Five-digit character counts, image descriptions longer than dozens of Mastodon toots, are my standard. Necessarily so because I can't see it being possible to sufficiently describe the kind of images I post in significantly fewer characters, so I can't help it.

But it isn't only about the length. It also seems to be about quality. As @Robert Kingett, blind points out in this Mastodon post and this blog post linked in the same Mastodon post, blind or visually-impaired people generally prefer AI-written image descriptions over human-written image descriptions. Human-written image descriptions lack effort, they lack details, they lack just about everything. AI descriptions, in comparison, are highly detailed and informative. And I guess when they talk about human-written image descriptions, they mean all of them.

I can upgrade my description style as often as I want. I can try to make it more and more inclusive by changing the way I describe colours or dimensions as much as I want. I can spend days describing one image, explaining it, researching necessary details for the description and explanation. But from a blind or visually-impaired user's point of view, AI can apparently write circles around that in every way.

AI can apparently describe and even explain my own images about an absolutely extreme niche topic more accurately and in greater detail than I can. In all details that I describe and explain, with no exception, plus even more on top of that.

If I take two days to describe an image in over 60,000 characters, it's still sub-standard in terms of quality, informativity and level of detail. AI only takes a few seconds to generate a few hundred characters which apparently describe and explain the self-same image at a higher quality, more informatively and at a higher level of detail. It may even be able to not only identify where exactly an image was created, even if that place is only a few days old, but also explain that location to someone who doesn't know anything about virtual worlds within no more than 100 characters or so.

Whenever I have to describe an image, I always have to throw someone in front of the bus. I can't perfectly satisfy everyone all the same at the same time. My detailed image descriptions are too long for many people, be it people with a short attention span, be it people with little time. But if I shortened them dramatically, I'd have to cut information to the disadvantage of not only neurodiverse people who need things explained in great detail, but also blind or visually-impaired users who want to explore a new and previously unknown world through only that one image, just like sighted people can let their eyes wander around the image.

Apparently, AI is fully capable of actually perfectly satisfying everyone all the same at the same time because it can convey more information with only a few hundred characters.

Sure, AI makes mistakes. But apparently, AI still makes fewer mistakes than I do.

#AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #AI #AIVsHuman #HumanVsAI

Fediverse Image description meta

Link zur Quelle

Why descriptions for images from virtual worlds have to be so long and extensive

2023-12-16 12:26:14

Profil ansehen

Jupiter Rowland

jupiter_rowland@hub.netzgemeinde.eu

Whenever I describe a picture from a virtual world, the description grows far beyond everyone's wildest imaginations in size; here's why

Artikel ansehen

Zusammenfassung ansehen

I rarely post pictures from virtual worlds anymore. I'd really like to show them to Fediverse users, including those who know nothing about them. But I rarely do that anymore. Not in posts, not even in Hubzilla articles.

That's because pictures posted in the Fediverse need image descriptions. Useful and sufficiently informative image descriptions. And to my understanding, even Hubzilla articles are part of the Fediverse because they're part of Hubzilla. So the exact same rules apply to them that apply to posts. Including image descriptions being an absolute requirement.

And a useful and sufficiently informative image description for a picture from a virtual world has to be absolutely massive. In fact, it can't be done within Mastodon's limits. Not even the 1,500 characters offered for alt-text are enough. Not nearly.

Over the last 12 or 13 months, I've developed my image-describing style, and it's still evolving. However, this also means my image descriptions get more and more detailed with more and more explanations, and so they tend to grow longer and longer.

My first attempt at writing a detailed, informative description for a picture from a virtual world was in November, 2022. It started at over 11,000 characters already and grew beyond 13,000 characters a bit later when I re-worked it and added a missing text transcript. Most recently, I've broken the 40,000-character barrier, also because I've raised my standards to describing pictures within pictures within a picture. I've taken over 13 hours to describe one single picture twice already.

I rarely get any feedback for my image descriptions. But I sometimes have to justify their length, especially to sighted Fediverse users who don't care for virtual worlds.

Sure, most people who come across my pictures don't care for virtual worlds at all. But most people who come across my pictures are fully sighted and don't require any image descriptions. It's still good manners to provide them.

And there may pretty well be people who are very excited about and interested in virtual worlds, especially if it's clear that these are actually existing, living, breathing virtual worlds and not some cryptobro's imagination. And they may want to know everything about these worlds. But they know nothing. They look at the pictures, but they can't figure out from looking at the pictures what these pictures show. Nothing that's in these pictures is really familiar to them.

So when describing a picture from a virtual world, one must never assume that anything in the picture is familiar to the on-looker. In most cases, it is not.

Also, one might say that only sighted people are interested in virtual worlds because virtual worlds are a very visual medium and next to impossible to navigate without eyesight. Still, blind or visually-impaired people may be just as fascinated by virtual worlds as sighted people. And they may be at least just as curious which means they may require even more description and explanation. They want to know what everything looks like, but since they can't see it for themselves, they have to be told.

All this is why pictures from virtual worlds require substantially more detailed and thus much, much longer descriptions than real-life photographs.

The medium

The wordiness of descriptions for images from virtual worlds starts with the medium. It's generally said that image descriptions must not start with "Picture of" or "Image of". Some even say that mentioning the medium, i.e. "Photograph of", is too much.

Unless it is not a digital photograph. And no, it isn't always a digital photograph.

It can just as well be a digitised analogue photograph, film grain and all. It can be a painting. It can be a sketch. It can be a graph. It can be a screenshot of a social media post. It can be a scanned newspaper page.

Or it can be a digital rendering.

Technically speaking, virtual world images are digital renderings. But just writing "digital rendering" isn't enough.

If I only wrote "digital rendering", people would think of spectacular, state-of-the-art, high-resolution digital art with ray-tracing and everything. Like stills from Cyberpunk 2077 for which the graphics settings were temporarily cranked up to levels at which the game becomes unplayable, just to show off. Or like promotional pictures from a Pixar film. Or like the stuff we did in PoV-Ray back in the day. When the single-core CPU ran on full blast for half an hour, but the outcome was a gorgeous screen-sized 3-D picture.

But images from the virtual worlds I frequent are nothing like this. Ray-tracing isn't even an option. It's unavailable. It's technologically impossible. So there is no fancy ray-tracing with fully reflective surfaces and whatnot. But there are shaders with stuff like ambient occlusion.

So where other people may or may not write "photograph", I have to write something like "digital 3-D rendering created using shaders, but without ray-tracing".

The location

If you think that was wordy, think again. Mentioning the location is much worse. And mentioning the location is mandatory in this case.

I mean, it's considered good style to always write where a picture was taken unless, maybe, it was at someone's home, or the location of something is classified.

In real life, that's easy. And except for digital art, digitally generated graphs and pictures of text, almost all pictures in the Fediverse were taken in real-life.

In real life, you can often get away with name-dropping. Most people know at least roughly what "Times Square" refers to. Or "Piccadilly Circus". Or "Monument Valley". Or "Stonehenge". There is no need to break down where these places are. It can be considered common knowledge.

In fact, you get away even more easily with name-dropping landmarks without telling where they are. White House. Empire State Building. Tower Bridge. Golden Gate Bridge. Mount Fuji. Eiffel Tower. Taj Mahal. Sydney Opera House which, admittedly, name-drops its rough location, just like the Hollywood Marquee. All these are names that should ring a bell.

But you can't do that in virtual worlds. In no virtual world can you do that. Not even in Roblox which has twice as many users as Germany has citizens. Much less in worlds running on OpenSim, all of which combined are estimated to have fewer than 50,000 unique monthly users. Whatever "unique" means, considering that many users have more than one avatar in more than one of these worlds.

Such tiny user numbers mean that there are even more people who don't use these worlds, who therefore are completely unfamiliar with these worlds. Who, in fact, don't even know these worlds exist. I'm pretty sure there isn't a single paid Metaverse expert of any kind who has ever even heard of OpenSimulator. They know Horizons, they know The Sandbox, they know Decentraland, they know Rec Room, they know VRchat, they know Roblox and so forth, they may even be aware that Second Life is still around, but they've never in their lives heard of OpenSim. It's that obscure.

So imagine I just name-dropped...

[...] the Sendalonde Community Library.

What'd that tell you?

It'd tell you nothing. You wouldn't know what that is. I couldn't blame you. Right off the bat, I know only two other Fediverse users who definitely know that building because I was there with them. Maybe a few more have been there before. Definitely much fewer than 50. Likely fewer than 20. Out of millions.

Okay, let's add where it is.

[...] the Sendalonde Community Library in Sendalonde.

Does that help?

No, it doesn't. If you don't know the Sendalonde Community Library, you don't know what and where Sendalonde is either. That place is only known for its spectacular library building.

And you've probably never heard of a real-life place with that name. Of course you haven't. That place isn't in real life.

So I'd have to add some more information.

[...] the Sendalonde Community Library in Sendalonde in the Discovery Grid.

What's the Discovery Grid? And what's a grid in this context, and why is it called a grid?

Well, then I have to get even wordier.

[...] the Sendalonde Community Library in Sendalonde in the Discovery Grid which is a 3-D virtual world based on OpenSimulator.

Nobody, absolutely nobody writes that much about a real-life location. Ever.

And still, while you know that I'm talking about a place in a virtual world and what that virtual world is based on, while this question is answered, it raises a new question: What is OpenSimulator?

I wouldn't blame you for asking that. Again, even Metaverse experts don't know OpenSimulator. I'm pretty sure that nobody in the Open Metaverse Interoperability Group, in the Open Metaverse Alliance and at the Open Metaverse Foundation has ever heard of OpenSim. The owners and operators of most existing virtual worlds have never heard of OpenSim except those of Second Life, Overte and maybe a few others. Most Second Life users, present and past, have never heard of OpenSim. Most users of most other virtual worlds, present and past, have never heard of OpenSim.

And billions of people out there believe that Zuckerberg has invented "The Metaverse", and that his virtual worlds are actually branded "Metaverse® ("Metaverse" is a registered trademark of Meta Platforms, Inc. All rights reserved.)" Hardly anyone knows that the term "metaverse" was coined by Neal Stephenson in his cyberpunk novel Snow Crash which, by the way, has inspired Philip Rosedale to create Second Life. And nobody knows that the term "metaverse" has been part of the regular OpenSim users' vocabulary since before 2010. Because nobody knows OpenSim.

And that's why I can't just name-drop "OpenSimulator" either. I have to explain even that.

[...] the Sendalonde Community Library in Sendalonde in the Discovery Grid which is a 3-D virtual world based on OpenSimulator.

OpenSimulator (official website and wiki), OpenSim in short, is a free and open-source platform for 3-D virtual worlds that uses largely the same technology as the commercial virtual world Second Life.

That alone would be more than your typical cat picture alt-text.

But it'd create misconceptions, namely of OpenSim being another walled-garden, headset-only VR platform that has jumped upon the "Metaverse" bandwagon. Because that's what people know about virtual worlds, if anything. So that's what they automatically assume. And that's wrong.

I'd have to keep that from happening by telling people that OpenSim is as decentralised and federated as the Fediverse, only that it even predates Laconi.ca, not to mention Mastodon. Okay, and it only federates with itself and some of its own forks because OpenSim doesn't run on a standardised protocol, and nobody else has ever created anything compatible.

[...] the Sendalonde Community Library in Sendalonde in the Discovery Grid which is a 3-D virtual world based on OpenSimulator.

OpenSimulator (official website and wiki), OpenSim in short, is a free and open-source platform for 3-D virtual worlds that uses largely the same technology as the commercial virtual world Second Life. It was launched as early as 2007, and most of it became a network of federated, interconnected worlds when the Hypergrid was introduced in 2008. It is accessed through client software running on desktop or laptop computers, so-called "viewers". It doesn't require a virtual reality headset, and it actually doesn't support virtual reality headsets.

This is more than most alt-texts on Mastodon. Only this.

But it still leaves one question unanswered: "Discovery Grid? What's that? Why is it called a grid? What's a grid in this context?"

So I'd have to add yet another paragraph.

[...] the Sendalonde Community Library in Sendalonde in the Discovery Grid which is a 3-D virtual world based on OpenSimulator.

OpenSimulator (official website and wiki), OpenSim in short, is a free and open-source platform for 3-D virtual worlds that uses largely the same technology as the commercial virtual world Second Life. It was launched as early as 2007, and most of it a network of federated, interconnected worlds when the Hypergrid was introduced in 2008. It is accessed through client software running on desktop or laptop computers, so-called "viewers". It doesn't require a virtual reality headset, and it actually doesn't support virtual reality headsets.

Just like Second Life's virtual world, worlds based on OpenSim are referred to as "grids" because they are separated into square fields of 256 by 256 metres, so-called "regions". These regions can be empty and inaccessible, or there can be a "simulator" or "sim" running in them. Only these sims count a the actual land area of a grid. It is possible to both look into neighbouring sims and move your avatar across sim borders unless access limitations prevent this.

I'm well past 1,000 characters now. Other people paint entire pictures with words with that many characters. I need them only to explain where a picture was taken. But this should answer all immediate questions and make clear what kind of place the picture shows.

The main downside, apart from the length which for some Mastodon users is too long for a full image description already, is that this will be outdated, should the decision be made to move Sendalonde to another grid again.

And I haven't even started actually describing the image. Blind or visually-impaired users still don't know what it actually shows.

The actual content of the image

If this was a place in real life, I might get away with name-dropping the Sendalonde Community Library and briefly mention that there are some trees around it, and there's a body of water in the background. It'd be absolutely sufficient.

But such a virtual place is something that next to nobody is familiar with. Non-sighted people even less because they're even more unlikely to visit virtual worlds. That's a highly visual medium and usually not really inclusive for non-sighted users.

So if I only name-dropped the Sendalonde Community Library, mentioned where it is located and explained what OpenSim is, I wouldn't be done. There would be blind or visually-impaired people inquiring, "Okay, but what does it look like?" Ditto people with poor internet for whom the image doesn't load.

Sure they would. Because they honestly wouldn't know what it looks like. Because even the sighted users with poor internet have never seen it before. But they would want to know.

So I'd have to tell them. Not doing so would be openly ableist.

And no, one sentence isn't enough. This is a very large, highly complex, highly detailed building and not just a box with a doorway and a sign on it. Besides, remember that we're talking about a virtual world. Architecture in virtual worlds is not bound to the same limits and laws and standards and codes as in real life. Just about everything is possible. So absolutely nothing can ever be considered "a given" and therefore unnecessary to be mentioned.

Now, don't believe that blind or visually-impaired people will limit their "What does it look like?" to the centre-piece of the picture. If you mention something being there, they want to know what it looks like. Always. Regardless of whether or not they used to be sighted, they still don't know what whatever you've mentioned looks like specifically in a virtual world. And, again, it's likely that they don't know what it looks like at all.

Thus, if I mention it, I have to describe it. Always. All of it.

There are exactly two exceptions. One, if something is fully outside the borders of the image. Two, if something is fully covered up by something else. And I'm not even entirely sure about the latter case.

Sometimes, a visual description isn't even enough. Sometimes, I can mention that something is somewhere in the picture. I can describe what that something looks like in all details. But people still don't know what it is.

I can mention that there's an OpenSimWorld beacon standing somewhere. I can describe its looks with over a 1,000 words and so much accuracy that an artist could make a fairly accurate drawing of it just from my description.

But people, the artist included, still would not know what an OpenSimWorld beacon is in the first place, nor what it's there for.

So I have to explain what an OpenSimWorld beacon is and what it does.

Before I can do that, I first have to explain what OpenSimWorld is. And that won't be possible with a short one-liner. OpenSimWorld is a very multi-purpose website. Explaining it will require a four-digit number of characters.

Only after I'm done explaining OpenSimWorld, I can start explaining the beacon. And the beacon is quite multi-functional itself. On top of that, I'll have to explain the concept of teleporting around in OpenSim, especially from grid to grid through the Hypergrid.

This is why I generally avoid having OSW beacons in my pictures.

Teleporters themselves aren't quite as bad, but they, too, require lots and lots of words. They have to be described. If there's a picture on them, maybe one that shows a preview of the chosen destination, that picture has to be described. All of a sudden, I have an entire second image to write a description for. And then I have to explain what that teleporter is, what it does, how it works, how it's operated. They don't know teleporters because there are no teleporters in real life.

At least I might not have to explain to them which destinations the teleporter can send an avatar to. The people who need all these descriptions and explanations won't have any use for this particular information because they don't even know the destinations in the first place. And describing and explaining each of these destinations, especially if they're over a hundred, might actually be beyond the scope of an image description, especially since these destinations usually aren't shown in the image itself.

Avatars

Just like in-world objects, avatars and everything more or less similar require detailed, extensive descriptions and explanations. People need to understand how avatars work in this kind of world, and of course, blind or visually-impaired people want to know what these avatars look like. Each and every last one of them. Again, how are they supposed to know otherwise?

I'm not quite sure whether or not it's smart to always give the names of all avatars in the image. It's easy to find them out, but when writing a description especially for a party picture with dozens of avatars in it, associating the depictions of avatars in the image with identities has to be done right away before even only one of these avatars leaves the location.

One thing that needs to be explained right afterwards is how avatars are built. In the cases of Second Life and OpenSim, this means explaining that they usually aren't "monobloc" avatars that can't be modified in-world. Instead, they are modular, put together from lots of elements, usually starting with a mesh body that "replaces" the default system body normally rendered by the viewer, continuing with a skin texture, an eye texture and a shape with over 80 different parameters and ending with clothes and accessories. Of course, this requires an explanation on what "mesh" is, why it's special and when and why it was introduced.

OpenSim also supports script-controlled NPCs which require their own explanation, including that NPCs don't exist in Second Life, and how they work in OpenSim. Animesh exists both in Second Life and OpenSim and requires its own explanation again.

After these explanations, the actual visual description can begin. And it can and has to be every bit as extensive and detailed as for everything else in the picture.

The sex of an avatar does not have to be avoided in the description, at least not in Second Life and OpenSim. There, you basically only have two choices: masculine men and feminine women. Deviating from that is extremely difficult, so next to nobody does that. What few people actually declare their avatars trans describe them as such in the profile. The only other exception are "women with a little extra". All other avatars can safely be assumed to be cis, and their visual sex can be used to describe them.

In virtual worlds, especially Second Life and OpenSim, there is no reason not to mention the skin tone either. A skin is just that: a skin. It can be replaced with just about any other skin on any avatar without changing anything else. It doesn't even have to be natural. It can be snow white, or it can be green, or it can be the grey of bare metal. In fact, in order to satisfy those who are really curious about virtual worlds, it's even necessary to mention if a skin is photo-realistic and has highlights and shades baked on.

Following that comes a description of what the avatar wears, including the hairstyle. This, too, should go into detail and mention things that are so common in real life that nobody would waste a thought about them, such as whether there are creases or crinkles on a piece of clothing at all, and if so, if they're actually part of the 3-D model or only painted on.

Needless to say that non-standard avatars, e.g. dragons, require the same amount of detail when describing them.

Now, only describing what an avatar looks like isn't enough. It's also necessary to describe what the avatar does which means a detailed description of its posture and mimics. Just about all human avatars in Second Life and OpenSim have support for mimics, even though they usually wear a neutral, non-descript expression. But even that needs to be mentioned.

Text transcripts

They say that if there's text somewhere in a picture, it has to be transcribed verbatim in the image description. However, there is no definite rule for text that is too small to be readable, partially obscured by something in front of it or only partially within the borders of the image.

Text not only appears in screenshots of social media posts, photographs of news articles and the like. It may appear in all kinds of photographs, and it may just as well appear in digital renderings from 3-D virtual worlds. It can be on posters, it can be on billboards, it can be on big and small signs, it can be on store marquees, it can be printed on people's clothes, it can be anywhere.

Again, the basic rule is: If there's text, it has to be transcribed.

Now you might say that transcribing illegible text is completely out of question. It can't be read anyway, so it can't be transcribed either. Case closed.

Not so fast. It's true that this text can't be read in the picture. But that one picture is not necessarily the only source for the text in question. If the picture is a real-life photograph, the last resort would be to go back to where the picture was taken, look around more closely and transcribe the bits of text from there.

Granted, that's difficult if whatever a text was on is no longer there, e.g. if it was printed on a T-shirt. And yes, that's extra effort, too much of an effort if you're at home posting pictures which you've taken during your overseas vacation. Flying back there just to transcribe text is completely out of question.

This is a non-issue for pictures from virtual worlds. In most cases, you can always go back to where you've taken a picture, take closer looks at signs and posters and so on, look behind trees or columns or whatever is standing in front of a sign and partly covering it and easily transcribe everything. Or you take the picture and write the description without even leaving first. You can stay there until you're done describing and transcribing everything.

At least Second Life and OpenSim also allow you to move your camera and therefore your vision independently from your avatar. That really makes it possible to take very close looks at just about everything, regardless of whether or not you can get close enough with your avatar.

There are only four cases in which in-world text does not have to be fully transcribed. One, it's incomplete in-world; in this case, transcribe what is there. Two, it's illegible in-world, for example due to a too low texture resolution or texture quality; that's bad luck. Three, it is fully obscured, either because it is fully covered by something else, or because it's on a surface completely facing away from the camera. And four, it isn't even within the borders of the image.

In all other cases, there is no reason not to transcribe text. The text being illegible in the picture isn't. In fact, that's rather a reason to transcribe it: Even sighted people need help figuring out what's written there. And people who are super-curious about virtual worlds and want to know everything about them will not stop at text.

But why?

Yeah, that's all tough, I know. And I can understand if you as the audience are trying to weasel yourself out of having to read such a massive image description. You're trying to get me to not write that much. You're trying to find a situation in which writing so much is not justified, not necessary. Or better yet, enough situations that they become the majority, that a full description ends up only necessary in extremely niche edge cases that you hope to never come across. You want to see that picture, but you want to see it without thousands or tens of thousands of worlds of description.

Let me tell you something: There is no such situation. There is no context in which such a huge image description wouldn't be necessary.

The picture could be part of a post of someone who has visited that place and wants to tell everyone about it. Even if the post itself has only got 200 characters.

The picture could be part of an announcement of an event that's planned to take place there.

The picture could be part of a post from that very event. Or about the event after it has happened.

The picture could be part of an interview with the owners.

The picture could be part of a post about famous locations in OpenSim.

The picture could be part of a post about the Discovery Grid.

The picture could be part of a post about OpenSim in general.

The picture could be part of a post or thread about 6 obscure virtual worlds that you've probably never heard of, and number 4 is really awesome.

The picture could be part of a post about virtual architecture.

The picture could be part of a post about the concept of virtual libraries or bookstores.

The picture could be part of a recommendation of cool OpenSim places to visit.

It doesn't matter. All these cases require the full image description with all its details. And so do all those which I haven't mentioned. There will always be someone coming across the post with the picture who needs the description.

See, I've learned something about the Fediverse. You can try to limit your target audience. But you can't limit your actual audience.

It'd be much easier for me if I could only post to people who know OpenSim and actually lock everyone else out. But I can't.

On the World-Wide Web, it's easy. If you write something niche, pretty much only people interested in that niche will see your content because only they will even look for content like yours. Content has to be actively dug out, but in doing so, you can pick what kind of content to dig out.

In the Fediverse, anyone will come across stuff that they know nothing about, whether they're interested in it or not. Even elaborate filtering of the personal timeline isn't fail-safe. And then there are local and federated timelines on which all kinds of stuff appear.

No matter how hard you try to only post to a specific audience, it is very likely that someone who knows nothing about your topic will see your post on the federated timeline on mastodon.social. It's rude to keep clueless casuals from following you, even though all they do is follow absolutely everyone because they need that background noise of uninteresting stuff on their personal timeline that they have on X due to The Algorithm. And it's impossible to keep people from boosting your posts to clueless casuals, whether these people are your own connections and familiar with your topic, or they've discovered your most recent post on their federated timeline.

You can't keep clueless casuals who need an extensive image description to understand your picture from coming across it. Neither can you keep blind or visually-impaired users who need an image description to even experience the picture in the first place from coming across it.

Neither, by the way, can you keep those who demand everyone always give a sufficient description for any image from coming across yours. And I'm pretty sure that some of them not only demand that from those whom they follow, but from those whose picture posts they come across on the local or federated timelines as well.

Sure, you can ignore them. You can block them. You can flip them the imaginary or actual bird. And then you can refuse to give a description altogether. Or you can put a short description into the alt-text which actually doesn't help at all. Sure, you can do that. But then you have to cope with having a Fediverse-wide reputation as an ableist swine.

The only alternative is to do it right and give those who need a sufficiently informative image description what they need. In the case of virtual worlds, as I've described, "sufficiently informative" starts at several thousand words.

And this is why pictures from virtual worlds always need extremely long image descriptions.