Why descriptions for images from virtual worlds have to be so long and extensive

Jupiter Rowland

jupiter_rowland@hub.netzgemeinde.eu

Duothematic channel. Primary topic is virtual worlds/OpenSim, secondary topic is the Fediverse beyond Mastodon. This channel is NOT about real life!

Kategorien

Alles
(streams)
Fediverse
How to
Image description meta
Image descriptions
Metaverse in general
OpenSim
Second Life

Why descriptions for images from virtual worlds have to be so long and extensive

2023-12-16 12:26:14

Profil ansehen

Jupiter Rowland

jupiter_rowland@hub.netzgemeinde.eu

Whenever I describe a picture from a virtual world, the description grows far beyond everyone's wildest imaginations in size; here's why

Artikel ansehen

Zusammenfassung ansehen

I rarely post pictures from virtual worlds anymore. I'd really like to show them to Fediverse users, including those who know nothing about them. But I rarely do that anymore. Not in posts, not even in Hubzilla articles.

That's because pictures posted in the Fediverse need image descriptions. Useful and sufficiently informative image descriptions. And to my understanding, even Hubzilla articles are part of the Fediverse because they're part of Hubzilla. So the exact same rules apply to them that apply to posts. Including image descriptions being an absolute requirement.

And a useful and sufficiently informative image description for a picture from a virtual world has to be absolutely massive. In fact, it can't be done within Mastodon's limits. Not even the 1,500 characters offered for alt-text are enough. Not nearly.

Over the last 12 or 13 months, I've developed my image-describing style, and it's still evolving. However, this also means my image descriptions get more and more detailed with more and more explanations, and so they tend to grow longer and longer.

My first attempt at writing a detailed, informative description for a picture from a virtual world was in November, 2022. It started at over 11,000 characters already and grew beyond 13,000 characters a bit later when I re-worked it and added a missing text transcript. Most recently, I've broken the 40,000-character barrier, also because I've raised my standards to describing pictures within pictures within a picture. I've taken over 13 hours to describe one single picture twice already.

I rarely get any feedback for my image descriptions. But I sometimes have to justify their length, especially to sighted Fediverse users who don't care for virtual worlds.

Sure, most people who come across my pictures don't care for virtual worlds at all. But most people who come across my pictures are fully sighted and don't require any image descriptions. It's still good manners to provide them.

And there may pretty well be people who are very excited about and interested in virtual worlds, especially if it's clear that these are actually existing, living, breathing virtual worlds and not some cryptobro's imagination. And they may want to know everything about these worlds. But they know nothing. They look at the pictures, but they can't figure out from looking at the pictures what these pictures show. Nothing that's in these pictures is really familiar to them.

So when describing a picture from a virtual world, one must never assume that anything in the picture is familiar to the on-looker. In most cases, it is not.

Also, one might say that only sighted people are interested in virtual worlds because virtual worlds are a very visual medium and next to impossible to navigate without eyesight. Still, blind or visually-impaired people may be just as fascinated by virtual worlds as sighted people. And they may be at least just as curious which means they may require even more description and explanation. They want to know what everything looks like, but since they can't see it for themselves, they have to be told.

All this is why pictures from virtual worlds require substantially more detailed and thus much, much longer descriptions than real-life photographs.

The medium

The wordiness of descriptions for images from virtual worlds starts with the medium. It's generally said that image descriptions must not start with "Picture of" or "Image of". Some even say that mentioning the medium, i.e. "Photograph of", is too much.

Unless it is not a digital photograph. And no, it isn't always a digital photograph.

It can just as well be a digitised analogue photograph, film grain and all. It can be a painting. It can be a sketch. It can be a graph. It can be a screenshot of a social media post. It can be a scanned newspaper page.

Or it can be a digital rendering.

Technically speaking, virtual world images are digital renderings. But just writing "digital rendering" isn't enough.

If I only wrote "digital rendering", people would think of spectacular, state-of-the-art, high-resolution digital art with ray-tracing and everything. Like stills from Cyberpunk 2077 for which the graphics settings were temporarily cranked up to levels at which the game becomes unplayable, just to show off. Or like promotional pictures from a Pixar film. Or like the stuff we did in PoV-Ray back in the day. When the single-core CPU ran on full blast for half an hour, but the outcome was a gorgeous screen-sized 3-D picture.

But images from the virtual worlds I frequent are nothing like this. Ray-tracing isn't even an option. It's unavailable. It's technologically impossible. So there is no fancy ray-tracing with fully reflective surfaces and whatnot. But there are shaders with stuff like ambient occlusion.

So where other people may or may not write "photograph", I have to write something like "digital 3-D rendering created using shaders, but without ray-tracing".

The location

If you think that was wordy, think again. Mentioning the location is much worse. And mentioning the location is mandatory in this case.

I mean, it's considered good style to always write where a picture was taken unless, maybe, it was at someone's home, or the location of something is classified.

In real life, that's easy. And except for digital art, digitally generated graphs and pictures of text, almost all pictures in the Fediverse were taken in real-life.

In real life, you can often get away with name-dropping. Most people know at least roughly what "Times Square" refers to. Or "Piccadilly Circus". Or "Monument Valley". Or "Stonehenge". There is no need to break down where these places are. It can be considered common knowledge.

In fact, you get away even more easily with name-dropping landmarks without telling where they are. White House. Empire State Building. Tower Bridge. Golden Gate Bridge. Mount Fuji. Eiffel Tower. Taj Mahal. Sydney Opera House which, admittedly, name-drops its rough location, just like the Hollywood Marquee. All these are names that should ring a bell.

But you can't do that in virtual worlds. In no virtual world can you do that. Not even in Roblox which has twice as many users as Germany has citizens. Much less in worlds running on OpenSim, all of which combined are estimated to have fewer than 50,000 unique monthly users. Whatever "unique" means, considering that many users have more than one avatar in more than one of these worlds.

Such tiny user numbers mean that there are even more people who don't use these worlds, who therefore are completely unfamiliar with these worlds. Who, in fact, don't even know these worlds exist. I'm pretty sure there isn't a single paid Metaverse expert of any kind who has ever even heard of OpenSimulator. They know Horizons, they know The Sandbox, they know Decentraland, they know Rec Room, they know VRchat, they know Roblox and so forth, they may even be aware that Second Life is still around, but they've never in their lives heard of OpenSim. It's that obscure.

So imagine I just name-dropped...

[...] the Sendalonde Community Library.

What'd that tell you?

It'd tell you nothing. You wouldn't know what that is. I couldn't blame you. Right off the bat, I know only two other Fediverse users who definitely know that building because I was there with them. Maybe a few more have been there before. Definitely much fewer than 50. Likely fewer than 20. Out of millions.

Okay, let's add where it is.

[...] the Sendalonde Community Library in Sendalonde.

Does that help?

No, it doesn't. If you don't know the Sendalonde Community Library, you don't know what and where Sendalonde is either. That place is only known for its spectacular library building.

And you've probably never heard of a real-life place with that name. Of course you haven't. That place isn't in real life.

So I'd have to add some more information.

[...] the Sendalonde Community Library in Sendalonde in the Discovery Grid.

What's the Discovery Grid? And what's a grid in this context, and why is it called a grid?

Well, then I have to get even wordier.

[...] the Sendalonde Community Library in Sendalonde in the Discovery Grid which is a 3-D virtual world based on OpenSimulator.

Nobody, absolutely nobody writes that much about a real-life location. Ever.

And still, while you know that I'm talking about a place in a virtual world and what that virtual world is based on, while this question is answered, it raises a new question: What is OpenSimulator?

I wouldn't blame you for asking that. Again, even Metaverse experts don't know OpenSimulator. I'm pretty sure that nobody in the Open Metaverse Interoperability Group, in the Open Metaverse Alliance and at the Open Metaverse Foundation has ever heard of OpenSim. The owners and operators of most existing virtual worlds have never heard of OpenSim except those of Second Life, Overte and maybe a few others. Most Second Life users, present and past, have never heard of OpenSim. Most users of most other virtual worlds, present and past, have never heard of OpenSim.

And billions of people out there believe that Zuckerberg has invented "The Metaverse", and that his virtual worlds are actually branded "Metaverse® ("Metaverse" is a registered trademark of Meta Platforms, Inc. All rights reserved.)" Hardly anyone knows that the term "metaverse" was coined by Neal Stephenson in his cyberpunk novel Snow Crash which, by the way, has inspired Philip Rosedale to create Second Life. And nobody knows that the term "metaverse" has been part of the regular OpenSim users' vocabulary since before 2010. Because nobody knows OpenSim.

And that's why I can't just name-drop "OpenSimulator" either. I have to explain even that.

[...] the Sendalonde Community Library in Sendalonde in the Discovery Grid which is a 3-D virtual world based on OpenSimulator.

OpenSimulator (official website and wiki), OpenSim in short, is a free and open-source platform for 3-D virtual worlds that uses largely the same technology as the commercial virtual world Second Life.

That alone would be more than your typical cat picture alt-text.

But it'd create misconceptions, namely of OpenSim being another walled-garden, headset-only VR platform that has jumped upon the "Metaverse" bandwagon. Because that's what people know about virtual worlds, if anything. So that's what they automatically assume. And that's wrong.

I'd have to keep that from happening by telling people that OpenSim is as decentralised and federated as the Fediverse, only that it even predates Laconi.ca, not to mention Mastodon. Okay, and it only federates with itself and some of its own forks because OpenSim doesn't run on a standardised protocol, and nobody else has ever created anything compatible.

[...] the Sendalonde Community Library in Sendalonde in the Discovery Grid which is a 3-D virtual world based on OpenSimulator.

OpenSimulator (official website and wiki), OpenSim in short, is a free and open-source platform for 3-D virtual worlds that uses largely the same technology as the commercial virtual world Second Life. It was launched as early as 2007, and most of it became a network of federated, interconnected worlds when the Hypergrid was introduced in 2008. It is accessed through client software running on desktop or laptop computers, so-called "viewers". It doesn't require a virtual reality headset, and it actually doesn't support virtual reality headsets.

This is more than most alt-texts on Mastodon. Only this.

But it still leaves one question unanswered: "Discovery Grid? What's that? Why is it called a grid? What's a grid in this context?"

So I'd have to add yet another paragraph.

[...] the Sendalonde Community Library in Sendalonde in the Discovery Grid which is a 3-D virtual world based on OpenSimulator.

OpenSimulator (official website and wiki), OpenSim in short, is a free and open-source platform for 3-D virtual worlds that uses largely the same technology as the commercial virtual world Second Life. It was launched as early as 2007, and most of it a network of federated, interconnected worlds when the Hypergrid was introduced in 2008. It is accessed through client software running on desktop or laptop computers, so-called "viewers". It doesn't require a virtual reality headset, and it actually doesn't support virtual reality headsets.

Just like Second Life's virtual world, worlds based on OpenSim are referred to as "grids" because they are separated into square fields of 256 by 256 metres, so-called "regions". These regions can be empty and inaccessible, or there can be a "simulator" or "sim" running in them. Only these sims count a the actual land area of a grid. It is possible to both look into neighbouring sims and move your avatar across sim borders unless access limitations prevent this.

I'm well past 1,000 characters now. Other people paint entire pictures with words with that many characters. I need them only to explain where a picture was taken. But this should answer all immediate questions and make clear what kind of place the picture shows.

The main downside, apart from the length which for some Mastodon users is too long for a full image description already, is that this will be outdated, should the decision be made to move Sendalonde to another grid again.

And I haven't even started actually describing the image. Blind or visually-impaired users still don't know what it actually shows.

The actual content of the image

If this was a place in real life, I might get away with name-dropping the Sendalonde Community Library and briefly mention that there are some trees around it, and there's a body of water in the background. It'd be absolutely sufficient.

But such a virtual place is something that next to nobody is familiar with. Non-sighted people even less because they're even more unlikely to visit virtual worlds. That's a highly visual medium and usually not really inclusive for non-sighted users.

So if I only name-dropped the Sendalonde Community Library, mentioned where it is located and explained what OpenSim is, I wouldn't be done. There would be blind or visually-impaired people inquiring, "Okay, but what does it look like?" Ditto people with poor internet for whom the image doesn't load.

Sure they would. Because they honestly wouldn't know what it looks like. Because even the sighted users with poor internet have never seen it before. But they would want to know.

So I'd have to tell them. Not doing so would be openly ableist.

And no, one sentence isn't enough. This is a very large, highly complex, highly detailed building and not just a box with a doorway and a sign on it. Besides, remember that we're talking about a virtual world. Architecture in virtual worlds is not bound to the same limits and laws and standards and codes as in real life. Just about everything is possible. So absolutely nothing can ever be considered "a given" and therefore unnecessary to be mentioned.

Now, don't believe that blind or visually-impaired people will limit their "What does it look like?" to the centre-piece of the picture. If you mention something being there, they want to know what it looks like. Always. Regardless of whether or not they used to be sighted, they still don't know what whatever you've mentioned looks like specifically in a virtual world. And, again, it's likely that they don't know what it looks like at all.

Thus, if I mention it, I have to describe it. Always. All of it.

There are exactly two exceptions. One, if something is fully outside the borders of the image. Two, if something is fully covered up by something else. And I'm not even entirely sure about the latter case.

Sometimes, a visual description isn't even enough. Sometimes, I can mention that something is somewhere in the picture. I can describe what that something looks like in all details. But people still don't know what it is.

I can mention that there's an OpenSimWorld beacon standing somewhere. I can describe its looks with over a 1,000 words and so much accuracy that an artist could make a fairly accurate drawing of it just from my description.

But people, the artist included, still would not know what an OpenSimWorld beacon is in the first place, nor what it's there for.

So I have to explain what an OpenSimWorld beacon is and what it does.

Before I can do that, I first have to explain what OpenSimWorld is. And that won't be possible with a short one-liner. OpenSimWorld is a very multi-purpose website. Explaining it will require a four-digit number of characters.

Only after I'm done explaining OpenSimWorld, I can start explaining the beacon. And the beacon is quite multi-functional itself. On top of that, I'll have to explain the concept of teleporting around in OpenSim, especially from grid to grid through the Hypergrid.

This is why I generally avoid having OSW beacons in my pictures.

Teleporters themselves aren't quite as bad, but they, too, require lots and lots of words. They have to be described. If there's a picture on them, maybe one that shows a preview of the chosen destination, that picture has to be described. All of a sudden, I have an entire second image to write a description for. And then I have to explain what that teleporter is, what it does, how it works, how it's operated. They don't know teleporters because there are no teleporters in real life.

At least I might not have to explain to them which destinations the teleporter can send an avatar to. The people who need all these descriptions and explanations won't have any use for this particular information because they don't even know the destinations in the first place. And describing and explaining each of these destinations, especially if they're over a hundred, might actually be beyond the scope of an image description, especially since these destinations usually aren't shown in the image itself.

Avatars

Just like in-world objects, avatars and everything more or less similar require detailed, extensive descriptions and explanations. People need to understand how avatars work in this kind of world, and of course, blind or visually-impaired people want to know what these avatars look like. Each and every last one of them. Again, how are they supposed to know otherwise?

I'm not quite sure whether or not it's smart to always give the names of all avatars in the image. It's easy to find them out, but when writing a description especially for a party picture with dozens of avatars in it, associating the depictions of avatars in the image with identities has to be done right away before even only one of these avatars leaves the location.

One thing that needs to be explained right afterwards is how avatars are built. In the cases of Second Life and OpenSim, this means explaining that they usually aren't "monobloc" avatars that can't be modified in-world. Instead, they are modular, put together from lots of elements, usually starting with a mesh body that "replaces" the default system body normally rendered by the viewer, continuing with a skin texture, an eye texture and a shape with over 80 different parameters and ending with clothes and accessories. Of course, this requires an explanation on what "mesh" is, why it's special and when and why it was introduced.

OpenSim also supports script-controlled NPCs which require their own explanation, including that NPCs don't exist in Second Life, and how they work in OpenSim. Animesh exists both in Second Life and OpenSim and requires its own explanation again.

After these explanations, the actual visual description can begin. And it can and has to be every bit as extensive and detailed as for everything else in the picture.

The sex of an avatar does not have to be avoided in the description, at least not in Second Life and OpenSim. There, you basically only have two choices: masculine men and feminine women. Deviating from that is extremely difficult, so next to nobody does that. What few people actually declare their avatars trans describe them as such in the profile. The only other exception are "women with a little extra". All other avatars can safely be assumed to be cis, and their visual sex can be used to describe them.

In virtual worlds, especially Second Life and OpenSim, there is no reason not to mention the skin tone either. A skin is just that: a skin. It can be replaced with just about any other skin on any avatar without changing anything else. It doesn't even have to be natural. It can be snow white, or it can be green, or it can be the grey of bare metal. In fact, in order to satisfy those who are really curious about virtual worlds, it's even necessary to mention if a skin is photo-realistic and has highlights and shades baked on.

Following that comes a description of what the avatar wears, including the hairstyle. This, too, should go into detail and mention things that are so common in real life that nobody would waste a thought about them, such as whether there are creases or crinkles on a piece of clothing at all, and if so, if they're actually part of the 3-D model or only painted on.

Needless to say that non-standard avatars, e.g. dragons, require the same amount of detail when describing them.

Now, only describing what an avatar looks like isn't enough. It's also necessary to describe what the avatar does which means a detailed description of its posture and mimics. Just about all human avatars in Second Life and OpenSim have support for mimics, even though they usually wear a neutral, non-descript expression. But even that needs to be mentioned.

Text transcripts

They say that if there's text somewhere in a picture, it has to be transcribed verbatim in the image description. However, there is no definite rule for text that is too small to be readable, partially obscured by something in front of it or only partially within the borders of the image.

Text not only appears in screenshots of social media posts, photographs of news articles and the like. It may appear in all kinds of photographs, and it may just as well appear in digital renderings from 3-D virtual worlds. It can be on posters, it can be on billboards, it can be on big and small signs, it can be on store marquees, it can be printed on people's clothes, it can be anywhere.

Again, the basic rule is: If there's text, it has to be transcribed.

Now you might say that transcribing illegible text is completely out of question. It can't be read anyway, so it can't be transcribed either. Case closed.

Not so fast. It's true that this text can't be read in the picture. But that one picture is not necessarily the only source for the text in question. If the picture is a real-life photograph, the last resort would be to go back to where the picture was taken, look around more closely and transcribe the bits of text from there.

Granted, that's difficult if whatever a text was on is no longer there, e.g. if it was printed on a T-shirt. And yes, that's extra effort, too much of an effort if you're at home posting pictures which you've taken during your overseas vacation. Flying back there just to transcribe text is completely out of question.

This is a non-issue for pictures from virtual worlds. In most cases, you can always go back to where you've taken a picture, take closer looks at signs and posters and so on, look behind trees or columns or whatever is standing in front of a sign and partly covering it and easily transcribe everything. Or you take the picture and write the description without even leaving first. You can stay there until you're done describing and transcribing everything.

At least Second Life and OpenSim also allow you to move your camera and therefore your vision independently from your avatar. That really makes it possible to take very close looks at just about everything, regardless of whether or not you can get close enough with your avatar.

There are only four cases in which in-world text does not have to be fully transcribed. One, it's incomplete in-world; in this case, transcribe what is there. Two, it's illegible in-world, for example due to a too low texture resolution or texture quality; that's bad luck. Three, it is fully obscured, either because it is fully covered by something else, or because it's on a surface completely facing away from the camera. And four, it isn't even within the borders of the image.

In all other cases, there is no reason not to transcribe text. The text being illegible in the picture isn't. In fact, that's rather a reason to transcribe it: Even sighted people need help figuring out what's written there. And people who are super-curious about virtual worlds and want to know everything about them will not stop at text.

But why?

Yeah, that's all tough, I know. And I can understand if you as the audience are trying to weasel yourself out of having to read such a massive image description. You're trying to get me to not write that much. You're trying to find a situation in which writing so much is not justified, not necessary. Or better yet, enough situations that they become the majority, that a full description ends up only necessary in extremely niche edge cases that you hope to never come across. You want to see that picture, but you want to see it without thousands or tens of thousands of worlds of description.

Let me tell you something: There is no such situation. There is no context in which such a huge image description wouldn't be necessary.

The picture could be part of a post of someone who has visited that place and wants to tell everyone about it. Even if the post itself has only got 200 characters.

The picture could be part of an announcement of an event that's planned to take place there.

The picture could be part of a post from that very event. Or about the event after it has happened.

The picture could be part of an interview with the owners.

The picture could be part of a post about famous locations in OpenSim.

The picture could be part of a post about the Discovery Grid.

The picture could be part of a post about OpenSim in general.

The picture could be part of a post or thread about 6 obscure virtual worlds that you've probably never heard of, and number 4 is really awesome.

The picture could be part of a post about virtual architecture.

The picture could be part of a post about the concept of virtual libraries or bookstores.

The picture could be part of a recommendation of cool OpenSim places to visit.

It doesn't matter. All these cases require the full image description with all its details. And so do all those which I haven't mentioned. There will always be someone coming across the post with the picture who needs the description.

See, I've learned something about the Fediverse. You can try to limit your target audience. But you can't limit your actual audience.

It'd be much easier for me if I could only post to people who know OpenSim and actually lock everyone else out. But I can't.

On the World-Wide Web, it's easy. If you write something niche, pretty much only people interested in that niche will see your content because only they will even look for content like yours. Content has to be actively dug out, but in doing so, you can pick what kind of content to dig out.

In the Fediverse, anyone will come across stuff that they know nothing about, whether they're interested in it or not. Even elaborate filtering of the personal timeline isn't fail-safe. And then there are local and federated timelines on which all kinds of stuff appear.

No matter how hard you try to only post to a specific audience, it is very likely that someone who knows nothing about your topic will see your post on the federated timeline on mastodon.social. It's rude to keep clueless casuals from following you, even though all they do is follow absolutely everyone because they need that background noise of uninteresting stuff on their personal timeline that they have on X due to The Algorithm. And it's impossible to keep people from boosting your posts to clueless casuals, whether these people are your own connections and familiar with your topic, or they've discovered your most recent post on their federated timeline.

You can't keep clueless casuals who need an extensive image description to understand your picture from coming across it. Neither can you keep blind or visually-impaired users who need an image description to even experience the picture in the first place from coming across it.

Neither, by the way, can you keep those who demand everyone always give a sufficient description for any image from coming across yours. And I'm pretty sure that some of them not only demand that from those whom they follow, but from those whose picture posts they come across on the local or federated timelines as well.

Sure, you can ignore them. You can block them. You can flip them the imaginary or actual bird. And then you can refuse to give a description altogether. Or you can put a short description into the alt-text which actually doesn't help at all. Sure, you can do that. But then you have to cope with having a Fediverse-wide reputation as an ableist swine.

The only alternative is to do it right and give those who need a sufficiently informative image description what they need. In the case of virtual worlds, as I've described, "sufficiently informative" starts at several thousand words.

And this is why pictures from virtual worlds always need extremely long image descriptions.