By yuser. cyrillic texts (russian) not displayed correctly (wrong codepage)

wolverine2710 · October 25, 2013

Originally yuser posted this in the Nexus MO forum. So that this discussion will not get buried on the nexus forums I created the question here and will respond to it here.

Just installed Mod Organizer. It is much better than NMM, thanks. At this moment i have one issue: all cyrillic texts (russian) - included readmes, savegame descriptions - seems as having wrong codepage. I.e. it seems as cp1252 or even iso8859-1 instead of system codepage cp1251. I couldn't find similar problems in this forum or official forum and bugtracker.

How i can fix it? Is it possible to change codepage or select another font or something else?

PS: sorry for my English.

I don't really have an answer but perhaps/hopefully others here can help you further.

Am I correct to assume you are using a russian Skyrim version?

Can you perhaps show us what is going wrong with a screenshot?

Just upload it to one of those 'image' sites and provide the link here.

Tannin · October 27, 2013

It's basically a general limitation of character encoding.

It is simply not possible to reliably determine the encoding of a file unless it's unicode and contains a BOM (Byte Order Mark).

The internal viewer of MO tries to detect that BOM. If it's missing it assumes UTF-8 (which is another unicode encoding that is often produced without BOM).

It currently doesn't support the ancient notion of country specific character sets at all.

The best approach here would be: If you have a file that doesn't show correctly in the MO window, send the author a mail that the 21st century says hi and looks forward to meeting him.

Yeah, I know this isn't realistic, we're going to live with DOS tech for a couple more decades, so I'll look into it.

yuser · October 26, 2013

wolverine2710, thanks for the reply.

Now I knew that wrong codepage in savegame descriptions is an ancient problem with russian version of Skyrim. So this is not MO problem at all.

And codepage (font?) issue in included readmes:

Posted Image

This file has codepage cp1251 (windows-1251, system codepage).

I'm using russian version of Skyrim, 1.9.0.32.

wolverine2710 · October 26, 2013

Now I knew that wrong codepage in savegame descriptions is an ancient problem with russian version of Skyrim. So this is not MO problem at all.

You used NMM before if I have interpreted your post correctl.Â Did this issue occur also with NMM or does it only occur after you have switched to MO. If so then MO seems to have something to do with it. In that cae please create a ticket for it in the issue tracker. You have to be log in before you can add a 'bug report'.Â

Just googled a bit but can't find really related stuff. This link might or might not be useful.

yuser · October 26, 2013

In the screenshot above is just a text file. I can open it in external viewer (e.g. notepad.exe) via "File tree" tab in MO and see normal text. MO's internal viewer opens file as UTF-8 while really it has CP1251. I tried to recode this file to UTF-8 and MO shown normal russian text. So it seems not a problem, sorry for this useless topic. I think that UTF-8 must be only and standard codepage but Microsoft don't think so...

Thanks for replies!

Uhuru · October 26, 2013

No problem, this is not useless, it may help others with same problem I read this earlier but had no time to post then I was going to suggest exactly what you have tried as I've seen this sort of thing in windows before.

wolverine2710 · October 26, 2013

Like Uhuru said its absolutely not useless. It apparently has to do with MO's internal viewer if I've interpreted everything correctly. Not every user with a russian Skyrim might know how to work a round it. Perhaps an enhancement request in the issue tracker might be in order here.

Uhuru · October 26, 2013

I think it's more a windows limitation when viewing text files in any notepad style viewer you can view text in Windows Style (ASCII) or unicode (UTF-8) for most text characters they are the same but some characters like russian choosing one gives readable text and the other rubbish but next document could be opposite way round basically some characters have different codes but only some characters (some non english ones)

yuser · October 27, 2013

Added an enhancement request: https://issue.tannin.eu/tbg/modorganizer/issues/449

Xelopon · April 25, 2015

It's basically a general limitation of character encoding. It is simply not possible to reliably determine the encoding of a file unless it's unicode and contains a BOM (Byte Order Mark).

Maybe it very sharply but I think its bad (or lazy) answer.

Correct encode detection without any BOM not so hard.

If I can see incorrect encoding then program can do it.

Many text libraries and text editors can detecting correct reencoding by checking output characters to valid character list. Another words If output characters after reencoding contain MANY unprintable symbols - its mean that reencoding incorrect, check another one.

Edited April 25, 2015 by Xelopon

Tannin · April 27, 2015

Maybe it very sharply but I think its bad (or lazy) answer.
Correct encode detection without any BOM not so hard.
If I can see incorrect encoding then program can do it.
Many text libraries and text editors can detecting correct reencoding by checking output characters to valid character list. Another words If output characters after reencoding contain MANY unprintable symbols - its mean that reencoding incorrect, check another one.

You're wrong and misinformed.

Just because your human brain can do something doesn't mean a computer program can. Humans are far better at pattern recognition than computers.

You can recognize that an encoding is wrong because you speak the language and the symbols don't make sense. Do you expect me to write a program that knows all languages, tests texts for syntactic and semantic correctness, account for typos, ascii art and the like to determine which encoding is most likely?

This is an unsolved problem, NO software known to me can do this reliably.

Also, you mention unprintable symbols but most 8-bit encodings have the unprintable symbols in the same byte-range and therefore they are NOT a reliable way to recognize character encoding.

If you're so smart YOU write code to auto-discover character encoding reliably. I'd advice you to file a patent.

Sign In

By yuser. cyrillic texts (russian) not displayed correctly (wrong codepage)

Question

wolverine2710

10 answers to this question

Recommended Posts

Tannin

yuser

wolverine2710

yuser

Uhuru

wolverine2710

Uhuru

yuser

Xelopon

Tannin

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Forums

Forum Activity

Game Guides

Nexus

Important Information