Jump to content
  • 0

By yuser. cyrillic texts (russian) not displayed correctly (wrong codepage)


Question

Posted

Originally yuser posted this in the Nexus MO forum. So that this discussion will not get buried on the nexus forums I created the question here and will respond to it here.

 

Just installed Mod Organizer. It is much better than NMM, thanks. At this moment i have one issue: all cyrillic texts (russian) - included readmes, savegame descriptions - seems as having wrong codepage. I.e. it seems as cp1252 or even iso8859-1 instead of system codepage cp1251. I couldn't find similar problems in this forum or official forum and bugtracker.

 

How i can fix it? Is it possible to change codepage or select another font or something else?

 

PS: sorry for my English.

I don't really have an answer but perhaps/hopefully others here can help you further.

 

Am I correct to assume you are using a russian Skyrim version?

Can you perhaps show us what is going wrong with a screenshot?

Just upload it to one of those 'image' sites and provide the link here.

10 answers to this question

Recommended Posts

  • 0
Posted

It's basically a general limitation of character encoding.

It is simply not possible to reliably determine the encoding of a file unless it's unicode and contains a BOM (Byte Order Mark).

 

The internal viewer of MO tries to detect that BOM. If it's missing it assumes UTF-8 (which is another unicode encoding that is often produced without BOM).

It currently doesn't support the ancient notion of country specific character sets at all.

 

The best approach here would be: If you have a file that doesn't show correctly in the MO window, send the author a mail that the 21st century says hi and looks forward to meeting him.

 

Yeah, I know this isn't realistic, we're going to live with DOS tech for a couple more decades, so I'll look into it.

  • +1 1
  • 0
Posted

wolverine2710, thanks for the reply.

 

Now I knew that wrong codepage in savegame descriptions is an ancient problem with russian version of Skyrim. So this is not MO problem at all.

 

And codepage (font?) issue in included readmes:

 

Posted Image

 

This file has codepage cp1251 (windows-1251, system codepage).

 

I'm using russian version of Skyrim, 1.9.0.32.

  • 0
Posted

Now I knew that wrong codepage in savegame descriptions is an ancient problem with russian version of Skyrim. So this is not MO problem at all.

 

You used NMM before if I have interpreted your post correctl.  Did this issue occur also with NMM or does it only occur after you have switched to MO. If so then MO seems to have something to do with it. In that cae please create a ticket for it in the issue tracker. You have to be log in before you can add a 'bug report'. 

 

Just googled a bit but can't find really related stuff. This link might or might not be useful.

  • 0
Posted

In the screenshot above is just a text file. I can open it in external viewer (e.g. notepad.exe) via "File tree" tab in MO and see normal text. MO's internal viewer opens file as UTF-8 while really it has CP1251. I tried to recode this file to UTF-8 and MO shown normal russian text. So it seems not a problem, sorry for this useless topic. I think that UTF-8 must be only and standard codepage but Microsoft don't think so...

 

Thanks for replies!

  • 0
Posted

No problem, this is not useless, it may help others with same problem I read this earlier but had no time to post then I was going to suggest exactly what you have tried as I've seen this sort of thing in windows before.

  • 0
Posted

Like Uhuru said its absolutely not useless. It apparently has to do with MO's internal viewer if I've interpreted everything correctly. Not every user with a russian Skyrim might know how to work a round it. Perhaps an enhancement request in the issue tracker might be in order here.

  • 0
Posted

I think it's more a windows limitation when viewing text files in any notepad style viewer you can view text in Windows Style (ASCII) or unicode (UTF-8) for most text characters they are the same but some characters like russian choosing one gives readable text and the other rubbish but next document could be opposite way round basically some characters have different codes but only some characters (some non english ones)

  • 0
Posted (edited)

It's basically a general limitation of character encoding. It is simply not possible to reliably determine the encoding of a file unless it's unicode and contains a BOM (Byte Order Mark).

Maybe it very sharply but I think its bad (or lazy) answer.
Correct encode detection without any BOM not so hard.
If I can see incorrect encoding then program can do it.
Many text libraries and text editors can detecting correct reencoding by checking output characters to valid character list. Another words If output characters after reencoding contain MANY unprintable symbols - its mean that reencoding incorrect, check another one.
Edited by Xelopon
  • 0
Posted

Maybe it very sharply but I think its bad (or lazy) answer.

Correct encode detection without any BOM not so hard.

If I can see incorrect encoding then program can do it.

Many text libraries and text editors can detecting correct reencoding by checking output characters to valid character list. Another words If output characters after reencoding contain MANY unprintable symbols - its mean that reencoding incorrect, check another one.

You're wrong and misinformed.

Just because your human brain can do something doesn't mean a computer program can. Humans are far better at pattern recognition than computers.

 

You can recognize that an encoding is wrong because you speak the language and the symbols don't make sense. Do you expect me to write a program that knows all languages, tests texts for syntactic and semantic correctness, account for typos, ascii art and the like to determine which encoding is most likely?

This is an unsolved problem, NO software known to me can do this reliably.

 

Also, you mention unprintable symbols but most 8-bit encodings have the unprintable symbols in the same byte-range and therefore they are NOT a reliable way to recognize character encoding.

 

If you're so smart YOU write code to auto-discover character encoding reliably. I'd advice you to file a patent.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...

Important Information

By using this site, you agree to our Guidelines, Privacy Policy, and Terms of Use.