Sun 15 Apr 2007
Correctly displaying russian MP3 ID3 tags in Muine
Posted by Markus Bertheau under computer , programming[5] Comments
Although I’m from Germany, I live in Novosibirsk at the moment. Novosibirsk is in Russia, so I listen to russian music. The player I use is muine. Unfortunately the artist and title information looks like this: 
The reason is that windows software that adds meta tags to music files uses the default russian 8-bit encoding CP1251. All ID3 versions except for the newest ones only allow ISO-8859-1 as the tag encoding. So muine, according to the standard, interprets the tags in ISO-8859-1. Let’s change that.
I’m using Ubuntu 6.10. Let’s have a look at the muine sources:
~/src/deb$ apt-get install build-essential ~/src/deb$ apt-get source muine ... dpkg-source: extracting muine in muine-0.8.5 dpkg-source: unpacking muine_0.8.5.orig.tar.gz dpkg-source: applying ./muine_0.8.5-1ubuntu4.diff.gz ~/src/deb$ cd muine-0.8.5/ ~/src/deb/muine-0.8.5$ ls src/ ... AddWindowEntry.cs DndUtils.cs Metadata.cs SkipToWindow.cs ...
The file Metadata.cs looks like it’s responsible for the ID3 tags. Searching it for title shows the following lines:
// Properties :: Title (get;)
[DllImport ("libmuine")]
private static extern IntPtr metadata_get_title (IntPtr metadata);
DllImport imports a binary library file. The next line declares a function metadata_get_title which is implemented in libmuine. Let’s look at that.
~/src/deb/muine-0.8.5$ ls libmuine/ ... gsequence.c metadata.c player-gst-0.8.c rb-cell-renderer-pixbuf.c ...
Searching for title in metadata.c gives us the following line:
metadata->title = get_mp3_comment_value (tag, ID3_FRAME_TITLE, 0);
Which leads us to get_mp3_comment_value. Let’s look at its definition:
get_mp3_comment_value (struct id3_tag *tag,
const char *field_name,
int index)
{
...
frame = id3_tag_findframe (tag, field_name, 0);
...
field = id3_frame_field (frame, 1);
...
ucs4 = id3_field_getstrings (field, index);
...
utf8 = id3_ucs4_utf8duplicate (ucs4);
...
}
get_mp3_comment_value calls a lot of functions the name of which starts with id3. The functions are not defined in metadata.c. They aren’t defined anywhere in the muine source code:
~/src/deb/muine-0.8.5$ grep -r id3_field_getstrings . ./libmuine/metadata.c: latin1 = id3_ucs4_latin1duplicate (id3_field_getstrings (field, 0)); ./libmuine/metadata.c: ucs4 = id3_field_getstrings (field, index);
Only calls to that function. In metadata.c there’s an include statement that includes id3tag.h. Looks like what we need. Let’s download the source for the corresponding library:
~/src/deb/muine-0.8.5$ apt-cache search id3tag libid3tag0 - ID3 tag reading library from the MAD project libid3tag0-dev - ID3 tag reading library from the MAD project mp3rename - Rename mp3 files based on id3tags somaplayer - player audio for the soma suite ~/src/deb/muine-0.8.5$ cd .. ~/src/deb$ apt-get source libid3tag0 ... dpkg-source: extracting libid3tag in libid3tag-0.15.1b dpkg-source: unpacking libid3tag_0.15.1b.orig.tar.gz dpkg-source: applying ./libid3tag_0.15.1b-8.diff.gz ~/src/deb$ cd libid3tag-0.15.1b/ ~/src/deb/libid3tag-0.15.1b$ grep -r id3_field_getstrings . ... ./field.c:id3_ucs4_t const *id3_field_getstrings(union id3_field const *field, ...
The function is defined in field.c. It accesses an array stringlist. That array is filled in the function id3_field_parse. This function calls another function, id3_parse_string that extracts the string values of a field.
~/src/deb/libid3tag-0.15.1b$ grep -r id3_parse_string * parse.c:id3_ucs4_t *id3_parse_string(id3_byte_t const **ptr, id3_length_t length, parse.h:id3_ucs4_t *id3_parse_string(id3_byte_t const **, id3_length_t,
This function is defined in parse.c. For ISO-8859-1 fields it calls id3_latin1_deserialize.
~/src/deb/libid3tag-0.15.1b$ grep -r id3_latin1_deserialize * latin1.c:id3_ucs4_t *id3_latin1_deserialize(id3_byte_t const **ptr, id3_length_t length) ...
id3_latin1_deserialize is defined in latin1.c. It calls id3_latin1_decode to convert the latin1 string to UCS-4, which in turn calls id3_latin1_decodechar to convert a single character. We’re there: we have found the place we have to change:
/*
* NAME: latin1->decodechar()
* DESCRIPTION: decode a (single) latin1 char into a single ucs4 char
*/
id3_length_t id3_latin1_decodechar(id3_latin1_t const *latin1,
id3_ucs4_t *ucs4)
{
*ucs4 = *latin1;
return 1;
}
The function is very simple: ISO-8859-1 is a subset of unicode, so only a direct assignment is needed. For CP1251 things are different. Looking at the wikipedia page for CP1251, we see that the letters of the russian alphabet start at 0xC0 with the upper case letters, followed by the lower case letters to 0xFF. Using gnome-character-map, we find that the corresponding unicode code points are U+0410 through U+044F and that the letters are in the same order. Very convenient. Let’s change the function to return the correct unicode values for the CP1251 letters 0xC0 through 0xFF:
id3_length_t id3_latin1_decodechar(id3_latin1_t const *latin1,
id3_ucs4_t *ucs4)
{
if (*latin1 >= 0xc0)
*ucs4 = 0x410 + (*latin1 - 0xc0);
else
*ucs4 = *latin1;
return 1;
}
The unicode encoding used here, UCS-4, just packs the unicode code point in a 32 bit integer, so we can just directly assign the unicode value. Now on to compiling the changed libid3tag.
~/src/deb/libid3tag-0.15.1b$ sudo apt-get build-dep libid3tag0 ... ~/src/deb/libid3tag-0.15.1b$ sudo apt-get install fakeroot ... ~/src/deb/libid3tag-0.15.1b$ fakeroot dpkg-buildpackage -uc -us ... ~/src/deb/libid3tag-0.15.1b$ sudo dpkg -i ../libid3tag0_0.15.1b-8_i386.deb ...
Now let’s delete the muine song database so that it re-reads the metadata.
~/src/deb/libid3tag-0.15.1b$ rm ~/.gnome2/muine/*
Start muine and import the music file:

Победа! :)
Update: Added install of build-essential at the beginning.
July 8th, 2007 at 21:06
But is there any way to easily just convert the tags so they would show up properly in all common linux music players?
July 10th, 2007 at 23:38
No doubt there is, I just don’t know it :)
July 11th, 2007 at 22:37
Good answer, I will continue looking.
December 13th, 2008 at 22:27
Thanks!,
December 19th, 2008 at 19:50
Hello
As a fresh http://www.bluetwanger.de user i just want to say hi to everyone else who uses this forum :>