Add-on unicode paths

This page describes how to prevent common problems with non latin characters in XBMC or Add-on paths.

Unicode paths
If you want to write an add-on which is able to work with paths like 'd:\apps\éîäß\' or 'opt/xbmc/àí' at first you should read http://docs.python.org/2/howto/unicode.html#tips-for-writing-unicode-aware-programs

After reading you know: "Software (Python) should only work with unicode strings internally, converting to a particular encoding on output. (or input)". XBMC outputs UTF-8 encoded strings. Input can be unicode or UTF-8 encoded, but there are rumors that some functions don't work with unicode input parameters.

Therefore the simplest way to deal with non latin characters is to pass every parameter as UTF-8 encoded string to XBMC and to convert XBMC's UTF-8 output back to unicode.

Addon path
The first path an add-on has to deal with is it's own add-on path: path = addon.getAddonInfo('path').decode('utf-8') XBMC's getAddonInfo returns an UTF-8 encoded string and we decode it an unicode.

Browse dialog
dialog = xbmcgui.Dialog directory = dialog.browse(0, 'Title', 'pictures').decode('utf-8') dialog.browse returns an UTF-8 encoded string which perhaps contains some non latin characters. Therefore decode it to unicode!

Path joins
os.path.join(path, filename) If path and filename are unicodes then everthing will work as expected. But what will happen if filename is an UTF-8 encoded string which contains "öäü.jpg"?

Python always uses unicodes to join a string with an unicode. Therefore Python will decode the string with it's default encoding (ascii). os.path.join(path, filename.decode('ascii')) Due to the missing öäü within the ASCII codepage you'll get an unicode exception! That's the reason why you must explicitly convert the string to unicode! os.path.join(path, filename.decode('utf-8'))

Logging
Don't use "print message" because if message contains non latin character you'll get an unicode exception. Instead: print message.encode('utf-8') which requires that message must be a unicode!

smart_unicode and smart_utf8
Because you cannot decode an unicode or encode a string it makes sense to have a function which works with unicodes and strings: def smart_unicode(s): """credit : sfaxman""" if not s:       return '' try: if not isinstance(s, basestring): if hasattr(s, '__unicode__'): s = unicode(s) else: s = unicode(str(s), 'UTF-8') elif not isinstance(s, unicode): s = unicode(s, 'UTF-8') except: if not isinstance(s, basestring): if hasattr(s, '__unicode__'): s = unicode(s) else: s = unicode(str(s), 'ISO-8859-1') elif not isinstance(s, unicode): s = unicode(s, 'ISO-8859-1') return s You can use smart_unicode to ensure that the return type is an unicode! def smart_utf8(s): return smart_unicode(s).encode('utf-8') And smart_utf8 to pass parameters to XBMC.

Logging
Instead of above mentioned "print message.encode('utf-8') use: def log(msg, level=xbmc.LOGDEBUG):   plugin = "My nice plugin"

if type(msg).__name__=='unicode': msg = msg.encode('utf-8')

xbmc.log("[%s] %s"%(plugin,msg.__str__), level) Benefit is that msg can be everything from string to unicode to class. And you can use XBMC's debug levels to prevent spam in the XBMC.log.

Notification
def show_notification(title, message, timeout=2000, image=""): if image == "": command = 'Notification(%s,%s,%s)' % (smart_utf8(title), smart_utf8(message), timeout) else: command = 'Notification(%s,%s,%s,%s)' % (smart_utf8(title), smart_utf8(message), timeout, smart_utf8(image)) xbmc.executebuiltin(command) The show_notification function uses the smart_utf8 function to ensure that every string parameter passed to XBMC is an UTF-8 encoded string. Therefore you can call show_notification with strings or unicodes!

Windows
Windows' NTFS is unicode aware but Windows still uses codepages like cp-850 for Western Europe.

If you use Python file functions with string parameters then internally the strings will be converted to the Windows codepage which means that you cannot access a file with greek characters from an english Windows. But if you pass unicodes to the file functions then everything will work as expected!

OpenElec
Due to missing codepage support in OpenElec you must not pass unicodes to Python file functions!

Instead always use UTF-8 encoded strings.

Conclusion
Since your add-on should work with all supported OS, use the following approach: try: file function with unicodes except: try: file function with utf-8 encoded strings except: fatal error