Unit Tests, Unicode and Special URI Characters

Unicode

Under NTFS, nearly all unicode characters can be used in file names. Characters like ÄÖÜ äöü ß, 新資料夾, Видео, 新增文字文件 may appear in file and directory names.

If you are dealing with software that handles user created file names you should always test your program with names containing several different unicode characters from different character sets.

Special URI Characters

Also, there are some characters that are special characters in the URI specification. For me, these special characters are the reserved characters :/?#[]@!$&'()*+,;= and the unreserved characters which are not alphanumerical or digits -._~.

As URIs and filenames are often used interchangeably, you need also to test with file names containing the special URI characters.

I hope, that you are seeing all the unicode and special characters on your screen like I’ve intended them 😉

Filenames Which Look Like Encoded Uri Characters

And, further, as in some cases file names get URI-encoded and -decoded it is also necessary to test with file names that look like some encoded special URI characters. I mean file names like %25%20.png or %23.png. Especially important are the encoded space %20 and the encoded percent %25.

Zip Archive and Tcl Script

To make your life easier, I have created a zip archive that contains various files with file names and path names with a set of the mentioned special characters that is usually enough for you to test. All the files are the same small plain text file, just named differently.

You can download the set of files here: TestFiles.zip

It may well be that for your testcase you cannot use the tiny txt files in my zip. Maybe you need pngs or jpegs or whatever. For that case, I’ve created a Tcl script with which you can copy a file of yours to all the differently named files.

Download the script here: createTestFiles.zip

To run the script, you need tclkit, which you can download here.
You can find some more info about tclkit in another post.

Happy testing!