Unit Tests, Unicode and Special URI Characters

Unicode

Under NTFS, nearly all unicode characters can be used in file names. Characters like ÄÖÜ äöü ß, 新資料夾, Видео, 新增文字文件 may appear in file and directory names.

If you are dealing with software that handles user created file names you should always test your program with names containing several different unicode characters from different character sets.

Special URI Characters

Also, there are some characters that are special characters in the URI specification. For me, these special characters are the reserved characters :/?#[]@!$&'()*+,;= and the unreserved characters which are not alphanumerical or digits -._~.

As URIs and filenames are often used interchangeably, you need also to test with file names containing the special URI characters.

I hope, that you are seeing all the unicode and special characters on your screen like I’ve intended them 😉

Filenames Which Look Like Encoded Uri Characters

And, further, as in some cases file names get URI-encoded and -decoded it is also necessary to test with file names that look like some encoded special URI characters. I mean file names like %25%20.png or %23.png. Especially important are the encoded space %20 and the encoded percent %25.

Zip Archive and Tcl Script

To make your life easier, I have created a zip archive that contains various files with file names and path names with a set of the mentioned special characters that is usually enough for you to test. All the files are the same small plain text file, just named differently.

You can download the set of files here: TestFiles.zip

It may well be that for your testcase you cannot use the tiny txt files in my zip. Maybe you need pngs or jpegs or whatever. For that case, I’ve created a Tcl script with which you can copy a file of yours to all the differently named files.

Download the script here: createTestFiles.zip

To run the script, you need tclkit, which you can download here.
You can find some more info about tclkit in another post.

Happy testing!

Tclkit: A Tiny Full Featured Scripting Language

Tcl/Tk

Tcl is a full featured script programming language with a small footprint. It is available on many platforms. Tcl’s syntax looks a bit odd if you are coming from a C-like language. But in reality, it is pretty simple … Tcl is a small language and you’ll learn it fast.

Tk is a cross-platform GUI system that has been built for Tcl. Together, they form Tcl/Tk. Tk is used with many other languages (Perl, Python, Ruby, …) too.

Tclkit and Tclkitsh

Tclkit is Tcl and Tk and several libraries altogether put into one single executable. No installation is needed.

For Windows, there is also available tclkitsh.exe which contains only the command line version of Tcl and libraries as one single executable. It is among the first ten things I put onto a new computer.

Why I Use Tclkit

  • Deployment nearly can’t be easier. You just have to copy the single tclkit or tclkitsh executable and your script file. I’ve used this method with my zip speed test program. You can download tclkitsh.exe from there.
  • It is tiny. Tclkitsh.exe 8.5.9 is only 740 kB. Tclkit.exe,
    which contains the full Tk GUI is only 1.3 MB

  • Development with tclkit is fast. No compile-run cycles.
  • The GUI system is easy to handle. You don’t even need any visual
    tool to create your GUI.
  • The windows cmd shell language is just terrible. That beast
    cannot even be called a language. If you don’t want to dive
    into Pwershell, Tclkit is a simple and perfect
    replacement for the windows cmd shell.
  • Compared to Perl and Python, it is easier to learn,
    read and understand.
  • It is perfect for doing file operations in your build process.
  • It is perfect for controlling other executables,
    e.g. in your build process. In fact, Tcl has been built
    with the main target to be a language to control other executables.

Example 1: Check If Drive Is Available

I do my backups onto an external USB drive. The backup program is controlled via the Windows Task Scheduler. For security reasons, this drive is usually not connected. This means, before the backup program can do its work, I have to manually connect the drive.

I’ve written a small Tcl/Tk program to check if the drive is available and to inform me that I should connect it if it’s not.

This is the Tcl/Tk Script. With graphical interface and bells.