Unit Tests, Unicode and Special URI Characters

Unicode

Under NTFS, nearly all unicode characters can be used in file names. Characters like ÄÖÜ äöü ß, 新資料夾, Видео, 新增文字文件 may appear in file and directory names.

If you are dealing with software that handles user created file names you should always test your program with names containing several different unicode characters from different character sets.

Special URI Characters

Also, there are some characters that are special characters in the URI specification. For me, these special characters are the reserved characters :/?#[]@!$&'()*+,;= and the unreserved characters which are not alphanumerical or digits -._~.

As URIs and filenames are often used interchangeably, you need also to test with file names containing the special URI characters.

I hope, that you are seeing all the unicode and special characters on your screen like I’ve intended them 😉

Filenames Which Look Like Encoded Uri Characters

And, further, as in some cases file names get URI-encoded and -decoded it is also necessary to test with file names that look like some encoded special URI characters. I mean file names like %25%20.png or %23.png. Especially important are the encoded space %20 and the encoded percent %25.

Zip Archive and Tcl Script

To make your life easier, I have created a zip archive that contains various files with file names and path names with a set of the mentioned special characters that is usually enough for you to test. All the files are the same small plain text file, just named differently.

You can download the set of files here: TestFiles.zip

It may well be that for your testcase you cannot use the tiny txt files in my zip. Maybe you need pngs or jpegs or whatever. For that case, I’ve created a Tcl script with which you can copy a file of yours to all the differently named files.

Download the script here: createTestFiles.zip

To run the script, you need tclkit, which you can download here.
You can find some more info about tclkit in another post.

Happy testing!

Randomness in Unit Tests

Poker set 2

Most of us programmers are writing unit tests for at least some of their code. This is very fine.

And, as I’ve seen in one of my latest projects, many of us have had the idea to use randomness in these unit tests. On the one hand, this good: With randomness, more different inputs are tested.
On the other hand, this is bad: Because if the seed of the random number generator is not known, we cannot reproduce a failing test.
So it could be the case that we have a nightly build with failing tests, but no way to reproduce (and hence debug) them because of the random input.

The solution: Use a seed that is based on the day, but not on on hours, minutes, seconds or ticks. This way we have both advantages: testing regularly with different input, but also easy reproducibility if one knows the day on which the test was run.
In C#, like this:

static DateTime now = DateTime.Now;
static int seed = (now.Year << 16) + (now.Month << 8 ) + now.Day;
static Random randomGenerator = new Random(seed);