Unit Tests, Unicode and Special URI Characters

Unicode

Under NTFS, nearly all unicode characters can be used in file names. Characters like ÄÖÜ äöü ß, 新資料夾, Видео, 新增文字文件 may appear in file and directory names.

If you are dealing with software that handles user created file names you should always test your program with names containing several different unicode characters from different character sets.

Special URI Characters

Also, there are some characters that are special characters in the URI specification. For me, these special characters are the reserved characters :/?#[]@!$&'()*+,;= and the unreserved characters which are not alphanumerical or digits -._~.

As URIs and filenames are often used interchangeably, you need also to test with file names containing the special URI characters.

I hope, that you are seeing all the unicode and special characters on your screen like I’ve intended them 😉

Filenames Which Look Like Encoded Uri Characters

And, further, as in some cases file names get URI-encoded and -decoded it is also necessary to test with file names that look like some encoded special URI characters. I mean file names like %25%20.png or %23.png. Especially important are the encoded space %20 and the encoded percent %25.

Zip Archive and Tcl Script

To make your life easier, I have created a zip archive that contains various files with file names and path names with a set of the mentioned special characters that is usually enough for you to test. All the files are the same small plain text file, just named differently.

You can download the set of files here: TestFiles.zip

It may well be that for your testcase you cannot use the tiny txt files in my zip. Maybe you need pngs or jpegs or whatever. For that case, I’ve created a Tcl script with which you can copy a file of yours to all the differently named files.

Download the script here: createTestFiles.zip

To run the script, you need tclkit, which you can download here.
You can find some more info about tclkit in another post.

Happy testing!

Tclkit: A Tiny Full Featured Scripting Language

Tcl/Tk

Tcl is a full featured script programming language with a small footprint. It is available on many platforms. Tcl’s syntax looks a bit odd if you are coming from a C-like language. But in reality, it is pretty simple … Tcl is a small language and you’ll learn it fast.

Tk is a cross-platform GUI system that has been built for Tcl. Together, they form Tcl/Tk. Tk is used with many other languages (Perl, Python, Ruby, …) too.

Tclkit and Tclkitsh

Tclkit is Tcl and Tk and several libraries altogether put into one single executable. No installation is needed.

For Windows, there is also available tclkitsh.exe which contains only the command line version of Tcl and libraries as one single executable. It is among the first ten things I put onto a new computer.

Why I Use Tclkit

  • Deployment nearly can’t be easier. You just have to copy the single tclkit or tclkitsh executable and your script file. I’ve used this method with my zip speed test program. You can download tclkitsh.exe from there.

  • It is tiny. Tclkitsh.exe 8.5.9 is only 740 kB. Tclkit.exe,
    which contains the full Tk GUI is only 1.3 MB

  • Development with tclkit is fast. No compile-run cycles.
  • The GUI system is easy to handle. You don’t even need any visual
    tool to create your GUI.
  • The windows cmd shell language is just terrible. That beast
    cannot even be called a language. If you don’t want to dive
    into Pwershell, Tclkit is a simple and perfect
    replacement for the windows cmd shell.
  • Compared to Perl and Python, it is easier to learn,
    read and understand.
  • It is perfect for doing file operations in your build process.
  • It is perfect for controlling other executables,
    e.g. in your build process. In fact, Tcl has been built
    with the main target to be a language to control other executables.

Effektiv Tcl/Tk programmieren . (Programmer’s Choice)

Example 1: Check If Drive Is Available

I do my backups onto an external USB drive. The backup program is controlled via the Windows Task Scheduler. For security reasons, this drive is usually not connected. This means, before the backup program can do its work, I have to manually connect the drive.

I’ve written a small Tcl/Tk program to check if the drive is available and to inform me that I should connect it if it’s not.

This is the Tcl/Tk Script. With graphical interface and bells.


proc drives {} { foreach drive [list a b c d e f g h i j k l m n o p q r s t u v x y z] { if {[catch {file stat ${drive}: dummy}] == 0} { lappend drives $drive } } return $drives } proc checkDrive {} { if { [lsearch [drives] z] != -1 } exit else { bell; bell } } set a "The daily backup is about to be done. " append a "Please connect the backup drive, make it " append a "available as network drive Z: and click Ok." checkDrive # Create a label or message called .m message .m -textvariable a -width 250 # Create a button called .hello # strange: the width of the button is in another dimension # than that of the message. button .hello -text "Ok" -command { checkDrive } -default active -width 15 # This binds the return key to the pressing of the .hello button bind . <Return> {.hello invoke} # Pack the message and the button together. pack .m .hello -padx 5 -pady 5 # Show the main window wm deiconify .

When this tiny script is run, it checks if drive Z: is connected and if not, it shows the following message box. When the Ok button is hit, it checks again.

0806-223826-BackupPre

Example 2: With SVN, Show All Changed Files

I am working on a system with svn as version control since some days. I have not yet found a fine graphical svn client, aside of TkCVS. (Yes, TkCVS handles svn quite well, despite its name.)

One of my first problems was to find out which files are changed locally in several unconnected checked out directories. A task for which tclkitsh is suited very well.


proc isEmptyDirectory { path } { set isED false if { [file isdirectory $path] } { set content [glob -nocomplain $path/*] if { [llength $content] == 0 } { set isED true } } return $isED } proc handleEntry { dir entry } { set path [file join $dir [string range $entry 8 end]] if { ! [regexp -nocase {( bin|/bin| obj|/obj|.user|.suo)$} $entry] && ! [isEmptyDirectory $path] } { puts $entry } } proc statDir { dir } { cd $dir puts "" puts $dir if [catch { set l [exec svn status . | findstr /V Pics ] } ] { # above, I'm piping the output of svn status . through # findstr. Just as an example. puts "" } else { set l [string map { /} $l] set l [split $l "n"] foreach s $l { handleEntry $dir $s } } } while { 1 } { statDir C:/code/Client/WebClient statDir C:/code/Client/ServiceLayer statDir D:/src/TraceOfDeathApp statDir E:/src/AspServer puts "Press Return key to repeat." gets stdin }

Java for C# Programmers, Part 1: Primitives

This is part 1 of a series of articles about Java for C# Programmers.

  • I strive to present a terse but complete depiction of the differences between Java and C#, from the point of view of an experienced C# programmer.
  • I do not strive to present a a complete reference of the Java language here. A complete reference for Java can be found on http://docs.oracle.com/javase/tutorial/java/.
  • Gotcha: Facts that are really unexpected for a C# guy are labeled Gotcha.

How Came?

  • Currently, I’m doing a Java course. While learning, I’m writing down what I’ve learned, as a reference for other C# guys and myself.
  • Java is a programming language. As such, it is a developer tool. It is public since 1995. A time-tested one, too 😉

Identifiers

In both languages, indentifiers must not start with a number and must not contain spaces. Aside of these rules, most characters are allowed. Including german umlauts ÄÖÜ and currency symbols. Äöüß$$€ is an allowed indentifier.

Naming Conventions

  • Classes and interfaces: First letter uppercase. Rest camel cased.
  • Packages: lowercased, no _ separators. package bananaboat;. When you have a lot of packages, subdivide names by dots. app.boats.bananaboat
  • Use pascal cased nouns for class names. Banana, BookDocument
  • Use adjectives for interface names, without leading I. Eatable, Printable
  • Methods: First letter lowercase. Rest camel cased. Verbs. getPrinter()
  • Variables: Like methods. myFruit. Use short names for short lived variables. int i.
  • Constants: All uppercased, _ as separator. MAX_WIDTH

Primitive Data Types

It is all the same as in C#, aside of the following differences.

  • The unsigned integer types do not exist in Java.
  • Gotcha: This means byte is signed in Java.
  • The high precision floating point type decimal does not exist in Java.
  • bool is called boolean.
  • C#’s nullable bool? type supports ternary logic with the & and | operators. There is no equivalent in Java.
  • Pointers and tuples do not exist in Java.
  • In Java, the primitive types are not derived from object. (In C# they are, via Object -> ValueType -> primitive type.)
  • Gotcha: Java’s Date is a reference type but C#’s DateTime a value type.
  • There is no TimeSpan in Java.

Literals

Mostly the same as in C#.

Gotcha: In Java, integer literals starting with a zero are interpreted as octal values. Not so in C#.
int i = 077; // Java: i is 63 decimal

Gotcha: In the following piece of code, the longWithoutL constant is calculated as int and then just assigned to the long. In C#, you’ll get a compile error when making such a mistake.

long longWithL = 1000*60*60*24*365L;
long longWithoutL = 1000*60*60*24*365;
System.out.println(longWithL);    // 31536000000
System.out.println(longWithoutL); // 1471228928

Additionally, binary notation for integer type values
is allowed.

int eleven = 0b1011;        // 11 decimal
int minuseleven= -0b1011;   // -11 decimal
byte minusone =         // Compile error. Byte is signed.  
         0b11111111;    //  You can use only 7 bits in 0b notation. 
byte b = -0b1;          // -1 decimal

Wrapper Classes / Boxing / Unboxing

In Java, the primitve types are not derived from Object. Probably because of this, you cannot use them in generics, cannot pass them by reference and they have no methods.
But for every primitive data type, there is a wrapper class which you can use in generics and has generally the same behaviour as the primitive type.

  • They are called Byte, Short, Integer, Long. Float, Double, Boolean. Character, ….
  • Automatic conversion to and from the primitive types works well.
  • You cannot use these wrapper classes to do a pass by reference for primitive types.
  • Some standard functions are implemented on these wrapper classes, e.g. toString().

Java’s Atomic Wrapper Classes

There is another type of wrapper classes available in Java. They are called Atomic Wrapper Classes.

  • AtomicBoolean, AtomicInteger, AtomicLong, and AtomicReference.
  • There is no automatic conversion to and from these types.
  • You can change their value and so they can be used for pass by reference of primitive types.
  • They are thread safe.
  • They implement a lot of functions in an atomic way, like incrementAndGet(), getAndAdd(int delta) and the like.

Pass By Ref of Primitive Data Types

static void Test()
{
    AtomicInteger i = new AtomicInteger();
    i.set(3);
    passByRef1(i);
    System.out.println("i: " + i);      // i: 4

    int[] j = { 3 };
    passByRef2(j);
    System.out.println("j[0]: " + j[0]);    // j[0]: 4

    Integer k = 3;
    passByRefNotGood(k);
    System.out.println("k: " + k);      // k: 3   **GOTCHA**
}

static void passByRef1(AtomicInteger i)
{
    i.incrementAndGet();
}

static void passByRef2(int[] i)
{
    i[0]++;
}

static void passByRefNotGood(Integer i)
{
        i += 1;     // **GOTCHA** This does NOT increment the 
                    // outer Integer, but there's no compile error.
}

Unboxing of Numbers in Calculations

// C#:
int? a = null, c = 3;
int? b = c * a;        // b becomes null.

bool? d = false, e = null;
bool? f = d & e;       // f becomes false. Ternary logic.   
bool? g = d | e;       // g becomes null. Ternary logic.  


// Java: 
Integer a = null, c = 3;   
Integer b = c * a;      // NullPointerException happens. 

Boolean d = false, e = null;
Boolean f = d & e;      // NullPointerException.

Notepad++: How to Make the Function List work with Tcl and Bash

In another post about Notepad++ I critisized that its function list feature does not work for Tcl and Bash scripts. Now I’ve got a solution. Here is it.

How to Make the Function List Work for Tcl

Open functionList.xml in an editor.  FunctionList.xml is in %APPDATA%\notepad++ or in the installation directory of Notepad++.
Add this line to the section with all the other association-entries.

 <association langID="29" id="tcl_procedure"/>

Add this to the section <parsers>

<parser id="tcl_procedure" displayName="Tcl source" commentExpr="(#)">
    <function
            mainExpr="^[\t ]*((proc)[\s]+)[^\n]+\{"
            displayMode="$functionName">
        <functionName>
            <nameExpr expr="[\w: ]+ \{.*\}"/>
        </functionName>
    </function>
</parser>

Restart Notepad++. Credits for the solution go to Detlef of compgroups.net/comp.lang.tcl. I’ve copied his’ and shortened it a bit. And added support for bash.

How to Make the Function List Work for Bash

In functionList.xml add this line to the section with all the other association-entries.

 <association langID="26" id="bash_function"/>

Add this to the section <parsers>

<parser id="bash_function" displayName="Bash" commentExpr="(#)">
    <function
            mainExpr="^[\t ]*((function)[\s]+)[^\n]+"
            displayMode="$functionName">
        <functionName>
            <nameExpr expr="[\w: ]+"/>
        </functionName>
    </function>
</parser>

Restart Notepad++.

Download

Or just download my functionList.xml with Tcl and Bash support and replace yours.

Effektiv Tcl/Tk programmieren . (Programmer’s Choice)

Update: As Olivier from comp.lang.tcl pointed out, my version does not work with Npp 6.4.3. I’ve rechecked this, he is right. My version does work at least with versions 6.4.5, 6.5 and 6.5.5, though.

Update 2: Thanks to Rasha Matt Blais for his improved version of the mainExpr for bash functions. Though that one still misses some types of function definition. Here my latest mainExpr string:

mainExpr="^[\t ]*((function)[\s]+)[^\n]+|[\w_]+[\s]*(\([\s]*\))"

I’ve also adapted the downloadable functionList.xml. And I’ve added the testfunc.bsh file which contains some bash functions with which I’ve tested the my mainExpr string.

Why Zip Is a Better Archive Format than 7z

Scott Hanselmann declared 7z to be a much better format than zip and zip to be dying.
Or at least, this is how I understand his writing:

The 7z format is fast becoming the compression format that choosey hardcore users choose.

Though I’ve got a lot of respect for Scott, I have to add some facts.

Zip is Much Faster Than 7z

zip-7z-comparison.1 In my humble opinion, Scott has overseen that zip is much faster than 7z… I’ve done  tests with both formats and with both formats I’ve used compression methods 1 and 5. Some of the results are staggering.

 

Discussion of the Results

compressFullzip-7z-comparison.3

In the compressFull tests, a data set is compressed and added to a new archive.  As you can see above, compressing with zip-1 is around 4 times faster than compressing with 7z-1 and seven times faster than compressing with 7z-5. The resulting archive is only 6% bigger than the 7z-1 archive and 40% bigger than the 7z-5 archive. In my opinion, the much greater speed speaks clearly for zip.

extractFull

In the extractFull tests, all the files in the archive created by fullCompress are extracted locally. Here, the speeds are not too different, e.g. extracting from a zip-1 is only around 30% faster than from a 7z-5 archive. As the 7z archives are smaller, I rate this as a draw.

extractSome

zip-7z-comparison.2

In the extractSome tests, only a small number of all the files are extracted from the archive. Extracting from a zip-1 archive is 50% faster than from a 7z-1 archive and an astonishing twenty times faster than from a 7z-5 archive. As this is unbelievable, I have repeated the tests a lot of times. But it stays true. Victory by knockout for zip.

Test Details

  • I’ve done the tests on my oldish Fujitsu Siemens Lifebook S with a Dual Core Processor, running Win 7.
  • For both archive formats I’ve used the commandline version of 7-zip 9.20, controlled by tclkitsh.exe and the tcl script given below. It is the current and stable release of 7-zip. Unchanged since 2010.
  • The compression method 5 is in 7-zip the default value for both formats. Compression method 1 is the fastest compression method for both formats.
  • I’ve repeated each test several times. The standard deviation of the results have mostly been very small.
  • My data set has been a set of 2800 files with 192 MB uncompressed.
  • The files which are extracted in the extractSome test are 72 files with 68 kB uncompressed.

Reproduction of the Tests

You can reproduce the tests easily:

  1. Download my script zip-test.zip and tclkitsh.zip and extract both into the same directory.
  2. Open the file zip-test.tcl with a text editor and adapt the first three lines. In the first line, adapt the path to the 7z.exe file on your PC. In the second line, adapt the path to the directory you want to compress.
  3. Open a command window, cd to the directory where you’ve put zip-test.tcl and type in     tclkitsh zip-test.tcl
  4. Wait and don’t use the computer until the tests are finished.
  5. The results will be written on screen and at the same time appended to the file zip-test-results.txt.
  6. You should discard the first test, because for the first one, the speed is highly determined by the time you need to read data from the hard drive. In later runs, much of the data is in the OS’s drive buffer. So the first test is not comparable to the following ones. It does not measure compression speed, but hard drive speed. You’ll see that the first run (with zip-1) takes much longer than the following ones, even those with zip-5 or 7z-5.

Summary

  1. All operations on zip archives are much faster or as fast as the same operation on 7z archives.
  2. 7z archives are somewhat smaller than zip archives – but not much.
  3. My recommendation is:
    Use zip as archive type. If you are using the software 7-zip, do not use the default settings. Always use compressing method 1.

What is your opinion? I’d love to hear from you.