By: Sujay V Sarma
File compression is necessary for several reasons, be it more
efficient storage or transmission of content by your application. You may not
want to develop a full-blown file compression software, but a smaller version of
that. To help with this task for .NET developers, ICSharpCode provides a .NET
assembly called 'SharpZipLib'. This library supports the ZIP, GZIP, TAR and BZ2
(BZip2) formats. The library is open source, so you can download the source code
and make your optimizations and fixes directly to the core, or directly lift and
plug just the required components into your application. In the project for this
article, we are using the binary SharpZipLib assembly to develop a tool similar
to popular tools like WinZip.
Our current code will process only ZIP files, but adding
support for other formats is as simple as setting a couple of parameters here
and there.
Understanding the project
Our project has six
source files in all. SplashScreen, MainWindow, CommentsSetting and
CompressionSetting together handle the UI of the application. FileSystem and
Zipper are classes that do the background work. We are not delving deep into the
code for three of the files; but it suffices to say that SplashScreen is the
startup screen to the application with no modifications to what VS.NET 2005
writes when you add a form from the 'Splash Screen' template to the project.
|
The name, version and copyright information for this screen
come directly from the project's properties at runtime. The other two
files-CommentsSetting and CompressionSetting, display one dialog box each to let
the user add or change the ZIP file comment and compression ratio, respectively.
FileSystem is a wrapper around Path, File and Directory classes and add some
additional functionality to those classes. Zipper is our interface with the
SharpZipLib component, a wrapper for its functions and performs exception
handling.
Code flow
When the user runs the
application, the runtime shows the splash screen (SplashScreen.vb) and times it
out after 5 seconds. Then the MainWindow UI is displayed and the application
waits for user interaction through the menu system there. The user opens an
existing ZIP file or creates a new one to start off other functionality.
On menu-item selection, the MainWindow code will check the
relevance (and make adjustments if required) before passing it on to the
functions within Zipper. Once the operation completes (success or failure), the
results are shown on the UI. The UI also has a status bar, which is used to
display some progress information. Let's now take up the code in Zipper and
understand it.
Zipper.vb
At the start of this file, you will
notice a 'Structure' defined. This is used by the last function in the file
(ListContents, lines 393—451) that reads a zip file and loads its directory into
the FileInZip structure array. The caller of ListContents will then use this
information to write it to the UI. This class has nine functions and one
important public property (ZipError). All consumers of the Zipper class must
examine the value of the ZipError variable to determine error status.
That there is an error would be indicated by the called
function-those that have defined Boolean returns will return a 'False', others
will return blank values. Typical errors range from incorrectly provided
information for a task to corruption in the ZIP file.
Getting started
face=Verdana size=1> Before attempting to follow this We have provided the SharpZipLib  |
You will notice that most of the code in this class plays
around with path strings. This is important because while the program itself is
dealing with absolute paths in MSDOS format ('\'), the ZIP file will store files
with relative paths and UNIX format ('/'). If path information is improperly
passed along, then the resulting output is unusable. Let's take the actual
zipping (lines 45—73) and unzipping (lines 315—355) code and analyze them. The
zipping code is as follows.
zFile = New
ZipOutputStream(File.Create (DestinationFileName))
With
zFile
.SetLevel(CompressionLevel)
    Â
.SetMethod(ZipOutputStream.DEFLATED)
End With
inFile =
File.OpenRead(SourceFilePath)
ReDim Buf(inFile.Length - 1)
inFile.Read(Buf, 0, Buf.Length)
zEntry = New
ZipEntry(Path.GetFileName(SourceFilePath))
With zEntry
.DateTime = Now
     .Size = inFile.Length
End
With
inFile.Close()
With oCrc32
.Reset()
     .Update(Buf)
    Â
zEntry.Crc = .Value
End With
With zFile
.PutNextEntry(zEntry)
     .Write(Buf, 0,
Buf.Length)
     .Finish()
     .Close()
End With
The zipping operation is implemented as a Stream object
(ZipInputStream). So, the first line above instantiates this by passing it a
File object, which in turn is created in the name of our destination ZIP file.
Quick hint: Similar functions exist within the BZip2, GZip
and Tar classes. Next, we need not concern ourselves with the nittygritties of
actually compressing each file. All we do is set up the compression level using
a simple 'SetLevel' method. This method takes a parameter (CompressionLevel),
which can vary from 0 through 6, with 6 being the maximum compression. Then, we
open the file to add to the ZIP in a standard File object and read in the
contents to a Byte array (Buf). Now, we need to compute its 32-bit CRC value.
This is very simply done in line 64 by 'oCrc32.Update(Buf)'. The Crc32 class
provided by the SharpZipLib library handles this for us.
Finally, set this CRC value to the Crc property of our
ZipEntry object (zEntry) and stream in the Bytes using the ZipOutputStream's
Write method. Simple, isn't it?
src="http://www.pcquest.com/2006/Images/PCQZipper_mar2k6.jpg" width=300 border=0> |
Our program lets the user change compression settings for any file in the zip. The drop down shows the current file selection for which this applies |
Now, let us move on to the unzip code. We are leaving out the
path manipulating code from the listing of lines 315—355 given below.
Dim Src As New
ZipInputStream(File.OpenRead(ZipFilePath))
Do
theEntry
= Src.GetNextEntry()
     If (theEntry Is Nothing) Then Exit
Do
     efPath = theEntry.Name
     ...
entryFileName = Path.GetFileName(efPath)
     If
(entryFileName.Length > 0) Then
           If ( ... )
Then
               ...
               End
If
               SW = File.Create(targetName)
               Do
                       Size =
Src.Read(Data, 0, Data.Length)
                 If (Size > 0)
Then
                       SW.Write(Data, 0, Size)
                 Else
                       Exit
Do
                 End If
              Â
               SW.Close()
           End If
     End If
Â
Just how writing to the Zip file involves a ZipOutputStream,
reading from it requires a ZipInputStream. This again requires
instantiation using an opened instance of the ZIP file we
want to extract from. Once opened, we need to enumerate through the
entire ZIP file to locate the file-there is no direct way to
do this yet. This is achieved using the GetNextEntry of the ZipInputStream
class.
Once found, we create a disk file of the same name
(File.Create), read in the data from the ZipInputStream's Read method, and write
it to the disk file we created. Now, where exactly did we uncompress the
information or check its CRC values? Well, it's in our ZipInputStream.Read
method does this automatically for us. If there were problems, the Try-Catch
construct around the above lines in our file will handle the exception.
The rest of the code in this file should be self-explanatory.
But let us discuss two key things before we move on. One, our ZipOperation is
what actually calls for both ZipAFile and Unzip (the classes with the code
above). And this ZipOperation uses a parameter 'DoAndWait'.
This is useful when there are several files being added into
a zip file and it is quite time consuming to keep recompressing the file every
time. When 'True', this parameter will cause the file to be added to a temporary
on-disk location, but not create the ZIP file itself. This also means that in a
set of calls to the ZipOperation function, the last one must have the DoAndWait
set to 'False' or the ZIP file will never be created.
This brings us to the second key point in our program. How we
zip files is by copying them away to a temporary location (value of
WorkingDirectory plus ZIP filename), and once all the files are there and
DoAndWait is False, the ZipAFolder is called to zip the entire folder at a time.
This saves a lot of computing power, but necessitates using that much of disk
space for the duration of the operation. So be careful when compressing large
files.
Current bugs
There are a few 'bugs' in our current
code as given. In some places, canceling a dialog will cause an error and the
ZIP file will close. In other places, operations will appear to succeed when
actually the circumstances make it impossible (exceptions are not being handled
at all). Some of this is intentional, others are accidental. Also we've left out
explanation of some code above. But in-file comments should be sufficient to
help you understand what's happening. The code is there for you so play with it,
change it or fix it. Have fun!
Source: PCQuest