Sinks: Write-only Streams of Data
Creating archives and writing files with zipsink
You can wrap any IO
object that supports writing bytes (any type that implements unsafe_write(::T, ::Ptr{UInt8}, ::UInt)
) in a special ZIP archive writer with the zipsink
function. The function will return an object that allows creating and writing files within the archive. You can then call open(sink, filename)
using the returned object to create a new file in the archive and begin writing to it with standard IO
functions.
This example creates a new ZIP archive file on disk, creates a new file within the archive, writes data to the file, then closes the file and archive:
using ZipStreams
io = open("new-archive.zip", "w")
sink = zipsink(io)
f = open(sink, "hello.txt")
write(f, "Hello, Julia!")
close(f)
close(sink)
Convenience methods are included that create a new file on disk by passing a file name to zipsink
instead of an IO
object and that run a unary function so that zipsink
can be used with a do ... end
block. In addition, the open(sink, filename)
method can also be used with a do ... end
block, as this example shows:
using ZipStreams
zipsink("new-archive.zip") do sink # create a new archive on disk and truncate it
open(sink, "hello.txt") do f # create a new file in the archive
write(f, "Hello, Julia!")
end # automatically write a Data Descriptor to the archive and close the file
end # automatically write the Central Directory and close the archive
Note that the IO
method does not automatically close the IO
object after the do
block ends. The caller of that signature is responsible for the lifetime of the IO
object. The IO
object can be closed before the end of the do
block by calling close
on the sink. Additional writes to a closed sink will cause an ArgumentError
to be thrown, but closing a closed sink is a noop, as these examples show:
using ZipStreams
io = IOBuffer()
zipsink(io) do sink
open(sink, "hello.txt") do f
write(f, "Hello, Julia!")
end
end
@assert isopen(io) == true
zipsink(io) do sink
open(sink, "goodbye.txt") do f
write(f, "Good bye, Julia!")
end
close(sink)
end
@assert isopen(io) == false
Because the data are streamed to the archive, you can only have one file open for writing at a time in a given archive. If you try to open a new file before closing the previous file, a warning will be printed to the console and the previous file will automatically be closed. In addition, any file still open for writing when the archive is closed will automatically be closed before the archive is finalized, as this example demonstrates:
using ZipStreams
zipsink("new-archive.zip") do sink
f1 = open(sink, "hello.txt")
write(f1, "Hello, Julia!")
f2 = open(sink, "goodbye.txt") # issues a warning and closes f1 before opening f2
write(f2, "Good bye, Julia!")
end # automatically closes f2 before closing the archive
Writing files to an archive all at once with write_file
When you open a file for writing in a ZIP archive using open(sink, filename)
, writing to the file is done in a streaming fashion with a Data Descriptor written at the end of the file data when it is closed. If you want to write an entire file to the archive at once, you can use the write_file(sink, filename, data)
method. This method will write file size and checksum information to the archive in the Local File Header rather than using a Data Descriptor. The advantage to this method is that files written this way are more efficiently read back by a zipsource
: when streamed for reading, the Local File Header will report the correct file size. The disadvantages to using this method for writing data are that you need to have all of the data you want to write available at one time and that both the raw data and the compressed data need to fit in memory. Here are some examples using this method for writing files:
using ZipStreams
zipsink("new-archive.zip") do sink
open(sink, "hello.txt") do f1
write(f1, "Hello, Julia!") # writes using a Data Descriptor
end
end
zipsource("new-archive.zip") do source
f = next_file(source) # works, but is slow to read because the stream has to be checked for a valid Data Descriptor with each read
@assert read(f, String) == "Hello, Julia!"
end
zipsink("new-archive.zip") do sink
text = "Hello, Julia!"
write_file(sink, "hello.txt", text) # writes without a Data Descriptor
end
zipsource("new-archive.zip") do source
f = next_file(source) # is more efficient to read because the file size is known a priori
@assert read(f, String) == "Hello, Julia!"
end
Creating directories in an archive
Directories within a ZIP archive are nothing more than files with zero length and a name that ends in a forward slash (/
). If you try to make a file using open
or write_file
that has a name ending in /
, the method will throw an error. You can, however, make a directory by calling the mkdir
and mkpath
functions. They work similar to how Base.mkdir
and Base.mkpath
work: the former will throw an error if all of the parent directories do not exist, while the latter will create the parent directories as needed. Here are examples of these two functions:
using ZipStreams
zipsink("new-archive.zip") do sink
try
f = open(sink, "file/") # fails because files cannot end in '/'
catch e
@error "exception caught" exception=e
end
mkdir(sink, "dir1/") # creates a directory called "dir1/" in the root of the archive
mkdir(sink, "dir1/dir2/") # creates "dir2/" as a subdirectory of "dir1/"
try
mkdir(sink, "dir3/dir4/") # fails because mkdir won't create parent directories
catch e
@error "exception caught" exception=e
end
mkpath(sink, "dir3/dir4/") # creates both "dir3/" and "dir3/dir4/"
mkdir(sink, "dir5") # The ending slash will be appended to directory names automatically
end
NOTE: Even on Windows computers, directory names in ZIP files always use forward slash (/
) as a directory separator. Backslash characters (\
) are treated as literal backslashes in the directory or filename, so mkdir(sink, "dir\\file")
will create a single file named dir\file
and not a directory.
The mkdir
and mkpath
methods return the number of bytes written to the archive, including the Local File Header required to define the directory, but excluding the Central Directory Header data (that will be written when the sink is closed).
The sink keeps track of which directories have been defined and skips creating directories that already exist, as this example demonstrates:
using ZipStreams
zipsink("new-archive.zip") do sink
a = mkdir(sink, "dir1/") # returns the number of bytes written to the archive
@assert a > 0
b = mkdir(sink, "dir1/")
@assert b == 0 # dir1 already exists, so nothing is written
c = mkpath(sink, "dir1/dir2") # dir1 already exists, so do not recreate it
d = mkpath(sink, "dir3/dir4") # dir3 has to be created along with dir4
@assert d > c # the second call creates two directories, so more bytes are written
end
Opening a new file in the sink that contains a non-trivial path will throw an error if the parent path does not exist. The keyword argument make_path=true
will cause the method to create the parent path as if mkpath
were called first:
using ZipStreams
zipsink("new-archive.zip") do sink
try
f = open(sink, "dir1/file") # fails because directory "dir1/" does not exist
catch e
@error "exception caught" exception=e
end
f = open(sink, "dir1/file"; make_path=true) # creates "dir1/" first
# ...
close(f)
end
Relative directory names .
or ..
are interpreted as directories literally named .
or ..
and not as relative paths. The root directory of the archive is unnamed, so attempts to create a directory named /
will be ignored. Attempting to create an unnamed subdirectory will result in the unnamed subdirectory being ignored (e.g., mkpath(sink, "dir1//dir2")
will do the same thing as mkpath(sink, "dir1/dir2")
). By rule, attempting to make a directory that appears to begin with a Windows drive specifier, even on a non-Windows OS, will throw an error (per 4.4.17 of the APPNOTE document).
using ZipStreams
zipsink("new-archive.zip") do sink
@assert mkpath(sink, "/") == 0 # '/' at the beginning is ignored
mkpath(sink, "/dir1")
@assert mkpath(sink, "dir1") == 0 # already created with "/dir1"
mkpath(sink, "dir1/////dir2")
@assert mkpath(sink, "dir1/dir2") == 0 # already created with "dir1/////dir2"
try
mkpath(sink, "c:\\dir1") # fails because directory appears to start with a drive specifier
catch e
@error "exception caught" exception=e
end
try
mkpath(sink, "q:dir1") # fails for the same reason: the slash at the end doesn't matter
catch e
@error "exception caught" exception=e
end
try
mkpath(sink, "\\\\networkshare\\dir1") # fails because Windows network drives count as drive specifiers
catch e
@error "exception caught" exception=e
end
end
API
ZipStreams.ZipArchiveSink
— TypeZipArchiveSink
A struct for appending to Zip archives.
Zip archives are optimized for appending to the end of the archive. This struct is used in tandem with library functions to keep track of what is appended to a Zip archive so that a proper Central Directory can be written at the end.
Users should not call the ZipArchiveSink
constructor: instead, use the zipsink
method to create a new streaming archive.
ZipStreams.zipsink
— Functionzipsink(fname; [keyword arguments]) -> ZipArchiveSink
zipsink(io; [keyword arguments]) -> ZipArchiveSink
zipsink(f, args...)
Open an IO
stream of a Zip archive for writing data.
Positional arguments
fname::AbstractString
: The name of a Zip archive file to open for writing. Will be created if the file does not exist. If the file does exist, it will be truncated before writing.io::IO
: AnIO
object that can be written to. The object will be closed when you callclose
on the returned object.f<:Function
: A unary function to which the opened stream will be passed. This method signature allows fordo
block usage. When called with the signature, the return value off
will be returned to the user.
Keyword arguments
utf8::Bool=true
: Encode file names and comments with UTF-8 encoding. Iffalse
, follows the Zip standard of treating text as encoded in IBM437 encoding.comment::AbstractString=""
: A comment to store with the Zip archive. This information is stored in plain text at the end of the archive and does not affect the Zip archive in any other way. The comment is always stored using IBM437 encoding.
Passing an IO
object as the first argument will use the object as-is, overwriting from the current position of the stream and writing the Central Directory after closing the stream without truncating the remainder. This use of zipsink
is recommended for advanced users only who need to write Zip archives to write-only streams (e.g., network pipes).
Base.Filesystem.mkdir
— Methodmkdir(archive, path; comment="")
Make a single directory within a ZIP archive.
Path elements in ZIP archives are separated by the forward slash character (/
). Backslashes (\
) and dots (.
and ..
) are treated as literal characters in the directory or file names. The final forward slash character will automatically be added to the directory name when this method is used.
If any parent directory in the path does not exist, an error will be thrown. Use mkpath
to create the entire path at once, including parent paths. Empty directory names (//
) will be ignored, as will directories that have already been created in the archive.
The comment
string will be added to the archive's metadata for the directory. It does not affect the stored data in any way.
Returns the number of bytes written to the archive when creating the directory.
Base.Filesystem.mkpath
— Methodmkpath(archive, path; comment="")
Make a directory and all its parent directories in a ZIP archive.
Path elements in ZIP archives are separated by the forward slash character (/
). Backslashes (\
) and dots (.
and ..
) are treated as literal characters in the directory or file names. The final forward slash character will automatically be added to the directory name when this method is used.
If any parent directory in the path does not exist, it will be created automatically. Empty directory names (//
) will be ignored, as will directories that have already been created in the archive.
The comment
string will be added to the archive's metadata only for the last directory in the path. All other directories created by this method will have no comment. This does not affect the stored data in any way.
Returns the number of bytes written to the archive when creating the entire path.
Base.open
— Methodopen(sink, fname; [keyword arguments]) -> IO
Create a file within a Zip archive and return a handle for writing.
Keyword arguments
compression::Union{UInt16,Symbol} = :deflate
: Can be one of:deflate
,:store
, or the associated codes defined by the Zip archive standard (0x0008
or0x0000
, respectively). Determines how the data is compressed when writing to the archive.level::Integer = -1
: zlib compression level for:deflate
compression method, higher values corresponding to better compression and slower compression speed (valid values [-1..9] with -1 corresponding to the default level of 6, ignored ifcompression == :store
).utf8::Bool = true
: Iftrue
, the file name and comment will be written to the archive metadata encoded in UTF-8 strings, and a flag will be set in the metadata to instruct decompression programs to read these strings as such. Iffalse
, the default IBM437 encoding will be used. This does not affect the file data itself.comment::AbstractString = ""
: Comment metadata to add to the archive about the file. This does not affect the file data itself.make_path::Bool = false
: Iftrue
, any directories infname
will be created first. Iffalse
and any directory in the path does not exist, an exception will be thrown.
The Zip archive specification does not clearly define what to do if multiple files in the Zip archive share the same name. This method will allow the user to create files with the same name in a single Zip archive, but other software may not behave as expected when reading the archive.
ZipStreams.write_file
— Functionwrite_file(sink, fname, data; [keyword arguments])
Archive data
to a new file named fname
in an archive sink all at once.
This is a convenience method that will create a new file in the archive with name fname
and write all of data
to that file. The data
argument can be anything for which the method write(io, data)
is defined.
Returns the number of bytes written to the archive.
Keyword arguments are the same as those accepted by open(::ZipArchiveSink, ::AbstractString)
.
This method reads data
into a buffer before writing it to the archive. Both data
and the buffered (potentially compressed) copy must be able to fit into memory simultaneously.
ZipStreams.info
— Methodinfo(zipfile)
Return a ZipFileInformation object describing the file.