Archiving and Compression on Linux: An Introduction

Exploring the tar utility and standard compression tools on Linux

Introduction

Archiving is the process of combining multiple files into a single package called an archive. This archive can then be easily distributed to another machine, backed up in a repository somewhere, or simply kept on your own machine as a way to organize and cleanup your file system. Archives are also an essential component of the Linux ecosystem, because all the software you install via your distribution’s package manager will initially be downloaded as a compressed archive from a remote repository. Therefore, working with archives is an important aspect of using a Linux based operating system effectively.

This article will introduce the standard archiving tool used on Linux — which is the tar software utility — and demonstrate its usage on the command line to create and work with archives, or tarballs. The tar utility is also able to compress an archive via a compression tool. This usage pattern of firstly creating an archive and then compressing it using a compression tool is often adopted for distributing packages remotely.

Regular files being added to an archive with tar and then compressed with gzip.

Compressing an archive can reduce its storage space requirements and is an important step in preparing it for distribution. With these points in mind, this article will also introduce the major compression tools on Linux, and discuss their usage both with tar and individually on the command line. More specifically, we shall take a look at the gzip, bzip2, xz and zstd compression tools.

Tar was initially released in January 1979 and has since evolved into the standard file archiver on Linux. As of writing this article in November 2020, the latest stable version of tar was released on February 23, 2019 and therefore is still in active development to this day.

If tar is not on your system already, go ahead and install it via your Linux distribution’s package manager:

$ sudo pacman -S tar      # Arch
$ sudo apt install tar # Debian and Ubuntu
$ sudo dnf install tar # CentOS and Fedora

I won’t dive into the intricacies of how tar works under the hood too much in this article, but I will mention some important points to be aware of as we progress through the commands in subsequent sections.

There are quite a few compression tools available on Linux, with each one implementing a specific compression algorithm. The compression tools introduced in this article all have a very similar command line interface and usage pattern, which means that if you know how to use one of the tools, you can use them all.

Go ahead and install the gzip, bzip2, xz and zstd compression tools via your Linux distribution’s package manager:

$ sudo pacman -S gzip bzip2 xz zstd            # Arch
$ sudo apt install gzip bzip2 xz-utils zstd # Debian and Ubuntu
$ sudo dnf install gzip bzip2 xz zstd # Fedora

A later section in the article will describe each compression tool in more detail and present some useful command line options as well. For now, the upcoming table provides an overview of the four compression tools we just installed. Also take note of the Extension column since we will need to specify one of these file extensions in order to compress files with the associated compression tool via tar:

An overview of the standard Linux compression tools.

Why do we need to know about so many compression tools? Well, there are a few reasons. Even if package managers adopt a modern compression tool such as zstd or xz, there will still be a plethora of packages and repositories out there compressed with the older gzip or bzip2. In order to decompress and access the files in these archives, the associated compression tool is required. In addition, you may find that an older compression tool such as gzip or bzip2 is sufficient for your needs.

Environment Setup

This section will demonstrate a couple of simple command line tools that will enable us to identify a file’s type, as well as output a file or directory’s size in a human readable format. These tools might come in handy for inspecting the various archive formats we’ll be working with. I’ll also provide some instructions for setting up a working directory and how to acquire some wallpaper files for readers who wish to follow along with the article and use the same commands.

Feel free to skip this section if you want to go straight to the archiving and compression commands.

The file command outputs a file’s type and should already be installed on your Linux system. To invoke the tool, simply pass the name of a file you want to inspect, such as a JPEG image:

$ file Acrylic.jpg# Output
Acrylic.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, progressive, precision 8, 5403x3602, components 3

We can use the file command to confirm the type of archives, compressed files and regular files on the system. Feel free to use it if you ever get confused on what type of file you’re working with.

The du command (Disk Usage) outputs the disk space usage of files and directories on a system. Similar in syntax to the file tool, simply specify the files and directories you wish to know the size of:

# Find size of two JPEG files
$ du Acrylic.jpg ColdWarm.jpg
# Output
4516 Acrylic.jpg
3512 ColdWarm.jpg

By default, du outputs the size of files in kilobytes — 4516 and 3512 in the example above. The -h flag outputs the file size in a more human-readable format, while the -c flag displays the total size of the files as well:

$ du -hc Acrylic.jpg ColdWarm.jpg# Output
4.5M Acrylic.jpg
3.5M ColdWarm.jpg
7.9M total

File sizes are now displayed in megabytes and have an M appended to the output. The total size of the files is also displayed at the bottom of the list. In addition to files, directories can also be passed to du. Feel free to check out the manual pages and have a play with the tool before continuing.

Let’s go ahead and set up a working directory for us to try out the archiving and compression commands demonstrated in the subsequent sections:

# Create a directory in the home folder and enter it
$ mkdir ~/archive-demos
$ cd ~/archive-demos

From here, copy over some wallpapers from your /usr/share/backgrounds/ directory. If you do not have any wallpapers on your system, feel free to download the gnome-backgrounds package, which provides some nice wallpapers for your desktop. This package will require about 40MB of space on your system:

$ sudo pacman -S gnome-backgrounds

Then copy the backgrounds into your working directory:

$ cp -fr /usr/share/backgrounds/gnome ~/archive-demos

Now we have a place to work along with some files, let’s take a look at some tar commands in the next section.

An Archiving and Compression Tutorial

Now that we understand what a file archiver and a compression tool is, let’s take a look at some commands to solve a few real-world scenarios.

When working with tar on the command line, you will often use the same command line options for achieving a specific task — such as creating and extracting archives, as well as listing files within an archive. These frequently used options are detailed here:

Frequently used options when using tar on the command line.

The tar utility also provides several options that enable filtering archives through specific compression tools. For example, if you wish to compress an archive with gzip, we would pass the z option along with the aforementioned cf options. The following table lists the filtering options available to tar that we shall use in this tutorial:

Tar options for filtering archives through the standard compression tools.

There are many more filtering options available to tar on the command line, so feel free to explore the manual pages after reading this article.

Moving forward, let’s enter our working directory and create some archives of desktop wallpapers by executing these commands in the terminal:

$ tar cfv wallpapers.tar gnome/
$ tar czfv wallpapers.tar.gz gnome/ # gzip
$ tar cjfv wallpapers.tar.bz2 gnome/ # bzip2
$ tar cJfv wallpapers.tar.xz gnome/ # xz
$ tar cfv wallpapers.tar.zst gnome/ --zstd # zstd
$ ls

The time it takes to create the compressed archive will differ depending on the filter passed to tar — this is because each filter implements a different compression algorithm. On my machine, the xz filter took the most time to compress the files, whilst the zstd filter performed the fastest.

Let’s list the files in the archives we’ve just created by using the -t option:

$ tar tfv wallpapers.tar
$ tar tzfv wallpapers.tar.gz # gzip
$ tar tjfv wallpapers.tar.bz2 # bzip2
$ tar tJfv wallpapers.tar.xz # xz
$ tar tfv wallpapers.tar.zst --zstd # zstd

Tar needs to firstly decompress the archive before retrieving the file names. Notice that xz’s decompression algorithm performs much faster than its compression algorithm.

It is also possible to list the individual files and directories within an archive:

$ tar tzfv wallpapers.tar.gz gnome/Acrylic.jpg gnome/Wood.jpg

If you intend to extract an archive, It is good practice to firstly list the archive’s files before performing the actual extract operation. For example, if you mistakenly downloaded a huge archive and immediately extract all the files before checking it’s content, you could be in some trouble!

The Vim and Emacs text editors are also able to list the files contained in an archive. To list the files in one of our archives with Vim, run this command:

$ vim wallpapers.tar.xz
Listing the contents of the wallpapers.tar.xz archive with vim.

Vim’s syntax highlighting and search features might make life easier when browsing through an archive’s files.

Let’s clean up our working directory by placing all the archives we’ve created in a nested directory called example-01:

$ mkdir example-01
$ mv wallpapers.* example-01/

Thus far, the parent gnome/ directory has been added to our archives. What if we want our archive to only include image files? The -C option solves this problem as it allows us to move into a directory before performing an operation with tar:

$ tar cfv wallpapers.tar -C gnome/ $(ls gnome/)
$ tar czfv wallpapers.tar.gz -C gnome/ $(ls gnome) # gzip
$ tar cjfv wallpapers.tar.bz2 -C gnome/ $(ls gnome) # bzip2
$ tar cJfv wallpapers.tar.xz -C gnome/ $(ls gnome) # xz
$ tar cfv wallpapers.tar.zst -C gnome/ $(ls gnome) --zstd # zstd

Files can also be deleted from an uncompressed.tar archive with the -delete option. Therefore, we must firstly decompress an archive before attempting to delete any of its content:

  # rename wallpapers.tar to prevent it being overwritten
$ mv wallpapers.tar wallpapers-original.tar
# decompress the .bz2 archive with -d flag
$ bzip2 -d wallpapers.tar.bz2
# delete Wood.jpg and Acrylic.jpg from the archive
$ tar f wallpapers.tar --delete Wood.jpg Acrylic.jpg
# compress the archive again with bzip2
$ bzip2 wallpapers.tar
# list content of the wallpapers.tar.bz2 archive
$ vim wallpapers.tar.bz2

Notice that we had to use the bzip2 program on the command line to decompress the wallpapers.tar.bz2 archive. We then deleted the files and compressed it with bzip2 once again. Conversely, files are appended to an archive via the -r option:

  # decompress the .bz2 archive with -d flag
$ bzip2 -d wallpapers.tar.bz2
# add Wood.jpg and Acrylic.jpg back into the archive
$ tar rfv wallpapers.tar -C gnome/ Wood.jpg Acrylic.jpg
# compress the archive again with bzip2
$ bzip2 wallpapers.tar
# list content of the wallpapers.tar.bz2 archive
$ vim wallpapers.tar.bz2

To extract files from an archive we use tar’s -x option. If we extract a file into a directory with an identical file already there, that original file will be overwritten. It is therefore important to be aware of the location you plan to extract files to and the possibility of any files being overwritten. Let’s go ahead and extract one of the archives in our working directory:

$ tar xzfv wallpapers.tar.gz  # extract files in gzip archive
$ tar tzfv wallpapers.tar.gz # confirm files still exist in archive
$ ls # list working directory

All of our images were extracted in our working directory! This isn’t very clean — it would be useful if we could extract all the files into a separate folder. This is possible if we specify the one-top-level option:

  # remove the files we just extracted
$ rm *.jpg *.png *.xml
# create an empty directory
$ mkdir top
# extract files into the top/ directory and list
$ tar xzfv wallpapers.tar.gz --one-top-level=top
$ ls top

Our extracted files were all written into the top directory. If you don’t assign a directory name to the --one-top-level option, files will be extracted into a directory with the same name as the archive:

$ rm -fr top
$ tar xzfv wallpapers.tar.gz --one-top-level
$ ls wallpapers

To extract individual files and directories from the archive, simply append the items you’d like to extract to the end of the command:

$ tar xfv wallpapers.tar.zst --zstd Wood.jpg        # zstd
$ tar xJfv wallpapers.tar.xz Acrylic.jpg Road.jpg # xz

We have built up enough knowledge to work with our own compressed archives in Linux. Let’s go ahead and clean up our working directory before moving onto the last example:

$ rm *.jpg
$ mkdir example-02
$ mv wallpapers.* wallpapers example-02/

Lastly, we will look at how to append a tarball to another tarball using the -A option:

  # create two archives and list their contents
$ tar cfv foo.tar -C gnome/ Acrylic.jpg Road.jpg
$ tar cfv bar.tar -C gnome/ Dark_Ivy.jpg Wood.jpg
$ tar tfv foo.tar && tar tfv bar.tar
# append bar.tar to foo.tar and list the contents again
$ tar Afv foo.tar bar.tar
$ tar tfv foo.tar && tar tfv bar.tar
# compress archive with gzip
$ gzip foo.tar

Notice that tar doesn’t allow you to append compressed archives. You will need to decompress the compressed archives you wish to combine before performing the actual append operation. After appending bar.tar to foo.tar in the above example, foo.tar is compressed with gzip, producing foo.tar.gz.

After compressing a file with gzip, the original uncompressed archive is discarded. Specify the -k option (keep) to keep the uncompressed version on the filesystem.

Lets wrap up the tutorial by cleaning up the working directory one more time:

$ mkdir example-03
$ mv foo.tar.gz bar.tar example-03/

Feel free to experiment with some additional tar options presented in the following table:

Some other useful tar command line options.

I also encourage you to check out the tar manual pages, which will certainly be your best resource for finding new information about the tool:

$ man tar

A Closer Look at the Compression Tools

This section serves as a quick reference for readers wishing to familiarize themselves on the command line interface of our adopted compression tools.

Command Line Reference — gzip

Some useful options for the gzip program on the command line.
Some useful options for the bzip2 program on the command line.
Some useful options for the xz program on the command line.
Some useful options for the zstd program on the command line.

In Conclusion

This article provided an introduction to working with archives on Linux using the tar file archiver and some standard Linux compression tools — including gzip, bzip2, xz and zstd. I hope the article has provided enough information and hands-on examples such that you can now comfortably create and manage your own archives on Linux.

From here, I encourage you to read further into the manual pages of tar and the various compression tools available on Linux, as well as looking into the pros and cons that come with using the individual compression algorithms.

MSc. Programmer and fan of open source software.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store