Saving Images to AWS S3 Scriptomagically

Whilst I’ve been messing around creating boot images, I’ve hit against the problem of needing to archive off some large images for later use.  Now I’ve finally got access to a high-speed bandwidth internet link, I’m can back stuff off to Amazon’s AWS S3 cloud in a reasonably timely fashion.

s3cmd does a great job of interfacing with AWS from a Linux CLI, but is designed to deal with precreated files, not stuff that is dynamically made.    When you’re talking about multi-gigabyte files, it isn’t always an option to make a local archive file before pushing it to the remote storage location.

I’m used to using pigz, dd and ssh to copy files like this, and wanted to achieve something similar to s3, however there doesn’t seem to be many guides to achieving this.  I have however made it work on my debian based distro relatively easily.


This is the tooling I combined


You need a recent version of s3cmd to make this work – v1.5.5 or above is apparently what supports stdin/stdout which you’ll need.
As of writing, this can be obtained from the s3tools git repository @
You’ll need git and some python bits and pieces but building was straightforward in my case.

Before you start, make sure you setup s3cmd using the command s3cmd –configure


I use pigz, although you can use gzip to achieve the same thing.  For those that don’t know, Pigz is a multi-threaded version of gzip – it offers much better performance than gzip on modern systems.


tar is pretty much on every linux system, and helps deal with folder contents in a way that gzip/pigz can’t.


The command I built is as follows:

tar cvf – –use-compress-prog=pigz /tmp/source/directory/* –recursion –exclude=’*.vswp’ –exclude=’*.log’ –exclude=’*.hlog’ | /path/to/s3cmd put – s3://bucket.location/BackupFolder/BackupFile.tar.gz –verbose -rr

I think its pretty self explaintory, but I’ll run through the code anyway…

tar cvf = tar compress verbose next option is a file
– = stands for stdout in tar parlance 
–use-compress-prog=pigz = self explaintory, but you can probably swap this for any compression app which supports stdout. 
/tmp/source/directory/* = the directory or mount point where your source files are coming from
–recursion = recurse through the directorys to pickup all the files
–exclude=’*.vswp’ –exclude=’*.log’ –exclude=’*.hlog’ = exclude various file types (in this instance, I was backing up a broken VMFS storage array 
= breaks the input to the next app
/path/to/s3cmd = the directory where s3cmd resides – in my instance, Id installed git repository version
put = send to s3; put works with a single file name.
– = use stdout as the source  s3://bucket.location/BackupFolder/BackupFile.tar.gz = the s3 bucket and path where you want the output stored
–verbose = debugging verbose output and status tracking
-rr = reduced redundancy storage – less expensive than full redundancy, and you can include/exclude this based on your needs.

 The biggest problem with this is you don’t really get an idea of how long a backup will take. s3cmd splits the file into chunks, but you don’t know how many chunks it is until the process has completed.  I average around 6 MB/s but a multi-gigabyte file can still take several hours to upload.  Whilst I didn’t time it exactly,  a 70GB file, compressed to 10GB, took around 90mins to send to s3.
You may want to leave your backup running in a screen session.

Wierd goings on with a Live CD and

I’ve been messing around trying to make a live-CD with some transcoding/ripping utilities built in to utilize some of the spare hardware I’ve got lying around. More on this later, but I’ve been reworking the guide @ with my own utilities and tools.

One problem I’ve been challenged with over the last couple of days is HandBrake-Cli bombing out with the message:

[email protected]:/mnt/Videos/Movies/dvdrip/91# HandBrakeCLI -i BHD.iso -o BHD.mkv –preset=”High Profile”
[20:41:41] hb_init: starting libhb thread
HandBrake 0.9.9 (2014070200) – Linux x86_64 –
4 CPUs detected
Opening BHD.iso…
[20:41:41] hb_scan: path=BHD.iso, title_index=1
index_parse.c:191: indx_parse(): error opening BHD.iso/BDMV/index.bdmv
index_parse.c:191: indx_parse(): error opening BHD.iso/BDMV/BACKUP/index.bdmv
bluray.c:2341: nav_get_title_list(BHD.iso) failed
[20:41:42] bd: not a bd – trying as a stream/file instead
libdvdnav: Using dvdnav version 4.1.3
libdvdread: Missing symbols in, this shouldn’t happen !
libdvdread: Using libdvdcss version  for DVD access
Segmentation fault

This has been bugging me, as it worked before I converted the image to a livecd.  I wondered if it was some kind of problem with the lack of ‘real’ disk space, or a lack of memory or something like that, but nothing I could find would identify it.

Finally, I started looking into libdvdcss rather than HandBrake itself.  I think what confused me is the symbols error looks like a warning, especially given that there is a follow-on message which looks like libdvdcss is continuing.  Anyway, eventually!   I ran an md5sum on the file to see if it matched a non-live machine (to a virtually identical build).

[email protected]:/# md5sum /usr/lib/x86_64-linux-gnu/
4702028ab20843fd5cb1e9ca4b720a72  /usr/lib/x86_64-linux-gnu/

N.b. is symlinked to in my current Debian sid based build.

On the donor machine
[email protected]:/usr/lib# md5sum x86_64-linux-gnu/
c9b314d9ed2688223c427bc5e5a39e6f  x86_64-linux-gnu/

So I’ve SCPd the source file into the live machine, checked the md5sum matched the donor machine (it did), and repeated the HandBrake job.  Lo and behold it worked!  So I’ve restreamed the two files into the filesystem and success, it just works.
So I don’t know if something funky happens when the image is created using the link, but actually its quite easy to fix once you understand the problem.

Hope this helps someone,  and I’ll be back soon with more details about building a live image, then booting it using iPXE.