Skip to content
Theme:

My Amazon S3 photo backup solution

Far, far away, someone, somewhere said:

“There are two kinds of people, those who back up their data and those who have never lost all their data.”

Luckily for me, I have never been a victim of a situation where I lost all of my data simply because I do backups regularly. I never do a full backup of my machine though. I can download an operating system in a few minutes, restore my system preferences with a single click, install all my frequently used apps using a single command, pull all of my projects from GitHub and listen to music on my Technics SL-1200 or stream it from Apple Music. The only thing that I keep backed up is my photo collection.

My backup strategy in a nutshell

Since May 2007, I have kept all of my photos in a well-organized collection, ordered chronologically by year and by session/event. I maintain the same habit for all of my pictures taken on my iPhone as well. It’s not an enormous amount of data (around 200GB), but the sentimental value it holds is immense.

No matter what, I always store this collection on two physical devices. It can be my computer’s hard drive, an external flash disk, NAS server or a RAID array. Currently I use two totally average external hard drives by Segate. I am the happy owner of a superb Sony α7R III that shoots 80 megabyte ARW files. Taking that into consideration I’ve realised that I may run out of storage on these hard drives very quickly, but for now they do the job.

However, things happen! Disks fail, people rob, rivers flood, comets fall. In case any of that occurs, I need one more copy in the cloud. I have tested multiple solutions and services over the past few years, and finally, I feel that I have found something that is going to stick around. Although making a backup to a local hard drive is fairly easy and straightforward, cloud backups are way more complicated. Luckily, I am here to help you out.

What I consider to be a good cloud backup and things that I don’t care about

There are plenty of services that offer cloud storage for amateur and professional photographers. Dropbox, Google Drive, Box, OneDrive or Backblaze just to name a few.

There are a few key things that I need to get out of my cloud backup solution: security first! There is a reasonable chance that my collection will grow over time, so auto-scaling and unlimited storage resources are another must-have. New services show up and vanish often, and I am really not interested in investing my time in solutions that may not be around tomorrow. Also, price is an obvious factor, of course.

The providers listed above usually offer tons of things that I simply don’t care about. I don’t need a fancy app with tons of bells and whistles. I don’t need constant live sync and seamless integration with my OS. It is a last resort backup—the file structure is probably never going to change. I will just add more stuff over time.

I am here today not to compare the available options or convince you to use one over the other. I spent years looking for a solution that suits my needs, and I would like to share it with you.

Say hello to AWS Simple Storage Service (S3)

AWS (Amazon Web Services) is a platform that offers a number of things that your business or you, as an individual, may need. From computational power, through to database storage, content delivery networks to machine learning and IoT (Internet of things) related products. A storage solution is one of the many services that AWS has to offer. It is well established and proven by the mile-long list of clients like: Adobe, AirBnb, Netflix, NASA, SoundCloud, Canon, GoPro… The list goes on and on.

You may have heard the opinion that AWS is complicated to use. In reality, it is indeed complex, but being in a band doesn’t require playing all the instruments—just mastering a single one. Storage is what we need.

AWS has a variety of storage solutions in its product list. From basic options like Amazon Simple Storage Service (S3) to the AWS Snowmobile – a 45-foot long shipping container pulled by a truck for transferring extremely large amounts of data (up to 100PB). What we require is a data bucket stored in an S3 bucket and its smooth transition to the Glacier class using lifecycle policies. Allow me to explain.

What is S3 and how it works

Amazon S3 is a simple storage solution that offers a range of classes designed for specific use cases. For frequently used general storage, use S3 Standard. Infrequent Access works best for files that you don’t have to access very often but still need to keep accessible whenever you need them. For archiving purposes, Glacier Deep Archive is the best option. Each of these categories comes with pros and cons, and each of them suits different needs. The main differences between them are price and waiting time to access objects (photos in our case). For those who are curious, I would direct you to Marc Trimuschats’ presentation from the AWS Summit 2017, Deep Dive on Object Storage, which tells you everything you need to know.

Amazon S3 storage classes

Essentially, files stored in the hot storage (S3 Standard) are accessible immediately but they will cost you a fortune ($0.021-0.023 / GB). Cold storage (Glacier Deep Archive) on the other hand is extremely cheap ($0.00099 per GB) but a file restoration can take from 1 minute up to 12 hours. You will be charged for each GB retrieved from the cold storage cluster too. The pricing may vary a bit depending on the region of your S3 “bucket”.

Privacy of files is something that we can easily control with S3. If you want to make a file public or private, no more than a single click is needed. Lifecycle policies help us to create a set of rules that invisibly migrate files between storage classes. I utilised the power of this feature to migrate all the files imported to the Standard bucket to Glacier the next day.

How to

I mentioned before that AWS is complicated to use, but I hope that this step-by-step guide can make things easier for you. The S3 storage may actually be one of the easiest-to-use services from the humongous number of products in the AWS portfolio.

Start with creating a free AWS account. This process requires you to add a credit card to your account and authorize it through a phone call you will receive from Amazon’s bot. It is worth mentioning that you are eligible to use the Free Tier, which gives you access to a snippet of AWS features completely for free. You can end this process here, but I would strongly suggest looking at the IAM (Identity and Access Management) best practices. Personally, I use my “root” account only for billing purposes and user management. For using AWS services, I created an IAM user with sufficient permissions for my everyday tasks—security first. Read more about the recommended way of using the AWS platform in the AWS Identity and Access Management Documentation. The Getting Started with Amazon Web Services webinar is another helpful resource to start with.

“When you first create an AWS account, you begin with a single sign-in identity that has complete access to all AWS services and resources in the account. This identity is called the AWS account root user and is accessed by signing in with the email address and password that you used to create the account. We strongly recommend that you do not use the root user for your everyday tasks, even the administrative ones. Instead, adhere to the best practice of using the root user only to create your first IAM user. Then securely lock away the root user credentials and use them to perform only a few account and service management tasks.”

Our account is ready to use and now secure. It is time to create the first storage “bucket” under the S3 section. Use a unique name for your bucket and choose a location. Do not make my mistake and pick the region that is the cheapest, not the closest to you. Make a wise decision at this point because you won’t be able to change those details later on. After choosing a name and region, keep the remaining settings as they are by default and click the “Create bucket” button.

Create Amazon S3 bucket

Now we need to configure lifecycle policies to automatically migrate files from the Standard storage class to Glacier Deep Archive. Ideally, the transition should happen as soon as possible. To set it up that way, click on the name of the bucket created in the previous step and navigate to Lifecycle rules under the Management tab. Click the “Create lifecycle rule” button to define a new rule. Add a meaningful name to your rule and navigate further to the Transitions section. For the current version of your files, create a rule that moves the file to Glacier after one day. We don’t need to tweak settings for the previous versions because we didn’t enable file versioning in the first place (you don’t need that for backups). Click next to the Expiration tab just to keep it as it is (we really don’t want our files to be removed) and proceed to the next tab — Review. Make sure that you are happy with all the settings in the last step and save the rule. We are done!

Create Amazon S3 lifecycle policy

GUI or not

Although the S3 web interface is very user-friendly and fast, you may be interested in using a GUI (graphical user interface) tool to send files to your bucket. Luckily, there are a lot of tools out there that let you access your Simple Storage Service easily. As a macOS user, my personal preference is ForkLift 3. Transmit 5 is another app for the Apple system that has garnered a great reputation. Maybe Cyberduck? FileZilla Pro and S3 Browser could be good options for Windows users. Play around with the available options and let me know about your preferred way to interact with S3 objects.

Using Forklift 3 with Amazon S3

Happy backing up

I am very happy with this solution, and it works well for me. I managed to reduce the cost of my digital backups from £8 per month to less than £1. I have a reliable and secure copy of my files, and a great system in place that hopefully will serve me for the long term. Let me know about your backup strategy in the comments below. If you have any questions or need further clarification on anything in this post, I am always eager to help. Happy backing up!

One 27 March 2019, Amazon announced a Glacier Deep Archive which is even more cost effective storage class that perfectly suits my needs.

Comments

  • j
    joshuafuglsang

    Nice I'm glad I found this article, as I have been wondering about the viability of using S3 as a backup solution for some time.

    I just have some questions for implementation:
    1) Do you have any suggestions on the structure of buckets? For example, do you have a seperate bucket for each year, or each month in a year, etc.?
    2) Is there any benefit of choosing servers geographically closest to you, or should I just choose the cheapest one (N. Virginia). It would seem that the point is never to access the data unless in an emergency. So the server proximity would be irrelevant?
    3) Do you have any trouble uploading large files, such as a video that is several gigabytes? Would aws cli serve to sync large files (`aws s3 sync . s3://whatever`)?
    4) Roughly how much data are you storing, is it terabytes? You said 200gb at the start, but is that still true with your a7r3? Is the system scaling to handle this extra data? What kind of upload speed do you have to handle this? I have lots of data ...

    Sorry for all the questions!

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      Hi.

      I am glad that you found it useful. I am more than happy to answer to your questions.

      1.
      I don't have any order or particularly complicated file structure. I keep everything in one bucket. This one is split by years, and inside each years directory I have a directories with sessions. Like so:


      Photos backup
      - 2016
      - - 2016.01.01 - Session 1
      - - 2016.02.02 - Session 2
      - 2017
      - - 2017.01.01 - Session 1
      - - 2017.02.02 - Session 2

      2.
      Totally go for the cheapest one in this case. There is no reason why I picked my local one apart from habit. All my S3 instances are in London ¯\_(ツ)_/¯

      3.
      Cannot help with this one because I have never tried sending big files like this. Sorry.

      4.
      I currently have around 250GB. I add new folders every so often and it scales really well for me. My sessions folders are not huge tho, between 3GB to 15GB. It goes really quick for me. Cannot give you a number but I can give approx comparison — Dropbox upload is million times slower than this one. Speed of this solution is not a concern for me at all.

      I am more than happy to help further if you have more questions. Have a lovely day :)

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
  • m
    mark

    Excellent article. I was attending a work course on AWS when it got me to thinking about using something like this as Cloud Storage for my RAWs and I came across your article.

    As I assign keywords to all my images at time of import into Lightroom - Is it possible to search for a particular image by Keyword in AWS ?

    Also I have my images placed in Folders which are contained inside a master folder. is it possible to import the folder structure or is it just the files ?

    thanks in advance.

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      Hi.

      I am glad that my articled helped you out. Unfortunately I am not able to answer your questions from multiple reasons. I use S3 purely for archive and I have no clue about Lightroom. I am a Capture one user.

      I am more than happy to help you with further questions if you have any.

      Have a nice day 🥑

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
    • T
      Tamas

      Yes, you can create folder structure. You can add also tags and additional properties to the uploaded objects but I'm not sure you can use them for searching objects.

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
  • T
    Tamas

    Thank you for sharing this. I'm about to do something very similar. At least regarding backing up raw or original photos.

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
  • W
    Warren

    Are prices prorated, or do you get charged for a full month? For example, 1TB would cost $23/month in Standard, and $4/month in Glacial. Do you end up paying $23 for the 1 day your photos sit in Standard storage?

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      Hi.

      It is hard to understand your question, but I can assure you that currently I store there about 300GB of data and my highest bill from AWS so far was £1.03 for a month. Hopefully this helps you to understand the costing a bit better.

      Thanks.

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
      • W
        Warren

        Prorated means that you only pay for the time you use. So when you put files into storage, are you being charged for 1 day of Standard storage, or are you charged for a full month of Standard storage?

        👆 you can use Markdown here

        Your comment is awaiting moderation. Thanks!
        • Pawel Grzybek
          Pawel Grzybek

          The way how I set it up is described above. All the files that I upload to standard S3 are kept like that for a month. After that time, they migrate to Glacier. You can customise it tho and almost instantly send them to Glacier if this workflow suits you better.

          👆 you can use Markdown here

          Your comment is awaiting moderation. Thanks!
          • W
            Warren

            OK, thanks for getting back to me.

            👆 you can use Markdown here

            Your comment is awaiting moderation. Thanks!
  • A
    Acaminero New York

    Is S3 Glacier best solution to storage family photos?

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      When it comes to family photos storage, privacy should be one of the main concerns and Amazon Glacier is fantastic at security as long as it is correctly set up.

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
  • D
    Dave

    This is awesome! Have been meaning to streamline my backup solution for aaaaages and finally I have something simple and robust. Thanks! 🙌

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      I am glad that you like it. It still works amazingly well for me to this day. Good luck :)

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
  • J
    Jayphen

    I'm a little confused — why is it that you upload to S3 & then have it transfer to Glacier? Is there some particular reason to do this rather than use something like Arq to transfer directly to Glacier?

    I think I understand after looking at it a bit more… it's not possible to directly back up to Glacier… I think!

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      To be honest I have no clue what Arq is but my reason to do so was ease of use and S3 integration with GUI that supports this protocol. I use app called ForkLift but there are some others like Transmit (super cool looking but too expensive for my needs).

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
      • J
        Jayphen

        Arq is just an automated backup tool for macOS, which handles both backing up and restoring from S3/Glacier. After reading up on it more, it's still necessary to back up to S3 and then use a lifecycle policy to migrate it.

        Arq is worth checking out :) It would save you the effort of manually using Forklift, and it can do incremental backups hourly for you.

        👆 you can use Markdown here

        Your comment is awaiting moderation. Thanks!
        • Pawel Grzybek
          Pawel Grzybek

          This is interesting and definitely worth looking at for some more automation freaks than me. I am kinda happy with my very manual process. I do it very rarely so automating this would be an overkill in my case. For professional photographers it is a really fantastic tool!

          👆 you can use Markdown here

          Your comment is awaiting moderation. Thanks!
  • b
    blank

    Coming to this discussion after some research on Glacier Deep. One of the things you don't mention is if the pricing for S3 Standard is prorated (a comment below asks, but not necessarily in a clear way). If I understand correctly, if you upload a file to S3 standard, you pay for a month's worth of its storage, regardless of whether you lifecycle it to Glacier 1 day later or 30 days. Has this been your experience? Or are you charged a prorated cost for S3 Standard depending on how long the file "sits" there?

    Another point: it is apparently possible to upload directly to Glacier Deep (PUT costs more, but you presumably save the cost of S3 Standard). The flipside is that, apparently, it is command-line only. Have you tried this? etc :)

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      Hi.

      With me that was the case. I paid for the first period for a standard storage class and then for glacier then it change the class after a period of time.

      There is an option to create a Glacier / Deep Archive bucket and put items directly to it. As far as i know it comes with some restrictions though. You cannot use any GUI clients because s3 protocol doesn't have access to Glacier objects. Back when I was setting this system up it was a big restriction for me. Things may changed since then though.

      Thanks for reading and good luck :)

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
      • L
        Lev
        There is an option to create a Glacier / Deep Archive bucket and put items directly to it.

        Do you mean S3 Glacier service (which operates with Vaults and Archives), as opposed to a Glacier-class object in an S3 bucket? If so, you are correct, it doesn't have a GUI client and all interactions are strictly through API calls. Basically, Glacier class storage in S3 works by S3 itself interacting with S3 Glacier via API (https://stackoverflow.com/a....
        It means that the S3 Glacier is more specialised service and is probably meant for API integration, rather than direct access by users.

        👆 you can use Markdown here

        Your comment is awaiting moderation. Thanks!
    • L
      Lev

      About uploading to Glacier: I can confirm that CloudBerry Backup (which seems to be the most popular S3 Windows backup client and which is, thankfully, free for personal use) can upload directly to any class, including both Glacier and Glacier Deep Archive. I couldn't find any upload settings for S3 Browser, though, as it seems to upload to a bucket default storage class. In any case, uploading directly to Glacier is not limited to command line.

      I may be wrong, but in my understanding prorated means that you pay only for the time you store an object in S3. So in Pawel's case he will be billed for 1 day of Standard class storage and the rest will be for Glacier class.

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
  • L
    Lev

    Great post! If I found it earlier, it could have saved me some time reading AWS docs and watching lengthy youtube videos :) Still, using IAM properly is something I haven't gotten into yet.

    One important thing I would add is AWS has very flexible server-side (and client-side) encryption settings for protecting data at rest. You can either manage keys yourself via using client-side encryption and uploading already encrypted data, or let AWS manage encryption server-side. The latter has at least two options: generate and use your own encryption keys via AWS KMS (Key Management Service), or let AWS manage the encryption automatically in the background without you needing to do anything (https://youtu.be/VC0k-noNwO....

    I'm not sure the server-side encryption is on by default though, and I think I had to specifically enable it in bucket settings, but it's there and it's as simple as turning it on. This way your girlfriend's photos are safe from anyone's, even the employee's eyes! :)

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
  • G
    Grumpy Old Gamer

    Thanks for the helpful article. I was looking into AWS for work and likewise thought maybe should expand my backups to use AWS rather than just a NAS and on site backup. However one point to consider is the fact that I got numerous warnings about how Glacier and Glacier Deep Archive are charged. Not only per request, but also per file, additional storage is required and it is very very hard to work out just what this means as far as $$ is concerned. Also I read each type has a minimum time of storage and you will be charged if you delete earlier than that time, which for GDA is 6 months. I have over 100,000 photos I was going to simply replicate to S3 and then push to GDA. I am now thinking that is a very bad idea and that I should perhaps zip them up or something but that creates additional problems around how many versions do you keep, how often you update, when do you delete, etc. I suggest that is why "simpler" options like Dropbox, Google, Drive, etc are popular as you just buy a bucket of storage and can upload, add, change, etc as much as you like without fear of racking up a large bill. If you know what you are doing and can understand the complicated charging model then perhaps S3 & AWS might work out cheaper.. but it could cost you a lot more as well!

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      Hi. Thanks for reading.

      Pricing so far is working very good for me. I currently store 10+ years of photos taken on my iPhone and professional camera and my monthly bill is never more expensive than £0.45.

      Good advice though. It is worth to do the math before investing time and effort to set system like that up.

      Have a great day 👍👋

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
      • L
        Lev

        Have you done any large volume (exceeding 10GB per month free tier limit) data retrievals so far? If so, roughly how much did it cost you?

        👆 you can use Markdown here

        Your comment is awaiting moderation. Thanks!
        • Pawel Grzybek
          Pawel Grzybek

          I have never done it before and hopefully I will never have to. I use Glacier Deep Archive for deep archive :) This is just a last resort backup option for me.

          👆 you can use Markdown here

          Your comment is awaiting moderation. Thanks!
          • D
            Dmitry

            You will be surprised how costly it would be to get your backup back from S3. All offerings have the transfer out priced per GiB and it is not cheap. E.g., assume you have 1TB in Glacier Deep Archive and a disaster hit, so you need to retrieve that 1TB back -- be prepared to pay at least $120 (1024GiB * (Retrieval cost + Data Transfer out)). I learnt it the hard way for retrieving 5TB :)

            👆 you can use Markdown here

            Your comment is awaiting moderation. Thanks!
            • Pawel Grzybek
              Pawel Grzybek

              This is a very valid point. As said before, My fingers crossed that I will never have to do it 🤞

              👆 you can use Markdown here

              Your comment is awaiting moderation. Thanks!
              • D
                Dmitry

                Does it mean that you don't have any assurances that the backup you are creating are actually good and can be relied upon? :). Personally, I would be uncomfortable keeping a "last resort" backup without at least an integrity check from time to time to ensure that it is not a dud. I think in your case, it is easier -- you can just request a random file retrieval to confirm that that particular file is retrievable. In my case I had a binary encrypted blob of my local disk image, so doing backup verification was a really costly operation.

                👆 you can use Markdown here

                Your comment is awaiting moderation. Thanks!
  • W
    Wallace Heller

    Thank you for this article.

    I started using S3 a few years ago for both syncing across machines and operating systems and archival of documents, photos, music and video. I use it to host two static websites as well. There are easy to use GUIs (Transmit on Mac, Cloudberry Explorer on PC, Cyberduck, Filezilla, others both paid and "free") and the command line is easy to use with a wee bit of learning and patience. (It will be familiar to anyone who has worked with Linux.)

    Your reminder to set up users beyond root is timely. I have neglected this but understand its importance. I have never explored Glacier but will look into it. Seems simple enough. One thing I've begun doing is backing up photos and other files directly from Android phone to S3 buckets. Haven't found a way to automate this though.

    So called "power users" and IT folks are familiar with AWS, but it's a best kept secret for the rest of us that deserves to be shared.

    By the way, Happy New Year!

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      Thanks for reading. New year is a great opportunity to revisit backup solutions. Happy 2020 to you as well 🎊

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
  • R
    Ridac

    Thanks for the info!!!

    Just quick notes as I was reading through the pricing policy. If one consider to have this as absolute backup then its fine. I.e. just dump data without need to access them unless disaster requires that

    1) Retrieve time (for objects to become available for download) for glacier and deep one 6/12 hours

    2) Storage consideration, AWS will add extra small data for each object on glacier/deep, something around 32KB. This is not an issue for small amount of objects. Good idea to compress and package objects in single file (per year for example).

    3) Also consider the traffic especially out one, those will be chargeable

    4) Consider to use US region to host your S3 than one close to you, all regions in US are the cheapest. The latency difference to upload/download to these zones is really negligible as we are not accessing those objects frequently anyway

    https://aws.amazon.com/s3/pricing/

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
  • Y
    Yaniv Wainer

    Thanks for the great and simple tutorial @pawelgrzybek:disqus! You really demystified AWS S3.
    Can you go into greater detail on how to secure your S3 bucket? What are the IAM configurations you have in place for securing the photo backup?
    Also, I read online that you can enter 0 for the days when creating the transition to Glacier rule, and that way it transfers the objects immediately. Do you have any experience with that?

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      Hi.

      I am glad you found if helpful.

      I kept this bucket private accessible just for IAM users of my account (literally just for myself). I remember doing some copy/pasting from StackOverflow to set this things up back then, but now it is all achievable using GUI (Permissions > Access control list).

      Personally I didn't explore the option to move classed immediately after upload. I am not sure it would be a massive cost saving for me. I do not store enough of data for this thing to make a big difference. It is good to know though :)

      Thanks again for reading and have a nice day 🥑

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
  • J
    Jason Dixon

    Ah I love it! I also went down this road and never looked back (actually that is a lie, I did try a couple of other options later on to see if they compare, and ended up returning everything to S3 haha).

    Thanks for putting this information up here. I also remember being completely overwhelmed at first with the AWS offerings (and back then there was no snowmobile!). Loads of people will find this helpful!

    One thing I'd like to add, if/when wanting to take things further is to consider git-annex. It's something I've been using for a long time as well, with S3 as the storage backend, and it provides some things I've never seen anything else achieve elsewhere.

    Unfortunately it does add more complexity, as you'd be learning just the basics of git, and then the basics of git-annex on top of all the S3 stuff above (though it does have an automated sync tool which can make a lot of this point and click!). But that is the theme of this post! Huzzah!

    Ultimately the tool manages metadata and filenames, and treats them separate from their content. Which maybe sounds small or confusing, and it is at first. But once you get used to it it's huge!

    It means you can move files around in their structure, rename them, even copy them, and they'll only be stored in S3 once without alteration.

    But most importantly for me, it keeps track of your data locations, and lets you drop and retrieve the content of your files easily.

    For example; I can ask it where a photo exists, and it knows that I have a copy of a photo on S3, and another copy on an external HD and another on my laptop.

    Because there is a separation of files and their content, I can drop a file I don't need right now (the tool will confirm there is still a copy around at least before allowing it), and what I'll get in the end is essentially an empty file of the same name as a placeholder. So I can still see my files, their structure, and what I have available. But I don't have to have all the "content" of those files around at all times.

    Git itself can be stored and hosted cheaply through AWS, so the whole package can live there.

    The list of other things you can do is a mile long, but some others of note:

    * It can handle encryption, so everything you put into S3 is encrypted if you so choose, even to amazon itself.
    * If can add another layer of metadata on top of your files managed in the tool itself. Then you can use that as search criteria in other commands, and even dynamically restructure the layout into temporary metadata driven views (think tagging people in photos, then dynamically restructuring the layout so all photos of people are in folders of their name, then switching it back as it was as if nothing happened).
    * It can add multiple "backends" that include S3 but don't have to be. You can even store data in different cloud services and access them all from one place if you like. Moving them between services as you see fit (they all have free plans up to like 5gb right? :P ).
    * Because it uses git as the backbone for syncing all the metadata and location info. You have all the knowledge completely offline. So you could check where a file lives, for instance, on a laptop offline on a beach somewhere, if you so desired. :P

    Sorry that became a large ramble, and I ended up removing a number of other features I also use for space, haha.

    Back to the regularly scheduled awesome tutorial!

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      O wow! This comment deserves to be an article by itself. Thanks for sharing. It is pretty amazing how you extended the simple idea that I described in my post.

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
      • R
        Raja Vetsa

        Agree with your comments. This comment itself is an article.

        👆 you can use Markdown here

        Your comment is awaiting moderation. Thanks!
    • R
      Raja Vetsa

      great

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
    • s
      sterling

      I'm trying to address this problem of meta data not being passed with uploads. Specifically, the date the photo was taken/created. Do you know of any less complicated solutions that will pass all meta data with photos to S3?

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
  • s
    skarnl

    Thanks for this explanation - helped me to setup my own photo backup in AWS

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      I am glad it helped you out. Have a fab day!

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
  • R
    Raja Vetsa

    Love it! Thanks for the detailed explanation.

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
  • E
    Errol Heywood

    Very useful article! I currently use the external drive+time machine backup for my photo's which go back over 20 years plus but my external drive is suddenly giving me write issues. So, time to think of getting a big bucket in the cloud. Carbonite has been there for ages but most of these solutions use Amazon cloud services anyway so why not just go straight there? Your article is making that possible, thanks!

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      I am glad that my article helped you out.

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
  • J
    John G

    This is just what I was looking for. My SSD that had all of my newborn's photos nearly died this past weekend and now I've been tasked with finding a better storage solution. First thought was disks + fireproof safe, but now I'm thinking S3 is better. You make it sound much easier than AWS' documentation! I'll certainly report back if it isn't as great as I hoped.

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
  • C
    Craig

    Great blog! Thanks Pawel for taking the time to post and respond... lots of options and tools to check out!

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      I am glad that you liked it Craig 🙌

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
  • E
    Elana

    I'm trying to set this up to have a backup of my external drives of photos (that are not currently mounted to my desktop computer), but I think the interface has changed and I'm getting lost in the lifecycle rules section since it looks different on mine! I just want to archive a few entire drives (and hope I never need to access them!) to deep storage (understanding that I'll have to pay if I need to access them).

    👆 you can use Markdown here

    Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      My apologies. This article is five years old at this point, and it is probably a good time to revisit it and update the screenshots. Thank you for pointing this out, and expect an updated guide very soon.

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!
    • Pawel Grzybek
      Pawel Grzybek

      OK, all done and updated. Hopefully it will make your life easier.

      👆 you can use Markdown here

      Your comment is awaiting moderation. Thanks!

Leave a comment

👆 you can use Markdown here

Your comment is awaiting moderation. Thanks!