Google Drive is a great collaborative tool for businesses of all sizes, especially since the enterprise 'Google Workspace' platform was expanded recently. It is relatively secure, easy to use, and offers robust functionality. Yet with any storage solution, you need redundancy plans. As Schofield’s 2nd Law of Computing states, your data doesn’t really exist unless you have two copies of it.
One of the key requirements of SOC 2 compliance is availability. Along with performance monitoring and disaster recovery plans, this includes sufficient data backups. This article will outline a quick and easy way to host your own Google Drive Backup solution, all using open-source tools and simple bash scripts.
Google Drive uses 256-bit encryption for files in transit, 128-bit encryption for data at rest, 2FA support, and built-in malware and phishing protection. While this offers some peace of mind, it far from negates the need for automated backups, especially given Google Drive's lack of ransomware safeguards. What happens when an employee (accidentally or purposefully) deletes a directory with all the important documentation your technology team has spent months curating? Just because Google makes an effort to protect your data, that doesn't mean you should solely rely on or trust Google with it.
There are multiple third-party services that offer paid solutions to back up and manage your Google Drive data, but they won’t be as cost effective and configurable as doing it yourself. This guide will show you how to set up a free, “set it and forget it” solution to automatically back-up your Google Drive data.
To follow along with this guide, you will need the following:
1) A machine to run the scheduled Rclone commands. This can be any Unix-based machine in which you can configure a scheduled job to run a bash script.
2) AWS s3 bucket, create one here.
- Here is an example of the s3 permissions required for Rclone:
To get started, install and configure Rclone on the machine which will run the scheduled job.
- Install Rclone by running this command:
- Start Rclone config:
This will start an interactive configuration session to build your initial Rclone config file.
Keep in mind: If there is any step you feel unsure about, don’t worry, you can always edit the Rclone config file later. This is just the initial setup.
Next, set up the Rclone remote for the s3 drive:
→ Select ‘n’ to create a new remote
→ Name the s3 remote
→ Select ‘4’ to select AWS compliant storage option
→ Select ‘1’ to select AWS as the storage provider
→ Select ‘1’ to manually enter your AWS Credentials during this config. Keep in mind this will store your AWS credentials in plaintext within your config file. Don’t worry, as you will be encrypting this config file later on. If you want to load the AWS config dynamically, select ‘2’ to get the AWS credentials from the environment.
→ Select your AWS region
→ Option endpoint : you can leave this blank
→ Option location_constraint : leave this blank since you won’t be creating any buckets with the Rclone job
→ Option acl : leave this blank since you won’t be creating any buckets with the Rclone job
→ Option server_side_encryption : select ‘3’ for aws:kms if you use AWS’s key management for encrypting your s3 bucket, otherwise choose ‘2’ to manage your own s3 encryption. This will allow Rclone to encrypt your backups in the s3 bucket.
→ Option sse_kms_key_id : if you selected ‘3’ for the server_side_encryption option, you must provide the arn of the kms_key.
→ Option storage_class : select ‘1’ for default option.
→ You can now select ‘n’ to decline Advanced Config.
Congratulations! Your s3 remote is now configured.
Now you need to set up the remote for your Google Drive(s). The biggest decision you need to make is whether you want to use the default OAuth token-based authentication or your Google Service Account Credentials. Here are some reasons you might want to authenticate with Google Service Account Credentials instead of a basic OAuth token:
- OAuth tokens will eventually expire, creating headaches down the road:
- Google Service Account Credential authentication will allow you to backup multiple shared (team) drives at once. With this method, you only need to configure one Rclone remote to manage all the shared drives your given user has access to. Using a basic OAuth token, you will have to configure a separate Rclone remote for every drive you want to back up.
If you choose to use Google Service Account Credentials, follow this guide to create them. Your Service Account Credentials should look something like this:
Now that that’s settled, proceed to configure the Google Drive remote!
→ Run command ‘rclone config’
→ Select ‘n’ to create new remote
→ Select ‘16’ for “Google Drive”
→ Option client_id : you can leave this blank
→ Option client_secret : you can also leave this blank
→ Option scope : select ‘2’ for “Read-only access to file metadata and file contents.”
→ Option root_folder_id. Here is an example folder id:
→ Option service_account_file :
- If you chose to use the Google Service Account, enter the path of the JSON file containing your Service Account credentials. Keep in mind you can always edit this later if you are unsure. You can also paste the actual JSON string of the Service Account Credentials in the config file since you'll be encrypting it.
- If you want to use basic Oath, just leave this blank
- If you chose to authenticate via Service Account Credentials, you’ll want to configure this as a shared drive in the next step (this will attempt to validate your Service Account Credentials).
→ Question “Edit advanced config?” : Select ‘n’
→ Question “Use auto config?” :
- Select ‘y’ if you want to use an OAuth token. This will automatically open a browser window in which you need to explicitly grant Rclone permissions to your Google Drive Account.
- Select ‘n’ if you are running this configuration on a headless/remote machine. Copy and paste the generated link in a browser window on your local machine. This will prompt you to log in and authorize Rclone to access your Google Drive Account.
If either of the two use cases we discussed here don’t exactly meet your requirements, Rclone offers a wide range of configuration options for Google Drive, which can be found here.
We strongly recommend encrypting your config file, as it could contain sensitive information. Store this password in a safe and secure place!
After encryption, to view your config file use the command ‘rclone config show’ and enter your password.
Script for using Google Service Credential authentication (multiple shared drives):
Script for using regular Oath authentication (single drive):
We use Rundeck to automate many tasks such as new builds, server provisioning, and general helper scripts. Therefore, this was the obvious home for us when deciding where to run this scheduled job. However, you can also just configure a simple cron job:
When running a traditional cron job, you'll need to tell the cron job how to find your AWS credentials. One way to accomplish this is by specifying the location of the AWS CLI credentials file location in the crontab itself. This will also write the script output to its own log file.
The following cron job will run every night at 3am:
Crontab expression generators like this one can be helpful in determining the correct cron expression.
Important caveats:
- Since you encrypted the Rclone config file, which the Rclone service needs read access to, you will need to supply the password any time you restart the Rclone service.
- While you can use Rclone with many different cloud storage providers, this guide is written specifically for backing up multiple Google Drives within the same account to one AWS S3 bucket.
- Our solution uses the ‘copy’ Rclone command instead of ‘sync’ so we can have multiple daily backups, which are managed by a s3 lifecycle policy. Follow this guide to set up a policy which will dictate how many daily backups your s3 bucket will retain. If you only want to maintain one backup, you can just as easily use ‘sync’ instead of ‘copy’.
Some helpful commands:
- Show location of Rclone config file:
- Show the contents of the Rclone config file:
Thank you for reading! Congratulations, you now have a solid redundancy plan in place for your Google Drive(s)!
Please check out other Botsplash blogs and articles if you enjoyed this one.
To learn more about Botsplash click the button below to schedule a demo with our team.