Transferring Large Data Sets, VM Images to Google Cloud Platform
Sample Use Case1 – You have VMs on-site that you need to backup and store on the cloud. Traditional, on-prem backups cost you an arm and a leg!
-
You can upload entire VM to a storage bucket using gsutil.
-
Your costs are of three types – Storage Costs, Network Costs (Ingress is free, and Egress) and Operations (PUT operations on the bucket)
-
If you are looking to import boot disk images to Compute Engine, that’s a slightly different ball game. There’s a new service from GCP for importing boot (virtual) disks.
GSUtil – The bucket data transfer workhorse
While this is a command line tool, cmd line in no way means single threaded or limited. Here are some of the key features of GSUtil.
- Multi-threaded/processed. Useful when transferring large number of files.
- Parallel composite uploads. Splits large files, transfers chunks in parallel, and composes at destination.
- Retry. Applies to transient network failures and
HTTP/429
and5xx
error codes. - Resumability. Resumes the transfer after an error.
Estimating costs
- Moving to Google Cloud Storage, you incur no ingress traffic charges. The
gsutil
tool and the Cloud Storage Transfer Service are both offered at no charge. - After your data is transferred, you pay for Cloud Storage usage based on storage, network, custom metadata and operations.
- Cost differs for different storage classes and choose the right storage class for your use case.
- If all you are doing is coldline storage (once per year retrieval), you would be paying approximately $0.007 – $0.014 per GB/month. Nearline storage (once per month) costs $0.01 – $0.02 per GB/month.
GCP network pricing page for the most up-to-date pricing details.
Basic Questions to get started
- Approximate size of the data do you need to transfer? Where does it currently reside?
- What is the network bandwidth is available from the data location?
- Frequency of Transfers – Do you need to transfer your data once, or periodically?
Bucket Lifecycle Questions
- Are you going to be doing a one time transfer or incremental transfers?
- Incremental transfers constitute uploads (ingress traffic), which is free. However, updates are PUT operations, and operations cost money.
- For coldline or nearline, how soon would you need your data retrieved? (Recovery Time objective, Recovery Point Objective)?
Bucket Location and Backup Questions
- Do you need your bucket to be in the same region?
- Do you need a backup of the storage bucket?
Other GCP Services?
Cloud Storage is optimized to work with other GCP services such as BigQuery and Cloud Dataflow, making it easy for you to perform cloud-based data engineering and analysis with a broader GCP architecture.
What is Google Transfer Service?
Gsutil fits most use cases as shown below. If your data is already in GCS or AWS or Azure, you can leverage Google Transfer Service (online to online data transfer).
Leave a Reply