Most applications need to store data e.g. media to be streamed, sensor data from devices.
Different applications and workloads require different storage database solutions.
- structured
- unstructured
- transactional
- relational
- Cloud Storage (like AWS S3)
- Cloud SQL
- Spanner
- Firestore
- Bigtable
(1) Cloud Storage
Object Storage
- a file and folder hierarchy (file storage) or
- as chunks of a disk (block storage)
- binary form of the actual data itself
- relevant associated meta-data (such as date created, author, resource type, and permissions)
- globally unique identifier. These unique keys are in the form of URLs, which means object storage interacts well with web technologies.
- video
- pictures
- audio recordings
Cloud Storage:
- Service that offers developers and IT organizations durable and highly available object storage
- Google’s object storage product
- Allows customers to store any amount of data, and to retrieve it as often as needed
- Fully managed scalable service
Cloud Storage Uses
- serving website content
- storing data for archival and disaster recovery
- distributing large data objects to end users via Direct Download
- online content such as videos and photos
- backup and archived data
- storage of intermediate results in processing workflows
Buckets
A bucket needs:
- globally unique name
- specific geographic location for where it should be stored
- An ideal location for a bucket is where latency is minimized. For example, if most of our users are in Europe, we probably want to pick a European location, so either a specific Google Cloud region in Europe, or else the EU multi-region
- With object versioning:
- Cloud Storage will keep a detailed history of modifications (overwrites or deletes) of all objects contained in that bucket
- We can list the archived versions of an object, restore an object to an older state, or permanently delete a version of an object, as needed
- Without object versioning:
- by default new versions will always overwrite older versions
Access Control
There are a couple of options to control user access to objects and buckets:
- For most purposes, IAM is sufficient. Roles are inherited from project to bucket to object.
- If we need finer control, we can create access control lists. Each access control list consists of two pieces of information:
- scope, which defines who can access and perform an action. This can be a specific user or group of users
- permission, which defines what actions can be performed, like read or write
- For example, we could tell Cloud Storage to delete objects older than 365 days; or to delete objects created before January 1, 2013; or to keep only the 3 most recent versions of each object in a bucket that has versioning enabled
- Having this control ensures that we’re not paying for more than we actually need
Storage classes and data transfer
- Standard storage
- considered best for frequently accessed or hot data
- great for data that's stored for only brief periods of time
- Nearline storage
- Best for storing infrequently accessed data, like reading or modifying data on average once a month or less
- Examples may include data backups, long term multimedia content, or data archiving.
- Coldline storage
- A low cost option for storing infrequently accessed data.
- However, as compared to near line storage, coldline storage is meant for reading or modifying data at most, once every 90 days.
- Archive storage
- The lowest cost option used ideally for data archiving, online backup and disaster recovery
- It's the best choice for data that we plan to access less than once a year because it has higher costs for data access and operations in a 365 day minimum storage duration
- unlimited storage
- no minimum object size requirement
- worldwide accessibility and locations
- low latency and high durability
- a uniform experience which extends to security tools and API's
- geo-redundancy if data is stored in a multi-region or dual region. This means placing physical servers in geographically diverse data centers to protect against catastrophic events and natural disasters, and low balancing traffic for optimal performance.
Auto-class
- moves data that is not accessed to colder storage classes to reduce storage costs
- moves data that is accessed to standard storage to optimize future accesses
Data Encryption
Data Transfer into Google Cloud Storage
- Online Transfer
- by using Cloud storage, which is the Cloud storage command from the Cloud SDK
- by using a Dragon Drop option in the Cloud console if accessed through the Google Chrome web browser
- Storage transfer service
- enables us to import large amounts of online data into Cloud storage quickly and cost effectively
- if we have to upload terabytes or even petabytes of data
- Lets us schedule and manage batch transfers to cloud storage from:
- another Cloud provider
- a different cloud storage region
- an HTTPS endpoint
- Transfer Appliance
- A rackable, high capacity storage server that we lease from Google Cloud
- We connect it to our network, load it with data, and then ship it to an upload facility where the data is uploaded to cloud storage
- We can transfer up to a petabyte of data on a single appliance
- Moving data in internally, from Google Cloud services as Cloud storage is tightly integrated with other Google Cloud products and services. For example, we can:
- import and export tables to and from both BigQuery and Cloud SQL
- store app engine logs, files for backups, and objects used by app engine applications like images
- store instance start up scripts, compute engine images, and objects used by compute engine applications
Provisioning Cloud Storage Bucket
(2) Cloud SQL
- MySQL
- PostgreSQL
- SQL Server
- applying patches and updates
- managing backups
- configuring replications
- Doesn't require any software installation or maintenance
- Can scale up to 128 processor cores, 864 GB of RAM, and 64 TB of storage.
- Supports automatic replication scenarios, such as from:
- Cloud SQL primary instance
- External primary instance
- External MySQL instances
- Supports managed backups, so backed-up data is securely stored and accessible if a restore is required. The cost of an instance covers seven backups
- Encrypts customer data when on Google’s internal networks and when stored in database tables, temporary files, and backups
- Includes a network firewall, which controls network access to each database instance
- Cloud SQL can be used with App Engine using standard drivers like Connector/J for Java or MySQLdb for Python.
- Compute Engine instances can be authorized to access Cloud SQL instances and configure the Cloud SQL instance to be in the same zone as our virtual machine
- Cloud SQL also supports other applications and tools, like:
- SQL Workbench
- Toad
- other external applications using standard MySQL drivers
Provisioning Cloud SQL Instance
- Database engine:
- MySQL
- PostgreSQL
- SQL Server
- Instance ID - arbitrary string e.g. blog-db
- Root user password: arbitrary string (There's no need to obscure the password because we use mechanisms to connect that aren't open access to everyone)
- Choose a Cloud SQL edition:
- Edition type:
- Enterprise
- Enterprise Plus
- Choose edition preset:
- Sandbox
- Development
- Production
- Choose region - This should be the same region and zone into which we launched the Cloud Compute VM instance. The best performance is achieved by placing the client and the database close to each other.
- Choose zonal availability
- Single zone - In case of outage, no failover. Not recommended for production.
- Multiple zones (Highly available) - Automatic failover to another zone within your selected region. Recommended for production instances. Increases cost.
- Select Primary zone
![]() |
click on image to zoom |
![]() |
click on image to zoom |
- see its Public IP address (e.g. 35.204.71.237)
- Add User Account
- username
- password
- set Connections
- Networking >> Add a Network
- Choose between Private IP connection and a Public IP connection
- set Name
- Network: <external_IP_of_VM_Instance>/32 (If chosen Public IP connection then use instance's external IP address)
(3) Spanner
- Fully managed relational database service that scales horizontally, is strongly consistent, and speaks SQL
- Service that powers Google’s $80 billion business (Google’s own mission-critical applications and services)
- Especially suited for applications that require:
- SQL relational database management system with joins and secondary indexes
- built-in high availability
- strong global consistency
- high numbers of input and output operations per second (tens of thousands of reads and writes per second or more)
The horizontal scaling approach, sometimes referred to as "scaling out," entails adding more machines to further distribute the load of the database and increase overall storage and/or processing power. [A Guide To Horizontal Vs Vertical Scaling | MongoDB]
(4) Firestore
- individual, specific documents or
- all the documents in a collection that match our query parameters
- automatic multi-region data replication
- strong consistency guarantees
- atomic batch operations
- real transaction support
(5) Bigtable
- Google's NoSQL big data database service
- The same database that powers many core Google services, including Search, Analytics, Maps, and Gmail
- Designed to handle massive workloads at consistent low latency and high throughput, so it's a great choice for both operational and analytical applications, including Internet of Things, user analytics, and financial data analysis.
- We work with more than 1TB of semi-structured or structured data
- Data is fast with high throughput, or it’s rapidly changing
- We work with NoSQL data. This usually means transactions where strong relational semantics are not required
- Data is a time-series or has natural semantic ordering
- We work with big data, running asynchronous batch or synchronous real-time processing on the data
- We are running machine learning algorithms on the data
- Managed VMs
- HBase REST Server
- Java Server using the HBase client
- Dataflow Streaming
- Spark Streaming
- Storm
- Hadoop MapReduce
- Dataflow
- Spark