Summary

Here is a summary of the essential steps to consider when working with big data and machine learning on Google Cloud:

  • Acknowledge the life cycle of data-to-AI transformation on Google Cloud
  • Identify the connection between data engineering and machine learning
  • Identify the options available to build ML models on Google Cloud.
  • Determine the major features and benefits of Vertex AI.

Key Knowledge and Tips

  • TPUs (Tensor Processing Units) is Google hardware innovation that tailors architecture to meet the computation needs on a domain, such as the matrix multiplication in machine learning
  • Archive storage is  the data storage class that is best for storing data that needs to be accessed less than once a year, such as online backups and disaster recovery
  • Compute Engine, Google Kubernetes Engine, App Engine, and Cloud Functions represent Compute services
  • Cloud Storage, Cloud Bigtable, Cloud SQL, Cloud Spanner, and Firestore represent Database and storage services
  • Pub/Sub, Dataflow, Dataproc, and Cloud Data Fusion align to the Ingestion and process stage of the data-to-AI workflow
  • AutoML, Vertex AI Workbench, and TensorFlow align to the Machine learning stage of the data-to-AI workflow

We offer practical recommendations in this report.  If you have any projects related to Big Data and Machine Learning in Google Cloud, please feel free to contact us.