Managing ML data sets with Vertex AI

Many enterprises want to use data to make meaningful predictions that can bolster their business or help them venture into new markets. This often requires using custom machine learning models—something not every business knows how to create or use. This is where Vertex AI can help. Vertex AI provides tools for every step of the machine learning workflow—from managing data sets to different ways of training the model, evaluating, deploying, and making predictions. It also supports varying levels of ML expertise, so you don’t need to be an ML expert to use Vertex AI.

Types of data you can use in Vertex AI

Datasets are the first step of the machine learning lifecycle—to get started you need data, and lots of it. Vertex AI currently supports managed datasets for four data types—image, tabular, text, and videos. 

Image

Image datasets let you do:

To ensure your model performs well in production, use training images similar to what your users will send. For example, if users are likely to send low quality images, be sure to have blurry and low resolution images in your data set. Don’t forget to include different angles, backgrounds, and resolutions. We recommend you include at least 1,000 images per label (item you want to identify), but you can always get started with 10 per label. The more examples you provide, the better your model will be.

Tabular

Tabular datasets enable you to do:

Tabular data sets support hundreds of columns and millions of rows. 

Text

With text datasets, you can do:

Video

Video datasets enable:

Creating and managing datasets in Vertex AI

Now that we’ve covered the different types of data you can use, let’s shift to creating and managing those datasets. In the Cloud Console, go to Vertex AI dashboard page and click Datasets, then click Create Project.

Say you want to classify items within a set of photos. Create an image dataset and select image classification. You can import files directly from your computer, which will be stored in Cloud Storage. Then, you’ll need to add the corresponding labels (items you want to identify) for your images. If you already have labels, you can use the Import File option to import a CSV with your image URLs and their labels. If your data is not labeled and you would like human help to label it, you can use the Vertex AI data labeling service. Once the files are uploaded, you can create labels and assign them to the images. You can also analyze the images in the data set, the number of images per label, and a few other properties. 

Depending on the type of data you use, your options might vary slightly. For example, if you want to use tabular data, you could upload a CSV file from your computer, use one from Cloud Storage, or select a table from BigQuery directly. Once you select the table, the data is available for analysis.

Take the next step

Start building on Google Cloud with $500 in free credits and 20+ always free products.

Related posts

Four steps to managing your Cloud Logging costs on a budget

by Cloud Ace Indonesia
1 year ago

Do the numbers: How AI is helping revolutionize accounting

by Cloud Ace Indonesia
2 years ago

Key considerations for evaluating AI-powered tools for enterprise developers

by Cloud Ace Indonesia
12 months ago