As part of our ongoing series on cost management for observability data in Google Cloud, Google are going to share four steps for getting the most out of your logs while on a budget. While Google will focus on optimizing your costs within Google Cloud, Google have found that this works with customers with infrastructure and logs on prem and in other clouds as well.

Step 1: Analyze your current spending on logging tools

To get started, create an itemized list of what volume of data is going where and what it costs. Google will start with the billing report and the obvious line items including those under Operations Tools/Cloud Logging:

  • Log Volume – the cost to write log data to disk once
  • Log Storage Volume – the cost to retain logs for more than 30 days 

If you’re using tools outside Cloud Logging, you’ll also need to include any costs related to these solutions. Here’s a list to get you started:

  • Log vendor and hardware costs — what are you paying to observability vendors? If you’re running your own logging solution, you’ll want to include the cost of compute and disk.
  • If you export logs within Google Cloud, include Cloud Storage and BigQuery costs
  • Processing costs — consider the costs for Kafka, Pub/Sub or Dataflow to process logs. Network egress charges may apply if you’re moving logs outside Google Cloud.
  • Engineering resources dedicated to managing your logging tools across your enterprise often are significant too!

Step 2: Eliminate waste — don’t pay for logs you don’t need

While not all costs scale directly with volume, optimizing your log volume is often the best way to reduce spend. Even if you are using a vendor with a contract that locks you into a fixed price for a period of time, you may still have costs in your pipeline that can be reduced by avoiding wasteful logs such as Kafka, Pub/Sub or Dataflow costs. 

Finding chatty logs in Google Cloud

The easiest way to understand which sources are generating the highest volume of logs within Google Cloud is to start with our pre-built dashboards in Cloud Monitoring. To access the available dashboards:

  1. Go to Monitoring -> Dashboards
  2. Select “Sample Library” -> “Logging”

This blog post has some specific recommendations for optimizing logs for GKE and GCE using prebuilt dashboards.

As a second option, you can use Metrics Explorer and system metrics to analyze the volume of logs. For example, type “log bytes ingested” into the filter. This specific metric corresponds to the Cloud Logging “Log Volume” charge. There are many ways to filter this data. To get a big picture, we often start with grouping by both “resource_type” and “project_id”. 

To narrow down the resource type in a particular project, add a “project_id” filter. Select “sum” under the Advanced Options -> Click on Aligner and select “sum”. Sort by volume to see the resources with the highest log volume.

While these rich metrics are great for understanding volumes, you’ll probably want to eventually look at the logs to see whether they’re critical to your observability strategy. In Logs Explorer, the log fields on the left side help you understand volumes and filter logs from a resource type.

Reducing log volume with the Logs Router 

Now that we understand what types of logs are expensive, we can use the Log Router and our sink definitions to reduce these volumes. Your strategy will depend on your observability goals, but here are some general tools we’ve found to work well.

The most obvious way to reduce your log volume is not to send the same logs to multiple storage destinations. One common example of this is when a central security team uses an aggregated log sink to centralize their audit logs but individual projects still ingest these logs. Instead, use exclusion filters on the _Default log sink and any other log sinks in each project to avoid these logs. Exclusion filters also work on log sinks to BigQuery, Pub/Sub, or Cloud Storage.

Similarly, if you’re paying to store logs in an external log management tool, you don’t have to save these same logs to Cloud Logging. We recommend keeping a small set of system logs from GCP services such as GKE in Cloud Logging in case you need assistance from GCP support but what you store is up to you, and you can still export them to the destination of your choice!

Another powerful tool to reduce log volume is to sample a percentage of chatty logs. This can be particularly useful with 2XX log balancer logs, for example. This can be a powerful tool, but we recommend you design a sampling strategy based on your usage, security and compliance requirements and document it clearly.

Step 3: Optimize costs over the lifecycle of your logs

Another option to reduce costs is to avoid storing logs for more time than you need them. Cloud Logging charges based on the monthly log volume retained per month. There’s no need to switch between hot and cold storage in Cloud Logging; doubling the default amount of retention only increases the cost by 2%. You can change your custom log retention at any time.

If you are storing your logs outside of Cloud Logging, it is a good idea to compare the cost to retain logs and make a decision. 

Step 4: Setup alerts to avoid surprise bills

Once you are confident that the volume of logs being routed through log sinks fit in your budget, set up alerts so that you can detect any spikes before you get a large bill. To alert based on the volume of logs ingested into Cloud Logging:

  1. Go to the Logs-based metrics page. Scroll down to the bottom of the page and click the three dots on “billing/bytes_ingested” under System-defined metrics. 
  2. Click “ Create alert from metric”
  3. Add filters (For example: use resource_id or project_id. This is optional). 
  4. Select the logs based metric for the alert policy.

You can also set up similar alerts on the volume for log sinks to Pub/Sub, BigQuery or Cloud Storage.

Conclusion

One final way to stretch your observability budget is to use more Cloud Operations. Google are always working to bring our customers the most value possible for their budget such as our latest feature, Log Analytics, which adds querying capabilities but also makes the same data available for analytics, reducing the need for data silos. Many small customers can operate entirely on our free tier. Larger customers have expressed their appreciation for the scalable Log Router functionality available at no extra charge that would otherwise require an expensive event store to process data. So it’s no surprise that a 2022 IDC report showed that more than half of respondents surveyed stated that managing and monitoring tools from public cloud platforms provide more value compared to third-party tools.