Are you familiar with the four golden signals of Site Reliability Engineering (SRE): latency, traffic, errors, and saturation? Whether you’re a developer or an operator, you’ve likely been responsible for collecting, storing, or analyzing the data associated with these concepts. Much of this data is captured in application and infrastructure logs, which provide a rich history of what is happening behind the scenes in your workloads.
Getting insights from your logs to track those four golden signals can become unruly very quickly as the application scales up, hindering the ability for your developers and operations teams to identify when and where errors are occurring. If you fail to set up your monitoring and logging systems correctly, your Mean Time to Recovery (MTTR) from service impacting events can be impacted.
Google Cloud provides guidance on what to think about when deciding how to set up your logging, monitoring, and alert systems in the operational excellence section of the Cloud Architecture Framework. Google Cloud also provides managed services as part of the operations suite to automate collection, storage and analysis of the four golden signals. Cloud Error Reporting is one such service.
Error Reporting – Speed up your MTTR with zero effort
Error Reporting automatically captures exceptions found in logs ingested by Cloud Logging from the following languages: Go, Java, Node.js, PHP, Python, Ruby, and .NET, aggregates them, and then notifies you of their existence. The service intelligently groups together the errors that it finds and makes them available in a dedicated dashboard. The dashboard displays the details of the exception including a histogram of occurrences, list of affected versions, request URL and links to the request log, meaning you can get to the affected resource immediately, with just one click!
How can Error Reporting help your organization today?
Error Reporting helps focus your most valuable resource (i.e Developer attention) on the potential source of exceptions that are impacting your workloads. With the notifications and embedded links, exceptions can quickly be resolved before they impact your customers and bottom line.
What do you have to do to enable Error Reporting?
Error Reporting is automatically enabled as soon as logs that contain error events like stack traces are ingested into Cloud Logging or when you use the API to self configure a service to capture exceptions.
When you use Google Kubernetes Engine and Google serverless offerings, application logs written to stdout or stderr will appear automatically in Cloud Logging, and therefore Error Reporting will automatically start analyzing them. To capture logs from applications running on VMs in Google Compute Engine, you will need to install the Ops Agent. From there, app logs will be captured in Cloud Logging and exceptions will flow through to Error Reporting.
Get started today
To view available error events, visit the Error Reporting page in the Google Cloud Console. You can find it in the left navigation panel or by searching in the search bar at the top of the console.
If you have any questions or want to start a discussion with other Error Reporting users, visit the Cloud Operations section of the Google Cloud Community and post a discussion topic.