AWS Athena: How to Create Logging System That You Can Search, Filter, Group, and Analyze Using SQL Queries

Nic Lasdoce

02 Oct 20237 minutes read

Untraceable logs can hinder effective troubleshooting. This article presents a solution using AWS's CloudWatch, S3, and Athena to optimize logging and enhance system diagnostics.

Introduction

Lack of search, filtering, traceability, and clarity in logs often leads to prolonged downtime, inefficient troubleshooting, and missed insights, all of which can have cascading impacts on system performance and user experience.

Effective logging and deep analysis are foundational for maintaining and understanding complex cloud infrastructures. Each log captures critical facets of system behavior, and when combined with rigorous analysis, these logs transform into invaluable insights. Through the capabilities of AWS’s CloudWatch, S3, and Athena, our focus shifts from merely documenting events to meticulously interpreting their significance. This guide is tailored to steer you through this essential integration, merging the precision of logging with the granularity of analysis.

This guide will steer you through leveraging aws tools, ensuring your logs are not just records but key instruments for system optimization and troubleshooting.

Part 1: Creating Logging and Export

This part may be boring but this is important. Here, we will setup all the AWS resources we are going to need to create logs and export these logs.

NOTE: We are going to use python to execute scripts to run the resource setup faster, but you can definitely use AWS Console UI if you wish to. This guide assumes you can at least run python scripts. IMPORTANT: If you deploy your application on AWS and properly setup logging, there is no need to push your logs to cloudwatch using the code on step 6. ALL your print and logging output will automatically be on cloudwatch so just make sure they are formatted properly.

1. Pre-requisite Installation

Install the necessary library: boto3.

pip install boto3

2. Import Required Libraries

Import the libraries crucial for the following steps.

import boto3
import json
import uuid
import pickle
from datetime import datetime, timedelta
from time import sleep

3. Unique S3 Bucket Naming

S3 bucket name is global, to avoid collision we are going to do a little trick here by generating a UUID which will then be cached via pickle. This will create a unique bucket name but will keep that name even if you rerun the program as long as you do not delete the .pkl file

try:
    BUCKET_UUID = pickle.load(open('uuid.pkl', 'rb'))
except FileNotFoundError:
    BUCKET_UUID = uuid.uuid4()
    pickle.dump(BUCKET_UUID, open('uuid.pkl', 'wb'))
S3_BUCKET_NAME = f'sample-log-export-bucket-{BUCKET_UUID}'

4. AWS Credentials Configuration

################################################################################
# IMPORTANT! For this demo, we are attaching FullAccess to Cloudfront
# and S3 to the IAM user, but in production, you must keep in mind to
# only give permissions to exact resources and actions
# that your application needs
################################################################################


# Replace these values with your AWS credentials
# make sure the iam access for the following credentials has the
# CloudWatchLogsFullAccess and AmazonS3FullAccess policies attached
AWS_ACCESS_KEY_ID = 'your-iam-access-key-id'
AWS_SECRET_ACCESS_KEY = 'your-iam-access-secret-key'
AWS_REGION = 'us-east-1'
LOG_GROUP_NAME = 'sample-log-group'
LOG_STREAM_NAME = 'sample-log-stream'

5. Initialize CloudWatch

Now we will create the cloudwatch log group and log stream where we will push the logs. If your app is already in AWS then you most likely have this setup and all you need to do to adapt this to your app is to go through the steps to create s3 bucket and export the cloudwatch logs. Again, you can do this using AWS Console UI

# Create the log group

logs_client = boto3.client(
    'logs',
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    region_name=AWS_REGION,
)

try:
    response = logs_client.create_log_group(logGroupName=LOG_GROUP_NAME)
    print(f'Log group {LOG_GROUP_NAME} created successfully.')
    response = logs_client.put_retention_policy(
        logGroupName=LOG_GROUP_NAME,
        retentionInDays=1
    )
except logs_client.exceptions.ResourceAlreadyExistsException:
    print(f'Log group {LOG_GROUP_NAME} already exists.')
except Exception as e:
    print(f'An error occurred: {e}')

# Create the log stream
try:
    response = logs_client.create_log_stream(
        logGroupName=LOG_GROUP_NAME,
        logStreamName=LOG_STREAM_NAME
    )
except logs_client.exceptions.ResourceAlreadyExistsException:
    print(f'Log stream {LOG_STREAM_NAME} already exists.')
except Exception as e:
    print(f'An error occurred: {e}')

6. Universal Log Formatting

Format logs to ensure consistency, making them easier to parse and analyze later.

def create_log(level, function_name, message, user, data):
    log = f':: {level} :: {function_name} :: {message} :: {user} :: {json.dumps(data)}'
    # NOTE: when running applications on ECS or EC2 with cloudwatch agents, the
    # python print and logger functions automatically goes to cloudwatch
    print(log)

    # NOTE: this part is only called because we are not directly running
    # the app inside AWS, so for simplicity reason, we just push the log
    # directly using cloudwatch api. But for apps running on AWS, then the
    # print(log) above will do the same logs as the put_log_events below
    # so the put_log_events is not necessary
    logs_client.put_log_events(
        logGroupName=LOG_GROUP_NAME,
        logStreamName=LOG_STREAM_NAME,
        logEvents=[
            {
                'timestamp': round(datetime.timestamp(datetime.now())) * 1000,
                'message': log
            }
        ]
    )

7. Time to add some logs to cloudwatch!

# Specify log data, this is a dict with any key-value you want, this
# just gives additional information to the logs such as variable content
# or object values
log_data = {
    "parameter_1": 1,
    "parameter_2": 2,
    "endpoint": "https://api.endpoint.sample.log/test"
}
# This helps to identify which function invoked logging
caller_function = 'sample_function'

# This is like the title or short description of logs
log_message = 'sample log'

# Log Level could be INFO, WARNING, ERROR
log_level = 'INFO'

# For applications, it is important to track actions by user
user_id = '1'


# create a sample info log with different parameters, so we can query them later
# with filter by parameter
for i in range(1, 10):
    log_data['parameter_1'] = i % 2
    # this format is important since all logs will follow this format so we can
    # parse them later
    create_log(
        level=log_level,
        function_name=caller_function,
        message=log_message,
        user=user_id,
        data=log_data
    )

# create a sample error log
log_level = 'ERROR'
log_data['error'] = "sample_error traceback"
create_log(
    level=log_level,
    function_name=caller_function,
    message=log_message,
    user=user_id,
    data=log_data
)

8. S3 Bucket creation

Again, this part can be done through AWS Console UI if you want to. We are only doing this here to make it easy to replicate by simply running the code

try:
    response = s3.create_bucket(
        Bucket=S3_BUCKET_NAME,
    )
    print(f'Bucket {S3_BUCKET_NAME} created successfully.')
except s3.exceptions.BucketAlreadyOwnedByYou:
    print(f'Bucket {S3_BUCKET_NAME} already exists and you own it.')
except s3.exceptions.BucketAlreadyExists:
    print(f'Bucket {S3_BUCKET_NAME} already exists, but not necessarily owned by you.')
except Exception as e:
    print(f'An error occurred: {e}')

9. Permissions

These are important to allow cloudwatch to export logs. Note that this policiy is not strict since we are doing testing, on production make sure that you specify conditions to only allow specific accounts

# Add permission for cloudwatch to export logs to the bucket created above
try:
    s3.put_bucket_policy(
        Bucket=S3_BUCKET_NAME,
        Policy=json.dumps({
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": f"logs.{AWS_REGION}.amazonaws.com"
                    },
                    "Action": "s3:PutObject",
                    "Resource": f"arn:aws:s3:::{S3_BUCKET_NAME}/*"
                },
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": f"logs.{AWS_REGION}.amazonaws.com"
                    },
                    "Action": "s3:GetBucketAcl",
                    "Resource": f"arn:aws:s3:::{S3_BUCKET_NAME}",
                },
            ]
        })
    )
    print(f"Bucket policy updated successfully for bucket {S3_BUCKET_NAME}.")
except Exception as e:
    print(f"An error occurred: {e}")

10. Export the logs to S3

After this, we are about to go to fun part. Again, you can also do this export via AWS Console

try:
    print('giving some time for cloudfront logs to be ready for export')
    sleep(90)  # give some seconds for cloudwatch to process logs
    dt_now = datetime.now()
    ts_yesterday = round(datetime.timestamp(dt_now - timedelta(days=1))) * 1000
    ts_tomorrow = round(datetime.timestamp(dt_now + timedelta(days=1))) * 1000
    response = logs_client.create_export_task(
        taskName='SampleExportTaskToS3',
        logGroupName=LOG_GROUP_NAME,
        logStreamNamePrefix=LOG_STREAM_NAME,
        fromTime=ts_yesterday,
        to=ts_tomorrow,
        destination=S3_BUCKET_NAME,
    )
    print(f"Export task initiated. Task ID: {response['taskId']}")
except Exception as e:
    print(f"An error occurred: {e}")

Part 2: Analysis

This is what we have been waiting for, all the tedious steps above are necessary so we can get to the actual options on how to analyze logs.

Athena treats your logs as a table. With the right schema, you can use SQL-like queries to extract crucial information, identify trends, debug issues, and even set up alerts for specific log patterns. Ensure your logs are consistently formatted to maximize the potential of Athena's querying capabilities.

Tip: Leverage AWS's built-in documentation and tutorials to understand how to use Athena with your specific log format. They provide extensive guides and examples for numerous use cases.

Analyze the Logs Using SQL Queries

Prepare Athena

Go to AWS S3, select the bucket we created above.
Wait until the contents of exportedlogs/[some-id]/sample-log-stream/[some-number].gz is populated
Go to AWS Athena Settings (https://us-east-1.console.aws.amazon.com/athena/home?region=us-east-1#/query-editor/settings)
Click on Manage, under the "Query result location and encryption" select the bucket we created above
Now go back to Athena and click the tab "Editor"

Create Table

Paste the following query to create the table that will parse the cloudwatch logs and turn it into SQL rows. Note that this will only work if use the create_log function above without modifying the format, if you modified it then you need to edit this query as well

CREATE EXTERNAL TABLE cloudwatch_logs (
    log_timestamp STRING,
    log_level STRING,
    caller_function STRING,
    log_message STRING,
    user_id STRING,
    log_data STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
    'input.regex' = '^(\\S+)\\s+::\\s+(.*?)\\s+::\\s+(.*?)\\s+::\\s+(.*?)\\s+::\\s+(.*?)\\s+::\\s+([\\{,\\[].*[\\},\\]])$'
)
LOCATION 's3://<S3_BUCKET_NAME>/exportedlogs/'

Leveraging SQL to Query Your Logs

Getting all logs

select * from cloudwatch_logs

Grouping

select log_level, count(*) from cloudwatch_logs group by log_level;

Filtering

select * from cloudwatch_logs where user_id = '1'

Advanced Parsing

Another powerful thing to do is to parse the content of the dict of log data. If you check the create_log function above, you can pass a dict to add more data to the logs.

WITH dataset AS (
    SELECT log_data
        AS blob
    FROM cloudwatch_logs WHERE log_timestamp IS NOT NULL
)
SELECT
    json_extract(blob, '$.parameter_1') AS parameter_1,
    json_extract(blob, '$.parameter_2') AS parameter_2,
    json_extract(blob, '$.error') AS err
FROM dataset

Advanced Parsing with Nested Filter

If you want to go deeper, then you can even query by the content of the log data object:

WITH dataset AS (
    SELECT log_data
        AS blob
    FROM cloudwatch_logs WHERE log_timestamp IS NOT NULL
)
SELECT
    json_extract(blob, '$.parameter_1') AS parameter_1,
    json_extract(blob, '$.parameter_2') AS parameter_2,
    json_extract(blob, '$.error') AS err
FROM dataset
WHERE json_extract_scalar(blob, '$.parameter_1') = '0'

Conclusion

We have now elevated our logging and analytical capabilities to a whole new level by utilizing AWS tools like CloudWatch, S3, and Athena. We sincerely hope that this guide provided valuable insights and that you've garnered new knowledge to enhance your operational expertise. Remember, every log has a story; with the right tools and techniques, you're better equipped to understand it.