Lack of search, filtering, traceability, and clarity in logs often leads to prolonged downtime, inefficient troubleshooting, and missed insights, all of which can have cascading impacts on system performance and user experience.
Effective logging and deep analysis are foundational for maintaining and understanding complex cloud infrastructures. Each log captures critical facets of system behavior, and when combined with rigorous analysis, these logs transform into invaluable insights. Through the capabilities of AWS’s CloudWatch, S3, and Athena, our focus shifts from merely documenting events to meticulously interpreting their significance. This guide is tailored to steer you through this essential integration, merging the precision of logging with the granularity of analysis.
This guide will steer you through leveraging aws tools, ensuring your logs are not just records but key instruments for system optimization and troubleshooting.
This part may be boring but this is important. Here, we will setup all the AWS resources we are going to need to create logs and export these logs.
NOTE: We are going to use python to execute scripts to run the resource setup faster, but you can definitely use AWS Console UI if you wish to. This guide assumes you can at least run python scripts. IMPORTANT: If you deploy your application on AWS and properly setup logging, there is no need to push your logs to cloudwatch using the code on step 6. ALL your print and logging output will automatically be on cloudwatch so just make sure they are formatted properly.
Install the necessary library: boto3.
pip install boto3
Import the libraries crucial for the following steps.
import boto3import jsonimport uuidimport picklefrom datetime import datetime, timedeltafrom time import sleep
S3 bucket name is global, to avoid collision we are going to do a little trick here by generating a UUID which will then be cached via pickle. This will create a unique bucket name but will keep that name even if you rerun the program as long as you do not delete the .pkl file
try:BUCKET_UUID = pickle.load(open('uuid.pkl', 'rb'))except FileNotFoundError:BUCKET_UUID = uuid.uuid4()pickle.dump(BUCKET_UUID, open('uuid.pkl', 'wb'))S3_BUCKET_NAME = f'sample-log-export-bucket-{BUCKET_UUID}'
################################################################################# IMPORTANT! For this demo, we are attaching FullAccess to Cloudfront# and S3 to the IAM user, but in production, you must keep in mind to# only give permissions to exact resources and actions# that your application needs################################################################################# Replace these values with your AWS credentials# make sure the iam access for the following credentials has the# CloudWatchLogsFullAccess and AmazonS3FullAccess policies attachedAWS_ACCESS_KEY_ID = 'your-iam-access-key-id'AWS_SECRET_ACCESS_KEY = 'your-iam-access-secret-key'AWS_REGION = 'us-east-1'LOG_GROUP_NAME = 'sample-log-group'LOG_STREAM_NAME = 'sample-log-stream'
Now we will create the cloudwatch log group and log stream where we will push the logs. If your app is already in AWS then you most likely have this setup and all you need to do to adapt this to your app is to go through the steps to create s3 bucket and export the cloudwatch logs. Again, you can do this using AWS Console UI
# Create the log grouplogs_client = boto3.client('logs',aws_access_key_id=AWS_ACCESS_KEY_ID,aws_secret_access_key=AWS_SECRET_ACCESS_KEY,region_name=AWS_REGION,)try:response = logs_client.create_log_group(logGroupName=LOG_GROUP_NAME)print(f'Log group {LOG_GROUP_NAME} created successfully.')response = logs_client.put_retention_policy(logGroupName=LOG_GROUP_NAME,retentionInDays=1)except logs_client.exceptions.ResourceAlreadyExistsException:print(f'Log group {LOG_GROUP_NAME} already exists.')except Exception as e:print(f'An error occurred: {e}')# Create the log streamtry:response = logs_client.create_log_stream(logGroupName=LOG_GROUP_NAME,logStreamName=LOG_STREAM_NAME)except logs_client.exceptions.ResourceAlreadyExistsException:print(f'Log stream {LOG_STREAM_NAME} already exists.')except Exception as e:print(f'An error occurred: {e}')
Format logs to ensure consistency, making them easier to parse and analyze later.
def create_log(level, function_name, message, user, data):log = f':: {level} :: {function_name} :: {message} :: {user} :: {json.dumps(data)}'# NOTE: when running applications on ECS or EC2 with cloudwatch agents, the# python print and logger functions automatically goes to cloudwatchprint(log)# NOTE: this part is only called because we are not directly running# the app inside AWS, so for simplicity reason, we just push the log# directly using cloudwatch api. But for apps running on AWS, then the# print(log) above will do the same logs as the put_log_events below# so the put_log_events is not necessarylogs_client.put_log_events(logGroupName=LOG_GROUP_NAME,logStreamName=LOG_STREAM_NAME,logEvents=[{'timestamp': round(datetime.timestamp(datetime.now())) * 1000,'message': log}])
# Specify log data, this is a dict with any key-value you want, this# just gives additional information to the logs such as variable content# or object valueslog_data = {"parameter_1": 1,"parameter_2": 2,"endpoint": "https://api.endpoint.sample.log/test"}# This helps to identify which function invoked loggingcaller_function = 'sample_function'# This is like the title or short description of logslog_message = 'sample log'# Log Level could be INFO, WARNING, ERRORlog_level = 'INFO'# For applications, it is important to track actions by useruser_id = '1'# create a sample info log with different parameters, so we can query them later# with filter by parameterfor i in range(1, 10):log_data['parameter_1'] = i % 2# this format is important since all logs will follow this format so we can# parse them latercreate_log(level=log_level,function_name=caller_function,message=log_message,user=user_id,data=log_data)# create a sample error loglog_level = 'ERROR'log_data['error'] = "sample_error traceback"create_log(level=log_level,function_name=caller_function,message=log_message,user=user_id,data=log_data)
Again, this part can be done through AWS Console UI if you want to. We are only doing this here to make it easy to replicate by simply running the code
try:response = s3.create_bucket(Bucket=S3_BUCKET_NAME,)print(f'Bucket {S3_BUCKET_NAME} created successfully.')except s3.exceptions.BucketAlreadyOwnedByYou:print(f'Bucket {S3_BUCKET_NAME} already exists and you own it.')except s3.exceptions.BucketAlreadyExists:print(f'Bucket {S3_BUCKET_NAME} already exists, but not necessarily owned by you.')except Exception as e:print(f'An error occurred: {e}')
These are important to allow cloudwatch to export logs. Note that this policiy is not strict since we are doing testing, on production make sure that you specify conditions to only allow specific accounts
# Add permission for cloudwatch to export logs to the bucket created abovetry:s3.put_bucket_policy(Bucket=S3_BUCKET_NAME,Policy=json.dumps({"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"Service": f"logs.{AWS_REGION}.amazonaws.com"},"Action": "s3:PutObject","Resource": f"arn:aws:s3:::{S3_BUCKET_NAME}/*"},{"Effect": "Allow","Principal": {"Service": f"logs.{AWS_REGION}.amazonaws.com"},"Action": "s3:GetBucketAcl","Resource": f"arn:aws:s3:::{S3_BUCKET_NAME}",},]}))print(f"Bucket policy updated successfully for bucket {S3_BUCKET_NAME}.")except Exception as e:print(f"An error occurred: {e}")
After this, we are about to go to fun part. Again, you can also do this export via AWS Console
try:print('giving some time for cloudfront logs to be ready for export')sleep(90) # give some seconds for cloudwatch to process logsdt_now = datetime.now()ts_yesterday = round(datetime.timestamp(dt_now - timedelta(days=1))) * 1000ts_tomorrow = round(datetime.timestamp(dt_now + timedelta(days=1))) * 1000response = logs_client.create_export_task(taskName='SampleExportTaskToS3',logGroupName=LOG_GROUP_NAME,logStreamNamePrefix=LOG_STREAM_NAME,fromTime=ts_yesterday,to=ts_tomorrow,destination=S3_BUCKET_NAME,)print(f"Export task initiated. Task ID: {response['taskId']}")except Exception as e:print(f"An error occurred: {e}")
This is what we have been waiting for, all the tedious steps above are necessary so we can get to the actual options on how to analyze logs.
Athena treats your logs as a table. With the right schema, you can use SQL-like queries to extract crucial information, identify trends, debug issues, and even set up alerts for specific log patterns. Ensure your logs are consistently formatted to maximize the potential of Athena's querying capabilities.
Tip: Leverage AWS's built-in documentation and tutorials to understand how to use Athena with your specific log format. They provide extensive guides and examples for numerous use cases.
CREATE EXTERNAL TABLE cloudwatch_logs (log_timestamp STRING,log_level STRING,caller_function STRING,log_message STRING,user_id STRING,log_data STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'WITH SERDEPROPERTIES ('input.regex' = '^(\\S+)\\s+::\\s+(.*?)\\s+::\\s+(.*?)\\s+::\\s+(.*?)\\s+::\\s+(.*?)\\s+::\\s+([\\{,\\[].*[\\},\\]])$')LOCATION 's3://<S3_BUCKET_NAME>/exportedlogs/'
select * from cloudwatch_logs
select log_level, count(*) from cloudwatch_logs group by log_level;
select * from cloudwatch_logs where user_id = '1'
Another powerful thing to do is to parse the content of the dict of log data. If you check the create_log function above, you can pass a dict to add more data to the logs.
WITH dataset AS (SELECT log_dataAS blobFROM cloudwatch_logs WHERE log_timestamp IS NOT NULL)SELECTjson_extract(blob, '$.parameter_1') AS parameter_1,json_extract(blob, '$.parameter_2') AS parameter_2,json_extract(blob, '$.error') AS errFROM dataset
If you want to go deeper, then you can even query by the content of the log data object:
WITH dataset AS (SELECT log_dataAS blobFROM cloudwatch_logs WHERE log_timestamp IS NOT NULL)SELECTjson_extract(blob, '$.parameter_1') AS parameter_1,json_extract(blob, '$.parameter_2') AS parameter_2,json_extract(blob, '$.error') AS errFROM datasetWHERE json_extract_scalar(blob, '$.parameter_1') = '0'
We have now elevated our logging and analytical capabilities to a whole new level by utilizing AWS tools like CloudWatch, S3, and Athena. We sincerely hope that this guide provided valuable insights and that you've garnered new knowledge to enhance your operational expertise. Remember, every log has a story; with the right tools and techniques, you're better equipped to understand it.
Stay ahead of the curve with our cutting-edge tech guides, providing expert insights and knowledge to empower your tech journey.
Subscribe to get updated on latest and relevant career opportunities