Protecting Big Data in AWS

Published Apr 02, 2020Last updated Nov 25, 2020

Big data is here to stay but so are increasingly strict privacy regulations like General Data Protection Regulation (GDPR). Failure to comply with data regulations can result in hefty penalties and loss of brand authority. To make sure that your reputation and your data is kept safe, you need to make sure you are taking the appropriate security measures.

In this article, we’ll cover what aspects of security you’re responsible for when using AWS services, and cover some best practices to make sure your data is as well-protected as possible.

AWS Shared Security Responsibility

The AWS shared security responsibility model states that AWS is responsible for securing their infrastructure and you are responsible for the rest. Your responsibility includes access and authentication policies, data security, and accountability for external networks, applications, and third-party integrations.

This might seem overwhelming but AWS does provide tools for you to manage these responsibilities, as well as extensive documentation. In the tools provided some features will be enabled by default and you simply need to make sure not to disable them unless you have an alternative. Other features require that you correctly configure your settings and apply the appropriate policies or integrations.

Best Practices

The specific measures you need to take will depend on the services you’re using, your access requirements, and specific limitations imposed by your data, but the following best practices should apply to most configurations.

Data Classification
Classifying your data is the first step in keeping it protected as the process will help you determine which data is at greatest risk, what the impacts of loss or corruption would be, and what measures you can take to find the delicate balance between security and workflow agility. Once your data has been categorized, including considerations for any regulatory restrictions that may apply, you can begin applying security measures.

These measures typically fall into two categories: prevention, such as firewalls and authentication; and detection, such as regular auditing and active monitoring. Regardless of your classification, you should be including a mix of both categories, but lower priority data will require less strict controls.

In combination with any measures you’re using, you need to make sure to backup your data, as no security system is perfect. AWS Backup, a centralized tool for managing backups across AWS services, is one tool that you can use for this. Other options are to use the features available to individual services, such as EBS snapshots, or use third-party tools like

Monitoring and Logging
When monitoring your data, you should pay careful attention to who and which services are accessing your data, where they’re accessing it from, and what actions are being taken during access. Doing so will help you determine if access is legitimate or malicious. Automating monitoring and setting alerts for key events will be key to successful detection of incidents as it is near impossible to manually monitor any but the simplest of systems.

AWS CloudTrail is one tool you can use. It allows you to track user activity and API usage, including through AWS Management Console, SDK, CLI, or other services, and continuously logs activity for analysis and auditing. Used in combination with GuardDuty, you can get an added boost in protection. GuardDuty is a managed service that uses machine learning, pattern-based signatures, and external threat intelligence sources to analyze and alert on flow logs, CloudTrail logs, and DNS logs.

Identity and Access Management (IAM)
AWS IAM tools allow you to manage group or role access to your AWS services and resources and deny access to a user’s AWS account until they’ve set up multifactor authentication. Through IAM, you can separate database administration tasks (management flow) from application data access (application flow) limiting how data can be accessed and by whom.

When creating access and permissions policies, it is best to use role or group policies which you then apply to individual users rather than granting permissions individually, and to apply the principle of least privilege whenever possible. Enforcing strong password policies and regularly rotating keys will reduce the risk of compromised credentials being used against you.

If you are unable to use IAM, due to your specific services or configuration, a secrets management system, like AWS Secrets Manager, can be a good alternative.

Network Isolation and Security-Zone Modelling
Using a Virtual Private Cloud (VPC) to store your databases or requiring access through VPC will keep traffic within your network and allow you to use of private IP addresses, both of which greatly restrict attackers’ potential access points.

Within your network, you can use security groups to provide micro-segmentation for application stack components or use swim lane isolation, accomplished via IAM controls and network flow controls, to group microservices together and ensure that services are only able to access the data they need.

Another option is to use zone-modeling, in which data is grouped into zones according to priority. Higher priority zones are layered beneath lower security ones, requiring multiple layers of authentication for high priority levels. This is accomplished by enforcing network control flow through a combination of Access Control Lists (ACLs) and IAM policies. Alternatively, you can use a network security solution to extend the controls offered natively by AWS.

Encryption
You should be using encryption whenever possible, both at-rest and in-transit.
AWS has two built-in services, the Key Management Service (AWS KMS) and Customer Master Keys (CMK), that integrate with AWS databases and can be used to manage and secure your data with AES-256 encryption. These services can also be used to achieve client-side encryption if your configuration supports it, but be mindful that doing so might limit native database functions.

If you are unable to encrypt data due to database or application restrictions, consider using tokenization. Doing so allows the use of one-time-use tokens and is less complex to manage than end-to-end encryption.

Wrap Up

Storing your data in the cloud can be significantly more secure than keeping it on-premises but only if you make sure to use the security tools available to you correctly and consistently. By implementing these best practices, consistently monitoring your systems, and periodically auditing your settings for configuration creep, you can rest easier knowing that your data is as secure as possible.

Big data Amazon web service

Report

Enjoy this post? Give Eddie Segal a like if it's helpful.

Eddie Segal

Data Scientist

A flavoured data scientist, with 5 years of experince in data, half in data science. Recently engaged in developing Deep Learning based systems, with ho python (Tensorflow).

Discover and read more posts from Eddie Segal

get started