The Wayback Machine - https://web.archive.org/web/20201207091155/https://medium.com/thundra/expanding-the-serverless-war-chest-with-aws-efs-df6dbf9fa89c

Expanding the Serverless War Chest With AWS EFS

Sarjeel Yusuf
Jul 3 · 8 min read
Image for post
Image for post

On the 16th of June, AWS added to the serverless arsenal the much-coveted Amazon EFS and AWS Lambda integration, furthering an already expanding feature set of the AWS Lambda function. AWS has been continuously improving AWS Lambda since it was first released in 2014, marking the beginning of the serverless wave in cloud computing. It must be noted that Amazon EFS is not a relatively new service and is something that has been around since 2014, mostly being used in conjunction with AWS’ container services.

However, now by successfully integrating Amazon EFS with AWS Lambda, those developing the cloud have been empowered to achieve greater use cases and more convenient data operations. Amazon EFS is AWS’s file-sharing service that allows users to manage file shares resembling traditional networks. It can be mounted on various compute machines both in on-prem and in the cloud, with AWS Lambda being the newest addition to the list. As a result, this has opened up various use cases that were not initially possible with AWS Lambda’s previous feature set. This was mainly due to the Lambda function’s limitation of 512MB of /tmp directory storage.

However, with the new Amazon EFS and AWS Lambda integration, developers can overcome this barrier. By now gaining the ability to work with larger file stores, new use cases and functionalities are unlocked. This also increases AWS Lambda’s desirability, pushing for overall serverless adoption.

The Problem and the Maneuver

The core issue is the fact that the /tmp directory has a hard limit of 512MB. That means when developers are building their AWS Lambda functions code, one must always be aware of this limit and how they are using it. After all, the /tmp directory is meant for temporary storage, Therefore once the serverless worker node is torn down, the data within the /tmp is also no longer available.

This is where the concept of “stateless” computing arises as the state of the function is kept of the worker node which can be considered the server instance. The data is usually stored and fully managed in other fully managed AWS store resources such as Amazon S3 and Amazon DynamoDB. This data is then accessed by the AWS Lambda function in its next invocation as it pulls the needed “state” or data from these external stores.

Therefore, developers usually work around the limit of internal storage by leveraging filestream concepts in the programming languages of their choice to read, process and write files on external storage resources, without storing the entire file in the AWS Lambda function itself. The methodology is well adapted especially when using S3 as the external store resource. As a result, two questions come to mind. The first being, what was the need for adding the Amazon EFS integration, if Amazon S3 was already available? The second question is what are the use cases that are now achievable as compared to when the AWS EFS integration was not available?

The second question is further articulated considering best practices and anti-patterns when building serverless applications. In most cases, developers familiar with AWS Lambda will not hit this limit because it’s often thought of as an anti-pattern. Nevertheless, there may be a niche set of use cases that would have required temporary storage which is now available with Amazon EFS.

A Step Beyond S3

Amazon S3 provides simple object storage as compared to Amazon EFS which is an elastic file storage system, scaling to the storage requirements you need automatically. Amazon EFS is mainly used to service the needs of SaaS and content management systems whereas Amazon S3 is mostly used to hold objects and service static websites.

Moreover, Amazon EFS is faster than Amazon S3, achieving lower latency, and higher IOPS. As a result, Amazon EFS is best suited for large quantities of data, such as datasets fed to machine learning algorithms. Amazon EFS allows concurrent access to various instances connected to it via access points, making it possible to process and analyze large amounts of data seamlessly. This is something that would not be as conveniently achievable when using Amazon S3.

Moreover, considering the pricing structure of the two services, Amazon EFS could prove to be more cost-effective. On the face of it, Amazon S3 is definitely the cheaper option considering the pricing option. However, if once the pricing structure of the service is broken down, and the use cases are taken into account, it is noticeable that Amazon EFS would be the better option to go with. This is further realized with the Bursting throughout mode where the developer does not incur costs related to bandwidths and requests. The charge that would be incurred is a fixed value of 0.30 USD per GB per month in the Standard Storage tier of Amazon EFS, in the Bursting throughput mode.

Additionally, sticking on the topic of cost, Amazon EFS, as compared to Amazon S3, charges per usage. This is a familiar cost structure for those already building applications with AWS Lambda functions considering the pay-as-you-go model which is a pillar of the serverless concept. This pricing structure can be further leveraged using the Lifecycle Management functionality to shift less frequently accessed files to more convenient pricing tiers, possibly reducing costs up to 85%.

Considering the benefits of cost-effectiveness, the ease with which Amazon EFS can handle high load-related applications, and the service’s ability to handle large data sizes, the use-cases with EFS become apparent.

Unlocked Use Cases

Machine Learning

There are, however, performance issues to take into consideration. One of them is the latency of read-writes and the other is the cold start problem. Remember, if a Lambda invocation requires the set up of the worker node, then there is some latency incurred, aka coldstarts. The way to get around this is by enabling Provisioned Concurrency.

File Sharing

Sharing data between various concurrently running instances may not be the best patterns to adopt, but it does prove its usefulness in various use cases. For example, different AWS Lambda functions can simultaneously access different blocks of a data set to perform black-box testing and writing results. The same EFS instance can then also be accessed to continuously improve a prediction model in real-time with processing conducted on another AWS Lambda function or even an Amazon EC2 instance.

Of course, the architecture is simpler aid than done. Different file permissions and communication infrastructure would need to be taken into consideration. Moreover, as discussed read/write latencies also need to be taken into account, but nonetheless these use cases are now achievable.

New Infrastructures

As a result, for those integrating their AWS Lambda functions with Amazon EFS would expect infrastructure as shown in the diagram below:

Image for post
Image for post

As is already discussed, the AWS Lambda functions are within the same private subnets of the Amazon EFS they have mounted. It should also be remembered that Amazon EFS is a regional service and therefore accessing these instances constitutes for adding mount points to the VPC. These can be added to the private subnets in which the AWS Lambda function will run. The mount target will most likely have security groups and they can thus be leveraged to allow access from other compute resources residing in the same VPC.

Conclusion

Originally published at https://blog.thundra.io.

Thundra

Full Observability For AWS Lambda

Sarjeel Yusuf

Written by

A software engineer passionate about AWS and everything serverless. Can be found talking about technologies, philosophy and laughing at the comedies of life.

Thundra

Thundra

Full Observability For AWS Lambda

Sarjeel Yusuf

Written by

A software engineer passionate about AWS and everything serverless. Can be found talking about technologies, philosophy and laughing at the comedies of life.

Thundra

Thundra

Full Observability For AWS Lambda

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store