Startups face unique challenges from financing to user engagement to hiring and more. One of the most important but often underestimated elements is cloud infrastructure as a critical component for success.
I’ve worked with companies in every stage of growth on their AWS environment and it’s typically startups or organizations new to the cloud that can benefit the most from strong devops experience. Following best practices early in the setup or migration to Amazon Web Services will save precious time and money. For startups with a short window of opportunity, this can be the difference between success and failure. Unfortunately, I have seen poorly planned, inefficient infrastructure eat away at the last dollars of investment, too late to stop the bleeding.
While many companies face similar constraints, startups tend to face at least these critical issues.
- Running lean: Burn rate and runway are always top of mind. Getting the most done in the shortest amount of time for the least amount of money is your priority.
- No systems expertise: The founding partners I’ve worked with are often some combination of business operations, marketing, software development or product. When an MBA and a coder start an app company, there is typically nobody on the team who knows how to properly set up the web platform.
- Planning for scale: Most new ventures want to onboard thousands of new users a day, and that’s a challenge for the simple database-and-a-web-server prototype many startups build their MVP on. But even at smaller growth rates, scale quickly becomes an issue. Even going from 20 beta users to 1000 after launch is a problem if you haven’t thought ahead on the systems side.
- Time Pressure: I know, every company has time pressure. Startups, however, have the added intensity of knowing their money will be gone or their slim window of market opportunity will close.
What are you Optimizing For?
Start by clarifying your technical priorities to determine which cloud efficiencies are most important. In most environments it’s hard to optimize for all the advantages AWS offers. Often, there is a narrow focus on cost, development time and scale.
Like everyone else, you want the quickest to build, cheapest to run, most scalable architecture, although achieving all three is something akin to the “Fast, Good, Cheap. Pick Two” of cloud computing. If it’s the cheapest, it’s probably not going to scale well. If it scales well, it’s probably not going to be built overnight.
Primary Factors in building cloud systems:
- Time to Market
The dimensions that are going to lead you to the solution that fits your situation best, however, are not always this obvious. Primary factors should also be based on the skills on your development team, the system administration expertise in your organization (or lack thereof), website traffic patterns and developer efficiency. Weighting these in your decision will ultimately lead to cheaper total costs, developer happiness and productivity, and the capacity for explosive expansion.
Other Important Factors:
- Available Systems Experience and Developer Skill Sets
- Website Traffic Patterns
- Developer Efficiency
So with everything AWS has to offer, where do you start?
Know the tools
Make deployment easy and emphasize user experience
Know cloud economics
Save money based on your unique organizational requirements
Know how to automate
Let developers focus on features and leverage existing products
Or find someone who does.
Unless you understand the services well, I can’t emphasize “find someone who does” enough. Improvising and then getting locked into the architecture of your prototype almost always has a high cost later in actual dollars or wasted productivity. Bringing devops help in as soon as possible will be invaluable as your team and needs grow.
In fact, whether or not your team has any ops skill is perhaps the most important factor in choosing your environment. If you have no devops talent available, then it should limit your choices. You don’t want to find yourself with a system that’s complex, hard to maintain and requires specialized skills to fix or upgrade when it breaks. You want your team adding features, not fixing servers.
Following these principles will have a lasting, exponential effect on the time to market and growth of your project.
KNOW THE TOOLS
The goal is to learn how to evaluate the products to fit your specific resources and capabilities because it’s not necessarily a one-time process. You may be doing this multiple times as your team and product evolve.
Below are a few example use cases. These aren’t intended to cover every scenario, just some of the more basic ones simply to illustrate available options.
Dynamic website with predictable traffic
100K users visit your site per day, primarily from the U.S. It is managed by a CMS to contribute articles, tags and links to the database. You also store user accounts, settings, and comments.
This would traditionally live on multiple web servers running a LAMP (Linux, Apache, MySQL and PHP/Python) or MEAN (Mongo/NoSQL, Express, Angular/Backbone/React, Node.js) stack. You might have multiple data stores (one for content, one for user data) with a caching layer like Redis.
In AWS, you could simplify with Elastic Beanstalk and RDS. Elastic Beanstalk gives you a fully managed web application environment with load balancing and deployment built-in. You also have a web UI to manage instance quantity and size. RDS fully manages your databases with automated backups. One way to eliminate a separate cache and reduce maintenance is to create database replicas to handle your frequent reads like user authorization for account login.
Finally, because your traffic is mostly U.S., you will have far fewer users in the middle of the night. You can set up auto scaling to automatically add instances in the morning and remove them at night to save money.
You won’t have as many options for your server and language versions as you normally would with this setup. You also may not be able to do as much about page load times or deployment speeds, which can be slower. If those are acceptable tradeoffs, this is an environment that does not require a systems expert.
Mostly static site with steady, heavy, international traffic 24/7
In a modern cloud architecture, an intricate web server is pretty useless for this kind of scenario. Serverless Lambda functions are too slow and overkill for serving static files. Elastic Beanstalk is unnecessary because your traffic is consistent and does not require much backend logic.
Knowing how AWS works gives you an obvious pattern: basic web server + S3 (media storage) + CloudFront (edge caching).
Just knowing that these cloud patterns exist allow you to eliminate entire areas of maintenance.
Lightweight B2B product with minimal users
Using a traditional model, the first instinct would be to set up a couple of redundant web servers and a database. You need to keep these up and running, so this project quickly expands to supporting pieces like load balancers and alerts on disk space, memory and downtime.
The deployment model is not as smooth and you have to manage versions of the API Gateway and the lambda functions manually, but this is an acceptable tradeoff for virtually no developer friction in getting features out. Adding more moving parts here would cost more and slow you down at this early stage. Even as a prototype or development environment, this is easier than standing up individual instances.
KNOW CLOUD ECONOMICS
With so many possibilities for building your app, it’s important to know not only the built-in savings but also the hidden values that come with a deeper knowledge of your cloud provider. In some cases, simply restructuring existing pieces is cheaper, while in others migrating to a different technology is worth the time and learning curve.
Here are a few common real-world examples to illustrate, although this list could be much longer.
Instance Cost: How many instances do you really need?
Often when load balancing the debate is about many small servers versus a few large ones.
Are 4 smaller t2.medium web servers cheaper than 3 bigger c4.large?
Technically, yes (4 x $.047/hour = $.188/hour vs. 3 x $.10/hour = $.30/hour). If you are running a CPU or network intensive application, however, or if your application suffers when you take a server out of rotation behind your load balancer because of the extra load on the remaining instances, this could actually cost you. Engineering time to manage extra load, lost users or downtime due to maintenance is costly.
Instance allocation has to be considered from more than just the cost-per-hour angle.
Traffic Management: Can the app use auto scaling or a CDN?
This one can be harder until a point of stability is reached, but knowing traffic patterns can lead to huge savings. Using Auto Scaling, On-demand or Reserved Instances can be more involved, but worth the cost to set up depending on workloads.
As mentioned, a solution like CloudFront almost always reduces web page load time. It can also be cheaper. If offloading work to the edge of a CDN allows you to scale down your internal network enough, the savings are worth it.
Bandwidth: Are you serving data from the cheapest source?
Serving static media like images from S3 is not always the right solution, but it can be if you can:
- Remove web servers and EBS volumes that are serving static files
- Save storage costs
- Shift traffic from your load balancers to a cheaper path
Many people do not realize that it costs over 4X as much to store data on servers with EBS volumes than it does in S3:
$0.10 per GB-month of provisioned storage on SSD (gp2) volumes vs. $0.023 per GB for standard storage, first 50TB.
Another commonly overlooked and easy improvement is to serve compressed files where possible. You can do this with CloudFront and various web servers.
Data Storage: Are you archiving and deleting data?
Storing data in S3 and not sending old, unused files to glacier is leaving money on the table. Likewise, database and volume snapshots have to be actively managed to the end of their lifecycle.
KNOW HOW TO AUTOMATE
It is extremely common to take an informal approach to systems in early stage technical environments, particularly with deployment and configuration. There is nothing inherently wrong with this if you are prototyping to eventually find stability. Living with
rsync for deployment or manually configuring instances is not a long-term solution. To avoid accidental breakage and the potential headaches that come with managing an increasing number of team members and services, follow the Infrastructure as Code methodology.
Your dev team needs an easy way to get code to production or test environments. They need to quickly understand how it works when making improvements and training new hires. Spending hours figuring out undocumented or manual steps is too common and a huge waste of time. Incorporate automation into your organization from the very beginning.
The strongest recommendation I can give here is to employ configuration management as you build. When you install packages, deploy code and update configurations this way, you get self-documenting tasks without any extra work. Scaling is often a matter of applying a playbook to multiple instances.
The next strongest recommendation is to adopt a deployment method early as well, preferably using CI/CD if you can implement solid test coverage. CodeDeploy is integrated, but even a repeatable, idempotent set of tasks using the AWS CLI tool is better than custom commands.
Optimally, all system components like web servers, databases and load balancers are automated. The need to replicate entire environments is essential for integration testing, developer sandboxes, migrations and feature demos. Consider CloudFormation and Terraform for this.
Common Tools for Infrastructure and Code Deployment:
- CloudFormation Template AWS Resource stacks
- OpsWorks: Configuration Management
Optimize for Simplicity Until You Can Afford Not To
The main question is whether you need and can support a more sophisticated infrastructure to meet your goals. Those goals may be a better user experience or faster product iteration. If you have access to qualified engineers, then shaving 300ms off page loads to increase user retention by a few percent is worth it. You can build out the right stack to support it at the cost of more complexity. Until then, the right balance between business needs and infrastructure maintenance is the primary factor.
Early in the growth process, questions about the future of your environment can be hard to answer. Even so, the sooner you start thinking about why and how to optimize, the healthier your development process and technical foundation will be. Best practices in AWS systems and infrastructure are a core component of a successful startup. Investing in the right expertise will pay valuable dividends over the life of your business.