<![CDATA[DevOps Blog]]>https://blog.davestern.com/https://blog.davestern.com/favicon.pngDevOps Bloghttps://blog.davestern.com/Ghost 3.12Thu, 26 Mar 2020 20:58:58 GMT60<![CDATA[10 Cloud and Web Security Questions You Should Be Asking]]>https://blog.davestern.com/10-cloud-and-web-security-questions-you-should-be-asking/5e38255a38347c0018410691Mon, 04 Jun 2018 17:40:48 GMT10 Cloud and Web Security Questions You Should Be Asking

If you run a SaaS business or mission critical software, the security of your web platform is crucial to your success. You may not, however, know the questions to ask your technical staff or outsourced providers. Knowing the answers to these questions can highlight vulnerabilities you need to address before they become a nightmare scenario.

This list and associated recommendations are not comprehensive, but should be used as the starting point to a larger assessment of your valuable technical assets. If you don't have complete answers to these questions from your team, bring in a security professional to analyze and address the issues before they are exploited.

1. Are your development or staging environments leaking data or credentials?

Developers often copy actual production data to a development environment to test new features. This isn't necessarily bad practice depending on the sensitivity of the data, but it can be if PII (Personally Identifiable Information) is involved. They also often use the same usernames and passwords to connect to the database, a caching system or other resources. The problem is that development environments are not usually as secure as production and make for an easier target to attack. Solutions like mock data, a subset of the production database and different credentials should be used.

2. How does your application get authorization to connect to other systems?

Web servers often need to use credentials to interact with data from another system like the database. They can also be used for automated activity on a third-party service like an API. Those credentials should not be in your code or checked in to your version control system. They should be encrypted and ideally should be temporary with short expirations.

3. Are any critical services available on public IP addresses?

Nothing in your network should be on the public internet unless absolutely necessary, especially your database. These belong in a private network with tightly restricted access.

4. Do you use HTTP instead of HTTPS anywhere that is publicly available?

For the resources that are available publicly like load balancers, web servers and the CDN (Content Delivery Network), HTTP should disabled or automatically redirected to HTTPS.

5. If any component of your infrastructure were breached, how much access would an attacker have from that system?

For each individual system on your network like servers, databases, monitoring hosts and bastion hosts you should know what else could potentially be compromised. Limiting the damage is done by Network Segmentation. Many attacks start by exploiting a small crack in the armor, then spread from the inside, so separating access internally is important.

6. Are you limiting outbound as well as inbound traffic?

Modern networks don't assume attacks will come from the outside. Good security also includes limiting what can leave your servers or networks. Malware and other types of attacks try to spread by contacting local servers or "phone home" to a master server and get updates. Control exactly what your internal resources can reach to firewall any malicious activity.

7. Do you have an exit process for technical staff and contractors?

When a user leaves your company, especially privileged administrators or developers, disabling their access thoroughly is an important step. Even if they leave on good terms, their credentials can still get in the hands of the bad guys. Old, unused logins and passwords are easy targets. Create a process that serves as a central inventory of the services used and as a checklist for offboarding. Similarly, you should have a process for onboarding new hires and keep both updated.

8. Do have clearly defined privilege levels for everyone working on your web application?

In the mad rush to get products to market, we often just give everyone the "Full Admin Access" IAM role out of convenience. For trusted team members, this is sometimes needed for working on highly interconnected projects. For contractors or other users, this can end badly. Whether by intent or ignorance, giving too much permission opens the door for catastrophic problems. Create well-named groups of access and assign each user to the appropriate group. Even senior administrators and developers should be working with limited privileges unless escalation is necessary. These include user roles, ssh keys, IAM policies, DB permissions, etc.

9. Have you ensured your employees and contractors are encrypting their drives, locking their devices and using Multi-Factor Authentication?

All it takes is one stolen laptop or eavesdropped wifi connection to ruin your company's day. Make sure all your employees, including the non-technical staff, practice good security. My blog post on securing your digital life is a good place to start.

10. Have you checked all your S3 buckets to make sure they are private?

Specific to AWS, S3 is an incredible resource and dangerous vulnerability if left open. Many major recent data breaches occurred because of unsecured S3 buckets. Make sure all your buckets are private unless they require public access for an explicit purpose like hosting a public website. Even then, using CloudFront and OAI with public access disabled is a better option.

Next Steps

Regularly assessing the state of your infrastructure will not only help keep your data safe, but it will also improve overall operational efficiency. Make sure you have a qualified, up-to-date specialist reviewing these and related questions, making actionable recommendations and educating your team.

<![CDATA[AWS for Startups (and everyone else): Optimization Factors]]>

Startups face unique challenges from financing to user engagement to hiring and more. One of the most important but often underestimated elements is cloud infrastructure as a critical component for success.

I've worked with companies in every stage of growth on their AWS environment and it's typically startups or organizations

https://blog.davestern.com/aws-for-startups-and-everyone-else-optimization-factors/5e38255a38347c001841068cTue, 06 Jun 2017 15:21:42 GMT

Startups face unique challenges from financing to user engagement to hiring and more. One of the most important but often underestimated elements is cloud infrastructure as a critical component for success.

I've worked with companies in every stage of growth on their AWS environment and it's typically startups or organizations new to the cloud that can benefit the most from strong devops experience. Following best practices early in the setup or migration to Amazon Web Services will save precious time and money. For startups with a short window of opportunity, this can be the difference between success and failure. Unfortunately, I have seen poorly planned, inefficient infrastructure eat away at the last dollars of investment, too late to stop the bleeding.

While many companies face similar constraints, startups tend to face at least these critical issues.

  • Running lean: Burn rate and runway are always top of mind. Getting the most done in the shortest amount of time for the least amount of money is your priority.
  • No systems expertise: The founding partners I've worked with are often some combination of business operations, marketing, software development or product. When an MBA and a coder start an app company, there is typically nobody on the team who knows how to properly set up the web platform.
  • Planning for scale: Most new ventures want to onboard thousands of new users a day, and that's a challenge for the simple database-and-a-web-server prototype many startups build their MVP on. But even at smaller growth rates, scale quickly becomes an issue. Even going from 20 beta users to 1000 after launch is a problem if you haven't thought ahead on the systems side.
  • Time Pressure: I know, every company has time pressure. Startups, however, have the added intensity of knowing their money will be gone or their slim window of market opportunity will close.

What are you Optimizing For?

Start by clarifying your technical priorities to determine which cloud efficiencies are most important. In most environments it's hard to optimize for all the advantages AWS offers. Often, there is a narrow focus on cost, development time and scale.

Like everyone else, you want the quickest to build, cheapest to run, most scalable architecture, although achieving all three is something akin to the ["Fast, Good, Cheap. Pick Two"](https://en.wikipedia.org/wiki/Project_management_triangle#.22Pick_any_two.22" target="_blank) of cloud computing. If it's the cheapest, it's probably not going to scale well. If it scales well, it's probably not going to be built overnight.

Primary Factors in building cloud systems:

  • Cost
  • Time to Market
  • Scale

The dimensions that are going to lead you to the solution that fits your situation best, however, are not always this obvious. Primary factors should also be based on the skills on your development team, the system administration expertise in your organization (or lack thereof), website traffic patterns and developer efficiency. Weighting these in your decision will ultimately lead to cheaper total costs, developer happiness and productivity, and the capacity for explosive expansion.

Other Important Factors:

  • Available Systems Experience and Developer Skill Sets
  • Website Traffic Patterns
  • Developer Efficiency

So with everything AWS has to offer, where do you start?

Know the tools

Make deployment easy and emphasize user experience

Know cloud economics

Save money based on your unique organizational requirements

Know how to automate

Let developers focus on features and leverage existing products

Or find someone who does.

Unless you understand the services well, I can't emphasize "find someone who does" enough. Improvising and then getting locked into the architecture of your prototype almost always has a high cost later in actual dollars or wasted productivity. Bringing devops help in as soon as possible will be invaluable as your team and needs grow.

In fact, whether or not your team has any ops skill is perhaps the most important factor in choosing your environment. If you have no devops talent available, then it should limit your choices. You don't want to find yourself with a system that's complex, hard to maintain and requires specialized skills to fix or upgrade when it breaks. You want your team adding features, not fixing servers.

Following these principles will have a lasting, exponential effect on the time to market and growth of your project.


The goal is to learn how to evaluate the products to fit your specific resources and capabilities because it's not necessarily a one-time process. You may be doing this multiple times as your team and product evolve.

Below are a few example use cases. These aren't intended to cover every scenario, just some of the more basic ones simply to illustrate available options.

Dynamic website with predictable traffic

100K users visit your site per day, primarily from the U.S. It is managed by a CMS to contribute articles, tags and links to the database. You also store user accounts, settings, and comments.

This would traditionally live on multiple web servers running a [LAMP](https://en.wikipedia.org/wiki/LAMP_(software_bundle)" target="blank) (Linux, Apache, MySQL and PHP/Python) or [MEAN](https://en.wikipedia.org/wiki/MEAN(software_bundle)" target="_blank) (Mongo/NoSQL, Express, Angular/Backbone/React, Node.js) stack. You might have multiple data stores (one for content, one for user data) with a caching layer like [Redis](https://redis.io/" target="_blank).

In AWS, you could simplify with [Elastic Beanstalk](https://aws.amazon.com/elasticbeanstalk/" target="_blank) and [RDS](https://aws.amazon.com/rds/" target="_blank). Elastic Beanstalk gives you a fully managed web application environment with load balancing and deployment built-in. You also have a web UI to manage instance quantity and size. RDS fully manages your databases with automated backups. One way to eliminate a separate cache and reduce maintenance is to create database replicas to handle your frequent reads like user authorization for account login.

Finally, because your traffic is mostly U.S., you will have far fewer users in the middle of the night. You can set up [auto scaling](http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.managing.as.html" target="_blank) to automatically add instances in the morning and remove them at night to save money.

You won't have as many options for your server and language versions as you normally would with this setup. You also may not be able to do as much about page load times or deployment speeds, which can be slower. If those are acceptable tradeoffs, this is an environment that does not require a systems expert.

Mostly static site with steady, heavy, international traffic 24/7

Your site is mostly visual media with a large audience from around the world. It uses a simple database to catalog content, but serving the static video, picture, audio and javascript files are the most resource-intensive. Page load times are critically important.

In a modern cloud architecture, an intricate web server is pretty useless for this kind of scenario. Serverless Lambda functions are too slow and overkill for serving static files. Elastic Beanstalk is unnecessary because your traffic is consistent and does not require much backend logic.

Knowing how AWS works gives you an obvious pattern: basic web server + [S3](https://aws.amazon.com/s3/" target="_blank) (media storage) + [CloudFront](https://aws.amazon.com/cloudfront/" target="_blank) (edge caching).

Your web server could easily be API Gateway or Elastic Beanstalk here, but you could also use nginx or apache instances in a basic configuration. All your media, javascript, css and font files go into S3, Amazon's object storage. CloudFront is the CDN. On the first request from a region, CloudFront gets the object from S3 and caches it locally. On subsequent requests, users from France, for example, will get the object from Ireland until it expires, which is much faster than a retrieval from the U.S.

Just knowing that these cloud patterns exist allow you to eliminate entire areas of maintenance.

Lightweight B2B product with minimal users

You're building a pricing search engine targeting a niche market of a million potential businesses. The site is heavy on client-side Javascript UI with search filters and the backend source data updates once per day.

Using a traditional model, the first instinct would be to set up a couple of redundant web servers and a database. You need to keep these up and running, so this project quickly expands to supporting pieces like load balancers and alerts on disk space, memory and downtime.

If the backend is in a common language like Node.js or python, this is a perfect scenario for [API Gateway + Lambda](http://docs.aws.amazon.com/apigateway/latest/developerguide/getting-started.html" target="_blank). A truly "serverless" environment with virtually no systems management required. You can even easily integrate [DynamoDB](https://aws.amazon.com/dynamodb/" target="_blank), a fully managed NoSQL database as your data store. The front-end javascript is served from S3. You don't have to monitor anything.

The deployment model is not as smooth and you have to manage versions of the API Gateway and the lambda functions manually, but this is an acceptable tradeoff for virtually no developer friction in getting features out. Adding more moving parts here would cost more and slow you down at this early stage. Even as a prototype or development environment, this is easier than standing up individual instances.


With so many possibilities for building your app, it's important to know not only the built-in savings but also the hidden values that come with a deeper knowledge of your cloud provider. In some cases, simply restructuring existing pieces is cheaper, while in others migrating to a different technology is worth the time and learning curve.

Here are a few common real-world examples to illustrate, although this list could be much longer.

Instance Cost: How many instances do you really need?

Often when load balancing the debate is about many small servers versus a few large ones.

Are 4 smaller t2.medium web servers cheaper than 3 bigger c4.large?

Technically, yes (4 x $.047/hour = $.188/hour vs. 3 x $.10/hour = $.30/hour). If you are running a CPU or network intensive application, however, or if your application suffers when you take a server out of rotation behind your load balancer because of the extra load on the remaining instances, this could actually cost you. Engineering time to manage extra load, lost users or downtime due to maintenance is costly.

Instance allocation has to be considered from more than just the cost-per-hour angle.

Traffic Management: Can the app use auto scaling or a CDN?

This one can be harder until a point of stability is reached, but knowing traffic patterns can lead to huge savings. Using [Auto Scaling](https://aws.amazon.com/autoscaling/" target="_blank), [On-demand or Reserved Instances](https://aws.amazon.com/ec2/pricing/" target="_blank) can be more involved, but worth the cost to set up depending on workloads.

As mentioned, a solution like CloudFront almost always reduces web page load time. It can also be cheaper. If offloading work to the edge of a CDN allows you to scale down your internal network enough, the savings are worth it.

Bandwidth: Are you serving data from the cheapest source?

Serving static media like images from S3 is not always the right solution, but it can be if you can:

  • Remove web servers and EBS volumes that are serving static files
  • Save storage costs
  • Shift traffic from your load balancers to a cheaper path

Many people do not realize that it costs over 4X as much to store data on servers with EBS volumes than it does in S3:
$0.10 per GB-month of provisioned storage on SSD (gp2) volumes vs. $0.023 per GB for standard storage, first 50TB.

Another commonly overlooked and easy improvement is to serve compressed files where possible. You can [do this with CloudFront](http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html" target="_blank) and [various web servers](https://en.wikipedia.org/wiki/HTTP_compression" target="_blank).

Data Storage: Are you archiving and deleting data?

Storing data in S3 and not sending old, unused files to glacier is leaving money on the table. Likewise, database and volume snapshots have to be actively managed to the end of their lifecycle.


It is extremely common to take an informal approach to systems in early stage technical environments, particularly with deployment and configuration. There is nothing inherently wrong with this if you are prototyping to eventually find stability. Living with rsync for deployment or manually configuring instances is not a long-term solution. To avoid accidental breakage and the potential headaches that come with managing an increasing number of team members and services, follow the [Infrastructure as Code](https://martinfowler.com/bliki/InfrastructureAsCode.html" target="_blank" target="_blank) methodology.

Your dev team needs an easy way to get code to production or test environments. They need to quickly understand how it works when making improvements and training new hires. Spending hours figuring out undocumented or manual steps is too common and a huge waste of time. Incorporate automation into your organization from the very beginning.

The strongest recommendation I can give here is to employ configuration management as you build. When you install packages, deploy code and update configurations this way, you get self-documenting tasks without any extra work. Scaling is often a matter of applying a playbook to multiple instances.

The next strongest recommendation is to adopt a deployment method early as well, preferably using CI/CD if you can implement solid test coverage. [CodeDeploy](https://aws.amazon.com/codedeploy/" target="_blank) is integrated, but even a repeatable, idempotent set of tasks using the [AWS CLI tool](https://aws.amazon.com/cli/" target="_blank) is better than custom commands.

Optimally, all system components like web servers, databases and load balancers are automated. The need to replicate entire environments is essential for [integration testing](https://en.wikipedia.org/wiki/Integration_testing" target="_blank), developer sandboxes, migrations and feature demos. Consider [CloudFormation](https://aws.amazon.com/cloudformation/" target="_blank) and [Terraform](https://www.terraform.io/" target="_blank) for this.

Common Tools for Infrastructure and Code Deployment:

Optimize for Simplicity Until You Can Afford Not To

The main question is whether you need and can support a more sophisticated infrastructure to meet your goals. Those goals may be a better user experience or faster product iteration. If you have access to qualified engineers, then shaving 300ms off page loads to increase user retention by a few percent is worth it. You can build out the right stack to support it at the cost of more complexity. Until then, the right balance between business needs and infrastructure maintenance is the primary factor.

Early in the growth process, questions about the future of your environment can be hard to answer. Even so, the sooner you start thinking about why and how to optimize, the healthier your development process and technical foundation will be. Best practices in AWS systems and infrastructure are a core component of a successful startup. Investing in the right expertise will pay valuable dividends over the life of your business.

<![CDATA[AWS: Latency Routing with API Gateway and haproxy]]>

I recently had the pleasure of working with the great team at Chameleon to migrate to AWS lambda and API Gateway. The primary goal was to geographically route users to the closest region for the lowest latency possible. One important requirement was that the files served to client websites must

https://blog.davestern.com/aws-latency-routing-with-api-gateway-and-haproxy/5e38255a38347c0018410689Wed, 18 Jan 2017 23:32:34 GMT

I recently had the pleasure of working with the great team at Chameleon to migrate to AWS lambda and API Gateway. The primary goal was to geographically route users to the closest region for the lowest latency possible. One important requirement was that the files served to client websites must be real-time and not cached so customers could see their changes immediately.

We tried this using only API Gateway and Cloudfront, but AWS limitations with custom domains made that impossible. According to posts on the AWS Forums, these are likely to be resolved in the near future, but we needed an alternate strategy. You can go straight to our haproxy Solution or read on to learn more about why this was necessary.

The Goal: Faster Delivery

Chameleon allows customers to build and optimize product tours without code. The targeted tours run inside the customer app or site and are optimized for customer conversion and retention.

Growing quickly and committed to the fastest possible user experience, they wanted to move to a scalable AWS solution to deliver client-side Javascript. We settled on a serverless architecture of API Gateway and Lambda because it wouldn't require maintenance of EC2 instances or Elastic Beanstalk.

Using claudia.js to deploy the Node.js function to lambda and wire up API Gateway is a straightforward task. I extended this via a shell script to deploy to multiple regions and all was well. We had their Node function running in multiple regions with built-in management of lambda versions and API Gateway stages.

The Problem: Global Latency

Unfortunately, at present AWS provides no mechanism to route to API Gateway in the region closest to the user.

The crux of the problem is that API Gateway is SSL only and requires the HTTP host header and the Server Name Indication (SNI) to match the hostname requested. Since you can only use a custom domain name in one region, each API region must have a unique custom domain name. The primary hostname used to load balance will mismatch the region chosen for many, if not all, users.

https://fast.trychameleon.com is the URL that serves the Javascript.

When the browser sends the request, it sends the Host header:

> Host: fast.trychameleon.com

It also sends the SNI with the same hostname. Even if we use fast.trychameleon.com in one region as the custom domain name, we can't use it again and the requests will fail in the other regions with mismatched host names.

In A Perfect World: Latency-Based DNS Routing

Here's an illustration of a solution that does not currently work (I'll explain why in a moment).

Ideally, we'd like to do something simple like this:

fast.trychameleon.com Route53/API Gateway diagram

If we could reuse the same custom domain, we would add one in each region for fast.trychameleon.com. The SSL cert for this hostname or a wildcard *.trychameleon.com would work in this scenario.

We would also set up a CNAME with latency routing for every region in Route53 where fast.trychameleon.com has a value of the cloudfront domain assigned to each API Gateway like random1.cloudfront.net.

The DNS records might look something like this:

fast.trychameleon.com latency DNS

Route53 would check latency and return an IP for the closest API Gateway Region. That region would have fast.trychameleon.com as a custom domain name, the Host header and SNI would match, and API Gateway would return content from the endpoint.

Why Routing Directly to API Gateway Doesn't Work

When you create a custom domain name for API Gateway, AWS creates a CloudFront distribution with the API in that region as an origin.

AWS does not allow a domain name to be associated with more than one CloudFront distribution.

Going back to our original scenario, you have to set different custom domain names for each of your API Gateway regions. If you try to add a domain name you've already used, you'll get this error:

The domain name you provided is already associated with an existing CloudFront distribution. Remove the domain name from the existing CloudFront distribution or use a different domain name. If you own this domain name and are not using it on an existing CloudFront distribution, please contact support.

Your browser will always send the Host header and SNI you requested, fast.trychameleon.com. So no matter what custom name you give your custom domains, if they are not the primary domain the customer is requesting, you will get a 403 Forbidden from CloudFront or an SSL handshake error:

< HTTP/1.1 403 Forbidden
< Date: Thu, 03 Nov 2016 22:57:43 GMT
< Content-Type: application/json
< Content-Length: 23
< Connection: close
< x-amzn-ErrorType: ForbiddenException
< x-amzn-RequestId: 1234-5678-9101112-1314
< X-Cache: Error from cloudfront
< Via: 1.1 12345678.cloudfront.net (CloudFront)
< X-Amz-Cf-Id: 1234-5678==
* Closing connection 0

Other Attempts

We considered some other possibilities, but none met our criteria.

  • Create a CloudFront Distribution with the API Gateways in each region as origins

    CloudFront will choose the first origin that matches the behavior precedence. So you will always get a response from the same region, not the closest one. And you still have the SNI/host mismatch issue.

  • Redirect the user.

    Set up an endpoint that is cached by CloudFront and returns a 302 redirect to the URL of the API closest to the user. The cost is at least one more DNS lookup and an extra roundtrip.

  • Use one API as the origin to a new CloudFront distribution.

    We get the advantage of geolocation at the edges, but it requires caching responses.

None of these are ideal for speed and live data. So we arrived at a different solution.

The Solution

Ultimately we concluded that a proxy could send the right header and SNI information at the lowest cost. haproxy is ideal as a fast and highly memory- and CPU-efficient option.

The upside is that we get the benefits of geolocation and are guaranteed to serve the current version of customer data. The downside is that we aren't serverless anymore and have to maintain some instances with failover. This was an acceptable tradeoff to try and achieve responses under 200ms.

Here's our current architecture that follows the AWS complex DNS failover model:

fast.trychameleon.com using haproxy

This only shows 2 regions but could be expanded to as many as needed. Here's how it works for a user in New York (us-east-1):

  1. fast.trychameleon.com is the initial DNS lookup.
  2. Route53 returns the resource record for the closest of the two latency routed hostnames: proxy-us-east-1-fast.trychameleon.com
  3. The next DNS lookup for proxy-us-east-1-fast.trychameleon.com will return one of the equally weighted haproxy instances at either proxy1-us-east-1-fast.trychameleon.com or proxy2-us-east-1-fast.trychameleon.com. The DNS records look like this:

proxy-us-east-1-fast.trychameleon.com Route53

Notice the health check for the instance value, which is critical to failover.

  1. proxy1-us-east-1-fast.trychameleon.com is a CNAME associated with the haproxy instance. That's where the browser will send the request.

If one haproxy instance is down, the health checks will cause DNS to failover to another instance in the region. If all haproxy instances are down in a region, the DNS latency check will failover to another region.

Each haproxy instance in a region is configured to send requests to the same backend, and to modify the headers accordingly. This snippet from the haproxy.cfg ansible template sends the correct information to the API Gateway:

frontend https
    bind *:443 ssl crt /path/to/ssl/certificate.pem
    mode http
    http-request deny if !{ ssl_fc }
    option forwardfor
    default_backend api_gateway

backend api_gateway
    mode http
    option httpclose

    http-request replace-value Host .* {{ backend }}
    server apig1 {{ backend }} ssl sni str({{ backend }})

{{ backend }} is dynamically replaced by ansible when the config is deployed. The bold portion sends the API Gateway the custom domain name it expects, so {{ backend }} becomes us-east-1-fast.trychameleon.com in this case:

    http-request replace-value Host .* us-east-1-fast.trychameleon.com
    server apig1 us-east-1-fast.trychameleon.com ssl sni str(us-east-1-fast.trychameleon.com)

Important note: Use the same subdomain for all components so that wildcard SSL certs will work. All of these hosts are in the trychameleon.com domain.

With a basic lambda function, we can now achieve responses under 200ms and in many cases as low as <50ms.

How AWS Can Fix It

If AWS allowed the same CNAME multiple in multiple API Gateway regions, we could use the original latency-based routing outlined in the first section above. According to AWS forum threads like this one, it's on the roadmap.

Either way, hopefully this post has solutions that you can use in your systems, particularly if you are forced to use SSL for haproxy backend servers that require SNI.

<![CDATA[A Quick Guide To Securing Your Digital Life]]>

If you've heard all the latest news about stolen passwords and identity theft, you probably want to feel more secure, but you don't know where to start. This post is intended to give you a quick step-by-step guide to securing and backing up your valuable data. While this is not

https://blog.davestern.com/a-quick-guide-to-securing-your-digital-life/5e38255a38347c0018410686Tue, 22 Apr 2014 23:38:24 GMT

If you've heard all the latest news about stolen passwords and identity theft, you probably want to feel more secure, but you don't know where to start. This post is intended to give you a quick step-by-step guide to securing and backing up your valuable data. While this is not a comprehensive tutorial, I have added explanations to help clarify some of the suggestions. I'll link to tutorials wherever possible but in the interest of a concise guide, the main intent is to tell you what you need to do and only get you started on how to do it. This is also heavily biased towards Mac users.

Let's quickly look at what we want to protect and the types of risk:

  • Financial information that can be used to steal your money: bank accounts and passwords, credit card numbers, balances, passcodes, security questions.
  • Personal data that can be used to steal your identity (and therefore your money): social security number, home address, birthdate, mother's maiden name, former addresses, phone number. Consider how your bank verifies your identity when you call them with a question.
  • Private data that can be used to embarrass you, blackmail you or cripple your business: Personal or intimate email and chat conversations, confidential business data or transactions (like details of a negotiation), intellectual property (like a novel in the works), legal information.
  • Social Media that can be used to hurt or ruin your reputation: Posting to your Facebook or Twitter account posing as you.
  • Data Access that can be used to delete all your photos, email, work, music and anything important to you.

Here's how to start protecting yourself against all that nastiness. Just follow the steps below and read the explanations if you'd like more detail.

Secure Your Physical Devices

Steps to follow:

  1. Set a passcode on every mobile device and be smart about it:
    - No easy codes like four numbers repeated: 1111, 2222, etc.
    - No patterns like 2580 (look at it on your phone)
  2. Encrypt the hard drive of all your computers.
    - Mac Users: Enable FileVault full disk encryption. Store the recovery key somewhere safe away from your computer.
  3. Require a password to login to your computer on startup and after waking from sleep. When FileVault is turned on, automatic login is disabled.


If your phone, tablet or computer are lost or stolen, the data is vulnerable. The drives can be removed and read easily, so start by using passcodes (mobile) and encryption (laptops and desktops) to secure the hardware. It's important to require a password to wake your computer from sleep since it can be stolen while powered on.

Lock Down All Your Email Accounts

Steps to follow:

  1. Change your email password to be a long string with random characters.
  2. Turn on 2-step verification in gmail
  3. If you don't use an email service that offers Two-factor authentication, consider switching or use a very strong password and SSL/TLS to encrypt your connections to email.


The first thing hackers will do is try to get access to your email so they can send Forgot Password? requests to all the sites of which you are a member. This happened to a prominent Wired editor and it was hell.

You will hear a lot about Multi-factor Authentication (MFA) aka Two-factor Authentication (TFA) aka 2-step verification. All these mean is that another code (a "factor") is required in addition to your password to log in to your email.

When MFA is enabled, after you enter your password the site will request a special code. That code is either provided by an app like google authenticator or sent via SMS to your phone. So if an attacker gets your password, they would also need your phone (unlocked) in order to use the app or to receive the SMS message to enter the second code and log in to your account.

Email is the one password that you might want to actually remember (instead of using a random string as recommended later). If you do, make it a long password with a mixture of numbers, letters and special characters. If you have MFA enabled, you'll want to be able to get into your email if you don't have your password manager available.

Set up a Password Manager and Change To Stronger Passwords

Steps to follow:

  1. Buy a password manager like 1password and follow the help guides. A good initial strategy is to start storing the passwords as you go about your normal routine so you can get comfortable using the app.
  2. Learn how to sync your passwords to your mobile device and to the cloud so they are available to you when away from your main computer.
  3. Once you have your main passwords recorded and synced to multiple devices, and you are used to working the new app, start changing your passwords to long, random strings.
  4. Use the password manager to store random answers to security questions. Don't answer with real information anymore, just random words that you can look up.
  5. When you change your passwords, turn on Multi-factor Authentication for sites that support it like Apple ID, Dropbox, Facebook, and of course gmail. There's a great lifehacker article with a more comprehensive list.


Let me emphasize again that human beings are exceedingly terrible at choosing passwords, even when they choose relatively strong passwords.

Any password you can think of has a 60-90% chance of being decrypted. Let a computer make one for you, they're much better at it.

Good password managers store passwords as you log into sites, help you generate strong passwords, and only decrypt the data on your computer so nothing is exposed "in the cloud".

While I strongly advise using a password manager, I will add the warning that they do introduce some extra work. This is offset by the convenience of only remembering one password and just clicking to login, but managing and syncing passwords as they change takes a bit more time. If you choose not to use a password manager, at least consider using "password tiers": one password for your email, one password for all financial sites, one password for all social media, one password for everything else. This is not a great strategy, but if someone gets your Facebook password, at least they can't login to your bank account with it.

Hat tip to the very wise @gscottolson for his advice on using password managers effectively.

Backup Your Data Locally and Remotely

Steps to follow:

  1. Buy a cheap, portable external USB drive of at least 1 TB.
  2. When you first use the drive with Time Machine, be sure to encrypt it. This will require a password every time you use it, but if you lose the drive, the data is secure.
  3. For remote backup, use a service like Backblaze, Mozy, Carbonite, or Crashplan.


You need to backup your data regularly. I prefer to use a local, offline backup (Time Machine) and an online backup service. Local backups are very useful because they are fast for both backup and restore. Online backup services are great because if something happened to both your computer and your backup, you can restore from the remote service. Consider an unfortunate event like a fire or burglary where all the devices in your home are destroyed or stolen. With remote backups, you can buy a new computer, login to the service and download all your data within a relatively short period of time.

Additionally, Time Machine and most remote services backup your entire drive (or at least all the important data and skip things you don't need like applications), so you don't need to think about what to back up, it's just all available to you.

Best Practices

Steps to follow:

  1. Don't send anything private over email: passwords, credit cards, SSN, etc.
  2. Don't post anything anywhere you don't expect to be seen publicly for a long, long time.
  3. Earlier I recommended using random words for security questions. I also recommend choosing a particular random word that you always use when asked for your mother's maiden name. If it's compromised, you can always change it.
  4. While I strongly recommend using a password manager and random text strings for passwords, if you are going to remember any of your passwords, remember your email (with MFA enabled), your password manager (obviously critical) and your backup drive password. If anything were to happen to all your devices, you could at least get into your email and start sending your own Forgot Password requests to get back into your accounts.


Don't ever assume privacy online. Don't send or post anything you want to keep private. Gmail gives you tons of space and never deletes anything by default. Public Facebook, Twitter and Instagram posts are routinely stored by other companies. You also don't know how long recipients keep their email or what they do with it.

Email is not encrypted when sent via the internet. Emails and chat messages travel through many networks that can see what you send as it travels to its destination. Don't send anything you don't want sniffed along the way.

In general, any time you are asked for information like mother's maiden name, ask if you can provide a passcode instead. If not, use a phrase that you can change later. Information like this is discoverable by various methods, but it's very difficult to figure out a random word you made up.

If you have to send a password or information to someone, send it via a separate, encrypted medium with no context. Don't email it with the subject "Here's my credit card number". Better yet, just call them.

Let me emphasize this point again: Anything you put online lives forever. Remember that when you post anything about you, your friends, or your kids. It will be searchable for decades to come.

Bottom Line

Following the steps in this guide will greatly reduce your chances of being a target of data or identity theft. They will also ensure that should something happen to your data, you have options to recover it. I sincerely hope this guide makes your life easier and more secure. Please leave any feedback in the comments below.