Asset Delivery and Analytics

15 Jan

Asset Delivery & Analytics

It's easier than ever to design and run web-scale platforms at a fraction of the cost of what was possible decades ago; however, designing for scale with new technologies sometimes requires a mindset shift to unleash the full potential of the cloud.

In this article, we'll discuss some core architectural challenges in designing an asset distribution platform with a basic analytics backend.

First, let's start with a fictitious scenario:

A gaming company FooBar Mobile creates whitelabel mobile games which are resold to businesses. The games are distributed to end-users inside mobile apps and progressive-web-apps (PWAs). Currently all games are bundled and shipped to the reseller, however, as the game portfolio has grown, these bundles have grown larger and larger.
FooBar Mobile wants to explore a solution where assets can be delivered to gamers just-in-time, reducing initial game launcher installation size.

The requirements of the platform can come from business-related or technical concerns. It's always worth critiquing requirements as part of a design and development process. Typically, requirements do not come with a clear priority, and focus can drift to specific requirements based on the experience of the designer and implementer.

A - The platform SHOULD have near-100% uptime (any outage will cause game launch failures for players, costing resellers money)

B - The platform MUST record analytical data on the usage of game assets (examples include Game Name, Game Version, SDK Version)

C - The API to fetch assets MUST provide a lookup endpoint, allowing for future backend refactoring without having to update client libraries/SDKs

D - The API MUST include endpoints to query data for analytical and operational decision-making

The Initial Design

With the initial background and requirements set, we can start a design. While it might be tempting to jump straight into implementation for an MVP, it's well worth taking a step back and looking deeper as to what is required.

Before putting too much effort into design constraints, note down the simplest solution, so you can identify friction or ambiguity with the defined requirements.

It's likely by now you've thought of a couple of components to implement a system to meet the given requirements using traditional API design techniques. It might look something like this:

In this design, the mobile app will first make a request to the API https://api.foobar.com/v1/<game>/<version> . We may include additional parameters, such as User Agent, for analytics. The API then records this request (e.g. increment a counter) in a database for analytical purposes and returns a JSON payload with a CDN URL, for example: https://assets.foobar.com/v1/<game>/<version>/assets.bundle .

The flow described above satisfies the following requirements:

B - The platform MUST record analytical data on the usage of game assets (examples include Game Name, Game Version, SDK Version)

C - The API to fetch assets MUST provide a lookup endpoint, allowing for future backend refactoring without having to update client libraries/SDKs

However, without significant engineering challenges, the above design will quickly struggle to achieve perhaps the most critical requirement:

A - The platform SHOULD have near-100% uptime (any outage will cause game launch failures for players, costing resellers money)

Let's consider a subset of possible reasons requests could be interrupted with this approach:

The API goes down - a bad code push, a faulty load balancer node, a failed instance, or regional interruption can disrupt user access
The DB goes down - with RDS Proxy, failover downtime can be minimal, however, you are still dependent on the DB to serve requests (saving analytical information)
Failure to scale - As the API is now in the critical path, scaling of both the API and DB is critical to a successful game load

Some common engineering solutions to the above issues could be:

Canary releases (validating the deployment and rolling over traffic)
Multi-AZ + Multi-Regional deployments
Autoscaling
Serverless Auto-Scaling databases (e.g. DynamoDB/Aurora V2)

All of these solutions will help the availability requirement but will undoubtedly engage several engineers and require a significant budget to achieve a truly near-100% uptime system. Here is an example of such a solution, which we will discuss further at the end of the article.

Cross-regional deployments can quickly become complex, especially as requirements change

Marty Sweet

T-Shaped Full Stack Cloud Developer with 5 AWS Certifications (SA, SysOps, Dev, SA Pro, DevOps). Always looking for a challenging problem to solve! 10 years of hands-on technical experience in a breath of IT domains, including networking, Linux, programming, virtualisation and web applications. Strong academic background with a best-in-class BSc degree in Computer Science w/ Industrial Year from the University of Reading.

https://www.linkedin.com/in/martysweet/