suns system design

How domain data is processed.

Group ID

When a domain is singularly symmetrical (like a for palindrome), it’s in a group with just itself. When a domain is symmetrical with another domain (like e for mirrornames), it’s in a group with another domain.

In either case, we create a group ID that incorporates:

The algorithm for this is defined in CalculateV1() in symval/internal/service/groupid/groupid.go,

// CalculateV1 generates a group ID by hashing owner + all hostnames
// The result is formatted as: idversion:type:base64(sha256(owner)):base64(sha256(sort(hostnames))).
func (s *Service) CalculateV1(owner, gtype string, hostnames []string) (string, error) {
  // ...
}

Implementation notes:

Dynamo design

Conceptually, we need to track:

These are stored in Dynamo DB.

Data access patterns

We plan to show all owners on the homepage with a list of all their domain groups. This means that every web request for the homepage will read every key in the table.

Later we might paginate this, or show owners’ domain groups on owner-specific pages.

Data storage implementation

Because we need to read every key in the table, it’s more efficient to update a JSON file in S3 on every change to Dynamo than it is to read data out of Dynamo more than once.

This is written to Dynamo DB, and sent through Dynamo DB Streams to a Lambda worker which builds a JSON file and uploads it to S3.

Lambda worker may just read the entire Dynamo database every time, or read from S3 / diff just the stream change / write to S3, depending on how much these options cost. Streams are “at least once”, so the builder must be able to tell if a record is already inserted. The Lambda worker is not concurrent, but this is ok be Dynamo has our backpressure. It should write to a temp key and then PUT to the final key, because S3 overwrites are atomic and strongly consistent, so readers won’t see a partial file this way. This just uses Dynamo as a way to do concurrent writes, sort of like a queue.

Nothing but the builder Lambda ever reads from Dynamo — the web client and the scheduled validation lambda just read from the JSON file in S3. This means we aren’t worried about primary/secondary keys in Dynamo.

We make they Dynamo PK (partition key) a composite of: owner + hostname + type + group ID. We could make a new PK concatenating that information with no SK (sorting key). But the group ID already incorporates a hash of the owner and the type, so we can simplify this by using the group ID as the PK and the hostname as the SK.

Concurrency

User requests go through the httpapi Lambda which saves data to DynamoDB.

DynamoDB streams events to a streamer Lambda which saves each changed record to a JSON file in S3. This Lambda has reservedConcurrentExecutions: 1 to only allow one to run at a time, which acts as a lock on writes to the JSON file. This Lambda is the only writer to the JSON file.

There is a reattestbatch Lambda that is run every day that re-attests every record in the JSON file, updating Dynamo with new validation time or deleting recordds that fail attestation. (There is a grace period to prevent intermittent errors from removing actually valid records.)

Aside from the Lambdas, the browser retrieves the JSON file when a user visits the website.

We use a monotonic Rev field in our data model to prevent concurrency bugs. When we are making changes to a record in Dynamo, we use ConditionExpression: rev = :snapshotRev to ensure that the change fails if an update has been made to Dynamo since our last snapshot of the table.

Dynamo storage costs

Invalid state

I want to make invalid state unrepresentable as much as possible.

Unrepresentable invalid state:

Representable invalid state

DNS claims

Require a special TXT record _suns for each domain.

These records provide claims that the domain is part of the group, but they don’t verify the claims.

Consitency checking

Consistency checking confirms that

Group validation

Group validation takes DNS claims which have been individually looked up and consistency checked, and validates that they make sense as a whole.

Attestation

Attestation ties all of these together.

Basically a “Lookup + Validate” flow.