Statistics in a Large Scale Dropwizard Application

I have been spending time thinking about how a basic statistics package would change over time, as load and system complexity increased.

I don’t want to “just” query Postgres, as this will not allow for scaling the system, and will introduce a future bottleneck (for me, this is a potential gotcha, just waiting to happen).

So I have been learning how Redis can be used to build the basis for storing statistics.

The diagram shows three steps:

  • Query Postgres once to get some baseline data into Redis. This can be represented as a command.
  • Write a job, which runs on a schedule. This will update “the waiting updates”. These updates are stored in Redis. The updates are “fed” by the regular Dropwizard (CRUD) controller(s). Whenever a widget is added or removed, send a message to the job. The job will read the message, and will write to Redis (message hash). A semaphore will be used to ensure there are no deadlocks.
    • The second part of the job will (when the semaphore is clear) read from the Redis hash, update the statistics hash, and remove the (processed message hash key).
  • The final part of the puzzle is a regular statistics page, which uses Redis for it’s data source.

design

The advantages of using Redis, include less load on the database, using Redis as a message queue (for resiliency), and having the ability to use other Redis features (like publish subscribe).

There is more to be done, but implementing this design would be a good foundation to providing statistics.