I attended the recent OpenCRVS technical committee meeting. It was interesting to hear where the project is and where it’s heading. Unfortunately I couldn’t stick around longer to ask some questions that I was wondering about during the presentation. So here are some things I wanted to ask related to the presentation.
Elasticsearch / Kibana usage. Is the project using the free licenses for these? Have you hit any limitations of the free license? If so, have you considered any alternatives, or is ES / Kibana serving the needs of the project for now?
Infrastructure migrations. During the presentation some smaller and larger migration needs were discussed (alternative orchestrators, updating dependencies etc.). This got me thinking, what exactly is the responsibility of the OpenCRVS project itself? Is it just the software? Or are you also responsible for deployments and managing them?
If OpenCRVS is also responsible for (some / all?) deployments, do you have service level agreements for those? Are zero downtime deployments necessary, or can these migrations be done with downtime? Zero downtime deployments / migrations can be quite complicated to achieve depending on the changes that need to happen.
Quality gates and performance testing. I noticed performance testing was only mentioned with the release quality gate. Is there no performance testing (automated or otherwise) being executed in the earlier quality checks? Is this testing automated or manual? Could a subset of it be automated for some sort of continuous testing on the development branch?
My reasoning for continuous performance testing is that often release phase performance testing makes it harder to figure out when or where the performance degradation has happened, at least if there are a lot of changes between releases. There’s also usually a lot of stuff going on when new releases are being prepared, so having unexpected performance degradations to figure out can also add unnecessary stress to that process.
There was also an open question related to release cadence during the meeting. My personal preference in most projects has been to do releases as often as possible. This has the benefit of making the process more familiar and makes it easier to recognize parts of the process that are always repeated in the same way. These parts can then hopefully be automated, making future releases even easier and removing some chances of human error.
Kubernetes proof of concept. When an alternative to Docker Swarm was investigated, were there any other container orchestration methods being looked at? Is the plan to use / support the base distribution of Kubernetes or something else? I think offering an alternative to Docker Swarm is a good idea, and Kubernetes is most likely the best known orchestration solution out there. It’s also a fairly complex system that offers fairly low level building blocks. You can build neat systems on top of it, but it does require work to get going AND to keep it going. Kubernetes version updates are pushed at a fairly steady pace, which leads to API deprecations etc. Lots of stuff to track to keep your deployments up to date.
Kafka / RabbitMQ and queues in general were mentioned as one solution to handle asynchronous event handling between the microservices of OpenCRVS. Is the team familiar with Kafka? Having been a Kafka consumer (at best, have very little experience with it), my understanding of it is that it’s another fairly complex system to maintain / implement properly. I was mainly wondering, if the added on complexity of Kafka is worth it. What are the main issues Kafka would be deployed to solve? Obviously my understanding of the OpenCRVS project is fairly rudimentary. My personal preference just tends to be to prefer simple solutions as long as those work, switching to more complex ones as is necessary.
These are the questions I wrote down during the meeting. I’m sorry I couldn’t stay and discuss these during the meeting, but I had another one already ongoing when we got to the Q&A portion.