As the data analytics space evolves, delivering these solutions at speed with single touch deployment becomes more important and challenging. At a Google I/O Extended event gave a quick overview on what is the difference between DataOps and DevOps and some tips to implement DataOps.
While there is a lot that can be said on the subject, quoting only 3 tips discussed in the talk.
Modularized Architecture
Analytics solutions are inherently an exploratory exercise, it’s near impossible to detail at the outset what the solution should look like. Business domain experts look at the curated data set and explore what insights can be gained, and refine what metrics should be captured. All this is an exploratory exercise requiring the ability to build small iterations of the analytics solutions at speed.
To enable this, ensuring separation of concerns while designing solutions is essential to structure the architecture to allow individual components to be updated and deployed in stages rather than having to release the whole data pipeline in one big bang. This allows for quicker and more frequent releases, which leads to faster feedback and iterations.
Infrastructure as code
The industry is still grappling with finding a singular platform that does the data lake and data warehouse well. Needless to mention the large amount of integrations required to ingest data from a variety of platforms and tools. All this leads to analytics solutions having to work across multiple platforms.
Provisioning, updating and maintaining these platforms can be a challenge. If these can all be managed services then it can elevate a lot of maintenance effort. If not, all infra management and deployment should be managed as Infrastructure as code. That means provisioning, deploying and maintaining platforms is pretty much automated.
This reduces platform errors caused by manual configuration mistakes and makes testing these platforms easier before deployment.
Automated tests
Confidence to deploy quickly requires to build capability to test solutions very quickly. Varied levels of automated tests need to be built starting from job level tests, to pipeline stages and then across the data pipeline depending on the use case.