2018-06-11

Towards an end-to-end analytics product

The last six months went surprisingly fast. While there are not so many ad hoc analytics and reporting tasks from the clients, I managed to be more exposed to the solution design works for our current and future products. Let me call them analytics as a product and reporting as a product. This is a valuable experience to me to make progresses in my career as a data scientist.

I always like to ask myself (and talk to relevant people when possible) before starting a new analytics task: 1) who want the analysis/report, 2) what are the deliverables and 3) what do they really want to know/understand.

When it comes with a bigger one, a project that aims at a complete product, things become more complex. Apart from the schedule and budget which I am not talking now, I have to consider things as follows.

1. What is the business value it will bring? The answer may be short but take long to get. To keep in line with the business goals, we want to know which stage our business is at and what we have promised to the current or up-coming clients. We want to be able to clearly show that this project will add to our promised advantages in the market within an affordable period and effort.

2. Based on the business value, what is the minimal viable product and what are the premium features on top of that. Actually what I used to focus on belongs to this part. When I was in the meeting, I am usually focused on what the clients may want and what are not necessary. But now I am thinking about these within a bigger framework and try to assign different weights to different parts based on the business values they may bring.
  There may be a modularised and/or layered structure of the whole analytics/reporting product. We may have a kernel module that is a minimal viable product, and more features can be accessed by the client when they pay more, so that we may have different plans, like free/silver/platinum plans for different clients.

3. Back to the product itself, we want to design not only a static analytics result or report, but also a dynamic process with pipelines and interactions. This is an extension to the "people who want the analysis/report", the "deliverables" and "what do they really want to know".

3.1 When we consider the people who want the analytics product, we need to construct a behaviour model of them and do our design towards that.
  Do they want to just receive an Excel with tables in it? or do they want to open a webpage and see a dashboard with the interactive tools, where they can get the latest information with a single click? Do we allow them to write into part of our database, where the clients can approve/reject our suggested changes in the database? How will they likely to use such features?
  Do we design this product only for external clients, or is it also for internal developers who may use some of the analytics results? Maybe we need to create both external and internal APIs for different uses and usage patterns.

3.2 When we consider the deliverables, we do not only give the final report/dashboard/API calls, but also need to establish a whole set of infrastructures and pipeline of processes for them.
  For example we may need to build helper tables in database and a process to refresh them daily or hourly with data from production database. We need to have a strategy and mechanism to cope with situations when nothing is refreshed, when the refreshing process fails, and when the server is down. When there are dirty or missing parts from the production database, how do we deal with them? When there is a change request to alter the values of some records, how to we ensure the changed values are queried instead of the old values, how do we enable possible rolling-back, and how do we ensure nothing is broken? Some tiny decisions require very cautious consideration and good understanding of the business.
  We need to design the tables in a way such that different clients may read and write different parts independently without conflicts. When the results are fed in to other developers in the company, we need to make sufficient communications to make sure that the use of the results will be restricted to only where they are needed, without breaking things elsewhere.
  And regarding to the methods used in the actual analytics works, including machine learning methods, statistic tests and measurement of performance, we need to demonstrate that they are proper, accurate, or at least outperforming all other methods currently used in the industry. This is mostly required internally, but sometimes it can be a selling point for marketing.

3.3 "What do they really want to know"
  Usually this is not clarified until several rounds of communications, even after the work has reached to the mid-stage in reality. However we really want to make this question clarified as early as possible, and it would be ideal we can settle it at the solution design stage.
  To give an example, when they say they want to understand how students are performing in homework, there is a chance finally they just care about the most recent week for each student.
  So a more precise description is needed all the time because we need to implement different queries for different questions, though they sound similar to each other. This also applies to data processing, machine learning and statistic methods. Answering different questions will need different methods or strategies.
  The design of end-deliverables will also address this question, by generating the most informative and relevant figures and charts to the clients, possibly with interactivity that enables them to navigate, search, roll up, drill down, and perhaps approve/reject what we suggest, all depending what they really want to know.

These are what I have learned recently. I may not implement all of them in the near future, but it is beneficial to bear in mind the ideas.

没有评论:

发表评论