The term data mining as introduced in lectures and literature is not so straightforward to many people. The explanations appear too geeky or nerdy based on how they are told.
By the words themselves, "data mining" can be any activity that is trying to find useful information from a bunch of data. (Not to say that the term is lexically wrong, as coal mining is about mining for coals and gold mining is about mining for gold, but data mining is definitely not mining for data - rather, it is mining for information and insights from data.)
When we look into a set of shopping data and spot that somebody suddenly begins to buy more expensive yogurt than before, which lasted for a couple of weeks, well, we apparently find something interesting. This person may have just earned some fortune. Another guess is this person was always struggling with poorly tasting yogurt and just finally find this big retreat. We want to know which of the two possibilities are right, and we look further into the shopping records. Aha, this person also bought a few other new brands of better quality. Now we are pretty sure that this guy just got something so spend, and is quite likely to accept a few big deals we want to offer.
This could be a great example of data mining. No technical skills or knowledge of statistics is really needed. It's just the analytical mind and good capture of abnormal patterns, and perhaps also the attention of alternative explanations. Sometimes people call this 'attention to details', which in my opinion should refer to something else.
And these are still in the realm of postulation and falsification, or say, a game of guess and prove. They need human understanding of human stuffs, which machines are not yet good at by themselves, at least not for this moment. Whatever technical infrastructure and skills are applied later, for automation or for scalability, they are based on top of the pioneering human thoughts.
Machine learning is another term often mentioned together with data mining. Well depending on how complex the job of data mining is, machine learning can be a core part of the job, which help us complete the computational and model fitting task. The machine needs us to feed in good and relevant data, and to tell them how to measure the results. Without humans who tell those to the machine, nothing good can be expected. So a conclusion is made here: machine learning is a job for machines, but data mining is still a human work.
没有评论:
发表评论