From Data to Knowledge: #BigData and #DataMining

The increasing digitization of our activities, the constantly accumulating capacity to store digital data, the accumulation of data of all kind resulting therefrom, generates a new sector of activity whose purpose is the analysis of large quantities of data. New approaches, new methods, new knowledge are emerging, and ultimately no doubt new ways of thinking and working. Thus, this very large amount of data, (=big data), and its processing, (=data mining), affect different sectors such as the economy, marketing, but also research and knowledge.

The economic, scientific and ethical implications of this data are quite significant. The fact that we are in a constantly evolving sector, where changes are frequent and rapid doesn’t make the analysis easy … However, a deep knowledge of data is necessary in order to better understand what data mining is.

From Data to Knowledge: #BigData and #DataMining

1 – What is data mining?             


Explore very large amounts of data. The purpose of data mining is to extract knowledge from large quantities of data by automatic or semiautomatic methods. Data mining, data drilling, knowledge Discovery from Data (KDD), are also referred as data mining.


  • How and why are such quantities of new data generated? Every minute 149519 e-mails are sent worldwide, 3.3 posts are published on Facebook, 3.8 million quarries are booked on Google, 65k photos are loaded on Instagram, 448k tweets are sent, 1400 posts are published via WordPress, 500 videos are uploaded on YouTube and last but not the least 29 million messages are sent via WhatsApp. These numbers can make one’s head go spin around, but important thing to note is that humans aren’t the only producers of data, machines also contribute with their sim cards, their sensors, and so on.
  • What to do with these data? If one understands the contemporary phenomenon of data accumulation, it is perhaps more difficult to perceive in what way these data, are changing the world. Depends how one is able to treat them. Science, IT, Medical sector relies heavily on statistics, on counting, and so on. From the moment when a set of data can be dealt with exhaustively, where cross-breeding and sorting can be carried out on a scale scarcely imaginable a few decades ago, these are analysis of our environment that are changing and being multiplied. In short, data is a tool for management and decision support and evaluation every sector and the raw material of the information is allowing the understanding of a phenomenon, a reality.


2 – Value of Data


While IT organizations are best able to grasp the market potential of data accumulation and processing, this is not the case everywhere, where the idea that data is new oil is making its way more slowly than one might have imagined.

  • What is the market value of the data? Building data through a variety of IT operations is a valuable potential that companies are not always aware of or using it. Even if they do not necessarily know how to exploit data themselves, they have resources that aren’t profitable for them yet. These gathered data and their use is a key issue for companies. The Big Data is a real source of marketing opportunities.
  • Data to be protect that is complex to exploit: Personal data poses many problems for researchers specialized in their analysis. First, they point to the need to better protect them and ensure their conservation. Moreover, it requires very specialized skills to be treated in order to produce interesting results.


3 – Data mining and targeted marketing 


One of the most significant applications of data mining is undoubtedly in the regeneration of marketing, because data mining allows companies to reach consumers very precisely by establishing precise and reliable profiles of their interest, purchasing methods, their standard of living, etc. Moreover, there is no need to go through a complicated process of search, each of the Internet users leaves enough traces when surfing, tweeting, publishing on Facebook, so that his profiling is possible, without his knowledge most of the time…

  • A new space for social science research: Viewed from another angle, this accumulated data is a gold mine for researchers. Some behavioral researchers have looked at the attitudes of Internet users using dating sites. In addition to finding that the data they use is more reliable than that obtained by meeting individuals (they are easier to lie to an investigator than to a machine …), they can make analyzes that are not politically correct but very informative!


4 – The data mining forecast tool

Data mining is also a tool that allows to multiply the properties related to the calculation of probability. Indeed, because it makes it possible to cross a volume of data, but above all, because it makes it possible to apply these calculations to many different fields, it appears today as able to make Forecasts. Plus, Data mining for forecasting offers the opportunity to leverage the numerous sources of time-series data, both internal and external, available to the business decision-maker, into actionable strategies that can directly impact profitability. Deciding what to make, when to make it and for whom is a complex process. Understanding what factors drive demand and how these factors interact with production processes or demand and change over time are keys to deriving value in this context.  Today scientists do not hesitate to announce that they will soon be able to predict the future. All this, thanks to the Data!

  • Probabilities and predictions: Today, predictive statistics tackle all sorts of issues: natural disasters, health, delinquency, climate … Statistical tools are numerous and are combined to improve outcomes, such as when using “random checks”. Even more fascinating, software is capable of improving itself and accumulating ever more data to boost their performance … In the meantime, it is possible to rely on these analyzes to try to avoid the flu or get vaccinated wisely.
  • Anticipating or Preventing Crimes: If the idea that a software would be able to predict crimes and misdemeanors reminds one of Spielberg’s film “Minority report”, reality has now caught up with the fiction: the PredPol (predictive policing) software makes it possible to estimate better than other human technique or analysis, places where crime is likely to occur, and consequently better place police patrols and other preventive measures.
  • Preventing fraud: Other perspectives offered by data mining, improve the fight against fraud and “scams” in insurances sector. Here again, it is a matter of better targeting the controls and apparently it works: This technique gives very clear results. In more than half of cases, when a controller will do a targeted control on the basis of the data mining, he’ll find good results. Insurance companies also apply this type of analysis to detect scams.