It seems that everywhere you turn, you read or hear about some business rising to new heights of probability by utilizing machine learning or A.I. The news is full of restaurants driving more revenue with recommender engines, gyms retaining their members with churn predictors, automotive repair shops reducing time to repair using A.I. systems, and sales reps doubling and even tripling their success ratios.
As a business owner, you know that machine learning and A.I. are the tools that will drive revenue for businesses in the near-future, and that many businesses are already enjoying the rewards. But is machine learning right for you? Can your business reap the rewards of A.I. now? This short article will help you decide.
Question 1: Do You Have Data?
“Data is the new oil.” You’ve heard this expression many times. While it is certainly true that data is valuable, the expression refers more to data’s driving force on machine learning than its intrinsic value. You cannot drive a car without gas (and if you own a Tesla, you cannot drive a Tesla without a charged battery). And, you cannot do machine learning without data. Data is “the new oil” because it drives the most powerful engine humans have ever created: Machine Learning and A.I.
So, the first thing you need to ask yourself is: “Do I have data?” Your data may be housed in several locations. You may have data in a traditional database, or several databases. You may have data stored on a third-party integrated system, such as Salesforce. You may have data stored on spreadsheets. Start looking around for all the data you have stored and in all of the systems you use.
As you identify your data sources, bear in mind the “grade” of your data. Just as oil has various grades, so, too, does data. The best data is “clean data.” Clean data refers to data that requires a minimal amount of preprocessing; it can be used right out of the database. Clean data is free of duplicates; free of errors in data entry; and free of missing values. The most time-consuming part of any machine learning project is the preprocessing of the data. Your goal, then, is to locate and identify clean data.
The next best grade of data is “big data.” Big data refers to vast quantities of like data. For example, a restaurant may have a third-party ordering system in which all customer orders are entered. A busy restaurant with several locations that has been in business for many years may have vast quantities of order data. That’s good oil!
I advise avoiding data that does not fit into one of the two grades aforementioned: Clean data or big data. And, as I am sure you surmised, the absolute best data is “clean big data.” Clean, big data will help enable you to create more powerful and accurate models. So, if you have clean big data, you are off to the races!
Question 2: Do You Have a Question?
An automobile is a great way to get from point A to point B. But an automobile is pretty much worthless if point B has not been defined. Machine learning and A.I. are great ways to solve a problem, and to answer questions, as long as there is a clearly defined problem to solve or question to answer.
There are two main types of question a machine learning model can answer: 1) What number will it be?; 2) What category will it be?
Machine learning has often been referred to as “applied statistics” because of its inherent ability to crunch large amounts of data in many dimensions. In fact, many of the algorithms that modern day machine learning models use are hundreds of years old. Of course, 250 years ago, a team of 20 expert level statisticians worked on a problem for 6-months (and at great monetary expense) to solve a problem that a machine learning model can do in seconds today. Yes, from 6 months and 20 people to a only few seconds.
Businesses are leveraging this power to answer numerical questions involving time, money, and quantities, such as:
“How long will this project take?”
“What is my revenue forecast 6-12 months from now?”
“How many supplies will I need to recorder next month?”
And, of course: “What will the price of XYZ stock be tomorrow?”
(One of the first successful uses of machine learning was the prediction of stock prices in the 1990’s, and it is still widely used today by most hedge funds.)
Machine learning’s statistical prowess at analyzing multiple dimensions translates into it’s amazing ability to categorize. By exploring patterns over vast quantities of data and in multiple dimensions, machine learning can see consistent patterns that humans would otherwise overlook.
This type of pattern recognition helps businesses answer complex questions such as:
“Into what segments do my customers fit?”
“Will this person likely cancel their membership in the next 6 months?”
“What products or services will this person most likely buy?”
Pattern recognition is a major strength of machine learning and is redefining how businesses reach and sell to their target audience.
Who Wags Who?
These two aforementioned questions are the foundation for any machine learning and A.I. project. Everything else, including budget, timeline, and resources, are all details and usually known quantities (see my article on The 5 Phases of Every Machine Learning Project). And, while these two objectives must be met before you start a machine learning project, it still begs the question: “Who wags who”?
Do you start by examining the data (a process data that scientists call Exploratory Data Analysis, or EDA) to determine what questions can even be answered? Or do you start with the question, and find data to help answer the question?
The answer is, you can start with the data first or the question first, but you need to understand the ramifications of either starting point. When starting with the data, it is important not to search for a question based solely on the data available. This is like creating a product, and then later finding a target market. When starting with the data, understand that the dataset can reveal many insights unseen in initial exploratory data analysis. Also, it is very typical for a company to complement their datasets with third-party corollary datasets.
When starting with the question, be leery of answering the question based on how the dataset is shaped. This may sound a bit counterintuitive since machine learning looks for patterns in the data to answer questions. However, initial insights from the data that may seem “correct” intuitively may not be an optimal solution, and further analysis and modeling may reveal better solutions that are not so obvious. The danger here is for domain experts to bias certain results based on experience, which can stifle the depth of the machine learning model’s capacity. Often, machine learning models propose solutions that don’t seem right to humans, and later, when implemented, turn out to be exceedingly accurate.
Stay ahead of the curve, and get started down the path of machine learning now. SerpicoDEV offers a no-charge consultation to help identify your data sources and help you craft the right questions to ask. Task the first step and contact us.