Machine Learning — Auto ML vs Traditional methods
Comparing SAP Analytics Cloud Smart Predict, Smart Discovery and Python
The artificial intelligence index 2017 report shows various trends in the AI space. The job openings chart above shows the increase in demand for using machine learning algorithms and various other aspects of AI to solve business problems.
There are some great tools and frameworks available that allows any business to start using advanced analytics and machine learning on top of their business data and start to use the value levers of the models to see business results. While in the past, most of this meant using tools like R, python, and others to do statistical analysis, there are several tools now available that have simplified the process of the statistical analysis and model development. These tools accelerate the process of running experiments, use a combination of algorithms and provide a competitive advantage to companies that may not have all the data science skills in house.
In my earlier post, I had written about SAP analytics cloud solutions’ smart discovery feature, which allows automated machine learning analysis to be done on any kind of data.
In this blog, I am going to compare and contrast the different modeling techniques and use my recruitment dataset example to evaluate the pros and cons of each technique.
- Python based analysis — This is a coding based approach and provides the most flexibility but is also the most complex of the three. In the recruitment example, I used the SKLearn random forest regression model and compared it to a logistic regression model to predict the outcome for “# of days to hire”.
2. SAC Smart Predict based analysis — The smart predict is a coding-free, wizard-based approach to statistical modeling. In this case, I uploaded my recruitment data excel and created two regression models, one with all features and one with the most relevant features. You can find more details on how to use smart predict here. Compared to my previous method, this was a much faster approach but I had to rely on the built-in regression algorithm(s) available. Running multiple iterations of my model with different feature sets, train/train split, I was able to compare and contrast the overall metrics across different models easily.
3.SAC Smart Discovery based analysis — Described in my previous blog, this feature allows you to explore data, gather insights, identify anomalies and run what-if simulations. As shown below, the smart discovery is also a coding-free option. As compared to the previous two methods, this abstracts a lot of the algorithms behind the scene and provides a visual output of the key insights, influencers, unexpected values and simulation to the business user. This is a powerful feature to identify data patterns and evaluate the critical features that drive your KPIs
Comparing feature rankings
All the three techniques provided similar results. I was able to see that “Job classification, Agency Name, Country and Promotion Status” columns are critical factors that impact my overall KPI of ‘# of days to hire’.
All the modeling techniques shown above have their pros and cons. It’s usually a tradeoff between the kind of analysis that you want to perform, skill-set that is available and control over the various stages in the machine learning model building process. Nevertheless, the smart discovery and smart predict features are pretty powerful solutions that bring the machine learning capabilities in the hands of a data analyst/citizen data scientist.