Why automation is changing data science for everyone

Why automation is changing data science for everyone Michael O'Connell is chief analytics officer at TIBCO Software.

A key goal of visual analytics and data science is to identify actionable insights that impact business processes – to grow revenue, improve productivity or mitigate risk. Automated AI, or specifically automated machine learning for data science, can help with this goal. AutoML can dramatically increase the productivity of data scientists by automating the more mundane tasks and freeing up time for innovation. AutoML with transparency can also guide and educate users on how to get the most out of their data and data science environment, while enforcing best practices.

The role and function of data scientists is on the rise. Data scientists have become the ultimate hackers; they do what it takes to get the job done. This can include designing and deploying end-to-end systems for model training and inference – for batch jobs running on a clock or a trigger – and real-time event processing. Such end-to-end systems typically include data access and federation, caching strategies, feature engineering, machine learning and model ops. Model ops can include containerising models, adding RESTful interfaces and deploying into operational systems – in hybrid, and sometimes multi-cloud, environments. 

Crucially, what data scientists require more than anything is to become more productive. AutoML helps with this by assisting analysts with data preparation, data cleaning, feature selection, feature engineering and modelling, with explainability. AutoML digital assistance is now starting to be extended to data science platforms that scale across hybrid cloud environments with deployment into event-based architectures. 

Ideally, AutoML systems should generate automatic flows which are editable, and informative with regard to how the software works. This should include surfacing the steps or nodes in the workflow, and how they are created and configured for the analysis. The generated flows should, and can be, an educational experience for the data scientist in how to optimally use the software. An AutoML system is also a way to enforce best practices, both for the experienced, professional data scientist, and for the less experienced practitioner. So, as the user moves through a data science pipeline, the environment is helping to connect, clean and prepare data, plus engineer features for model building. And the system should ideally provide guidance on things like hold-out validation sets, feature and model combinations and model explainability. 

A word of caution – we are not saying that the goal is the complete automation of everything in data science, as has been advocated elsewhere. The goal is not to produce an environment of total automation where pushing a big red button means ‘job done.’ Rather, the goal is to educate the practitioner as a digital assistant, automating the more mundane tasks, educating the user and enforcing good scientific practices. 

This ideal AutoML software system helps business analysts, data scientists and developers by removing complexity and accelerating deployment to live production environments. These capabilities are starting to shift the conversation between business analysts, data scientists, developers and business executives to focus on addressing the problems at hand with the best solutions available. Automating the mundane frees up time for developing innovative approaches to growing revenue, reducing risk and removing unnecessary costs.

Automated AI for all

The large number of stakeholders in a data science project make it a challenge to simplify the process. For example, a system that moves from a business analyst for dataviz, to a data scientist for training and deployment, involves workflows for cleaning the data, engineering the features and building the models that create the predictions – in batch jobs and on streaming data in operational systems.

Productivity gains come from automatically generating these multiple different workflows for tasks such as data preparation, feature engineering, feature selection and modelling. Automating processes from preparation to model tuning produces transparent editable workflows which can more quickly move to production-ready versions in operational systems.

When a data scientist creates a predictive model, it can be a great deal of work to develop the many different data prep / data science workflows required. When these are automatically generated, there can be significant time savings, more accurate models and enforced best practices throughout. 

Productivity gains and smarter outputs

Automated data prep and machine learning can create considerable productivity gains for business analysts and data scientists. By automating different stages of the workflow from business analyst, to data scientist, to production, models are created, tuned and deployed as cloud native production environments.

To address more complex issues, machine learning models are becoming easier to deploy and connect to data feeds to support faster and smarter decisions in real time. It is not about creating a black box. Whether the desired outcome is helping financial services more accurately detect fraud, or monitoring oil field output, analysts, scientists and developers are using automated workflows for insights to build smarter models at a faster pace.

One key area of value in data science is in making accurate predictions in live operations environments. Just as physical automated production lines created the modern industrial age – think car factory robots – so data science automation is driving the digital industrial age by enabling analytics to be quickly applied to different domains by experts who are no longer forced to do the grunt work.

Through automation, data science can move faster to solve real-world problems, while providing measurable benefits for everyone in the value chain.

Photo by Alex Kondratiev on Unsplash

Interested in hearing industry leaders discuss subjects like this and sharing their use-cases? Attend the co-located IoT Tech ExpoBlockchain ExpoAI & Big Data ExpoCyber Security & Cloud Expo and 5G Expo World Series with upcoming events in Silicon Valley, London and Amsterdam and explore the future of enterprise technology.

View Comments
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *