25.01.2023

Azure Auto ML: The Solution for Faster Machine Learning Development

Use Azure Automated Machine Learning to make predictions about office workloads and to expand the CS Smart Workspace.

Are you tired of spending countless hours and resources choosing features and cleaning data for your machine learning models? Do you wish there were an easier and more efficient way to get your models production-ready? Look no further! In this article, we will show you how Auto ML can revolutionize your machine learning workflow and significantly reduce the time and expertise needed to develop high-quality models. With Auto ML, even users with limited domain knowledge can quickly and easily create top-performing models. Don’t miss out on this opportunity to streamline your machine learning process and achieve better results faster. 

What is Azure Auto ML? Azure Machine Learning is a Microsoft Azure service that empowers users with any level of programming or data science expertise, focusing on the problem-solving process. It democratizes the ML development process with an intuitive GUI that allows you to build ML solutions quickly. That means it can not only enhance productivity and creativity within your organization, but also saves time for experts – enabling your organization to quickly experiment with data in an agile way. 

 

1. Solution Elements and Data Basis

We started by collecting data from the Smart Workspace about past utilization of office space. Using Auto ML to analyze this data, we hope to predict future utilization as accurately as possible. That will allow end users to make informed decisions about whether to come to the office on a given day, based on the prediction. 

What is the Smart Workspace? CS Smart Workspace is an AI-based assistance system for promoting meaningful personal interactions. The system lets employees plan in-person work days. It offers transparency about your colleagues’ schedules, the overall utilization of areas, and your own calendar. In addition, workspaces can be booked. The system helps employees search for a free workspace by letting them check room availability on their smartphones. Social components promote on-site interactions. All in all, the CS Smart Workspace solution helps companies provide more demand-oriented workspaces for their employees.  

For more information about this project, take a look at our website.

 

2. ML Essentials and Definition

Traditional machine learning model development is resource-intensive. It requires significant domain knowledge and time to produce and compare dozens of models with decent performance. Automated machine learning accelerates the process of creating production-ready ML models, with great ease and efficiency.  

Microsoft offers a cloud service within Azure Machine Learning to accelerate and manage the machine learning project lifecycle. It helps data scientists, analysts, and developers build ML models with high scale, efficiency, and productivity, all while maintaining model quality. (Source: Microsoft

We used Auto ML to advance our Smart Workspace project based on these 5 widely used machine learning phases:

  1. Task definition: Define the objective of your machine-learning-based prediction as precisely as possible.
  2. Data preparation: Decide what data you need to solve this problem, find features, improve your data, check significance, manipulate, and if necessary, normalize your data.
  3. Train and validate model: Train your model with different algorithms using the library of your choice and validation parameters of your choice.
  4. Deploy model: Deploy your models to a containerized service and integrate them into existing infrastructure.
  5. Monitor & manage lifecycle: Analyze your prediction services performance and initiate retaining or model correction if necessary.

 

3. Solution Design and Implementation

In the next section, we will go through the previously mentioned ML phases used for our Smart Workspace prediction task.

 

Task Definition

We started by defining a task to predict the utilization of our meeting rooms and workspaces across our offices over the course of the year. By predicting the utilization of office desks, organizations can better manage the allocation and utilization of office space and resources, ensuring that desks are not left unused or overbooked. We also hope to eliminate unnecessary or underused desks to free up space for other purposes. This could lead to increased efficiency and productivity. Finally, a personalized and convenient experience for employees should enable them to book desks in advance and ensure that they have a place to work when they come in to the office.

Data Preparation

We used our Smart Workspace solution, explained above, as our data source. The illustration below shows meeting rooms in one of our offices as an example; the small dots in each room indicate the overall number of seats as well as whether they are occupied (red) or free (green).

By analyzing the data, we see that the data set consists of a series of observations recorded at regular intervals over a period of time. The task is to use this data to predict a future value based on past trends and patterns. The variables of interest in the data set include time-related factors such as the date, hour, and minute of each observation, so we will perform a time series analysis on our data set.

Building ML models with time series data is often tedious and complex, with many factors to consider. When working with time series data, it is important to apply appropriate preprocessing and feature engineering techniques in order to extract meaningful insights and build accurate models. That can include time-based indexing techniques such as date shifting and resampling, as well as decomposing the data into its seasonal, trend, and residual components. Feature engineering can also help create additional features that capture relevant patterns and trends in the data, such as lagging and rolling window statistics. Applying these techniques helps you better understand and model your time series data, leading to more accurate predictions and informed decisions.

These steps were applied by transforming the data with Python. Finally, the data was visualized, and a decomposition was applied to find out if our data has seasonality or a trend. The results show that the seasonality is strong; residuals can be disregarded for the multivariant time series analysis, since a second model will exclusively focus on statistical outliners.

Our time series data for the training data shows no clear trend, so we will apply the Auto ML features to search for interesting patterns.

 

Create Auto ML Run

Getting started with Auto ML, we first create a Machine Learning Studio Workspace, which is the top-level resource for Azure Machine Learning. It provides a centralized place to work with all the artifacts you create when you use Azure Machine Learning. The workspace keeps a history of all the training runs, including logs, metrics, output, and a snapshot of your scripts. You can use this information to determine which training run produces the best model. Moreover, it offers “Pro Code” capabilities by the Azure Machine Learning Python SDK as well as No-Code/Low-Code ML processes within the Azure Machine Learning Studio. In our case, we chose the latter. Read more about the configuration options in Microsoft’s Configuration documentation. 

Next, starting a run with the previously prepared data, we choose the task of “time series forecasting” among the options shown in Figure 1. For our time series forecast, we must select a time column. Auto ML also automatically preselects an appropriate metric to measure our model’s performance for us.

Root mean squared error (RMSE) is a common metric for evaluating the performance of time series forecasting machine learning models. This metric measures the deviation of the predicted values from the true values, which is important for forecasting tasks where accuracy is critical. RMSE is also sensitive to the scale of the data, making it a suitable choice for time series data that may have different scales or units. In addition, RMSE is easy to interpret because it is expressed in the same units as the original data, and it is well understood and widely used by practitioners, making it a reliable choice for evaluating the performance of time series forecasting models. When you select “View additional configuration settings,” the performance metric can also be adjusted.

Running automated ML experiments remotely requires computational targets. Azure Machine Learning Compute is a managed computing infrastructure that can be used to create single-node or multi-node computing. For more information on setting and using compute targets for model training, see Microsoft’s documentation. 

Examine Results

Once the automated Machine Learning (Auto ML) run is finished, we can examine the results from all the trained models. A big advantage of Auto ML is that it automatically sorts the results according to the best performance achieved. 

The best-performing model is automatically listed at the top of the Models table, but you can also look at each model individually to see if another model might be better suited for your needs. In our case, the lower the Normalized Root Mean Square Error (RMSE) value, the better our model performs. A smaller RMSE indicates that the predicted responses are closer to the true responses, which means the model is performing better. As a general rule of thumb, an RMSE value below 0.5 is considered good, but the acceptable RMSE value can vary depending on the specific requirements of your application.

(Source: Microsoft)

Deploy

Finally, to deploy a trained model for consumption, we have two options when choosing our model, as shown in Figure 3:

  1. Deploy directly in Azure
  2. Download the models in .h5 format, tweak and deploy anywhere

In our case, the Voting Ensemble algorithm showed the best performance results, and we picked the first option to consume the model directly.

Consume

Once deployed via Azure, the Machine Learning Studio easily creates a consumable REST endpoint for us, which is embedded into the Smart Workspace bot. That allows users to consume predictions about the future office occupation status shown in Teams, like the example in Figure 5 below.

We identified four important advantages of integrating Auto ML into the machine learning project lifecycle:

  • Increased efficiency: Auto ML can automate and optimize many of the tasks involved in building and tuning machine learning models, reducing the time and effort required to develop high-quality models.
  • Improved model performance: Auto ML can search through a wide range of model architectures and hyperparameter settings to find the best-performing model for a given data set, which improves model performance.
  • Reduced expertise requirements: Auto ML can make it easier for users with limited machine learning expertise to build and deploy high-quality models, since it can handle many of the technical details automatically.
  • Enhanced reproducibility: Auto ML can provide a record of the steps taken to build and tune a model, making it easier to reproduce the results and compare different models.

With Auto ML, you can accelerate and simplify AI at scale. Once high-quality data is available, you can quickly identify the best model that fits your data. Data scientists can use the Python SDK and the code-first approach to accelerate their ML workflow, while non-pro-data scientists can use the web user interface.

Our Artificial Intelligence Experience Report, published jointly with the Technical University of Darmstadt, provides insights and describes how AI can successfully contribute to the digital transformation once certain hurdles have been overcome. In researching the report, we interviewed specialists and executives from various companies about the use of AI.

Authors

Ramón Roales-Welsch

Business Lead Data & AI

Janina Wörz

Consultant

Daniel Fernau

Senior Consultant