# What Is Structured Output Prediction?

Last Updated on August 28, 2020

Multi-**output** regression involves predicting two or more numerical variables.

Unlike normal regression where a single value is predicted for each sample, multi-**output** regression requires specialized machine learning algorithms that support **output**ting multiple variables for each prediction.

Deep learning neural networks are an example of an algorithm that natively supports multi-**output** regression problems. Neural network models for multi-**output** regression tasks can be easily defined and evaluated using the Keras deep learning library.

In this tutorial, you will discover how to develop deep learning models for multi-**output** regression.

After completing this tutorial, you will know:

Multi-**output**regression is a predictive modeling task that involves two or more numerical

**output**variables.Neural network models can be configured for multi-

**output**regression tasks.How to evaluate a neural network for multi-

**output**regression and make a prediction for new data.

Let’s get started.

Deep Learning Models for Multi-**output** RegressionPhoto by Christian Collins, some rights reserved.

## Tutorial Overview

This tutorial is divided into three parts; they are:

Multi-**output**RegressionNeural Networks for Multi-

**output**sNeural Network for Multi-

**output**Regression

## Multi-**output** Regression

Regression is a predictive modeling task that involves predicting a numerical **output** given some input.

It is different from classification tasks that involve predicting a class label.

Typically, a regression task involves predicting a single numeric value. Although, some tasks require predicting more than one numeric value. These tasks are referred to as multiple-**output** regression, or multi-**output** regression for short.

In multi-**output** regression, two or more **output**s are required for each input sample, and the **output**s are required simultaneously. The assumption is that the **output**s are a function of the inputs.

We can create a synthetic multi-**output** regression dataset using the make_regression() function in the scikit-learn library.

Our dataset will have 1,000 samples with 10 input features, five of which will be relevant to the **output** and five of which will be redundant. The dataset will have three numeric **output**s for each sample.

The complete example of creating and summarizing the synthetic multi-**output** regression dataset is listed below.

# example of a multi-**output** regression problem

from sklearn.datasets import make_regression

# create dataset

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=2)

# summarize shape

print(X.shape, y.shape)

Running the example creates the dataset and summarizes the shape of the input and **output** elements.

We can see that, as expected, there are 1,000 samples, each with 10 input features and three **output** features.

Next, let’s look at how we can develop neural network models for multiple-**output** regression tasks.

## Neural Networks for Multi-**output**s

Many machine learning algorithms support multi-**output** regression natively.

Popular examples are decision trees and ensembles of decision trees. A limitation of decision trees for multi-**output** regression is that the relationships between inputs and **output**s can be blocky or highly **structured** based on the training data.

Neural network models also support multi-**output** regression and have the benefit of learning a continuous function that can model a more graceful relationship between changes in input and **output**.

Multi-**output** regression can be supported directly by neural networks simply by specifying the number of target variables there are in the problem as the number of nodes in the **output** layer. For example, a task that has three **output** variables will require a neural network **output** layer with three nodes in the **output** layer, each with the linear (default) activation function.

We can demonstrate this using the Keras deep learning library.

We will define a multilayer perceptron (MLP) model for the multi-**output** regression task defined in the previous section.

Each sample has 10 inputs and three **output**s, therefore, the network requires an input layer that expects 10 inputs specified via the “input_dim” argument in the first hidden layer and three nodes in the **output** layer.

We will use the popular ReLU activation function in the hidden layer. The hidden layer has 20 nodes, which were chosen after some trial and error. We will fit the model using mean absolute error (MAE) loss and the Adam version of stochastic gradient descent.

The definition of the network for the multi-**output** regression task is listed below.

...

# define the model

model = Sequential()

model.add(Dense(20, input_dim=10, kernel_initializer='he_uniform', activation='relu'))

model.add(Dense(3))

model.compile(loss='mae', optimizer='adam')

You may want to adapt this model for your own multi-**output** regression task, therefore, we can create a function to define and return the model where the number of input and number of **output** variables are provided as arguments.

# get the model

def get_model(n_inputs, n_**output**s):

model = Sequential()

model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))

model.add(Dense(n_**output**s))

model.compile(loss='mae', optimizer='adam')

return model

Now that we are familiar with how to define an MLP for multi-**output** regression, let’s explore how this model can be evaluated.

## Neural Network for Multi-**output** Regression

If the dataset is small, it is good practice to evaluate neural network models repeatedly on the same dataset and report the mean performance across the repeats.

This is because of the stochastic nature of the learning algorithm.

Additionally, it is good practice to use k-fold cross-validation instead of train/test splits of a dataset to get an unbiased estimate of model performance when making predictions on new data. Again, only if there is not too much data and the process can be completed in a reasonable time.

Taking this into account, we will evaluate the MLP model on the multi-**output** regression task using repeated k-fold cross-validation with 10 folds and three repeats.

Each fold the model is defined, fit, and evaluated. The scores are collected and can be summarized by reporting the mean and standard deviation.

The evaluate_model() function below takes the dataset, evaluates the model, and returns a list of evaluation scores, in this case, MAE scores.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

# evaluate a model using repeated k-fold cross-validation

def evaluate_model(X, y):

results = list()

n_inputs, n_**output**s = X.shape[1], y.shape[1]

# define evaluation procedure

cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

# enumerate folds

for train_ix, test_ix in cv.split(X):

# prepare data

X_train, X_test = X[train_ix], X[test_ix]

y_train, y_test = y[train_ix], y[test_ix]

# define model

model = get_model(n_inputs, n_**output**s)

# fit model

model.fit(X_train, y_train, verbose=0, epochs=100)

# evaluate model on test set

mae = model.evaluate(X_test, y_test, verbose=0)

# store result

print('>%.3f' % mae)

results.append(mae)

return results

We can then load our dataset and evaluate the model and report the mean performance.

Tying this together, the complete example is listed below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

#rstats I have a typical linear prediction `predict(mod, pred_data, interval="predict")` that returns an nx3 matrix of predictions values. What is the tidyverse way to attach that

— Jackson Curtis (@SteamPoweredDM) Oct 20, 2020outputto pred_data as columns? Everything I try attaches it in a nested st

45

46

47

48

49

# mlp for multi-**output** regression

from numpy import mean

from numpy import std

from sklearn.datasets import make_regression

from sklearn.model_selection import RepeatedKFold

from keras.models import Sequential

from keras.layers import Dense

# get the dataset

def get_dataset():

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=2)

return X, y

# get the model

def get_model(n_inputs, n_**output**s):

model = Sequential()

model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))

model.add(Dense(n_**output**s))

model.compile(loss='mae', optimizer='adam')

return model

# evaluate a model using repeated k-fold cross-validation

def evaluate_model(X, y):

results = list()

n_inputs, n_**output**s = X.shape[1], y.shape[1]

# define evaluation procedure

cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)

# enumerate folds

for train_ix, test_ix in cv.split(X):

# prepare data

X_train, X_test = X[train_ix], X[test_ix]

y_train, y_test = y[train_ix], y[test_ix]

# define model

model = get_model(n_inputs, n_**output**s)

# fit model

model.fit(X_train, y_train, verbose=0, epochs=100)

# evaluate model on test set

mae = model.evaluate(X_test, y_test, verbose=0)

# store result

print('>%.3f' % mae)

results.append(mae)

return results

# load dataset

X, y = get_dataset()

# evaluate model

results = evaluate_model(X, y)

# summarize performance

print('MAE: %.3f (%.3f)' % (mean(results), std(results)))

Running the example reports the MAE for each fold and each repeat, to give an idea of the evaluation progress.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

At the end, the mean and standard deviation MAE is reported. In this case, the model is shown to achieve a MAE of about 8.184.

You can use this code as a template for evaluating MLP models on your own multi-**output** regression tasks. The number of nodes and layers in the model can easily be adapted and tailored to the complexity of your dataset.

...

>8.054

>7.562

>9.026

>8.541

>6.744

MAE: 8.184 (1.032)

Once a model configuration is chosen, we can use it to fit a final model on all available data and make a prediction for new data.

The example below demonstrates this by first fitting the MLP model on the entire multi-**output** regression dataset, then calling the predict() function on the saved model in order to make a prediction for a new row of data.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

# use mlp for prediction on multi-**output** regression

from numpy import asarray

from sklearn.datasets import make_regression

from keras.models import Sequential

from keras.layers import Dense

# get the dataset

def get_dataset():

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=3, random_state=2)

return X, y

# get the model

def get_model(n_inputs, n_**output**s):

model = Sequential()

model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))

model.add(Dense(n_**output**s, kernel_initializer='he_uniform'))

model.compile(loss='mae', optimizer='adam')

return model

# load dataset

X, y = get_dataset()

n_inputs, n_**output**s = X.shape[1], y.shape[1]

# get model

model = get_model(n_inputs, n_**output**s)

# fit the model on all data

model.fit(X, y, verbose=0, epochs=100)

# make a prediction for new data

row = [-0.99859353,2.19284309,-0.42632569,-0.21043258,-1.13655612,-0.55671602,-0.63169045,-0.87625098,-0.99445578,-0.3677487]

newX = asarray([row])

yhat = model.predict(newX)

print('Predicted: %s' % yhat[0])

Running the example fits the model and makes a prediction for a new row.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

As expected, the prediction contains three **output** variables required for the multi-**output** regression task.

Predicted: [-152.22713 -78.04891 -91.97194]

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

## Summary

In this tutorial, you discovered how to develop deep learning models for multi-**output** regression.

Specifically, you learned:

Multi-**output**regression is a predictive modeling task that involves two or more numerical

**output**variables.Neural network models can be configured for multi-

**output**regression tasks.How to evaluate a neural network for multi-

**output**regression and make a prediction for new data.

Do you have any questions?Ask your questions in the comments below and I will do my best to answer.

## Develop Deep Learning Projects with Python!

What If You Could Develop A Network in Minutes...with just a few lines of Python

Discover how in my new Ebook: Deep Learning With Python

It covers end-to-end projects on topics like:Multilayer Perceptrons, Convolutional Nets and Recurrent Neural Nets, and more...

Finally Bring Deep Learning ToYour Own ProjectsSkip the Academics. Just Results.

See What's Inside