fromsklearn.model_selectionimportKFold# k fold cross validation
```
%% Cell type:markdown id: tags:
## Exercise 2
Train a model on the train data and predict on the test data. Use the code shown in the slides. Do this using one of the following networks: *LogisticRegression, KNeighborsClassifier, RandomForestClassifier*
**Use the X_train data to train the model and X_test data to test the performance. Which model works the best?**
%% Cell type:code id: tags:
``` python
# Your code
```
%% Cell type:markdown id: tags:
## Exercise 3
Let's do the same for the neural network. Play around with the different values. Add new layers, change the number of nodes etc. How well does the model perform?
%% Cell type:code id: tags:
``` python
## Change code
## define model
model=keras.Sequential()
# define the input layer
input_shape=x_train.shape[1]
input_shape=X_train.shape[1]
model.add(keras.Input(shape=input_shape))
# add hidden layers to the model (2 nodes with relu activation)
model.add(layers.Dense(15,activation='relu'))
# add output layer (1 node with sigmoi activation)
model.add(layers.Dense(1,activation='sigmoid'))
# compile the model: specify the loss, metrics and optimizer
For optimization of the hyper parameters we use cross validation. This also gives you some idea of the robustness of your model. In the following lines of code we will apply this technique to the ML and DL models.
%% Cell type:markdown id: tags:
## Exercise 4
**Fill in the number of cross validations you want to use and the type of scoring. The different scoring parameters you can use can be found here: https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter**
%% Cell type:code id: tags:
``` python
# You can play around with these values
n_cv=5# number of cross validations
score='accuracy'# scoring type
cv=KFold(n_cv,shuffle=True,random_state=random_state)# define the type of cross validation
```
%% Cell type:code id: tags:
``` python
# loop trough all the different model types and use cross validation (defined in previous cell) to roughly determive their performance
Normally, based on the performances of the data you are able to choose one or two models you want to optimise. However, for illustration purposees, we will first optimise a Random Forest. We will use the easiest approach, namely, *Grid Search* to find the best hyperparameters.
%% Cell type:code id: tags:
``` python
fromsklearn.model_selectionimportGridSearchCV# grid search using cross validation
```
%% Cell type:code id: tags:
``` python
# define the options of hyperparameters to use for a random forest as defined in
print('\nPerformance on test set:\nScore: {0:.2f}'.format(score_test))
```
%% Cell type:markdown id: tags:
## Question
Using the above example of code, improve the best performing ML model you found earlier. Check the sklearn documentation for the hyper pararmeters that can be used for your model.
%% Cell type:code id: tags:
``` python
# Your code
```
%% Cell type:markdown id: tags:
## Neural networks
The following code shows how to optimise a Neural Network. This works basically the same way, but we first have to enable our GridSearchCV to create different neural networks. For this we define the function *create_model*:
%% Cell type:code id: tags:
``` python
# using the Keras Classifier wrapper we can use the keras model the same way as a sklearn model
fromtensorflow.keras.wrappers.scikit_learnimportKerasClassifier# so we can use neural networks in sklearn
# update create_models for tuning
defcreate_model(hidden_layers=1,nodes=5,activation='relu',optimizer='rmsprop'):# define the parameters and their defeaults
# define model
model=keras.Sequential()
# define the input layer
input_shape=X.shape[1]
model.add(keras.Input(shape=input_shape))
# add hidden layers to the model (2 nodes with relu activation)
print('\nPerformance on test set:\nScore: {0:.2f}'.format(score_test))
```
%% Cell type:markdown id: tags:
## Question
**Try to improve the values for your DL model**
- Include different values for the hyper parameters
- Replace GriddSearchCV for another form of search (tip: use the BayesSearchCV from https://scikit-optimize.github.io/stable/auto_examples/sklearn-gridsearchcv-replacement.html) Install the package using the following code:
%% Cell type:code id: tags:
``` python
# install package to use BayesSearchCV
!pipinstallscikit-optimize
```
%% Cell type:code id: tags:
``` python
# Your code
```
%% Cell type:markdown id: tags:
## Final task
Save your model so you can validate it's performance next week. First run the cell with the pipeline which gave you the best results, the run the following cell: