Kaggle Grupo Bimbo Neural Network Implementation

I learnt about Neural Network during my machine learning lectures. And as our project we had to solve a problem on Kaggle. Kaggle is a platform which provides real world machine learning problems.

Before going through this example I suggest you to go through this blog post (http://iamtrask.github.io/2015/07/12/basic-python-network/) on implementing a small neural network. 


In machine learning and cognitive science, an artificial neural network (ANN) is a network inspired by biological neural networks (the central nervous systems of animals, in particular the brain) which are used to estimate or approximate functions that can depend on a large number of inputs that are generally unknown. Artificial neural networks are typically specified using three things. 


We had to solve this(https://www.kaggle.com/c/grupo-bimbo-inventory-demand) problem. Problem description is as follows

Maximize sales and minimize returns of bakery goods

Planning a celebration is a balancing act of preparing just enough food to go around without being stuck eating the same leftovers for the next week. The key is anticipating how many guests will come. Grupo Bimbo must weigh similar considerations as it strives to meet daily consumer demand for fresh bakery products on the shelves of over 1 million stores along its 45,000 routes across Mexico.

Currently, daily inventory calculations are performed by direct delivery sales employees who must single-handedly predict the forces of supply, demand, and hunger based on their personal experiences with each store. With some breads carrying a one week shelf life, the acceptable margin for error is small.

In this competition, Grupo Bimbo invites Kagglers to develop a model to accurately forecast inventory demand based on historical sales data. Doing so will make sure consumers of its over 100 bakery products aren’t staring at empty shelves, while also reducing the amount spent on refunds to store owners with surplus product unfit for sale.


The implementation can be found here : https://github.com/ArtisanHub/GrupoBimboKaggle/blob/master/NeuralNetwork/KaggleNeuralNetwork.py

Though we have used several implementations, I will only talk about the Neural Network Implementation we have used

Neural Network is a widely used Prediction Technique for Large Dataset. For Kaggle, we have implemented Neural Networks which is the basic well known technique of using Sequential Model of Layers. We  have used a python library called Keras to implement the Neural Network. Keras is capable of running on top of either TensorFlow or Theano. TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently

Our Implementation had 3 layered neural network as you can see the figure below, which provides a basic design to give a basic idea.

Neural Network

Figure :Basic design for Three Layer Neural Network

When implementing neural network with Keras, we need to set the parameters for each layer. We can create a model object and add the layers, the way we want. For each layers we have to three compulsory things

  1. Number of neurons
  2. Init
    Name of initialization function for the weights of the layer, or alternatively, Theano function to use for weights initialization. This parameter is only relevant if you don’t pass a weights argument.
  3. Activation
    name of activation function to use, or alternatively, elementwise Theano function.
  Number of neurons init activation
Layer 1 12 uniform linear
Layer 2 8 uniform linear
Layer 3 1 uniform linear

We have used a Linear Activation function for the implementation because Group Bimbo’ is numerical value prediction problem where we get real valued results for “Demanda_uni_equil”.

In this network there are 12 Neurons for the input where we have 12 columns in the dataset. And we made 8 hidden neurons for the middle layer and one single output neuron for the predicting value.

Before training the model we need to configure the learning process, which is done via the Compile method. We have used the following configurations. (Note that we have used the two compulsory configurations.)

  • Loss
    This is called as the objective function. You can either pass the name of an existing objective, or pass a Theano/TensorFlow symbolic function that returns a scalar for each data-point and takes the following two arguments. We had used mae (Mean absolute Error) for our implementation.
  • Optimizer
    As the optimizer we have used Adam which was provided by the library. Adam is an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.

Training a network means finding the best set of weights to make predictions. In training we fit the compiled model to fit to the dataset (training dataset). This is done using fit() method provided by Keras. Here we define number of iterations we need to go through the network. Hence the weights are adjust for better values number of iterations we define in the fit() function. Now we evaluate the model we trained on the dataset we are going to test. This gives us an idea of how the model is related with the test data we have.After the evaluation we can decide whether to keep up the model or modify it accordingly.

Accuracy of the trained model can be calculated using the evaluate  method provided by the library.

Finally we predicted the Demanda_uni_equilvalues for each instance of the test dataset using the predict() method provided by the library.

Main reason behind the implementation of Neural Network is Adaptive learning, An ability to learn how to do tasks based on the data given for training or initial experience. The other reason was Self-Organisation. An Artificial Neural Network can create its own organisation or representation of the information it receives during learning time.Even Though it was slow ( Lazy Learning technique) we could get better results than above described techniques.

As I mentioned earlier this was not the best solution for this problem. When implementing the Neural Network we knew the main limitation was going to be the time. So initially we trained the model by only using two weeks and tested the trained model with the rest of the weeks and managed to get a accuracy of 20%. We had used the inbuilt method provided by the keras library to evaluate the model. To train the model for two weeks it took around 30 hours. We have managed to get a 40% of accuracy by using 5 weeks for training and the rest for testing. Training with 5 weeks took almost 3 days. Since this process was taking a lot of time, we limited the hidden layers to a single one. Though we have implemented the Neural Network we couldn’t train the model using the complete dataset because of the time it’s taking to train.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.