So, the moral of today's blog is that we get out of challenges only to get caught into newer issues. :-) That is how human evolution too happened right ??
Did by mistake the postscript of previous post is copy pasted here ? When we deal with aspects of truth which is so overwhelming, is it not natural to repeat, re-emphasis and reiterate ?
Paradox and Contradictions is in all aspects of our lives - so it is quite easy to understand few nuances of AI/ML if we relate it with this grand truth. I am continuing the same theme of last post in this post albeit with a different topic.
So lets deal with "Activation" function which is a "basic" concept in neural networks.
This concept has become a familiar one in machine learning era also when ML engineers started experimenting with neural networks as early as 1943 ! Before going down the memory lane let me give a contextual understanding so that switching gears will be smoother.
The essential difference between the traditional Machine learning models - like Linear regression (which uses linear equation) & logistics regression (which uses sigmoid function) and neural network is what is referred as "hidden layers". Neural network uses hidden layers while the traditional ML models were not sophisticated enough to use it.If there is just 1 hidden layer we call it as "shallow" neural network positioned between input layer and output layer. Hidden layer may contain one or many neurons. (well, it could be a very rare situation to have just 1 neuron in a hidden layer but still technically feasible if practically required).
Essentially what does such hidden layers do ?
The hidden layers take an input value from previous layer, process it and pass on its output to next layer. As simple as that.
Processing here essentially means two components - (1) computing operation (which is referred to calculating Z value) and (2) transforming that value (with the help of an activation function)
The funny thing about activation is that we have multiple choices based on the requirements - the purpose of model, the volume and pattern of data, the complexity of design and also aspects like budget & infrastructure availability. These days the default activation function used is RELU (which is acronym of Rectified Linear Unit) but there are many other smart activation available too.
Now let me zero in on the intent of this post - paradox - isn't it ? When we say there are multiple options today, the paradox is that we don't use sigmoid function and linear function as activation layer in hidden layers. Both of them are relevant only for output layers and sigmoid was used until a point of time when RELU was made public (Linear function was never attempted to be used in hidden layer since it won't "transform" the output. why to create hidden layers then ?)
I wanted to go deeper into the history of these activation functions but don't want to make this post any longer since I hit the bull's eye already. Do you see the paradox. The original functions that started off machine learning are out-dated and become useless with modern neural networks. In the next post let me explain the history of various activation functions & after that would like to jump in for another sequel to lament about philosophical aspect of this "activation" and what is the purpose of "activation" in grand scheme of things.
By the way, with my limited exposure to AI/ML, I am convinced that this subject is not for those who have a dis-taste for philosophy. Come on, if we don't see the connection between the most materialistic things and the abstract aspects of life, I am sorry, you are missing life.
So Lets jump to philosophy after exploring history.. ok ? Stay tuned.
No comments:
Post a Comment