Thursday, May 29, 2025

AI/ML ----> Paradox - "part 2" - Neural Network Activations

So, the moral of today's blog is that we get out of challenges only to get caught into newer issues. :-)  That is how human evolution too happened right ?? 

Did  by mistake the postscript of previous post  is copy pasted here ? When we deal with aspects of  truth which is so overwhelming, is it not natural to repeat, re-emphasis  and reiterate ?  

Paradox and Contradictions is in all aspects of our lives - so it is quite easy to understand few nuances of AI/ML if we relate it with this grand truth. I am continuing the same theme of last post in this post albeit with a different topic.

So lets deal with "Activation" function which is a "basic" concept in neural networks.

This concept has become a familiar one in machine learning era also when ML engineers started experimenting with neural networks as early as 1943 ! Before going down the memory lane let me give a contextual understanding so that switching gears will be smoother.

 The essential difference between the traditional Machine learning models - like Linear regression (which uses linear equation) & logistics regression (which uses sigmoid function)  and neural network is  what is referred as "hidden layers". Neural network uses hidden layers while the traditional ML models were not sophisticated enough to use it.If there is just 1 hidden layer we call it as "shallow" neural network positioned between input layer and output layer. Hidden layer may contain one or many neurons. (well, it could be a very rare situation to have just 1 neuron in a hidden layer but still technically feasible if practically required). 

Essentially what does such hidden layers do ? 

The hidden layers take an input value from previous layer, process it and pass on its output to next layer. As simple as that. 

Processing here essentially means two components - (1) computing operation (which is referred to calculating Z value) and (2) transforming that value (with the help of an activation function)

 The funny thing about activation is that we have multiple choices based on the requirements - the purpose of model, the volume and pattern of data, the complexity of design and also aspects like budget & infrastructure availability. These days the default activation function used is RELU (which is acronym of Rectified Linear Unit) but there are many other smart activation available too.

Now let me zero in on the intent of this post - paradox - isn't it ? When we say there are multiple options today, the paradox is that we don't use sigmoid function and linear function as activation layer in hidden layers.  Both of them are relevant only for output layers and sigmoid was used until a point of time when RELU was made public (Linear function was never attempted to be used in hidden layer since it won't "transform" the output. why to create hidden layers then ?)

I wanted to go deeper into the history of these activation functions but don't want to make this post any longer since I hit the bull's eye already. Do you see the paradox. The original functions that started off machine learning are out-dated and become useless with modern neural networks. In the next post let me explain the history of various activation functions & after that would like to jump in for another sequel to lament about philosophical aspect of this "activation" and what is the purpose of "activation"  in grand scheme of things.

By the way, with my limited exposure to AI/ML, I am convinced that this subject is not for those who have a dis-taste for philosophy. Come on, if we don't see the connection between the most materialistic things and the abstract aspects of life, I am sorry, you are missing life.

 So Lets jump to philosophy after exploring history.. ok ? Stay tuned.


 

 

 

                    

 

Wednesday, May 28, 2025

AI / ML ---> Amusing Paradox

"One of my PhD students from Stanford, many years after he'd already graduated from Stanford, once said to me that while he was studying at Stanford, he learned about bias and variance and felt like he got it, he understood it. But that subsequently, after many years of work experience in a few different companies, he realized that bias and variance is one of those concepts that takes a short time to learn, but takes a lifetime to master. Those were his exact words. Bias and variance is one of those very powerful ideas"  - Andrew N.G

 Let me talk about one amusing concept today in ML/AI learning - the trade off between Bias and  Variance.

Essentially all these Machine Learning / AI models consist of 3 steps - Create a model, train the same, Test/Validate it before actual release. Out of the entire data set, always a portion is reserved for testing / validation - generally referred as Dev set. That's the fundamental knowledge that you will need to go thru this post.

During Training we may face an issue of "under-fitting" or Bias which means that the model's is not doing so great with the training data. It could be due to simplistic assumptions made OR not considering the key features  / input parameters. Understandably when we train the model, it would come to light  very quick pointing the need for refinement of the model to make it more accurate in training stage itself.

Most of the times, a model quite successful during training  may not produce as good results during validation / testing. This is the other side of the coin - the challenge of Variance problem which is referred as "over-fitting". It may be an isolated issue or also as a consequence of too much of bias-treatment given during training stage.

To given an analogy, a child pampered too much at home (model treatment for "bias" problem) may end up not so-well-behaved in public isn't it ? We may consider the data set reserved for testing / validation to be equivalent to going public (or the future performance) & that goes for a task if we try to handle bias problem too much.

Historically, during pre-neural network days, this was always referred as "Bias-Variance" Trade off when we were using Linear regressions, Simple Decision Trees and K-Means algorithms. For example, in case of Linear regression, if we add a lot of input features, we can reduce bias substantially but we will find it reflecting adversely when we test the model with new set of data since the model gets too much "attached" to the training data set. Similarly when we use a higher order of polynomial function in linear regression, bias problem can be effectively reduced but it will reflect very badly during testing.

 With the Deep Learning era where we have complex and deeper neural network coupled with large volumes of data, are we better off now ? We still have the same issues that exist earlier but the way we handle them has changed with better tools in hand.

To begin with, it has been time and again proved that a deep neural network is capable of handling Bias problem effectively without impacting the variance issue in any manner WHEN we use appropriate regularization factor. While building neural network we don't get into the paradox type of situation any more - for instance, the high level thumb rule is "Solve bias problem with more complex neural network" and there after "Use more volume of data to tackle the variance problem". 

What is stated above is bit simplistic but we have more number of tools in hand - the hyper parameter tuning, batch normalization, drop outs, early stoppages - we are better equipped to handle the Bias Variance issues. There is of course a cost associated to this luxury - well, larger network means more cost and need for better and larger  IT infrastructure. Similarly on the side of having more data,  it may not be always possible to get large data for all kinds of situations.

So, the moral of today's blog is that we get out of challenges only to get caught into newer issues. :-)  That is how human evolution too happened right ?? 

Just musing..... 

 Suren

Sunday, May 25, 2025

AI a.k.a Agandam

 A year back, just to cope up with official pressure (?) on up-skilling,  I took up Azure Fundamentals course from Microsoft and as a natural sequel, I also took up Azure AI Fundamentals. Amongst an array of courses offered by MS why I did choose AI ?  

Mainly because I felt AI could alone capture my imagination - the faculty that kept me going until now in spite of growing old - year after year.

Yep.. without having a clue on what AI is capable of, I just picked up a single criteria for choosing to get into it - it sounded creative from my layman's view.

Well, as I look back the past 1 year, I was taking up one course after other from a reputed university (Coursera - who has a tie up with Stanford University) where I learnt pretty much of conceptual knowledge during past 7- 8 months continually. after the MS AI fundamentals course. Every single concept of AI/M I learnt left only with awe and admiration.

Today as I look at this brief journey that I have completed so far, I find myself lot more humble with my learning than feeling "fullness" of what I learnt. May be because, i had a very scientific approach in my learning ? Lemme explain...

 As I was understanding the concepts of AI from the soft-spoken and pleasant-faced Andrew.NG - an expert at Coursera, I was also exploring specific nuances at more detail with the help of Chat GPT and other popular models, I was also trying my hands with Python with a feverish aim to "understand" the codes if not write them myself, I was taking assistance from an overseas nephew who is an AI Engineer and also had the patience to educate me on my silly doubt. Above all, I had the guts to watch Andrej Karpathy's you tube videos  which literally made my jaws wide open. I did not understand every thing he spoke about, but I was drawn helplessly towards his "Zero to Hero" series which showed me the peak of what I am trying to attempt with my baby steps.

May be this kind of varied approach of learning made the entire experience so far quite a joy and instilling me a craving to march further and farther.... and when the time came to take some decisive steps towards my dreams, I am helplessly drawn back to blogging.

Yes around 17-18  years back when I first started my blogging experience with a senior cousin he rightfully titled it as "Jolly Musings" and then I launched this very blog for myself by end of 2008 to continue my tantrums about the existence - so I titled it appropriately as "AGANDAM" - which in Tamil means vastness.

I was dishing out random thoughts, writing poems, wrote film reviews, giving my pearls of wisdom on little things about life but as I see those posts now, I can see it was rudderless. It contained lot of energy and sparks of intelligence, but it was not going any where tangible. Well, Sort of stopped writing 3 years back but then I got reminded today to re-start my blogging - to share my delight on my AI learning, shout out my bliss moments and move towards next plane of my "new" learning endeavour.

Here I am writing this at 1.45 AM on ShivRathri (a day before New moon day) which is meant to be the darkest night of entire month. Yes...Let the blog restart like a spark in this dark night and  glow bigger and bigger with sure-footed march ahead !! 

I am re-christened the name of the blog as "AI alias Agandam" which is just an extension of what I used to brag about my erstwhile name "Agandam". Well, scope is going to be narrowed down now - neigh - rather going to be more focussed

Krishnaarpanam - As always