Sunday, June 15, 2025

AI / ML : Celebrating a Tiny Success

 So, here I am -feeling blissful after completing a ML certificate course from Coursera  -  ready and eager to relax myself with a long drive in the evening (after all its a Sunday).

During last week, I had invested some time studying about the challenges in data quality and was brooding a bit why Mr.Andrew NG did not cover these things in the other ML/ Deep Learning courses that I have been taking up quite systematically from the same academy.

For me, it was a pleasant surprise to have some of the issues of data quality were addressed in this course - which is relatively smaller compared to other courses that I had underwent recently from Coursera - nevertheless, most enriching.

First thing first, this course is not about algorithms, python and its omnipotent libraries like pytorch / tensorflow or any other theoretical framework of AI/ML. It talks about the practical wisdom while implementing ML projects and fittingly the assessment towards the end of both modules only ask questions on practical decisions.

I should say I had a disdain to start off this course though I finished my earlier course on Deep learning / Neural networks was completed during last week of Apr 2025 itself. I had been pushing off this module with a thought that it might turn out to be superficial and uninteresting. On the contrary, the entire week had turned out to be quite gripping and keeping me hooked to the course and provided a lot of practical tips and tested processes.

Should a  person like me - having no domain knowledge - take up this course as the first one in AI/ML - my answer will be a clear NO.

I feel quite sincerely that I would not have understood this course as much as I have now - without the time invested to understand AI / ML in a structured manner during the past several months. As the course title itself says = "Structuring" ML projects, this course essentially helps us to structure our overall understanding.

Well, do you like the cherry on the top of Ice cream in a fruit salad ? 

I like it so much and liked this one too.....!! 

Thursday, June 12, 2025

AI / ML --> How ChatGPT Works (Courtesy : Andrej Karpathy)


This is a summary of a video available in Youtube  by none other than one of the AI Legends of current times (Andrej Karpathy). This video is amongst huge inspiration for myself in recent times to get deeper and wider into AI/ML.

I felt like posting this today for two reasons - First is about the compassion in my heart to all those people who don't have much time and patience to go through the video themselves. (Yes the you tube video is for a duration of 3 hours & 30 minutes ). The second reason let me put towards the end.

Well, I know there is a small version of this topic from Andrej himself is available too. Also I know these days LLMs can give a better nice summary than the below. However, I am compelled to post this one since I exist still and my craving to share is still intact.  

*********************************************************************************

As an introduction, Andrej explains that this video is not about ChatGPT alone - it is also about LLMs in general. Also, he makes it quite clear in the opening statement of this video  that this one is meant for general audience and no need for technical knowledge to understand this video. So here is how it goes.

Stages of Development like ChatGPT 

Typically there are 3 stages of development in any such LLMs which act as a platform for users to interact directly

Stage 1 : Pre-staging & neural Training -  High quality and  documents on diversified subjects are downloaded from internet and they are pre-processed in structured manner. For example all URL embedded in original text are deleted, duplication filters carried out, PII detected and removed. Entire data is converted into unique symbols (tokenized) and eventually crunched by re-processing the loads of data.

Neural Training - In this stage the data sequence is trained using probability and is done repeatedly to get the best sequences of words - which happens on a bunch of tokens but is repeated across the entire data set in parallel to improve the overall data sequence.

What gets created at the end of Neural training is referred as base model. It is still not ready for public to start interacting with it directly.

Stage 2 :  Post training - Turning the base model into instructor model needs post training. We need involvement of human beings for this - a pool of people will start creating "data set" which is essentially Q & A which are going to be used by the Base model to turn more intelligent and user friendly. 

Relatively speaking post training may take very less time and the people who are involved in this process are called "Human Labellers" who give the human touch that we all sense when we interact with ChatGPT. These people are normally well educated and experienced & they also ensure ethical standards while developing the responses to the hypothetical questions. We can understand that they may not be able to create all possible Q & A but the data sets will contain "persona" and models can understand how to interpret them based on the neural training provided to the base models.

Responses given by ChatGPT are just statistical imitations / simulation of human labellers and not any thing magical.  

Stage 3 : Reinforcement Training - Like stage 2, here also humans are involved. Why we need this stage can be understood with an analogy of text book. We have different layers of learning in any academic text book We may look the volume of text written similar to the first stage of training of ChatGPT . The illustrations and interactive questions during the course of any chapter is like Stage 2 explained above. At the end of each chapter, we do have only questions WITHOUT answers (perhaps final answers is normally given at the last few pages). This kind of learning is essentially achieved with Reinforcement Training.

For example if we ask a LLM to tell a joke, the output given by it is reviewed by humans to give their rating of best joke. In parallel  a reward model (which is a separate neural network) is asked their prioritization in the scale of 1-9. We compare the scores given from both sources and the model is given update based on the human ranking – so that humans need not involve in the entire rating exercise of joke. So reward model is nudged at the end of each iteration and move towards human score. 

In fact - as explained above, we only use RLHF (Reinforcement learning from Human feedback) instead of Reinforcement Learning going by the definition.

How does ChatGPT / Other LLMs work ? 

When a user asks a specific question, the chatbot first searches in its data bank created by the human labellers & even  if it is not available it is capable of imitating the training information and provide best possible response. It goes for internet search if needed.  The responses provided by ChatGPT may look very personalized and comprehensive at the same time but the reality is that it is just generating series of tokens. 

We can test this by asking the same question repeatedly, we can see Chatbot will reword or modify the responses each time without changing the core response. It is so eloquent with the tonnes of data that it has not just control over but also trained over. !

To summarize, responses generated by ChatGPT / other LLms are just statistical imitations / simulation of human labellers and not any thing magical.  

Myths abouts  ChatGPT and other LLMs

(1)  Hallucination

ChatGPT (or any other LLM) will never accept that it does not know. It searches its training data and try to give a response somehow.  This effect is called "hallucination" and we can avoid this by asking a question and giving a special comment "Do not use any tools". Now it cannot use internet or any other source of data and will admit its ignorance.

Mitigation strategy 1 : In fact some of the models provide methods to enrich the knowledge by provisioning the user to add up the new information to the training data. 

Mitigation strategy 2 : On such situation where we know that the LLM does not know, we can try to provide contextual data and ask our questions. It is smart enough and in fact more powerful by having contextual information. 

(2) Knowledge of self 

"Who are you"  is a very dumb question to ask to ChatGPT or any other LLM since it is possible to add it in training data or it can be a hardcoded response in some models Afterall the model is just a “token tumbler” & has no memory / personality of its own ; 

Mitigation strategy : Make use of Chat GPT to know more about things that you don't know and learn from it. No point trying to be smart to understand the source ! (Well there are few LLMs - Perplexity for instance - which essentially is built on Chat GPT but goes to the extend of giving references for its responses also . This is being provided to gain more credibility with users) 

(3) Question asking for arithmetic calculation  

We need to remember that a LLM operates with just an "one dimensional sequence of token" and the calculation will be done based on stream of tokens. For an arithmetic problem, it is natural for the LLM to give the response step by step - one after the other. If we insist on giving the calculated value first and then have the detailed step, it will be quite a complex thing for a ChatGPT considering the load on the token that is available for processing at each interaction with the user. So it is better to have the step wise responses for all arithmetic calculations. 

Mitigation Strategy : Better way to ask ChatGPT is after giving a arithmetic statement problem type “use code”. It will be more accurate & reliable since it uses the python arithmetic instead of mental arithmetic of the language model (“Model need token to think”). By using “code” it uses another part of the model where program is executed and just brings the result to the interactive screen

(4) As a subset of earlier point, ChatGPT is not good in counting. Ask “how many dots are in the below ……..” don’t be surprised if the answer is wrong. It is going to try counting the total number of dots which may have got split into different tokens. When we say “use code” it will calculate using Python loop function. 

Well Andrej gives an example that it was quite a popular joke that ChatGPT was not able to correctly count the number of 'r" in the word "strawberry" until recently. He remarks that it is no longer doing that mistake -perhaps got fixed by the ChatGPT team. 

To summarize, ChatGPT and other LLMs can be effectively used if we understand them better and use them wisely.

*********************************************************************************

So that was a quick summary of Andrej's video and hope you don't all the details / references that he provides in his video.

Btw, the second reason for this post is that today is my birthday and this post is just an expression of my bliss about today 

 

Regards // Suren 

 

 

 

Sunday, June 8, 2025

AI / ML --> Musings on Data Quality (Part 2)

So, Lets first look at couple of unavoidable issues in data quality. It is not that this challenge will be there in every project but if it is so, that needs to be recognized and handled properly. 

First one in the category is "Class Imbalance". Typically this happens in most of the classification problems where there is a natural imbalance in the distribution of the expected outcomes. For eg, let us consider a model being developed for cancer detection - it is quite natural that Number of people diagonised with cancer will be far less than those who do not have this disease. In this kind of situation, when we know that out of the total population only less than 2 % can have  a rare disease, we need to be doubly careful when the trained model is going to show 98% success rate (even 99 %). A large weight has to be attached to the wrong diagnosis to ensure that metrics behave in expected manner and a thorough review of wrong diagnosis would be needed on false positives and false negatives. This kind of challenge is also quite common in multi class classification models.

Next is the concept of "drift" which was referred in earlier post when "sampling bias' was explained. Particularly when the project life cycle is more  and in cases where the model is used in scenario where there are too many updates / changes to scenario, keeping in mind of Drift factor is very important for data scientists. Drift can be various types - it can happen to data (Example: In a medical dataset, patients now come from a new region with different average body weights, but the diagnosis rules haven’t changed) or to labels (Example: Credit card fraud increases post-COVID, so the % of fraud cases rises, but the fraud patterns are still detectable)  or to concept (Example: A spam filter becomes outdated as spammers change tactics over time). Needless to say the last one is the most dangerous type of drift.

Now, there is one more possible slip for the data science team to ensure the granularity of the data - in other words he level of detail or resolution at which data is captured or processed. Wrong granularity occurs when data is too coarse (aggregated too much) or too fine-grained (minor individual events / outliers taken into consideration) or  mismatch input data and target labels or multiple tables or sources are wrongly joined. This is one kind of quality challenge which falls within the scope of Data scientists team. It is some thing they can completely avoid by careful planning and seamless execution during data processing stages.

Now, lets come to the last one of data quality challenge which is referred as "Low Signal" which is also referred as SNR (Low signal-to-noise ratio). This hurts the AI/ML performance silently. If the features of the data set carry too little useful signal (predictive power) compared to random noise, the model may do either of the 3 things - overfit noise, fail to generalize or produce meaningless prediction. It might be interesting to notice that the outcomes are varied - so the early this is detected, the remediation is going to be effective else we will go by wrong diagnosis doing all irrelevant things to mitigate the issue.It is not uncommon that  the development team is asked to finetune the model and chase the algorithms instead of doing the right things.

So what are the right things when it comes to Low signal ? Right in Feature selection stage, it is important to remove relevant/redundant ones. Also, Data scientists can look at combining and transforming features to create better ones or use embedding technique to represent sparse features. Regularization is one technique that is adopted during development of model to attack this issue (as it will silence the noisy feature). There are also tools available like autoencoders which can extract latent structure and  discard weak noise in the data.

To summarize, Data Science is an exciting area of AI/ML. There is a popular proverb "Proof of the pudding is in the eating" which actually I realize is half truth. It should be a a statement that should be after "Guarantee of the pudding is in the making" !! The main advantage of Data Science compared to the process of making the pudding is that - we have wide range of tools and remediation measures to mitigate the challenges of poor data quality even if we miss it earlier. The subject got so evolved that it offers both preventive as well as remediation processes for each of the challenges. Relatively speaking, proof of pudding is more of a unavoidable result !! 

Friday, June 6, 2025

AI / ML ---> Musings on Data Quality

This post is focused on Data Quality - which as we saw in the previous post as one of the critical component of AI/ML model.  

Even during nascent stage,  Machine language was considered as an intersection of Data Science and Software Engineering. As Machine Language matured and also paved way to Artificial Intelligence, the importance of Data Science is only increasing. In fact for the Large LLMs and Generative AI technologies, Data Science is getting more and more crucial aspect than never before. 

With super efficiency brought about in building AI models these days - with deep neural networks and well tested scripts / functions, it is quite an irony that Data Quality continues to be a challenge. 

Lets look at  various aspects that impact Data Quality.

First and foremost, there are few Foundational Challenges which the data scientists need to take enormous care before processing the data. A key challenge is to ensure that data is fairly representative of real world scenario. This is referred as Sampling Bias where deliberately or inadvertently the data may not reflect the real life data (data which the users may eventually use in Production Environment). We will deal with concept of Data drift separately (which could be due to the time lag between development and deployment) but the major reason could be due to skew-ness or sampling errors. 

Another foundational challenge is missing / incomplete data. A thorough review of the data set for the completeness of various fields is vital for all types of supervised learning which fairly depend on  structured data. There are various methods to ensure this and also multiple techniques - simple (example delete the incomplete row of data or using domain specific rules) & complex (imputation or model agnostic hand ling). depending on the size and time available for the data scientists.

Data Duplication  is referred to the situation where the same data has duplicated itself set due to oversight. It could be a case of exact duplicates or near duplicates - some times it might be duplicating across Training set and Dev data set. When we end up in this kind of scenario, we will be misled by the model's performance and also  the credibility of  metrics will be at stake during development process.  

All the three above are basically at Data gathering stage which may be due to human error. However there are more human errors possible which are explained in the next part of the post. 

For all kinds of supervised models, labels (or the actual outcome) are critical to evaluate the model's predicted outcomes. To get the best quality of labels, normally human labelling is employed & even in the age of LLMs the importance of human labellers is not undermined. One mischievous kind of challenge is to have noisy labels which is a broad topic by itself. The labels could be incorrect, inconsistent or could be weak (when it is kept as rule - based). There can also be sensor or transcription errors. Without setting this right, starting off with model training will only be a crude joke.

Another subset of this is Annotation inconsistency which is quite common in classification problems. A particular kind of ambiguous situation can bThere are some challenges due to human errors which is some times tricky and happens at various stages - data gathering, human labelling & during data processing. Lets look at them one by one e decided in a particular manner by one of the human labeler while another one decides otherwise. When a team of people work on labelling this issue is in a way unavoidable unless we have clear rules on definition for each data element. Secondary verification or Group discussions can also help to substantially bridge the ambiguities. 

Like data leakages that we saw earlier, we can also have label leakages. This could be a serious situation because it means that the labels (actual outcome) has seeped into the input parameters  either by design of dataset or willfully by some team member to show better performance in training data. Extra care should be taken by Data Scientist to review all the input parameters closely and ensure that there is no signal / influence of the output given as part of input parameters.

 There are two more categories of challenges in data quality which can be grouped as (a) "unavoidable" in some cases but needs to be handled and (b) Quality focus of Data scientist team which I will handle in next post. In particular, I am keen to give more space to one of the Quality focus challenge called "Low Signal"  which was an eye opener for me to understand and appreciate the role of Data Scientists. 

Tuesday, June 3, 2025

AI / ML - Neurons - Biological vs Artificial

  • Forward pass gives structure.

  • Activation functions give expressiveness.

  • Backward pass enables learning

  • Optimization algorithms guide learning.

  • Data brings meaning.

 Life goes on with priorities in our hand and which are urgent - isn't it ?

Sunday, June 1, 2025

AI / ML - History of Activations

 In one of my favorite film (Tamil) which deals with the greatness of a 5th Century  spiritual master  (Bodhi Dharma),  there is a wonderful dialogue - Film - Aezhaam Arivu (which means "7th sense")

" We started losing our science when we started ignoring our history and heritage"

***********************************************************************************

 In 1943, "Step" function was used to have the early neural network just to enable "fire" or "don't fire"  in other word - "give output or keep quiet based on the computation made" (no details covered in this post please). You can appreciate this is the most simplistic way of looking at things but the data scientists those days were sincere to mimick their limited understanding of human brain which they observed that only few of the neurons of the brain were activated at any point of time.

Understandably this milestone made their initial neural network to continue the data processing in neural networks - or what is referred as "forward propagation". However, it lacked the ability to support the reverse calculation of the neural network automatically which is critical to optimize the output. Can you believe the AI experts used to manually work on derivatives (calculus) to handle the back propagation since that was not enabled by "step" function ? It was cumbersome but there were no choice for them.

During 1970 - 80, a breakthrough was achieved by using sigmoid function in the basic neural networks (also referred as shallow networks which did not have many layers) which helped  optimization when the concept of gradient descent (a method to do the reverse calculation / backward propagation automatically) was developed. 

However, since - as we know - sigmoid function returns a value 0 to 1, there were challenges of "vanishing gradients" during the backward propagation process so the whole idea of optimization was stuck up. In the year 1980, a much better activation called tanh function started getting used which gives an output in the range -1 to 1 which helped to avoid the earlier challenge substantially. However still when there are situations of large values of inputs or very small values of inputs, optimization still suffered.

After quite a while which also marked the advent of "deep learning era" (supported by deep neural networks (allowing multiple hidden layers and large language models), during 2010, RELU activation function  (Proud to say one of the co-founder was Vinod Nair - though he lives in Canada) . This activation is very simple (output either maximum of the input value or zero which ever is higher) and the computational cost was minimal.  It has almost  taken out the woes of optimization and was a huge shot in the arm of deep learning. Even today, though there are many other sophisticated activation functions available, when in doubt or unsure, developing community goes for ReLU for hidden layer activation as a safe bet. Is it the best one available today ? It still has issue with returning values zeros some times.

During 2011-15, a very smart variation of RELU (returns either input value or .001 of input value) named as "Leaky Relu" was introduced to adjust against the issues faced by zero value of the computed value since now it wont return zero any more but a tiny value instead  and keep the backward propagation going with non zero values ! 

In parallel, we had softmax function introduced in deep neural network based on the need to return a probablistic output for a chosen set of values. We should be clear that this was more out of need than any logical progression in the history described so far.

After 2015, we have Swish, Mish , GELU and so on  which has made smoother activation possible for ultra deep networks ..and this is not an exhaustive list of activation functions.

 Well, my constant companion ChatGPT gave a nice idea to remember the history for people over 40 years of age (when obviously the neurons start dying quite rapidly)

*************************************************************************

 "Some Teachers Run Like Super Geniuses"

S = Step & Sigmoid

T = Tanh

R = ReLU

L = Leaky ReLU

S = SWISH

G = GELU

**************************************************************************

 Another Memory Anchor - 

First they 'stepped' (binary), then made it smooth (Sigmoid), then centered (tanh), then said 'lets forget curves, just 'cut'' (ReLU), fixed 'dying neurons' (Leaky ReLU) and finally started 'smart, curvy activations' (SWISH, GELU)


 

 

 

 

 

 

 

Thursday, May 29, 2025

AI/ML ----> Paradox - "part 2" - Neural Network Activations

So, the moral of today's blog is that we get out of challenges only to get caught into newer issues. :-)  That is how human evolution too happened right ?? 

Did  by mistake the postscript of previous post  is copy pasted here ? When we deal with aspects of  truth which is so overwhelming, is it not natural to repeat, re-emphasis  and reiterate ?  

Paradox and Contradictions is in all aspects of our lives - so it is quite easy to understand few nuances of AI/ML if we relate it with this grand truth. I am continuing the same theme of last post in this post albeit with a different topic.

So lets deal with "Activation" function which is a "basic" concept in neural networks.

This concept has become a familiar one in machine learning era also when ML engineers started experimenting with neural networks as early as 1943 ! Before going down the memory lane let me give a contextual understanding so that switching gears will be smoother.

 The essential difference between the traditional Machine learning models - like Linear regression (which uses linear equation) & logistics regression (which uses sigmoid function)  and neural network is  what is referred as "hidden layers". Neural network uses hidden layers while the traditional ML models were not sophisticated enough to use it.If there is just 1 hidden layer we call it as "shallow" neural network positioned between input layer and output layer. Hidden layer may contain one or many neurons. (well, it could be a very rare situation to have just 1 neuron in a hidden layer but still technically feasible if practically required). 

Essentially what does such hidden layers do ? 

The hidden layers take an input value from previous layer, process it and pass on its output to next layer. As simple as that. 

Processing here essentially means two components - (1) computing operation (which is referred to calculating Z value) and (2) transforming that value (with the help of an activation function)

 The funny thing about activation is that we have multiple choices based on the requirements - the purpose of model, the volume and pattern of data, the complexity of design and also aspects like budget & infrastructure availability. These days the default activation function used is RELU (which is acronym of Rectified Linear Unit) but there are many other smart activation available too.

Now let me zero in on the intent of this post - paradox - isn't it ? When we say there are multiple options today, the paradox is that we don't use sigmoid function and linear function as activation layer in hidden layers.  Both of them are relevant only for output layers and sigmoid was used until a point of time when RELU was made public (Linear function was never attempted to be used in hidden layer since it won't "transform" the output. why to create hidden layers then ?)

I wanted to go deeper into the history of these activation functions but don't want to make this post any longer since I hit the bull's eye already. Do you see the paradox. The original functions that started off machine learning are out-dated and become useless with modern neural networks. In the next post let me explain the history of various activation functions & after that would like to jump in for another sequel to lament about philosophical aspect of this "activation" and what is the purpose of "activation"  in grand scheme of things.

By the way, with my limited exposure to AI/ML, I am convinced that this subject is not for those who have a dis-taste for philosophy. Come on, if we don't see the connection between the most materialistic things and the abstract aspects of life, I am sorry, you are missing life.

 So Lets jump to philosophy after exploring history.. ok ? Stay tuned.


 

 

 

                    

 

Wednesday, May 28, 2025

AI / ML ---> Amusing Paradox

"One of my PhD students from Stanford, many years after he'd already graduated from Stanford, once said to me that while he was studying at Stanford, he learned about bias and variance and felt like he got it, he understood it. But that subsequently, after many years of work experience in a few different companies, he realized that bias and variance is one of those concepts that takes a short time to learn, but takes a lifetime to master. Those were his exact words. Bias and variance is one of those very powerful ideas"  - Andrew N.G

 Let me talk about one amusing concept today in ML/AI learning - the trade off between Bias and  Variance.

Essentially all these Machine Learning / AI models consist of 3 steps - Create a model, train the same, Test/Validate it before actual release. Out of the entire data set, always a portion is reserved for testing / validation - generally referred as Dev set. That's the fundamental knowledge that you will need to go thru this post.

During Training we may face an issue of "under-fitting" or Bias which means that the model's is not doing so great with the training data. It could be due to simplistic assumptions made OR not considering the key features  / input parameters. Understandably when we train the model, it would come to light  very quick pointing the need for refinement of the model to make it more accurate in training stage itself.

Most of the times, a model quite successful during training  may not produce as good results during validation / testing. This is the other side of the coin - the challenge of Variance problem which is referred as "over-fitting". It may be an isolated issue or also as a consequence of too much of bias-treatment given during training stage.

To given an analogy, a child pampered too much at home (model treatment for "bias" problem) may end up not so-well-behaved in public isn't it ? We may consider the data set reserved for testing / validation to be equivalent to going public (or the future performance) & that goes for a task if we try to handle bias problem too much.

Historically, during pre-neural network days, this was always referred as "Bias-Variance" Trade off when we were using Linear regressions, Simple Decision Trees and K-Means algorithms. For example, in case of Linear regression, if we add a lot of input features, we can reduce bias substantially but we will find it reflecting adversely when we test the model with new set of data since the model gets too much "attached" to the training data set. Similarly when we use a higher order of polynomial function in linear regression, bias problem can be effectively reduced but it will reflect very badly during testing.

 With the Deep Learning era where we have complex and deeper neural network coupled with large volumes of data, are we better off now ? We still have the same issues that exist earlier but the way we handle them has changed with better tools in hand.

To begin with, it has been time and again proved that a deep neural network is capable of handling Bias problem effectively without impacting the variance issue in any manner WHEN we use appropriate regularization factor. While building neural network we don't get into the paradox type of situation any more - for instance, the high level thumb rule is "Solve bias problem with more complex neural network" and there after "Use more volume of data to tackle the variance problem". 

What is stated above is bit simplistic but we have more number of tools in hand - the hyper parameter tuning, batch normalization, drop outs, early stoppages - we are better equipped to handle the Bias Variance issues. There is of course a cost associated to this luxury - well, larger network means more cost and need for better and larger  IT infrastructure. Similarly on the side of having more data,  it may not be always possible to get large data for all kinds of situations.

So, the moral of today's blog is that we get out of challenges only to get caught into newer issues. :-)  That is how human evolution too happened right ?? 

Just musing..... 

 Suren

Sunday, May 25, 2025

AI a.k.a Agandam

 A year back, just to cope up with official pressure (?) on up-skilling,  I took up Azure Fundamentals course from Microsoft and as a natural sequel, I also took up Azure AI Fundamentals. Amongst an array of courses offered by MS why I did choose AI ?  

Mainly because I felt AI could alone capture my imagination - the faculty that kept me going until now in spite of growing old - year after year.

Yep.. without having a clue on what AI is capable of, I just picked up a single criteria for choosing to get into it - it sounded creative from my layman's view.

Well, as I look back the past 1 year, I was taking up one course after other from a reputed university (Coursera - who has a tie up with Stanford University) where I learnt pretty much of conceptual knowledge during past 7- 8 months continually. after the MS AI fundamentals course. Every single concept of AI/M I learnt left only with awe and admiration.

Today as I look at this brief journey that I have completed so far, I find myself lot more humble with my learning than feeling "fullness" of what I learnt. May be because, i had a very scientific approach in my learning ? Lemme explain...

 As I was understanding the concepts of AI from the soft-spoken and pleasant-faced Andrew.NG - an expert at Coursera, I was also exploring specific nuances at more detail with the help of Chat GPT and other popular models, I was also trying my hands with Python with a feverish aim to "understand" the codes if not write them myself, I was taking assistance from an overseas nephew who is an AI Engineer and also had the patience to educate me on my silly doubt. Above all, I had the guts to watch Andrej Karpathy's you tube videos  which literally made my jaws wide open. I did not understand every thing he spoke about, but I was drawn helplessly towards his "Zero to Hero" series which showed me the peak of what I am trying to attempt with my baby steps.

May be this kind of varied approach of learning made the entire experience so far quite a joy and instilling me a craving to march further and farther.... and when the time came to take some decisive steps towards my dreams, I am helplessly drawn back to blogging.

Yes around 17-18  years back when I first started my blogging experience with a senior cousin he rightfully titled it as "Jolly Musings" and then I launched this very blog for myself by end of 2008 to continue my tantrums about the existence - so I titled it appropriately as "AGANDAM" - which in Tamil means vastness.

I was dishing out random thoughts, writing poems, wrote film reviews, giving my pearls of wisdom on little things about life but as I see those posts now, I can see it was rudderless. It contained lot of energy and sparks of intelligence, but it was not going any where tangible. Well, Sort of stopped writing 3 years back but then I got reminded today to re-start my blogging - to share my delight on my AI learning, shout out my bliss moments and move towards next plane of my "new" learning endeavour.

Here I am writing this at 1.45 AM on ShivRathri (a day before New moon day) which is meant to be the darkest night of entire month. Yes...Let the blog restart like a spark in this dark night and  glow bigger and bigger with sure-footed march ahead !! 

I am re-christened the name of the blog as "AI alias Agandam" which is just an extension of what I used to brag about my erstwhile name "Agandam". Well, scope is going to be narrowed down now - neigh - rather going to be more focussed

Krishnaarpanam - As always