This is a summary of a video available in Youtube by none other than one of the AI Legends of current times (Andrej Karpathy). This video is amongst huge inspiration for myself in recent times to get deeper and wider into AI/ML.
I felt like posting this today for two reasons - First is about the compassion in my heart to all those people who don't have much time and patience to go through the video themselves. (Yes the you tube video is for a duration of 3 hours & 30 minutes ). The second reason let me put towards the end.
Well, I know there is a small version of this topic from Andrej himself is available too. Also I know these days LLMs can give a better nice summary than the below. However, I am compelled to post this one since I exist still and my craving to share is still intact.
*********************************************************************************
As an introduction, Andrej explains that this video is not about ChatGPT alone - it is also about LLMs in general. Also, he makes it quite clear in the opening statement of this video that this one is meant for general audience and no need for technical knowledge to understand this video. So here is how it goes.
Stages of Development like ChatGPT
Typically there are 3 stages of development in any such LLMs which act as a platform for users to interact directly
Stage 1 : Pre-staging & neural Training - High quality and documents on diversified subjects are downloaded from internet and they are pre-processed in structured manner. For example all URL embedded in original text are deleted, duplication filters carried out, PII detected and removed. Entire data is converted into unique symbols (tokenized) and eventually crunched by re-processing the loads of data.
Neural Training - In this stage the data sequence is trained using probability and is done repeatedly to get the best sequences of words - which happens on a bunch of tokens but is repeated across the entire data set in parallel to improve the overall data sequence.
What gets created at the end of Neural training is referred as base model. It is still not ready for public to start interacting with it directly.
Stage 2 : Post training - Turning the base model into instructor model needs post training. We need involvement of human beings for this - a pool of people will start creating "data set" which is essentially Q & A which are going to be used by the Base model to turn more intelligent and user friendly.
Relatively speaking post training may take very less time and the people who are involved in this process are called "Human Labellers" who give the human touch that we all sense when we interact with ChatGPT. These people are normally well educated and experienced & they also ensure ethical standards while developing the responses to the hypothetical questions. We can understand that they may not be able to create all possible Q & A but the data sets will contain "persona" and models can understand how to interpret them based on the neural training provided to the base models.
Responses given by ChatGPT are just statistical imitations / simulation of human labellers and not any thing magical.
Stage 3 : Reinforcement Training - Like stage 2, here also humans are involved. Why we need this stage can be understood with an analogy of text book. We have different layers of learning in any academic text book We may look the volume of text written similar to the first stage of training of ChatGPT . The illustrations and interactive questions during the course of any chapter is like Stage 2 explained above. At the end of each chapter, we do have only questions WITHOUT answers (perhaps final answers is normally given at the last few pages). This kind of learning is essentially achieved with Reinforcement Training.
For example if we ask a LLM to tell a joke, the output given by it is reviewed by humans to give their rating of best joke. In parallel a reward model (which is a separate neural network) is asked their prioritization in the scale of 1-9. We compare the scores given from both sources and the model is given update based on the human ranking – so that humans need not involve in the entire rating exercise of joke. So reward model is nudged at the end of each iteration and move towards human score.
In fact - as explained above, we only use RLHF (Reinforcement learning from Human feedback) instead of Reinforcement Learning going by the definition.
How does ChatGPT / Other LLMs work ?
When a user asks a specific question, the chatbot first searches in its data bank created by the human labellers & even if it is not available it is capable of imitating the training information and provide best possible response. It goes for internet search if needed. The responses provided by ChatGPT may look very personalized and comprehensive at the same time but the reality is that it is just generating series of tokens.
We can test this by asking the same question repeatedly, we can see Chatbot will reword or modify the responses each time without changing the core response. It is so eloquent with the tonnes of data that it has not just control over but also trained over. !
To summarize, responses generated by ChatGPT / other LLms are just statistical imitations / simulation of human labellers and not any thing magical.
Myths abouts ChatGPT and other LLMs
(1) Hallucination
ChatGPT (or any other LLM) will never accept that it does not know. It searches its training data and try to give a response somehow. This effect is called "hallucination" and we can avoid this by asking a question and giving a special comment "Do not use any tools". Now it cannot use internet or any other source of data and will admit its ignorance.
Mitigation strategy 1 : In fact some of the models provide methods to enrich the knowledge by provisioning the user to add up the new information to the training data.
Mitigation strategy 2 : On such situation where we know that the LLM does not know, we can try to provide contextual data and ask our questions. It is smart enough and in fact more powerful by having contextual information.
(2) Knowledge of self
"Who are you" is a very dumb question to ask to ChatGPT or any other LLM since it is possible to add it in training data or it can be a hardcoded response in some models Afterall the model is just a “token tumbler” & has no memory / personality of its own ;
Mitigation strategy : Make use of Chat GPT to know more about things that you don't know and learn from it. No point trying to be smart to understand the source ! (Well there are few LLMs - Perplexity for instance - which essentially is built on Chat GPT but goes to the extend of giving references for its responses also . This is being provided to gain more credibility with users)
(3) Question asking for arithmetic calculation
We need to remember that a LLM operates with just an "one dimensional sequence of token" and the calculation will be done based on stream of tokens. For an arithmetic problem, it is natural for the LLM to give the response step by step - one after the other. If we insist on giving the calculated value first and then have the detailed step, it will be quite a complex thing for a ChatGPT considering the load on the token that is available for processing at each interaction with the user. So it is better to have the step wise responses for all arithmetic calculations.
Mitigation Strategy : Better way to ask ChatGPT is after giving a arithmetic statement problem type “use code”. It will be more accurate & reliable since it uses the python arithmetic instead of mental arithmetic of the language model (“Model need token to think”). By using “code” it uses another part of the model where program is executed and just brings the result to the interactive screen
(4) As a subset of earlier point, ChatGPT is not good in counting. Ask “how many dots are in the below ……..” don’t be surprised if the answer is wrong. It is going to try counting the total number of dots which may have got split into different tokens. When we say “use code” it will calculate using Python loop function.
Well Andrej gives an example that it was quite a popular joke that ChatGPT was not able to correctly count the number of 'r" in the word "strawberry" until recently. He remarks that it is no longer doing that mistake -perhaps got fixed by the ChatGPT team.
To summarize, ChatGPT and other LLMs can be effectively used if we understand them better and use them wisely.
*********************************************************************************
So that was a quick summary of Andrej's video and hope you don't all the details / references that he provides in his video.
Btw, the second reason for this post is that today is my birthday and this post is just an expression of my bliss about today
Regards // Suren
No comments:
Post a Comment