Difference between revisions of "Darwin2049/chatgpt4 version03"

From arguably.io
Jump to navigation Jump to search
Line 44: Line 44:


In order to address risks our plan is to describe or summarize:
In order to address risks our plan is to describe or summarize:
'''''<SPAN STYLE="COLOR:#0000FF">* impressions: </SPAN>''''' what has emerged so far; a few initial impressions are listed; we then list a few of the more recent impressions and warnings;
'''''<SPAN STYLE="COLOR:#0000FF">* impressions: </SPAN>''''' what has emerged so far; a few initial impressions are listed; we then list a few of the more recent impressions and warnings;
  '''''<SPAN STYLE="COLOR:#0000FF">* theory of operation: </SPAN>'''''we present a very condensed summary of how CG4 performs its actions;
  '''''<SPAN STYLE="COLOR:#0000FF">* theory of operation: </SPAN>'''''we present a very condensed summary of how CG4 performs its actions;

Revision as of 19:32, 3 August 2023

OPENAI.png

ChatGPT 4. The software company Open AI recently released its improved version of Chat GPT3 for public access. It is an easier to use version of the GPT-4 system. The result has been a massive upsurge in use worldwide. Within two months of March 14th 2023 release it had achieved a user profile of over 100 million users worldwide.

This was the shortest uptake in popularity of any software capability in history. That its rapid adaption into daily usage speaks very clearly to the fact that it has been recognized to be a qualitative technological advance over what has gone before.

What is well recognized about ChatGPT 4 (CG4) is that it is an improved version of the earlier ChatGPT 3. User reports corroborate that it is significantly more capable. Though it has shown certain unexpected irregularities a notable one being characterized as "hallucinating" when producing responses.

In what follows we posit that this new risk will become increasingly evident with the adaptation of this new technology. Our observations suggest that we should expect to see risks that are: systemic, malicious or theoretical. Combinations of these are possible as well.

By way of engaging in this topic we posit also that several basic factors obtain at all times; these are that:

  • CG4 is inherently dual use technology should always be kept clearly in mind. The prospect that totally unforeseen and unexpected uses will emerge must be accepted as the new reality. Like any new capability it will be applied toward positive and benign outcomes but we should not be surprised when adverse and negative uses emerge. Hopes of only positive outcomes are hopelessly unrealistic.
  • CG4 can be at best be viewed as a cognitive prosthetic (CP). It possesses no sense of self, sentience, intentionality or emotion. Yet it has demonstrated itself to be a more powerful tool for expression or problem solving. Possibly the next step forward above and beyond the invention of symbolic representation through writing.
  • CG4 has shown novel and unexpected emergent capabilities. These instances of emergent behavior typically were not foreseen by CG4's designers. However though they take note of this development they do not support the position that sentience, intentionality, consciousness or self reference is present. The lights certainly are on but the doors and windows are still open and there is still nobody home.
  • CG4 operates from within a von Neuman embodiment. As it is currently accessed has been developed and is hosted on classical computing equipment. Despite this fact it has shown a stunning ability to provide context sensitive an meaningful responses to a broad range of queries. However intrinsic to its fundamental modeling approach has shown itself as being prone to actions that can intercept or subvert its intended agreeable functionality. To what extent solving this "alignment" problem is successful going forward remains to be seen.
  • CGX-Quantum will completely eclipse current incarnations of CG4. Quantum computing devices are already demonstrating their ability to solve problems in seconds or minutes that classical von Neuman computing machines would take decades, centuries or even millennia.
  • Phase Shift. Classical physics describes how states of matter possess different properties depending upon their energy state or environment. Thus on the surface of the earth we can experience the gas of the atmosphere, the wetness of water and the solidity of an ice berg.

These all consist of the same elements. H2O. Yet properties found in one state, or phase bear little or no resemblance to those in the subsequent state. We should expect to see an evolution comparable as witnessing what happens when we see gas condense to a liquid state; then when we see that liquid state solidify into an object that can be wielded in one's hand. When systems such as DeepMind or CG4 and its derivatives are re-embodied in a quantum computing environment heretofore unimaginable capabilities will become the norm.

In what follows we pose several questions that focus on questions related to risk. Because of the inherent novelty of what a quantum computing environment might make possible the following discussion limits itself to what is currently known.

Therefore, the risks that we express concern about include:

* interfacing/access: how will different groups interact with, respond to and be affected by it; might access modalities available to one group have positive or negative implications for other groups;

* political/competitive: how might different groups or actors gain or lose relative advantage; also, how might it be used as a tool of control;

* evolutionary/stratification: might new classifications of social categories emerge; were phenotypical bifurcations to emerge would or how would the manifest themselves;

* epistemological/ethical relativism: how to reconcile ethical issues within a society, between societies; more specifically, might it provide solutions or results that are acceptable to the one group but unacceptable to the other group;

CG4 has demonstrated capabilities that represent a significant leap forward in overall capability and versatility beyond what has gone before. In order to attempt an assessment prospective risks suggests reviewing recent impressions at a later date as more reporting and insights have come to light.

CG4 has already demonstrated that new and unforeseen risks are tangible; in some instances novel and unforeseen capabilities have been reported.

It is with this in mind that we attempt here to offer an initial profile or picture of the risks that we should expect to see with its broader use.

In order to address risks our plan is to describe or summarize:

* impressions: what has emerged so far; a few initial impressions are listed; we then list a few of the more recent impressions and warnings;

* theory of operation: we present a very condensed summary of how CG4 performs its actions;
* risks that are: 

** systemic: these are inherent as a natural process of ongoing technological, sociological advance; ** malicious: who known actors categories are; how might they use this new capability; ** theoretical: or possible new uses that might heretofore not been possible; ** notes, references: we list a few notable portrayals of qualitative technological or scientific leaps;

Overview. Some observers early on were variously favorable and voiced possible moderate caution while others were guarded or expressed caution to serious fear. Less directly there were voices that ranged from caution to alarm going forward. Some early sentiment was expressed by:

* Dr. Jordan Peterson. During a live discussion before an audience Dr Peterson made reference to his experience of CG4. His remark indicated surprise and shock at its sophistication. He expected that it would possibly very revolutionary.

* Dr. Alan Thompson. Thompson regularly produces video summaries of recent events in the area of artificial intelligence. He frequently demonstrates the capabilities of CG4 via an avatar that he has named "Leta".

* Brian Roemmele. Brian Roemmele has been credited with the concept of what are called super prompts. These are ways of communicating with CG4 using significantly more effective ways of retrieving relevant results.

* Dr. de Grass-Tyson. A recent episode of the Valuetainment youtube channel was present with Patrick Bet-David to discuss CG4 wit Dr de Grass-Tyson. Some of the salient observations that were brought forward was that the state of artificial intelligence at this moment as embodied in the CG4 system is not something to be feared. Rather his stance was that it has not thus far demonstrated the ability to reason about various aspects of reality that no human has ever done yet.

He cites a hypothetical example of himself whereby he takes a vacation. While on the vacation he experiences an engaging meeting with someone who inspires him. The result is that he comes up with totally new insights and ideas. He then posits that CG4 will have been frozen in its ability to offer insights comparable to those that he was able to offer prior to his vacation. So the CG4 capability would have to "play catch up". By extension it will always be behind what humans are capable of.

Caution and concern was expressed by:

* Dr. B Weinstein.

* Dr. J. Bach.

* Dr. Karp (CEO - Palintir).

Caution to alarm has been expressed by:

* E. Yudkowski.

* E. Musk. Mr Musk recently enlisted 1100 other prominent personalities and researchers to sign a petition to presented to the US Congress regarding the risks and dangers of artificial intelligence.

Indirect alarm has been expressed by:

* R. Gertz.

* Dr. G. Allison.

* Amb. (ret.) M. Pillsbury.

* B. Gen (ret.) R. Spalding III.

* Dr. A. Chan & Hr. Ridley.

* Kai-Fuu Lee.

* E. Li.


By way of summarization some observers say that CG4:

  • is based upon and is a refinement of its predecessor, the Chat GPT 3.5 system;
  • has been developed using the generative predictive transformer (GPT) model;
  • has been trained on a very large data set including textual material that can be found on the internet;
  • unconfirmed rumors suggest that it has been trained on over 1 trillion parameters;
  • is capable of sustaining conversational interaction using text based input provided by a user;
  • can provide contextually relevant and consistent responses;
  • can link topics in a chronologically consistent manner and refer back to them in current prompt requests;
  • is a Large Language Models that uses prediction as the basis of its actions;
  • uses deep learning neural networks and very large training data sets;
  • uses a SAAS model; like Google Search, Youtube or Morningstar Financial;

Some Early Impressions

  • possess no consciousness, sentience, intentionality, motivation or self reflectivity;
  • is a narrow artificial intelligence;
  • is available to a worldwide 24/7 audience;
  • can debug and write, correct and provide explanatory documentation to code;
  • explain its responses
  • write music and poems
  • translation of English text to other languages;
  • summarize convoluted documents or stories
  • score in the 90% level on the SAT, Bar and Medical Exams
  • provide answers to homework,
  • self critiques and improves own responses;
  • provide explanations to difficult abstract questions
  • calibrate its response style to resemble known news presenters or narrators;
  • provides convincingly accurate responses to Turing Test questions;

As awareness of what how extensive CG4's capabilities came to light several common impressions were articulated. Several that seemed to resonate included impressions that were favorable, but in many instances there were impressions that were less so. Following are a few of those that can be seen leading many discussions on the system and its capabilities.

Favorable.

Convincingly human: has demonstrated performance that suggests that it can pass the Turing Test;

Possible AGI precursor: CG4 derivative such as a CG5 could exhibit artificial general intelligence (AGI) capability;

Emergent Capabilities: recent experiments with multi-agent systems demonstrate unexpected skills;

Language Skills: is capable of responding in one hundred languages;

Real World: is capable of reasoning about spatial relationships, performing mathematical reasoning;

Concerns.

knowledge gaps: inability to provide meaningful or intelligent responses on certain topics;

deception: might be capable to evade human control, replicate and devise independent agenda to pursue;

intentionality: possibility of agenda actions being hazardous or inimical to human welfare;

economic disruption: places jobs at risk because it can now perform some tasks previously defined within a job description;

emergence: unforeseen, possibly latent capabilities;

“hallucinations”: solution, answers not grounded in real world;

Contemporaneous with the impressions that lead many discussion were expressions of concern that this new capability brought inherent risks. In discussions of risk three main categories emerged that received much public attention. These risks broke down into possible new ways that it could be used for malicious purposes. Other discussions focused on more theoretical risks.

In other words, things that might be possible to do when using this tool. With any new technological development there were necessarily other risks that did not fall into either category, i.e. deliberate malicious use or possible or imagined uses that could represent either a benefit or a risk to society or various elements of society.

These risks might be considered to be more systemic risks. These are risks that arise innately as a result of use or adoption of that new technology. A case in point might be the risks of traffic accidents when automobiles began to proliferate. Prior to their presence there were no systematized and government sanctioned method of traffic management and control.

One had to face the risk of dealing with what were often very chaotic traffic conditions. Only after unregulated traffic behavior became recognized did various civil authorities impose controls on how automobile operators could operate.

Going further as private ownership of automobiles increased even further, vehicle identification and registration became a common practice. Even further, automobile operators became obliged to meet certain basic operations competence and pass exams that verified operations competence.

The impetus to regulate how a new technology recurs in most cases where that technology can be used positively or negatively. Operating an aircraft requires considerable academic and practical, hands on training.

After the minimum training that the civil authorities have demanded a prospective pilot can apply for a pilot's license. We can see the same thing in the case of operators of heavy equipment such as long haul trucks, road repair vehicles and comparable specialized equipment.

Anyone familiar with recent events in both the US and in various European countries will be aware that private vehicles have been used with malicious intent resulting in severe injury and death to innocent bystanders or pedestrians. We further recognize the fact that even though powered vehicles such as cars or trucks require licensing and usage restrictions they have still been repurposed to be used as weapons.

Recent Reactions. Since the most recent artificial intelligence systems have swept over the public consciousness sentiment has begun to crystallize. There have been three primary paths that sentiment have taken. These include: the voice of caution, the voice of action and the voice of preemption.

Columbus00.jpg

The Voice of Caution. Elon Musk and eleven hundred knowledgeable industry observers or participants signed a petition to the US Congress urging caution and regulation of the rapidly advancing areas of artificial intelligence. They voiced concern that the risks were very high for the creation and dissemination of false or otherwise misleading information that can incite various elements of society to action.

They also expressed concern that the rapid pace of the adaption of artificial intelligence tools can very quickly lead to job losses across a broad cross section of currently employed individuals. This topic is showing itself to be dynamic and fast changing. Therefore it merits regular review for the most recent insights and developments.

{20230718: SOMEHOW THE WORDING IS NOT QUITE EXPRESSING THE RISKS OF DOING NOTHING WHEN FACED WITH A RELENTLESS ADVERSARY SUCH AS THE CCP. MAKE THIS PART MORE FOCUSED IN SAYING THAT GOING UP AGAINST THE CCP AFTER HAVING STOPPED AI PROGRESS WOULD BE LIKE THE SANTA MARIA GOING UP AGAINST THE GERALD FORD AIRCRAFT CARRIER...

MOREOVER WORSE THAT THE CCP WILL COME AT THE WEST LIKE KING KONG GOING AFTER AN ANT HILL... )


The Voice of Action. In recent weeks and months there have been sources signaling that the use of this new technology will make substantial beneficial impact on their activities and outcomes. Therefore they wish to see it advance as quickly as possible. Not doing so would place the advances made in the West and the US at risk.

Which could mean foreclosing on having the most capable system possible to use when dealing with external threats such as those that can be posed by the CCP.

The Voice of Preemption. Those familiar with geopolitics and national security hold the position that any form of pause would be suicide for the US because known competitors and adversaries will race ahead to advance their artificial intelligence capabilities at warp speed. In the process obviating anything that the leading companies in the West might accomplish. They argue that any kind of pause can not even be contemplated.

Ford00.jpg

Some voices go so far as to point out that deep learning systems require large data sets to be trained on. They posit the reality that the PRC has a population of 1.37 billion people. A recent report indicates that in 2023 there were 827 million WeChat users in the PRC.

Further that the PRC makes use of data collection systems such as WeChat and TenCent. In each case these are systems that capture the messaging information from hundreds of millions of PRC residents on any given day. They also capture a comparably large amount of financial and related information each and every day.

WeChat. According to OBERLO.COM the PRC boasts of 827 million users of WeChat.

TenCent. Recent reporting by WebTribunal WeChat has over one billion users.

Surveillance. Reporting from Comparitech the PRC has over 600,000,000 surveillance cameras within a country of 1.37 billion people.

A known fact The result is that the Peoples Republic of China (PRC) has an almost bottomless sea of data with which to work when they wish to train their deep learning systems with. Furthermore they have no legislation that focuses on privacy.

The relevant agencies charged with developing these artificial intelligence capabilities benefit from a totally unrestricted volume of current training data.

This positions the government of the PRC to chart a pathway forward in its efforts to develop the most advanced and sophisticated artificial intelligence systems on the planet. And in very short time frames.

If viewed from the national security perspective then it is clear that an adversary with the capability to advance the breadth of capability and depth of sophistication of an artificial intelligence tool such as ChatGPT4 or DeepMind will have an overarching advantage over all other powers.

This must be viewed in the context of the Western democracies. Which in all cases are bound by public opinion and fundamental legal restrictions or roadblocks.


The Voice of Urgency. A studied scrutiny of the available reports suggests that there is a very clear awareness of how quantum computing will be used in the area of artificial intelligence.

Simply put, very little attention seems to be focused on the advent of quantum computing and how it will impact artificial intelligence progress and capabilities.

Kingkong00.jpg

What has already been made very clear is that even with the very limited quantum computing capabilities currently available, these systems prove themselves to be orders of magnitude faster in solving difficult problems than even the most powerful classical supercomputer ensembles.

If we take a short step into the near term future then we might be obliged to attempt to assimilate and rationalize developments happening on a daily basis. Any or even all of which can have transformative implications.

The upshot is that as quantum computing becomes more prevalent the field of deep learning will take another forward "quantum leap" - literally which will put those who possess it at an incalculable advantage. It will be like trying to go after King Kong with Spad S. - firing with bb pellets.

The central problem in these most recent developments arises because of the well recognized human inability to process changes that happen in a nonlinear fashion.

If change is introduced in a relatively linear fashion at a slow to moderate pace then most humans are able to adapt and accommodate the change. However if change happens geometrically like what we see in the areas of deep learning then it is much more difficult to adapt to change.


''''' DOWN TO RIGHT ABOUT HERE... THIS ALL SHOULD BE MOVED AHEAD OF THE THEOR OF OPERATIONS


}

CG4 – Theory of Operation: CG4 is a narrow artificial intelligence system, it is based upon what is known as a Generative Pre-trained Transformer. According to Wikipedia: Generative pre-trained transformers (GPT) are a type of Large Language Model (LLM) and a prominent framework for generative artificial intelligence.

The first GPT was introduced in 2018 by the American artificial intelligence (AI) organization OpenAI.

GPT models are artificial neural networks that are based on the transformer architecture, pretrained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

Generative Pre-Trained Language models are fundamentally prediction algorithms. They attempt to predict a next token or element from an input from the previous or some prior element. Illustrative video describing how the prediction process works. Google Search is attempting to predict what a person is about to type.

Generative Pre-Trained language models are attempting to do the same thing. But they require a very large corpus of langue to work with in order to arrive at a high probability that they have made the right prediction.

Fundamentals. Starting with the basics here is a link to a video that explains how a neural network learns.

From 3Blue1Brown:

Neural_Network_Basics

Gradient Descent

Back Propagation, intuitively, what is going on?

Back Propagation Theory

CG4 – What is it:

GooglePredict.jpg

Large Language Models are are attempting to predict the next token, or word fragment from an input text. In part one the narrator describes how an input is transformed using a neural network to predict an output. In the case of language models the prediction process is attempting to predict what should come next based upon the word or token that has just been processed. However in order to generate accurate predictions very large bodies of text are required to pre-train the model.

Part One. In this video the narrator describes how words are used to predict subsequent words in an input text. Part Two. Here, the narrator expands on how the transformer network is constructed by combining the next word network with the attention network to create context vectors that use various weightings to attempt to arrive at a meaningful result.

Note: this is a more detailed explanation of how a transformer is constructed and details how each term in an input text is encoded using a context vector; the narrator then explains how the attention network uses the set of context vectors associated with each word or token are passed to the next word prediction network to attempt to match the input with the closest matching output text.

Transformer.png

Generative pre-trained transformers are implemented using a deep learning neural network topology. This means that they have an input layer, a set of hidden layers and an output layer. With more hidden layers the ability of the deep learning system increases. Currently the number of hidden layers in CG4 is not known but speculated to be very large. A generic example of how hidden layers are implemented can be seen as follows.

The Generative Pre-training Transformer accepts some text as input. It then attempts to predict the next word in order based upon this input in order to generate and output. It has been trained on a massive corpus of text which it then uses to base its prediction on. The basics of how tokenization is done can be found here.


Tokenization is the process of creating the mapping of words or word fragments to their position in the input text. The training step enables a deep neural network to learn language structures and patterns. The neural network will then be fine tuned for improved performance. In the case of CG4 the size of the corpus of text that was used for training has not been revealed but is rumored to be over one trillion parameters.

Tokens00.png

They perform their magic by accepting text as input and assigning several parameters to each token that is created. A token can be a whole word or part of a word. The position of the word or word fragment. The Graphics in Five Minutes channel provides a very concise description of how words are converted to tokens and then how tokens are used to make predictions.

* Transformers (basics, BERT, GPT)[1] This is a lengthy and very detailed explanation of the BERT and GPT transformer models for those interested in specific details.

* Words and Tokens This video provides a general and basic explanation on how word or tokens are predicted using the large language model.

* Context Vectors, Prediction and Attention. In this video the narrator expands upon how words and tokens are mapped into input text positions and is an excellent description of how words are assigned probabilities; based upon the probability of word frequency an expectation can be computed that predicts what the next word will be.

DeepLearning.jpg

image source: IBM. Hidden Layers

Chat GPT4 is Large Language Model system. Informal assessments suggest that it has been trained on over one trillion parameters. But these suspicions have not been confirmed. If this speculation is true then GC4 will be the largest large language model to date.

According to Wikipedia: A Large Language Model (LLM - Wikipedia) is a Language Model consisting of a Neural Network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using Self-Supervised Learning or Semi-Supervised Learning. LLMs emerged around 2018 and perform well at a wide variety of tasks.

This has shifted the focus of Natural Language Processing research away from the previous paradigm of training specialized supervised models for specific tasks.

It uses what is known as the Transformer Model. The Turing site offers useful insight as well into how the transformer model constructs a response from an input. Because the topic is highly technical we leave it to the interested reader to examine the detail processing steps.

The transformer model is a neural network that learns context and understanding as a result of sequential data analysis. The mechanics of how a transformer model works is beyond the technical scope of this summary but a good summary can be found here.

If we use the associated diagram as a reference model then we can see that when we migrate to a deep learning model with a large number of hidden layers then the ability of the deep learning neural network escalates. If we examine closely the facial images at the bottom of the diagram then we can see that there are a number of faces.

Included in the diagram is a blow up of a selected feature from one of the faces. In this case it comes from the image of George Washington. If we are using a deep learning system with billions to hundreds of billions of parameters then we should expect that the ability of the deep learning model to possess the most exquisite ability to discern extremely find detail recognition tasks. Which is in fact exactly what happens.

We can see in this diagram the main processing steps that take place in the transformer. The two main processing cycles include encoding processing and decoding processing. As this is a fairly technical discussion we will defer examination of the internal processing actions for a later iteration.


Transformer00.png

The following four references offer an overview of what basic steps are taken to train and fine tune a GPT system.

"Attention is all you need" Transformer model: processing

Training and Inferencing a Neural Network

Fine Tuning GPT

General Fine Tuning

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ RESUME RIGHT HERE $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

CG4 – What is it: CG4 is a narrow artificial intelligence system, it is based upon what is known as a Generative Pre-trained Transformer. According to Wikipedia: Generative pre-trained transformers (GPT) are a type of Large Language Model (LLM) and a prominent framework for generative artificial intelligence. The first GPT was introduced in 2018 by the American artificial intelligence (AI) organization OpenAI.

GPT models are artificial neural networks that are based on the transformer architecture, pretrained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs. Generative Pre-Trained Language models are fundamentally prediction algorithms. They attempt to predict a next token or element from an input from the previous or some prior element. Illustrative video describing how the prediction process works. Google Search is attempting to predict what a person is about to type. Generative Pre-Trained language models are attempting to do the same thing. But they require a very large corpus of langue to work with in order to arrive at a high probability that they have made the right prediction.

Large Language Models are are attempting to predict the next token, or word fragment from an input text. In part one the narrator describes how an input is transformed using a neural network to predict an output. In the case of language models the prediction process is attempting to predict what should come next based upon the word or token that has just been processed. However in order to generate accurate predictions very large bodies of text are required to pre-train the model.

• Part One. In this video the narrator describes how words are used to predict subsequent words in an input text.

• Part Two. Here, the narrator expands on how the transformer network is constructed by combining the next word network with the attention network to create context vectors that use various weightings to attempt to arrive at a meaningful result. Note: this is a more detailed explanation of how a transformer is constructed and details how each term in an input text is encoded using a context vector; the narrator then explains how the attention network uses the set of context vectors associated with each word or token are passed to the next word prediction network to attempt to match the input with the closest matching output text. Generative pre-trained transformers are implemented using a deep learning neural network topology. This means that they have an input layer, a set of hidden layers and an output layer. With more hidden layers the ability of the deep learning system increases. Currently the number of hidden layers in CG4 is not known but speculated to be very large. A generic example of how hidden layers are implemented can be seen as follows.

The Generative Pre-training Transformer accepts some text as input. It then attempts to predict the next word in order based upon this input in order to generate and output. It has been trained on a massive corpus of text which it then uses to base its prediction on. The basics of how tokenization is done can be found here.

Tokenization is the process of creating the mapping of words or word fragments to their position in the input text. The training step enables a deep neural network to learn language structures and patterns. The neural network will then be fine tuned for improved performance. In the case of CG4 the size of the corpus of text that was used for training has not been revealed but is rumored to be over one trillion parameters.

They perform their magic by accepting text as input and assigning several parameters to each token that is created. A token can be a whole word or part of a word. The position of the word or word fragment. The Graphics in Five Minutes channel provides a very concise description of how words are converted to tokens and then how tokens are used to make predictions.

  • Transformers (basics, BERT, GPT)[1] This is a lengthy and very detailed explanation of the BERT and GPT transformer models for those interested in specific details.
  • Words and Tokens This video provides a general and basic explanation on how word or tokens are predicted using the large language model.
  • Context Vectors, Prediction and Attention. In this video the narrator expands upon how words and tokens are mapped into input text positions and is an excellent description of how words are assigned probabilities; based upon the probability of word frequency an expectation can be computed that predicts what the next word will be.

image source: IBM. Hidden Layers

Chat GPT4 is Large Language Model system. Informal assessments suggest that it has been trained on over one trillion parameters. But these suspicions have not been confirmed. If this speculation is true then GC4 will be the largest large language model to date. According to Wikipedia: A Large Language Model (LLM - Wikipedia) is a Language Model consisting of a Neural Network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using Self-Supervised Learning or Semi-Supervised Learning. LLMs emerged around 2018 and perform well at a wide variety of tasks. This has shifted the focus of Natural Language Processing research away from the previous paradigm of training specialized supervised models for specific tasks.

It uses what is known as the Transformer Model. The Turing site offers useful insight as well into how the transformer model constructs a response from an input. Because the topic is highly technical we leave it to the interested reader to examine the detail processing steps.

The transformer model is a neural network that learns context and understanding as a result of sequential data analysis. The mechanics of how a transformer model works is beyond the technical scope of this summary but a good summary can be found here.

If we use the associated diagram as a reference model then we can see that when we migrate to a deep learning model with a large number of hidden layers then the ability of the deep learning neural network escalates. If we examine closely the facial images at the bottom of the diagram then we can see that there are a number of faces. Included in the diagram is a blow up of a selected feature from one of the faces. In this case it comes from the image of George Washington. If we are using a deep learning system with billions to hundreds of billions of parameters then we should expect that the ability of the deep learning model to possess the most exquisite ability to discern extremely find detail recognition tasks. Which is in fact exactly what happens.

We can see in this diagram the main processing steps that take place in the transformer. The two main processing cycles include encoding processing and decoding processing. As this is a fairly technical discussion we will defer examination of the internal processing actions for a later iteration.

The following four references offer an overview of what basic steps are taken to train and fine tune a GPT system.

"Attention is all you need" Transformer model: processing

Training and Inferencing a Neural Network

Fine Tuning GPT

General Fine Tuning An Overview. If we step back for a moment and summarize what some observers have had to say about this new capability then we might tentatively start with that: • is based upon and is a refinement of its predecessor, the Chat GPT 3.5 system; • has been developed using the generative predictive transformer (GPT) model; • has been trained on a very large data set including textual material that can be found on the internet; unconfirmed rumors suggest that it has been trained on 1 trillion parameters; • is capable of sustaining conversational interaction using text based input provided by a user; • can provide contextually relevant and consistent responses; • can link topics in a chronologically consistent manner and refer back to them in current prompt requests; • is a Large Language Models that uses prediction as the basis of its actions; • uses deep learning neural networks and very large training data sets; • uses a SAAS model; like Google Search, Youtube or Morningstar Financial;


{20230729: THIS ELEMENT REQUIRES A FEW MORE PASSES; AT MINIMUM THEY SHOULD INCLUDE:

DEMARCATE THE MAJOR COMPONENTS INTO SELF CONTAINED SECTIONS; SECTIONS SHOULD LOGICALLY FOLLOW, ONE AFTER THE OTHER; DO A SANITY CHECK ON EACH SECTION; THIS MEANS THAT EACH SECTION SHOULD SUMMARIZE WHAT IT PURPOSE IN LIFE IS; IT SHOULD PICK UP FROM THE PREVIOUS SECTION AND "STATE ITS CASE"; VERIFY THAT THE EXAMPLES PROVIDED (WRITERS STRIKE, LUDDITES, HUMANS) SPEAK TO THE SPECIFIC ACTUAL PROBLEM AS PRESENTED; IN THE RISKS SECTION TIE THE EXAMPLES TOGETHER A BIT MORE CLOSELY - SPECIFICALLY THE PAPER PRESENTED BY THE LEADERS OF O.AI AND D.M... ADD A CRUCIAL RECENTLY IDENTIFIED RISK OF EMERGENCE AND ITS OPACITY; VERIFY THAT ALL IMAGES HAVE SOURCING INFORMATION IN THE DISCUSSION AND SYNTHESIS SECTION CHARACTERIZE EACH IN TERMS OF POSITIVE/NEGATIVE IMPACT/RISK; ADD A CONCLUSION SECTION THAT SPOTLIGHTS THE IMPACT OF QUANTUM COMPUTING ON ALL OF THE ABOVE; ONCE THIS HAS BEEN TIGHTENED AND "CLOSED UP" CONNECT IT TO "QUESTIONS-PART-2.0 WHY DOES DISCOURSE STOP WITH INTERFACE QUESTIONS? WHAT ABOUT EVOLUTION, POLITICAL AND EPISTIMOLOGICAL?}