The End of White Collar midwits
I predict a 90% White Collar career erasure in 3 years. And the language barrier won't protect you.
Physics teachers are not obsolete, yet:
G-Dawg: What has more energy (Joules)? The calories in a Mars bar, or the kinetic energy of an A-10's 30mm gun's projectile at the muzzle?
ChatGPT: To compare the energy of the calories in a Mars bar and the kinetic energy of an A-10's 30mm gun's projectile at the muzzle, we need to convert both to the same unit of energy, such as Joules.
According to the nutrition information on the Mars bar wrapper, it contains 230 calories. One calorie is equivalent to 4.184 Joules, so the Mars bar contains:
230 calories x 4.184 Joules/calorie = 964.52 Joules
The kinetic energy of a projectile can be calculated using the formula:
KE = 1/2 * m * v^2
where KE is the kinetic energy, m is the mass of the projectile, and v is its velocity.
The mass of a 30mm A-10 projectile is typically around 400 grams, or 0.4 kg. The muzzle velocity of the A-10's 30mm gun is about 1,044 meters per second.
Plugging these values into the kinetic energy formula, we get:
KE = 1/2 * 0.4 kg * (1,044 m/s)^2 = 222,294.72 Joules
Therefore, the kinetic energy of the A-10's 30mm gun's projectile at the muzzle (222,294.72 Joules) is much greater than the energy contained in the calories of a Mars bar (964.52 Joules).
Good answer, seemingly. Good grasp of the problem, definitely, and a well argued response. A teacher who never gets tired.
It would never leave him, and it would never hurt him, never shout at him, or get drunk and hit him, or say it was too busy to spend time with him. It would always be there. And it would die to answer him. Of all the would-be teachers who came and went over the years, this thing, this machine, was the only one who measured up. In an insane world, it was the sanest choice.
Regarding ChatGPT’s first mistake, I’m emphatic, we’ve all been here, the fatties reading this definitely:
G-Dawg: Are you sure about the Joules in the Mars bar? Those "Calories" might be "kcal".
ChatGPT: You are correct, I apologize for the error. The calories in food are often measured in kilocalories (kcal) rather than calories, and the Mars bar actually contains 230 kcal, not 230 calories.
One kilocalorie (kcal) is equal to 4.184 kilojoules (kJ), so the energy content of the Mars bar can be calculated as:
230 kcal x 4.184 kJ/kcal = 962.32 kJ
Therefore, the energy contained in the Mars bar is 962.32 kJ.
The kinetic energy of an A-10's 30mm gun projectile at the muzzle is still much greater than the energy contained in the Mars bar, with a value of 222,294.72 Joules, as calculated previously.
Okay, getting there. But you see, another human mistake, getting confused by decimals:
G-Dawg: Your conclusion is wrong. The A-10's 30mm gun projectile only has around 222 kJs of kinetic energy. That's significantly less than the Mars bar's energy content.
ChatGPT: I apologize for my mistake in the previous response. You are correct that the kinetic energy of the A-10's 30mm gun projectile at the muzzle is much lower than previously calculated. The actual kinetic energy can be calculated as follows:
The mass of a 30mm A-10 projectile is around 0.39 kg, and the muzzle velocity of the gun is around 1,044 m/s. Using the formula for kinetic energy, KE = 0.5 * m * v^2, we can calculate the kinetic energy of the projectile as:
KE = 0.5 * 0.39 kg * (1,044 m/s)^2 = 210,320.88 Joules
Therefore, the kinetic energy of the A-10's 30mm gun projectile at the muzzle is about 210 kJ, which is significantly less than the energy content of the Mars bar at 962.32 kJ.
Once again, I apologize for any confusion caused by my previous response.
Can you see the core of the problem? It’s the second mistake in the very first answer. More on it later.
ChatGPT
ChatGPT is based on a large language model (LLM), GPT-3, trained on a text corpus scraped from the web up until 2021. It was fine-tuned by Reinforcement Learning from Human Feedback (RLHF), that is, humans (outsourced and within OpenAI) weighted/pruned the neural network to provide less erratic results than GPT-3, wild and universal, would give you.
It’s fairly static (save for development iterations by OpenAI), each conversation starting from the same fixed state, and the flow of the conversation from your first prompt to the last reply being the memory that makes your instance unique.
ChatGPT, GPT-3 and LLMs in general are bad at arithmetic: their core goal, simplified to a single, short statement is to predict the next word in a sentence. All the magic you see interacting with ChatGPT (or Bing’s New Clippy) is the result of pursuing this very simple goal.
The internal state of a neural network cannot be explained by humans: it’s a black box. Individual components are understood, it’s software developed by humans, but a given state, a given weight in that massive, interconnected graph can’t be explained, can’t be reverse-engineered, can’t be changed on purpose.
A neural network’s internal state can only be explored through benchmarking: trying different training methods, datasets, and coming up with ways to score the results for comparison. AI research was and is empirical.
If you want a neural network to forget Hitler, you can’t just open the file (the model) containing it, ctrl-f for “Hitler” and delete him. You can train a new one on a dataset that has Hitler removed, or filter the output so Hitler gets removed. You’re massaging a black box.
Because of this, the main reason why the AI revolution is happening today is the available excess processing power allowing empirical research to pick up speed: LLMs could — and in my case, demonstrably did — run on hardware from 2009. My old, 2009 Mac Pro with a 6-core Xeon and 64 GBs of memory can happily train the largest GPT-2 model. It takes ages, but it's possible. To do empirical research, trial-and-error, it’s too slow: my methods of reproducing AI magic on this old hardware rely on methods perfected by brute force on late-2010s silicon.
If you want to point to a revolutionary piece of hardware that truly democratized this field, the Nvidia V100, a computational GPU with 16GB/32GB of RAM is a good starting point: released in 2017 (just like the paper that kickstarted this whole mess, more about that at the very end of this article), it delivered the speed and memory required to make the jump with LLMs that resulted in ChatGPT. Despite being a 6 years old chip, ChatGPT’s still running on V100s, or so it claims.
ChatGPT’s neural network has the size of 175 billion parameters (the GPT-2 model I trained for Hungarian poetry had 0.774 billion parameters. The largest GPT-2 model available has 1.5 billion parameters). Parameters to neural networks a bit like “Megahertz” was to the Millennium’s CPU race, a meaningful way to compare performance, but can be apples to oranges in certain cases. With this complexity, it probably requires around 400GB-800GB of RAM to train. That’s not a lot, that almost fits inside a normal personal computer, definitely fits inside a workstation. To converse with it — inference — less is required. Your own, personal, trainable ChatGPT could fit under your desk. [Update: similarly capable LLMs now run on a MacBook Air. This spring is wild.]
Going back to the second mistake in the first answer of ChatGPT regarding my question about kinetic energy, the root of this problem, lies in this line from the dummy “AI”:
KE = 1/2 * 0.4 kg * (1,044 m/s)^2 = 222,294.72 Joules
You see, ChatGPT is a Large Language Model. It doesn’t do arithmetic. It has an impression of arithmetic calculations from all the text it processed. A Casio calculator wrist watch from the 80s beats it in pure, precise arithmetic. [Update: Remember when ChatGPT sucked at math? (two weeks ago)]
The actual result of that calculation, using ChatGPT’s own numbers is:
(1 / 2) * (0.4 kg) * ((1044 (m / s))^2) = 217 987.2 joules
Also, it originally stated that the projectile’s mass is around 400 grams. But is it really? At one point it became 0.39 kg. I asked it to clarify.
Check out the cocky answer from machine teacher:
G-Dawg: Repeat this comparison, but try to look for a credible source for the projectile's mass!
ChatGPT: Certainly! After researching further, I found a more reliable source for the mass of a 30mm A-10 projectile. According to the U.S. Air Force's technical specifications for the GAU-8/A 30mm Gatling gun, which is used in the A-10 aircraft, the projectile weighs approximately 390 grams.
Using this more accurate value for the projectile's mass, the kinetic energy at the muzzle can be calculated as follows:
The muzzle velocity of the gun is around 1,044 m/s. Using the formula for kinetic energy, KE = 0.5 * m * v^2, we can calculate the kinetic energy of the projectile as:
KE = 0.5 * 0.39 kg * (1,044 m/s)^2 = 210,320.88 Joules
Therefore, the kinetic energy of the A-10's 30mm gun projectile at the muzzle is still approximately 210 kJ, which is significantly less than the energy content of the Mars bar at 962.32 kJ.
In this regard, this LLM is more human than AIs imagined by sci-fi authors. Why bother with the actual value, 390 grams, when the original 400 grams is kind of the same. It is STILL in the same ballpark, spergy human, but I’ve looked it up for you, since you insisted.
This is a proper answer from a teacher who would want to teach a method that allows you to handle real world problems.
My insistence on providing sources almost always improves dodgy answers from ChatGPT: this means a lot of hallucination can be cut down by “prompt engineering”, asking the right questions, setting up a proper context. Ask stupid questions, get stupid answers.
Regarding the shortcomings of Large Language Models doing proper arithmetic — and more — stemming from pure, probability-driven, hallucinatory word-concatenation, there’s already active research into solving this: check out this video about Meta’s efforts, skip to Prompt Pre-Training at 35:05. (I recommend the whole video (and Meta’s paper as well, linked in the video notes).) It’s a problem that will soon have a solution, and you can be sure that OpenAI is also working on something similar.
Going back to ChatGPT’s size, the 175 billion parameters: contained within this complexity (or to sound fancy, entropy) is an incredibly broad, general skillset. It speaks Hungarian, because why not? It’s not bullet-proof Hungarian, but it picked up enough to have a verbal IQ that beats most Hungarians (based on a lifetime of anecdotal evidence). Keep the same size, but train it — fine-tune it — to focus on a single language and a single field (law, medicine, history), have a bunch of bipedal warm bodies from that field supervise RLHF, and the result should be spectacular. The reason why I made it play a physics teacher is because the best humans to oversee the reinforcement learning of LLMs are humans who, by profession, teach. Teachers might be the first ones to put themselves out of work, all it takes is a few traitors. A given, field-specific version of an AI of ChatGPT’s capability can fit (trained, tuned, run) inside a workstation: the reality of having a box that fits under a desk that can teach chemistry to humans better than 90% of chemistry teachers, 99% of teachers, 99.999% of humans is just a matter of time, and that time is not measured in years, it’s measured in months. Might be tomorrow, for all we know. All the tools, all the hardware, all the software already exists, and this is just the safest assumption based on the version that’s already been released to the public.
Going back to ChatGPT as it is, an aimless Swiss army knife of general word concatenation, I also tested if it could replace the most annoying type of human I’ve had to put up with during my — semi-abandoned — career: the programmer.
Mein Programmer
Humans are stupid: I tell them one thing, perfectly stated, almost like a mathematical theorem, yet they interpret it the wrong way. Data corruption starts with their ears, some loss already happens there, selective hearing, and as time goes on, it gets worse as their memory gets “creative”. This is why I prefer written correspondence. I insist on written correspondence.
Humans lie. Humans make shit up. Humans are lazy. Humans are entitled.
Everything I’ve just stated is doubly true to programmers.
As a CTO I had to interact with programming humans. Some were brilliant, but even those were lazy and entitled. Most were all those negatives above. I couldn’t trust their code. Even if they wrote a unit test, I couldn’t trust the test.
Being on the same page was always a struggle. I often wondered if my instructions were too unspecific, if my communication wasn’t clear, if the context I provided wasn’t detailed enough.
Then came ChatGPT and all my doubts about myself are forever gone: my communication to my fellow humans, as it turns out, was perfect. I told them only what they needed to know to pull the task off. I know this, because that’s what I prompt ChatGPT, and it gives me the perfect result.
If it messes up a bit, I clarify it with an extra prompt, and 99% of the time it corrects it the way I want it to. You can’t trust it blindy; but neither can you trust humans. Every text needs and editor, every code, every change set needs a supervisor. It’s very rare to have a human subordinate, colleague, contractor who you can blindly trust to deliver everything as specified, as promised. How rare are those who you can? 1 in 10? The rest need constant supervision.
Interaction with ChatGPT, as a subordinate software engineer, is way more pleasant and fruitful than interacting with most human programmers that I had the misfortune to do so.
You might raise an objection and point to ChatGPT’s limitations regarding complexity, the scope of the task you can relegate to it: it’s quite atomic. It’s something that must fit into a chat, or more like a one-on-one, written email correspondence. Make a script that does this and that, goes from A to B, maybe on to C, but no more than that. That is true. But to be honest, 90% of the programmers I know can’t be tasked with successfully delivering an implementation from a specification, on time, if it’s more complex than this, anyway.
90% of programmers are only useful with tasks that ChatGPT can already do; increase their scope above it and you will invite pain in your life as a CTO, so much pain that you’re better off doing it yourself, or begging your brilliant 10 percenter — if you’re lucky to have one — to find some time tackling the problem. Within this limited scope, ChatGPT is already faster, smarter, and overall more pleasant to interact with than the Long Tail of White Collar workers, who are, in this field, 90% of programmers.
Not to mention: it’s also cheaper.
How long is the Long Tail in your White Collar profession?
A pleasant interaction on a fringe topic
I had a few questions, regarding my machine learning endeavor, for years. I had no humans to turn to: it was too specific, there were only a few dead forums, necroed Github comment sections, and academic papers that tangentially touched on the general topic. I might have been unlucky. I might have lacked the persistence.
So, eventually, I asked ChatGPT.
Here are excerpts from the conversation. You might get a glimpse about the experimental nature of AI training, pointed out by ChatGPT itself:
G-Dawg: I'm training a GPT-2 model, but it's not getting better
ChatGPT: Training a GPT-2 model can be a complex and time-consuming process, and there are many factors that can affect the quality of the model that you train. Here are a few things you can consider to improve the training of your GPT-2 model:
Increase the amount of training data: GPT-2 models are trained on large amounts of text data. If your training data is too small, your model may not learn enough patterns and generalizations to generate coherent and diverse text.
Adjust the model hyperparameters: There are many hyperparameters in GPT-2 that can be adjusted to improve the quality of the model, such as learning rate, batch size, number of layers, and number of attention heads. Tuning these hyperparameters can make a big difference in the quality of the model.
Use a pre-trained model as a starting point: …
G-Dawg: Elaborate on number 2
ChatGPT: When training a GPT-2 model, adjusting the model hyperparameters can have a significant impact on the quality of the resulting model. Here are a few hyperparameters that you may want to consider adjusting:
Learning rate: The learning rate determines how quickly the model updates its parameters in response to the training data. A higher learning rate can lead to faster convergence, but may also cause the model to overshoot the optimal parameters and perform poorly. A lower learning rate, on the other hand, can lead to more stable updates and better performance, but may also take longer to converge.
Batch size: The batch size determines how many training examples are processed at once. A larger batch size can lead to faster training times and more stable updates, but may also require more memory and result in less diverse updates. A smaller batch size, on the other hand, can lead to more diverse updates, but may also take longer to train and be less stable.
Number of layers: The number of layers determines the depth of the model. A deeper model can capture more complex patterns in the data, but may also be more prone to overfitting and require more training data. A shallower model, on the other hand, may be easier to train and less prone to overfitting, but may not be able to capture as much complexity in the data.
Number of attention heads: The number of attention heads determines how many different relationships the model can attend to in the data. A higher number of attention heads can capture more fine-grained relationships in the data, but may also require more computation and training data. A lower number of attention heads may be easier to train, but may not be able to capture as much detail in the data.
Dropout rate: Dropout is a regularization technique that randomly drops out some of the connections in the model during training. A higher dropout rate can prevent overfitting and lead to more generalizable updates, but may also make the updates more noisy and reduce the speed of convergence. A lower dropout rate may lead to faster convergence, but may also be more prone to overfitting.
Keep in mind that adjusting these hyperparameters requires careful experimentation and evaluation to find the optimal values for your specific use case. It's also important to remember that the optimal values for one hyperparameter may depend on the values of other hyperparameters, so it's a good idea to try adjusting multiple hyperparameters simultaneously.
G-Dawg: I'm using GPT2simple
GPT2simple is 3rd party solution by a single dev, probably deprecated by now. This — normally — means that you’ll have though luck getting answers from existing sources or by asking a new question, anywhere.
ChatGPT: GPT2simple is a library that provides a simple interface for fine-tuning and using GPT-2 models, which can make the process of training a GPT-2 model easier and more accessible. While GPT2simple provides a simpler interface than the underlying TensorFlow code, it still requires knowledge of the GPT-2 model and its hyperparameters to achieve good results.
To adjust the hyperparameters of the GPT-2 model using GPT2simple, you can pass them as arguments to the
finetune()
function. For example, to adjust the learning rate, you can pass thelearning_rate
argument tofinetune()
. Here's an example:[a perfect example, with parameters explained]
This conversation went on and on. It never entered a frustrating loop, never had a single misunderstanding. If ChatGPT stepped beyond the topic’s scope, I could scold it (“"prompt engineer” it) back in line, to stay on topic.
All of my answers were answered. I gained insight. It was fast, clean and pleasant. It was like talking to a very good, machine learning focused university professor, who is eager to help.
When it comes to discussing scientific theory, ChatGPT might be closer to the upper 1% than the upper 10%, by all regards, but mostly the overall pleasantness of the experience. Pleasantness. It can brainstorm with it in a way that very few humans could do, and it’s pleasant.
G-Dawg: What's the ideal thrust-to-weight ratio for the first stage of a space rocket?
ChatGPT: The ideal thrust-to-weight ratio (TWR) of the first stage of a space rocket depends on a number of factors, including the desired trajectory, payload mass, and engine efficiency. Generally, a higher TWR is desirable for the first stage of a rocket, as it allows the rocket to achieve the required velocity and altitude more quickly, reducing the amount of fuel required.
In general, a TWR of around 1.5 to 2.5 is considered a good range for the first stage of a space rocket. This means that the rocket's engines should produce 1.5 to 2.5 times the weight of the rocket in thrust. However, this can vary depending on the specific design and mission requirements of the rocket.
…
The pace of the erasure for a certain field depends on the available — high quality — corpus. Fortunately for machine learning, humans have been busy digitizing every White Collar field for the past 30 years. Using CAD was a great idea for architects, for a while; now all their past efforts become training material for an AI that can design buildings, interiors, on prompt. You will still need a senior, human architect to review and sign off on the output, but a whole army of mouse clicking apes under him will soon become redundant.
How long is the Long Tail in your White Collar profession?
Regarding the title of this post, 90% White Collar career erasure in 3 years, the main takeaways:
ChatGPT is way behind on what’s currently possible: it’s yesteryear’s software. AI doesn’t need to advance for 3 more years, it merely needs to pick up speed with the rollout schedule for what’s already privately/theoretically possible.
ChatGPT is way behind on what’s currently possible: it’s running on yesterdecade’s hardware. Hell, it could run on 2000s hardware. It can become a lot more powerful, a lot more democratic or both, just by the progress in hardware.
AI research is an empirical process with major open questions: don’t fool yourself by any currently perceived shortcomings, we’re very far from hitting any ceiling, any time soon.
The language barrier won’t protect you: LLMs are also transformers, so they naturally pick up the ability translate between languages from a global corpus, on the side. They can also be combined with a high-quality ML translator, like DeepL.
Interacting with a LLM is more pleasant, more fruitful, more truthful than interacting with 90% of your white collar subordinates/colleagues/contractors. It’s also quicker and cheaper.
ChatGPT is universal. If it already threatens your career in this state, just wait until similarly complex LLMs focusing on your field arrive.
ChatGPT — and similarly complex alternatives — doesn’t require a supercomputer.
Meta is already solving problems that are identified as ChatGPT’s major shortcoming on their own LLM project, and they publish the research.
How long is the Long Tail in your White Collar profession?
A comment I left at a substack post on the topic, summarizing my general attitude:
I expect the erasure of the long tail of White Collar jobs in the next 3 years thanks to machine learning.
Moving to digital, something humans' been doing for the past 30+ years ended up providing a corpus from copywriting to CAD that will result in the redundancy of the less senior positions: the long tail.
As a senior programmer, being forced to interact with average programmers, I don't mind this. Interaction with rudimentary implementations like ChatGPT is already a more pleasant experience than interacting with the human long tail of my profession.
I won't be alone. Professions may vary (lawyers will hold out the longest, thanks to their institutional entrenchment and power), but I'm not alone.
If you're a White Collar worker and prompting ChatGTP is a more pleasant and more productive experience than asking you: you're in the long tail. You have 3 years left at best.
I don’t live and work in San Francisco, so I can skip the platitudes about the ethics of AI on the job market. I live in Eastern Europe: I can say it as it is. Clarity above all!
Followup:
More on machine learning:
Update: GPT-4’s ability to self-reflect opens the way to unsupervised self-improvement, which is basically The Singularity:
Talking about human oversight: we have all these fancy new ML startups, how come no one picked up the brand, dormant since 1994, “Thinking Machines”?
G-Dawg: Explain the 2017 Google paper that OpenAI's GPT is based on
ChatGPT: The 2017 Google paper you are referring to is "Attention is All You Need" by Vaswani et al., which introduced the Transformer architecture for natural language processing (NLP).
Before the Transformer, most NLP models were based on recurrent neural networks (RNNs) or convolutional neural networks (CNNs), which have some limitations. RNNs suffer from vanishing gradients and are slow to train, while CNNs are not as good at modeling sequences.
The Transformer uses a novel attention mechanism that allows it to model dependencies between any pair of positions in a sequence, without relying on the sequential processing of RNNs or the fixed pattern matching of CNNs. The attention mechanism enables the model to attend to different parts of the input sequence as needed, depending on the task at hand.
The Transformer consists of an encoder and a decoder. The encoder reads in the input sequence and produces a sequence of hidden representations, while the decoder produces an output sequence based on the encoder's representations and the previously generated outputs. Both the encoder and decoder consist of multiple layers of self-attention and feedforward neural networks.
The success of the Transformer on various NLP tasks, including machine translation, language modeling, and text generation, has made it a popular architecture in the NLP community. OpenAI's GPT models are based on the Transformer architecture and have achieved state-of-the-art performance on several language tasks, including language modeling, question answering, and text completion.
G-Dawg: Provide sources
ChatGPT: Sure, here are some sources you can refer to for more information:
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008). Link: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
OpenAI blog post on GPT: https://openai.com/blog/language-unsupervised/
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems (pp. 1876-1901). Link: https://proceedings.neurips.cc/paper/2020/file/1457c3b325dd3ec3f9ec5f5bb8d5c5b5-Paper.pdf
Illustrated Transformer, a visual explanation of the Transformer architecture: http://jalammar.github.io/illustrated-transformer/
Attention and Transformers, a video lecture series by Stanford University's CS224n course on NLP:
I hope these resources are helpful!
Nice answer. Some links are dead though.
Wait a minute!
G-Dawg: Some links are dead
ChatGPT:
I apologize for the inconvenience. Here are some updated links for your reference:
"Attention is All You Need" paper: https://arxiv.org/abs/1706.03762
OpenAI blog post on GPT: https://openai.com/blog/better-language-models/
"Language Models are Few-Shot Learners" paper: https://arxiv.org/abs/2005.14165
"The Illustrated Transformer" article: http://jalammar.github.io/illustrated-transformer/
"Attention and Transformers" lecture series by Stanford University's CS224n course on NLP: https://www.youtube.com/playlist?list=PLoROMvodv4rObpMCir6rNNUlFAn56Js20
I hope these updated links are helpful!
Now they all work.
How long is the Long Tail in your White Collar profession?
90%? You don't have to be in the 90th percentile to be able to code a web app, which doesn't seem like something an LLM will be able to do in 3 years. How would that work? You think there will be an LLM where you just drop the spec into it and out comes the codebase? If you do it piecemeal, how do you ensure all the pieces cohere, that they fit into the architecture, basically, that it doesn't turn into a big pile of spaghetti? How do you make an LLM maintain previously existing software, which is where probably most programming effort goes into?
> I live in Eastern Europe: I can say it as it is.
Beams of appreciation.