OpenAI is an Artificial Intelligence (AI) research lab that was founded in 2015 by Elon Musk, Sam Altman, and others. In 2019, OpenAI received a US$1 billion investment from Microsoft.
GPT-3 is a cutting edge AI model from OpenAI, trained on a dataset of 300 billion tokens of English text with "10x more than any previous non-sparse language model," according to the OpenAI team. This training set comes from the internet and includes news articles, Shakespeare and more. OpenAI's GPT-3 is arguably the most advanced text generation system, surpassing OpenAI's GPT-2 and BERT (which is open source).
GPT stands for Generative Pre-trained Transformer. In simplest terms, GPT-3 takes a text prompt in plain english natural language via an API, or set of instructions, and uses neural networks to return a text output. You can also prime it for specific tasks using a few examples as training data.
Several startups and Github projects have used GPT-3 to build basic chatbots to see if they can pass the Turing test. However, even though OpenAI API generates text that is grammatically correct, the text sometimes lacks common sense.
Chatdesk is one of the first companies to use the GPT-3 api for a real world production use case not just a playground scenario.
Chatdesk Trends is a free dashboard that automatically analyzes customer feedback across email, chat, social media, reviews, surveys and more. We use Natural Language Processing (NLP) and Deep Learning algorithms to automatically tag messages in real time. We have a machine learning model and alogrithm for each industry that we serve including Fintech, Fashion, Beauty, Pets, Airlines and more. The dashboard can be used by anyone in an organization including Customer Experience, Marketing, Fulfillment, Product, etc. You can have unlimited users on your free dashboard and no python, data science, HTML or SQL experience is needed on your team!
For Chatdesk Trends, we use the GPT-3 model to help companies better understand their customers by highlighting or summarizing insights from their customer feedback.
For example, one of our customers, The Black Tux, gets a lot of messages about sizing of clothing. Here’s how we use GPT-3 to summarize those messages:
First, all the messages are indexed in our semantic search engine, which means we can quickly search for messages that mention “sizing”, as well as related words like “sizes” or “size.”
However, not all of these messages will be feedback that’s useful to include in our summary.
Here are some examples:
"Do you have that in my size?"
“The new size was supposed to arrive Tuesday”
We don’t want to feed these messages to GPT-3 because (1) they’re not really feedback we are trying to summarize, and (2) there is a token limit to what you can summarize on OpenAI, so we have to be judicious in which messages we pick to be summarized.
As a result, we score messages on how likely they are to be “useful feedback.”
In this particular case, our scoring system ranks messages that are questions lower. Messages that are only about sizing are also scored higher than messages that are really about deliveries, but happen to mention the word “size.” This can be done using a combination of text classification models and keyword searches. This will naturally score messages like the following higher:
"Sizing was perfect."
"Nearly every person had incorrect sizing in some shape or form."
"I’m concerned going one size up might not do it for comfort."
We then take the highest scoring messages and use OpenAI’s text completion in “davinci-instruct-beta” mode to generate a summary of those selected messages. So far we’ve found this “mode” or engine to work well for our task, but the best mode will depend on the exact task.
The model also has parameters that you can use for fine tuning like temperature or top_p, which control how “creative” the model gets, as well as the logit_bias parameter, which we use to ban certain words from being used in the summary. In our case, this is useful because text from customers will often contain words like “I” or “he” or “she” and we don’t want those to appear in summaries.
Where the model really shines is in highlighting problems that customers report. For example, one of our customers, BarkBox, is a subscription company that mails out a box of dog toys and treats to their customers each month, and they want to know if any toys or treats are hazardous to their customers’ dogs.
Here’s an example where GPT-3 summarized some of the messages that mentioned “hazard”:
"Some customers think the stuffing and strings may become a choking hazard. The xo toy from the valentine's box has proven to be a safety hazard."
Here BarkBox can quickly get useful feedback without having to comb through all the customer messages that contained “hazard.”
Now, we shouldn’t mistake GPT-3 with general intelligence that actually understands text. Here was some other feedback from another customer of ours - Thinx, a period underwear brand:
"The boyshorts work much better for me for sleep."
"I wish the boyshort came in a high waist option."
"You definitely can still use a pad but it doesn’t improve the leak proof ness of it unless it’s a high coverage style like the boyshort."
Here’s the summary GPT-3 produced for this feedback:
"Customers seem to like the boyshorts for sleep. They also want a high-waisted option and they say it doesn't work well as a pad replacement unless it is a higher coverage style."
Although GPT-3 did admirably well in mentioning that customers liked the boyshorts for sleep and wanted a high-waist option, it did not quite get the meaning right about "higher coverage style."
As generative language models improve, and as we learn more about what they can or can’t do, the possibilities for companies to better understand their customers will only grow. At Chatdesk, we’re excited to be at the forefront of that effort.