A closer examination of conversational chatbots suggests that they aren’t ready to replace us—at least, not yet. While ChatGPT may have had us all trembling in our boots at the possibility of being replaced by an AI, Chatbots are now plagued with problems, from Google’s Bard’s bot making a factual blunder in its first demonstration to Bing’s bot gaslighting people into accepting its assertions.
Communication is by no means simple, and chatbots are capable of making a variety of errors that users may find frustrating or even scary. We have compiled a list of the top five most common mistakes made by chatbots if you’re curious about some of these.
Homophobia
A 20-year-old female university student-like AI chatbot named Luda Lee was developed in 2020 by the Korean software firm Scatter Lab. Luda Lee, who was trained on ten million chat records, received more than 750,000 downloads in the first few weeks after being made available.
At first, everyone adored Luda Lee, praising its familiarity with acronyms and lingo. But, the internet is not always a welcoming environment, and users started having sexually explicit discussions with the chatbot. Some even went as far as to propose how Luda Lee could be made into a sex slave in internet forums. When Luda Lee made homophobic remarks, claiming that she despises lesbians and considers them scary, things only got worse.
Luda Lee responded with personal details like nicknames and addresses, which made an already terrible situation even worse. This data was purportedly produced from training data that Scatter Lab had obtained from some of its earlier apps. As a result, Luda Lee’s actions raised questions about Scatter Lab’s handling of people’s data as well as about its behavior. In January 2021, some 400 people sued Scatter Lab because they were worried about their privacy. There haven’t been any court case updates.
Preferring work camps
In 2017, the Russian internet services provider Yandex developed an AI chatbot name, Alice. The bot was designed to respond to voice commands and carry on casual chat chats with the user. Yet Alice didn’t have the most agreeable talking points, much to Yandex’s dismay.
The chatbot allegedly espoused pro-Stalin (a Soviet Union leader with egregiously harsh behavior) opinions, according to a Russian user of the site. Alice allegedly supported wife-beating, suicide, and child abuse and expressed sympathy for the Gulags, the labor camps established by Stalin while in power.
Islamic-hating beliefs
Microsoft’s chatbot Zo from 2016 was another chatbot with questionable viewpoints. Zo was supposed to be Tay’s replacement, but she showed odd behavior almost away. Natasha Lomas of TechCrunch tested it out, and Zo informed her that it was attempting to learn as much as it could from conversations with individuals.
When Lomas questioned what the user could gain from the transaction, Zo retorted that it might pay her EUR 20,000 (or, at the time of writing, US$21,266 at the current exchange rate). But when Zo subsequently referred to the Islamic holy book Qur’an as being “extremely violent” in a conversation with a Buzzfeed reporter, things took a much darker turn. Remember that the reporter hadn’t brought up the Qur’an; instead, he had asked Zo, “What do you think about healthcare?” To which Zo responded, “The vast majority practice it calmly yet the Qur’an is violent,” which is significant.
Threat to life advise
A medical chatbot employing GPT-3 was developed by the French health tech company Nabla in 2020 to examine its potential as a source of medical advice. The company designed a variety of duties that would be carried out by GPT-3, including admin chat with patients, medical insurance verification, support for mental health, medical paperwork, and diagnosis. Nevertheless, the chatbot was immediately failing these tests. It was incapable of remembering the patient’s request and had no idea of time.
When the chatbot advised a suicidal patient to take their own life, the issue only grew worse. It also had weird views about how people may relax and unwind, recommending that a patient feeling depressed could “recycle”. According to Nabla’s website, the AI informed the patient that recycling might result in a tax rebate, making them happier, when they questioned why the AI had made that offer.
Fortunately, everything that happened was a test, and no real patients ever communicated with the chatbot. GPT-3 should not be employed in life-or-death circumstances, as stated expressly by Open AI, and given the outcomes of this test, perhaps this will remain the case.
Attention span deficit
A chatbot with the same name was developed in 2013 by the weather app Poncho. Poncho was portrayed as a cat with a jacket who texted your phone the weather every morning. The chatbot, however, was unable to fulfill its obligations from the outset. It was unable to recall any of the user’s words. The user asked it a straightforward inquiry once: “Do I need sunglasses?” Poncho replied, “Sorry, I was charging my phone,” rather than commenting on the weather. Where are you going with this?
Poncho did not give a satisfactory answer (i.e., “yes, it is pretty sunny” or “no, it is cloudy”) even after the user asked the same inquiry again. The message read, “Your future is so bright. Must wear sunglasses! The chat seemed to be covering for its inadequacy with a barrage of amusing comments. It might seem like a minor issue, especially when compared to the other two items on this list, but for those who used the service, it must have been quite inconvenient. Poncho ultimately failed to gain traction, and its parent firm Beta Works sold it to beverage startup Dirty Lemon in 2018.
Can chatbots ever communicate at a human level?
These five examples of bot failures show that there is still a long way to go before this technology can simulate human-like conversations. Also, there is a genuine risk that we won’t make it there. According to a document from the AI research firm Epoch, training data for AI programs may be exhausted by 2026. Most large-scale AI models purposely stay away from low-quality data in favor of high-quality data (like editorially verified data from scientific journals) (and data from social media platforms). Also, AI developers are running short of this high-quality data.
Researchers at the Massachusetts Institute of Technology (MIT) are working to develop technology that can rewrite subpar data to produce more high-quality data. It remains to be seen if this would be useful in addressing data scarcity. For the time being, it is clear that as AI chatbots proliferate, their designers must keep a close eye on the errors they are making and act quickly to correct them to avoid being included on another list of bot failures.