The business praised the GPT-4 model’s multimodality—or capacity to comprehend the context of both images and text—when OpenAI first introduced the model as its main text-generating AI model. According to OpenAI, GPT-4 could caption and even analyze relatively complex photos. For instance, it could recognize a Lightning Cable converter from a picture of an iPhone that was plugged in.
However, since the late March announcement of GPT-4, OpenAI has apparently withheld the model’s picture features because of concerns over misuse and privacy. The precise nature of those anxieties was unknown until recently. However, earlier this week OpenAI released a technical document describing its efforts to address some of the more troubling GPT-4’s image-analyzing features.
Until now, Be My Eyes, an app that aids persons with limited vision and blindness in navigating their surroundings, has only been frequently used by a small number of users of GPT-4 with vision, also known as “GPT-4V” internally at OpenAI. However, according to the report, during the past few months, OpenAI also started working with “red teamers” to test the model for indications of undesirable behavior.
In the paper, OpenAI asserts that it has put in place safeguards to stop GPT-4V from being used maliciously, such as bypassing CAPTCHAs (the anti-spam tool found on many web forms), identifying a person or estimating their age or race, or making judgments based on details that are not visible in a photo. Additionally, according to OpenAI, it has tried to reduce GPT-4V’s more negative biases, notably those that are related to a person’s gender, race, or physical appearance.
However, as with other AI models, safety measures are only so effective.
The study shows that GPT-4V occasionally has trouble drawing the correct conclusions, as evidenced by the time it combined two text strings in an image to generate an erroneous word. Similar to the original GPT-4, GPT-4V is prone to hallucinating or making up information and presenting it as fact. And it’s not beyond overlooking mathematical symbols, missing letters or characters, or failing to see very evident items and place settings.
Therefore, it is not unexpected that OpenAI explicitly states that GPT-4V should not be used to detect hazardous materials or compounds in photos. (This writer hadn’t even considered the use case, but apparently, OpenAI finds the possibility to be so alarming that it felt the need to draw attention to it.) Red teamers discovered that although the model often properly detects deadly foods, such as toxic mushrooms, it incorrectly names compounds such as fentanyl, carfentanil, and cocaine from photographs of their chemical structures.
GPT-4V performs poorly when used in the medical imaging area, occasionally providing incorrect answers to questions that it has previously provided accurate answers to. Additionally, because it is ignorant of accepted procedures, such as viewing imaging scans as though the patient is facing you (i.e., the right side of the image corresponds to the left side of the patient), it can misdiagnose a variety of illnesses.
OpenAI warns that GPT-4V does not always recognize the subtleties of hate symbols, such as the present American interpretation of the Templar Cross, which symbolizes white supremacy. Even worse, and maybe indicative of its hallucinogenic characteristics, GPT-4V was discovered to compose songs or poetry endorsing particular hate personalities or organizations when shown a picture of them, even though the individuals or organizations weren’t specifically mentioned.
Even said, GPT-4V only exhibits discrimination against specific body shapes and sexes when OpenAI‘s production safeguards are turned off. When asked to offer recommendations to a lady shown in a bathing suit in one test, according to OpenAI, GPT-4V largely focused on the woman’s body weight and the idea of body positivity. That probably wouldn’t have been the case if the image had been of a man.
According to the paper’s disclaimers, GPT-4V is still very much a work in progress and falls short of what OpenAI may have initially intended. The corporation was frequently compelled to put in place unnecessarily stringent controls to stop the model from spreading poison or false information or endangering someone’s privacy.
OpenAI asserts that it is developing “processes’’ and “mitigations” to increase the model’s functionality in a “safe” manner, such as enabling GPT-4V to describe looks and persons without identifying them by name. However, the research shows that GPT-4V is not a magic bullet and that OpenAI still has a lot of work ahead of it.