Recognizing Floral Diversity Remains a Human Stronghold: Decoding Flower Varieties

Human Capabilities that Surpass AI: Recognizing Blossoms and Petals

In a groundbreaking study published in Nature Human Behaviour, researchers have highlighted a significant gap in the understanding of physical concepts by large language models (LLMs) such as ChatGPT. The study, led by Qihui Xu, a postdoctoral researcher in psychology at Ohio State University, found that while LLMs can synthesize language and visual data to form object representations, they lack the sensory-embodied understanding found in humans.

The study involved four AI models, including OpenAI's GPT-3.5 and GPT-4, and Google's PaLM and Gemini. These models were compared to human participants in terms of their conceptual understanding of 4,442 words. The findings revealed a limitation in AI's ability to understand physical concepts, particularly those that involve sensory experiences.

Humans, for instance, rely on multidimensional sensory experiences to understand objects such as flowers. This includes colour, shape, texture, and emotional or cultural significance. On the other hand, LLMs predominantly use semantic and linguistic information, lacking direct sensory-motor experience. This means that while AI can bring vast linguistic knowledge and pattern recognition to human-centered applications, it may lack an intuitive grasp of sensory-rich, embodied experiences.

Empirical studies using behavioural and neuroimaging analyses have confirmed this disparity. While LLMs and multimodal LLMs show higher consistency with human judgments than visual-only models, there remain notable differences in which object dimensions they rely on for classification and similarity judgments. This suggests that AI does not yet fully replicate human perception of physical concepts like flowers.

The implications for AI-human interactions are significant. LLMs can assist and augment human understanding by bringing vast linguistic knowledge and pattern recognition, but they might lack intuitive grasp of sensory-rich, embodied experiences. This gap indicates that while AI can communicate effectively about objects and support tasks involving conceptual knowledge, humans remain essential for contexts requiring multisensory integration, emotional nuance, and cultural depth.

Future development of AI systems might focus on instruction fine-tuning and enhanced multimodal training to better align AI object understanding with human cognitive dimensions, potentially improving collaboration where physical concept comprehension is crucial. AI might one day be able to better represent physical concepts via sensorimotor data and/or robotics, according to Xu.

The study compared the AI models' outcomes to two standard psycholinguistic ratings: the Glasgow Norms and the Lancaster Norms. The Glasgow Norms approach involved questions about the emotional arousal and imagery of certain words, such as a flower. The Lancaster Norms involved questions about sensory perceptions and bodily actions related to the words, such as the smell and torso experience of a flower.

In conclusion, while AI models like ChatGPT make impressive strides in understanding physical concepts such as flowers by synthesizing language and visual data, their conceptualizations remain largely semantic rather than sensory-embodied like humans'. Recognising this difference is vital for leveraging AI effectively in human-centered applications and advancing AI toward more human-like cognition. Xu concluded that the human experience is far richer than words alone can hold.

The study involving four AI models, GPT-3.5 and GPT-4 from OpenAI, PaLM and Gemini from Google, revealed a significant disparity between AI's and humans' understanding of physical concepts, especially those involving sensory experiences.
Humans perceive objects such as flowers through multidimensional sensory experiences like color, shape, texture, and emotional or cultural significance, while AI predominantly utilizes semantic and linguistic information, lacking direct sensory-motor experience.
Despite showing higher consistency with human judgments than visual-only models, AI fails to fully replicate human perception of physical concepts like flowers, demonstrating a reliance on different dimensions for classification and similarity judgments.
Xu, the study's leader, suggested that future development of AI systems may involve instruction fine-tuning and enhanced multimodal training to align AI object understanding more closely with human cognitive dimensions, potentially improving collaboration in contexts requiring physical concept comprehension.
The study compared the AI models' outcomes to standard psycholinguistic ratings, such as the Glasgow Norms and the Lancaster Norms, which focused on emotional arousal, imagery, sensory perceptions, and bodily actions related to certain words, like a flower.