Language barriers remove humans from the loop, AI smashes AI-proof tests, and using AI atrophies human cognitive functions
Midjourney v6.1 AI talking to AI at fast pace, humans ignored
The adoption of AI in the workplace is proceeding at pace whether companies want it or not. The FT highlights the trend of people paying for AI tools to help at work, At work, a quiet AI revolution is under way, and highlights how AI is helping many people write better or express themselves clearer. It asks does it matter if a birthday note is penned by AI or is it still the thought that counts?
Google reveals “Co-Scientist” that could lead to science breakthroughs, Google reveals ‘Co-Scientist’ AI it says could lead to huge research breakthroughs | The Independent in a great demonstration of how AI is working alongside people to enhance human ingenuity.
Do you rely upon SatNav to get you from A to B when driving? Microsoft and Carnegie Mellon research found that where automation deprived workers from the opportunity to use their judgement it left their cognitive function “atrophied and unprepared” to deal with anything beyond the routine. Microsoft says AI tools such as Copilot or ChatGPT are affecting critical thinking at work – staff using the technology encounter 'long-term reliance and diminished independent problem-solving' | ITPro A potential worrying challenge for military officers expected to use critical judgement in high pressure moments.
OpenAI has released a new model, o3, that has smashed traditional AI performance metrics The Dawn of a New Era: OpenAI’s o3 Model Surpasses the Best of Us. Key measures show it achieved:
25.2% on FrontierMath, a collection of ultra-hard problems that stump even professional mathematicians. Previously AI scored 2%;
96.7% accuracy on AIME Math Test, correctly answering 14 out of 15 questions. Exceptional humans consider 10 correct answers as notable.
87.7% at Graduate Level Science in the GPQA Diamond, with PhD Experts scoring around 70% in their field. Typically, AI has failed in this examination which is “Google proof” involving novel problems and reasoning rather than memorised knowledge.
2727 Codeforces ELO score in competitive coding, putting o3 in the top 200 coders globally and achieving higher scores than the team that built the model.
Reasoning score of 88% against the ARC-AGI Benchmark, an assessment created to prove the limitations of AI with the test assessing logic, reason, and intuition. Typical humans score 70%.
The last score, ARC-AGI, is the most impressive. From 2020 to 2024, LLMs struggled in this assessment, and only achieved 4% by 2024. By the end of 2024 it had creeped up to 35%, but in just 3 months it has achieved 88% and almost making the test no longer relevant as a way of proving the difference between humans and AI.
Finally, as AI increasingly sees humans as the slow point in their progress, one team has developed a way for chatbots to interact with each other at a faster pace Two AI chatbots speaking to each other in their own special language is the last thing we need | TechRadar Science fiction has long suggested that robots would not use human language to communicate as it was too slow (think R2D2 in Star Wars) and this could be the first step in faster AI to AI communications. The human could be removed from the loop as they no longer talk the language used in the loop.