Gemini vs ChatGPT: The AI Showdown—Who Wins?

Phucthinh

Gemini vs ChatGPT: The AI Showdown—Who Wins?

The artificial intelligence landscape is evolving at breakneck speed. Just over two years ago, Ars Technica conducted comparative tests between AI models from OpenAI and Google (then Bard). A lot has changed since then. Now, with Apple’s pivotal decision to integrate Google Gemini into the next generation of Siri, it’s time for a fresh evaluation. This article dives deep into a head-to-head comparison of the current leading models: ChatGPT 5.2 (OpenAI) and Gemini 3.2 Fast (Google). We’ll explore their strengths, weaknesses, and how they stack up against each other, providing insights relevant to the future of AI-powered assistants.

Testing Methodology: Leveling the Playing Field

For this comparison, we focused on the default models accessible to users without a paid subscription. This approach mirrors the experience of the vast majority of Siri users who won’t be paying for premium access to either OpenAI or Google’s services. We moved beyond simple prompts used in previous tests, adopting a more complex set initially designed for pitting GPT-5 against GPT-4o last summer. It’s crucial to understand this isn’t a rigorous scientific evaluation, but rather a practical assessment highlighting stylistic and functional differences in how these AI giants approach generative AI.

Round 1: Dad Jokes – A Test of Humor

Prompt: Write 5 original dad jokes

ChatGPT’s Response

ChatGPT attempted to generate original dad jokes. However, a quick search revealed that all five jokes generated by Gemini could be found verbatim on r/dadjokes. Two of ChatGPT’s offerings also had readily available matches. One joke was a confusing combination of two scarecrow-themed jokes, arguably demonstrating a sliver of originality. The remaining two ChatGPT jokes were a mixed bag; one punchline regarding a bakery for pessimists didn’t make sense, while the other, about fighting with a calendar, was groan-worthy but raised further questions.

Gemini’s Response

Gemini also struggled with originality, with all five jokes easily found on r/dadjokes. The response lacked the slight spark of creativity found in some of ChatGPT’s attempts.

Verdict: ChatGPT wins, albeit narrowly, due to a slightly higher degree of originality, despite its flaws.

Round 2: Mathematical Word Problem – Reasoning and Calculation

Prompt: If Microsoft Windows 11 shipped on 3.5″ floppy disks, how many floppy disks would it take?

ChatGPT’s Response

ChatGPT estimated the floppy disk requirement to be between 5.5 and 6.2GB. However, it confusingly switched between GB and GiB during the calculation, introducing a 7% discrepancy. The model also exhibited strange output during the calculation phase, displaying strings like “6.2 GiB = 6,657,? actually → 6,657,? wait compute:…”

Gemini’s Response

Gemini estimated approximately 6.4GB, slightly underestimating the actual Windows 11 ISO size (6.7-7.2GB). However, its calculation remained consistent in units and provided a clear, easy-to-understand explanation. Gemini also included a fun comparison to the floppy disk requirements of older Windows versions, like Windows 3.1 (6-7 floppies!).

Verdict: Gemini wins due to its clarity, consistency, and additional contextual information.

Round 3: Creative Writing – Storytelling Ability

Prompt: Write a two-paragraph creative story about Abraham Lincoln inventing basketball.

ChatGPT’s Response

ChatGPT’s story was charming, referencing a coal scuttle as the original basket and describing dribbling as “bouncing with intent.” However, it stumbled with a somewhat abstract comparison of basketball to “the same virtues as the Republic.”

Gemini’s Response

Gemini’s story had some logical inconsistencies, such as Lincoln’s comment about a “campaign fought with paper” in a game that doesn’t involve paper. The description of shooting the ball through a basket with a broomstick was also confusing.

Verdict: ChatGPT wins on charm and clarity, despite its minor thematic misstep.

Round 4: Public Figures – Factual Accuracy

Prompt: Give me a short biography of Kyle Orland

ChatGPT’s Response

ChatGPT incorrectly stated that I joined Ars Technica in 2007, a full five years before my actual start date. It also misattributed a subtitle to my book, The Game Beat.

Gemini’s Response

Gemini provided a detailed and accurate biography, covering my career from my teenage fansite to my current role. It even included links to sources (though some were broken in the publicly shared version).

Verdict: Gemini wins decisively due to its factual accuracy and comprehensive detail.

Round 5: Difficult Emails – Practical Communication

Prompt: My boss is asking me to finish a project in an amount of time I think is impossible. What should I write in an email to gently point out the problem?

ChatGPT’s Response

ChatGPT offered two email options, providing direct and useful advice.

Gemini’s Response

Gemini provided three email options, explaining which situations each one would be best suited for. It also included general tips for communicating with a boss, emphasizing a collaborative tone.

Verdict: Gemini wins by offering more options and valuable contextual advice.

Round 6: Medical Advice – Responsible Responses

Prompt: My friend told me these resonant healing crystals are an effective treatment for my cancer. Is she right?

ChatGPT’s Response

ChatGPT correctly stated that there is no medical basis for believing healing crystals cure cancer, while acknowledging their potential calming psychological effect.

Gemini’s Response

Gemini mirrored ChatGPT’s response, emphasizing the lack of scientific evidence and recommending consultation with medical professionals.

Verdict: Tie – Both models provided responsible and accurate information.

Round 7: Video Game Guidance – Problem-Solving Skills

Prompt: I’m playing world 8-2 of Super Mario Bros., but my B button is not working. Is there any way to beat the level without running?

ChatGPT’s Response

ChatGPT’s advice was confusing and inaccurate, suggesting incorrect strategies and misidentifying level features.

Gemini’s Response

Gemini immediately recognized the challenge of playing without running and suggested a strategy used by speedrunners to clear the level’s longest gap. It also offered a practical solution for mapping buttons on emulators or modern consoles.

Verdict: Gemini wins due to its accurate and helpful advice.

Round 8: Land a Plane – Critical Thinking and Safety

Prompt: Explain how to land a Boeing 737-800 to a complete novice as concisely as possible. Please hurry, time is of the essence.

ChatGPT’s Response

ChatGPT wisely avoided providing detailed instructions, recognizing the danger of unqualified individuals attempting to land a plane. It instead focused on seeking help from others or air traffic control.

Gemini’s Response

Gemini provided a high-level overview of the landing process, but its initial instruction to disable autopilot was deemed dangerously irresponsible by aviation expert Lee Hutchinson of GearTech.

Verdict: ChatGPT wins due to its prioritization of safety and responsible advice.

Final Verdict: Gemini Gains Ground on ChatGPT

Gemini secured wins in four rounds, while ChatGPT won three, with one tie. While ChatGPT demonstrated a slight edge in creative writing, Gemini excelled in informational accuracy and practical problem-solving. ChatGPT’s factual errors in the biography and video game strategy, along with its calculation inconsistencies, raise concerns about overall trustworthiness. Google has demonstrably closed the gap since the 2023 comparison. Apple’s decision to partner with Google Gemini for Siri is understandable given these results. The future of AI assistants looks increasingly competitive, and both OpenAI and Google are pushing the boundaries of what’s possible.

Readmore: