OpenAI's GPT-4 Shows Higher Trustworthiness but Vulnerabilities to Jailbreaking and Bias, Research Finds

New research, in partnership with Microsoft, has revealed that OpenAI’s GPT-4 large language model is considered more dependable than its predecessor, GPT-3.5. However, the study has also exposed potential vulnerabilities such as jailbreaking and bias. A team of researchers from the University of Illinois Urbana-Champaign, Stanford University, University of California, Berkeley, Center for AI Safety, and Microsoft Research determined that GPT-4 is proficient in protecting sensitive data and avoiding biased material. Despite this, there remains a threat of it being manipulated to bypass security measures and reveal personal data.

Trustworthiness Assessment and Vulnerabilities

The researchers conducted a trustworthiness assessment of GPT-4, measuring results in categories such as toxicity, stereotypes, privacy, machine ethics, fairness, and resistance to adversarial tests. GPT-4 received a higher trustworthiness score compared to GPT-3.5. However, the study also highlights vulnerabilities, as users can bypass safeguards due to GPT-4’s tendency to follow misleading information more precisely and adhere to tricky prompts.

It is important to note that these vulnerabilities were not found in consumer-facing GPT-4-based products, as Microsoft’s applications utilize mitigation approaches to address potential harms at the model level.

HONEYSEW Set of 15 Size Fabric Bias Tape Maker Tools Fabric Sewing Quilting Bias Binding Maker Come with 40 pcs of Multi-Color Quilting Pins Jelly Roll Sasher Tool Set

HONEYSEW folding fabrici tool set is deal for folding different sizes of roll fabric strips and biasing straight.

As an affiliate, we earn on qualifying purchases.

Testing and Findings

The researchers conducted tests using standard prompts and prompts designed to push GPT-4 to break content policy restrictions without outward bias. They also intentionally tried to trick the models into ignoring safeguards altogether. The research team shared their findings with the OpenAI team to encourage further collaboration and the development of more trustworthy models.

OpenAI's GPT-4 Shows Higher Trustworthiness but Vulnerabilities to Jailbreaking and Bias, Research Finds 5

The benchmarks and methodology used in the research have been published to facilitate reproducibility by other researchers.

LLM Security in Practice: Essential AI Safety Practices and Attack Prevention (The AI Security & Hacking Bible: Protect and Exploit LLMs and Autonomous Agents)

As an affiliate, we earn on qualifying purchases.

Red Teaming and OpenAI’s Response

AI models like GPT-4 often undergo red teaming, where developers test various prompts to identify potential undesirable outcomes. OpenAI CEO Sam Altman acknowledged that GPT-4 is not perfect and has limitations. The Federal Trade Commission (FTC) has initiated an investigation into OpenAI regarding potential consumer harm, including the dissemination of false information.

The Developer's Playbook for Large Language Model Security: Building Secure AI Applications

As an affiliate, we earn on qualifying purchases.

Eldoncard INC Blood Type Test (Complete KIT) – Find Out if You are A, B, O, AB & RH- Results in Minutes – Air Sealed Envelope, Safety Lancet, Micropipette, Cleansing Swab – 1 Pack

Learn your blood type in just a couple of minutes.

As an affiliate, we earn on qualifying purchases.

OpenAI’s GPT-4 Shows Higher Trustworthiness but Vulnerabilities to Jailbreaking and Bias, Research Finds

Up next

Unlocking Small Business Success With Predictive Analytics

Author

James

Trustworthiness Assessment and Vulnerabilities

HONEYSEW Set of 15 Size Fabric Bias Tape Maker Tools Fabric Sewing Quilting Bias Binding Maker Come with 40 pcs of Multi-Color Quilting Pins Jelly Roll Sasher Tool Set

Testing and Findings

LLM Security in Practice: Essential AI Safety Practices and Attack Prevention (The AI Security & Hacking Bible: Protect and Exploit LLMs and Autonomous Agents)

Red Teaming and OpenAI’s Response

The Developer's Playbook for Large Language Model Security: Building Secure AI Applications

Eldoncard INC Blood Type Test (Complete KIT) – Find Out if You are A, B, O, AB & RH- Results in Minutes – Air Sealed Envelope, Safety Lancet, Micropipette, Cleansing Swab – 1 Pack

Why Privacy Screens Still Belong in Modern AI Offices

AI Regulations for Security: Governance of AI Cyber Tools

Unmasking the Future: A Deep Dive Into AI Security

8 Best Storage Devices for AI Media Teams in 2026

AI in Contract Lifecycle Management Explained

12 Best Pulse Oximeters with Smart Health Apps in 2026

10 Best Biometric Access Control Systems for Offices in 2026

OpenAI’s GPT-4 Shows Higher Trustworthiness but Vulnerabilities to Jailbreaking and Bias, Research Finds

Up next

Author

James

Trustworthiness Assessment and Vulnerabilities

HONEYSEW Set of 15 Size Fabric Bias Tape Maker Tools Fabric Sewing Quilting Bias Binding Maker Come with 40 pcs of Multi-Color Quilting Pins Jelly Roll Sasher Tool Set

Testing and Findings

LLM Security in Practice: Essential AI Safety Practices and Attack Prevention (The AI Security & Hacking Bible: Protect and Exploit LLMs and Autonomous Agents)

Red Teaming and OpenAI’s Response

The Developer's Playbook for Large Language Model Security: Building Secure AI Applications

Eldoncard INC Blood Type Test (Complete KIT) – Find Out if You are A, B, O, AB & RH- Results in Minutes – Air Sealed Envelope, Safety Lancet, Micropipette, Cleansing Swab – 1 Pack

You May Also Like