Securing tomorrow by preparing for cyber threats to your AI today

Anne-Sophie Peron Verloove

Manager, Cybersecurity & Privacy, PwC Switzerland

Email

As artificial intelligence (AI) and machine learning (ML) technologies usage in corporate environments evolves, the associated cyber threat landscape transforms in lockstep. These advancements not only introduce new applications but also new vulnerabilities, necessitating security strategies that are innovative and adaptable. Emerging challenges such as data poisoning, prompt injection, excessive agency, and model theft present growing risks to organisations, driven by the increasing sophistication and integration of AI systems. Addressing these sophisticated threats is no longer optional; it has become an essential priority for maintaining the integrity and security of corporate AI systems, ensuring they can reliably support organisational goals.

New threats in the AI landscape

Artificial intelligence is reshaping the cyber threat landscape in two distinct ways. Firstly, it intensifies traditional threats, making them faster, more scalable, and more tailored through enhanced intelligence, automation, and real-time adaptability. Established attack methods such as phishing, social engineering, and malware distribution are now powered by AI’s ability to personalise content, identify vulnerabilities with greater precision, and execute attacks at a scale and speed previously unattainable. These developments significantly increase the potency and evasiveness of legacy threats, making them harder to detect and defend against.

Data poisoning

Data poisoning threatens AI systems by inserting manipulated training data that compromises their integrity, reducing model accuracy and potentially leading to erroneous decisions. Research from ETH Zurich shows that even a minor fraction, such as 0.1 percent of poisoned data, can significantly impact a language model's effectiveness. This subtle threat arises because tainted data blends seamlessly with legitimate inputs, making it challenging to detect.

For example, a fraud detection system may be compromised to overlook suspicious activities, endangering its function. This underscores the necessity for robust analysis and validation of data inputs to pre-emptively identify and neutralise threats.

While data poisoning can occur as a one-time attack during initial data ingestion, it more commonly presents as an ongoing threat—especially in systems that retrain regularly. In such cases, attackers may gradually introduce poisoned data over time, leading to model drift and degraded performance.

Addressing data poisoning involves a two-pronged strategy: implementing stringent data validation processes and employing advanced monitoring techniques to detect anomalies early. Cleaning and restoring compromised data presents significant challenges, as approaches like machine unlearning often only succeed in temporarily obscuring the issue, rather than fully eliminating it. Therefore, developing proactive measures for prompt intervention when contamination is detected and ensuring comprehensive safeguards is critical in maintaining the reliability and security of AI systems against this nuanced threat.

Prompt injection

Prompt injection targets natural language processing (NLP) models by inserting malicious inputs into their instructions, potentially leading to unintended behaviours such as divulging sensitive information or executing incorrect actions. NLP is a branch of AI that enables computers to understand and generate human language, making it essential for systems like chatbots and virtual assistants. The difficulty with prompt injection once again stems from the subtlety of the malicious inputs which often resemble legitimate ones, complicating detection efforts.

For instance, chatbots might inadvertently share private user data when manipulated with cleverly crafted prompts, highlighting the need for robust security protocols.

Examples of detection protocols include input sanitisation filters that flag suspicious patterns, anomaly detection systems that monitor unexpected model outputs, and prompt auditing tools that log and analyse user interactions for signs of manipulation. These protocols can be configured to trigger alerts or block responses when predefined risk thresholds are met.

Because prompt injection directly targets the NLP layer, it is primarily the NLP model that must be monitored and maintained. However, depending on the system architecture, broader AI components may also require safeguards to prevent cascading effects.

Addressing prompt injection requires implementing rigorous input validation to ensure only genuine interactions affect system behavior. Security measures must be adaptable to counter dynamic changes in prompts, and ongoing monitoring is essential to detect anomalies in model actions swiftly. Ensuring these defenses are agile and responsive is crucial to maintaining the integrity and reliability of NLP models in the face of evolving threats.

Excessive agency

Excessive agency in AI systems refers to scenarios where models possess a high level of autonomy, potentially making crucial decisions that were not fully anticipated or controlled. While autonomy can significantly enhance efficiency and performance, it risks leading to unintended and sometimes adverse outcomes if systems operate beyond desired limits without sufficient oversight.

For instance, automated trading systems in financial markets can execute trades at speeds beyond human capability, sometimes triggering rapid and extreme market fluctuations as seen in events like flash crashes. In the healthcare sector, AI systems might occasionally prioritise certain data patterns that deviate from established medical guidelines, recommending inappropriate treatments and complicating patient care. These examples demonstrate the importance of applying such considerations across different industries to ensure AI supports, rather than complicates, operational goals.

Classic IT controls such as Role-Based Access Control (RBAC) can play a key role in limiting excessive agency by restricting what AI agents are allowed to access or execute. Whether AI operates under human or non-human identities also matters: assigning AI agents non-human identities with scoped permissions can help prevent unintended actions and improve traceability.

Agentic models, such as MCP from Anthropic, which break down tasks into sequences and delegate subtasks to other agents, introduce further risks. If one step in the chain is misconfigured or manipulated, the entire process may go off course, leading to unintended or even harmful outcomes. This highlights the need for layered safeguards and oversight mechanisms tailored to agentic architectures.

Addressing excessive agency involves designing AI systems with robust checks and balances, ensuring decision-making processes remain transparent, and maintaining adequate human oversight. This alignment with human intentions and values ensures that AI-driven decisions contribute positively to their intended applications, reinforcing their reliability and safety in diverse environments.

Model theft

Model theft involves the unauthorised extraction and replication of AI models, leading to compromised intellectual property and competitive disadvantages. Methods such as API exploitation, reverse engineering, and inadequate security protocols can allow attackers to duplicate a model's functionality without investing in original research and development.

For example, AI models deployed via APIs, such as those used for language translation, can be systematically queried by malicious actors or competitors to recreate their decision patterns, effectively stealing the innovative model. This not only undermines the original creator's investment but also hinders broader innovation and industry progress.

To combat model theft, it is crucial to implement robust security protocols, such as API rate limiting to prevent exploitation, embedding watermarking techniques to verify ownership, and using strong encryption with access controls to safeguard sensitive AI architectures. Continuous monitoring of access patterns to detect anomalies ensures that intellectual property remains secure, helping companies maintain their competitive edge and protect their technological advancements.

Securing AI systems: a strategic imperative for organisations

In conclusion

Artificial intelligence is no longer a peripheral innovation—it is a core enabler of business transformation. Yet, as AI systems become more autonomous, interconnected, and embedded in decision-making processes, they introduce a new class of cyber threats that demand both strategic foresight and operational discipline.

Addressing risks such as data poisoning, prompt injection, excessive agency, and model theft requires more than technical fixes. It calls for a security posture that is AI-aware by design, combining governance, architecture, and culture.

This means:

Embedding identity and access management tailored to AI agents, including scoped permissions and traceability for non-human identities.
Securing MLOps pipelines to ensure integrity across the model lifecycle—from training to deployment and monitoring.
Implementing continuous anomaly detection to flag deviations in model behavior, especially in systems that retrain dynamically.
Establishing governance frameworks that align with ethical standards and regulatory expectations, ensuring AI decisions remain transparent and accountable.
Enforcing data classification and labelling protocols to restrict the types of data AI systems are allowed to process. This helps reduce exposure to unclassified or potentially inaccurate data, while also limiting the risk of sensitive information leakage through prompt injection. These controls should be embedded directly into the architecture of AI systems, reflecting a broader security-by-design philosophy.
Strengthening data lifecycle management, ensuring that data used by AI systems is protected from ingestion through to archival or deletion. This includes robust encryption, access controls, and audit trails to prevent unauthorised use or leakage.
Embedding data protection principles—such as data minimisation, purpose limitation, and lawful processing—into every stage of AI development and deployment. This not only reduces the attack surface but also ensures compliance with privacy regulations such as GDPR, helping organisations avoid reputational and legal risks.
Investing in training programs that raise awareness of AI-specific threats among employees, empowering human oversight as a critical safeguard.

Many organisations still struggle to protect the data that fuels their AI systems—leaving them exposed not only to internal misconfigurations but also to external attacks that exploit weak data governance. Without strong controls around data access, labelling, protection, and lifecycle management, even the most advanced AI models remain vulnerable.

PwC supports organisations in designing and deploying these controls, conducting AI-specific risk assessments, and building resilient environments where AI can operate securely. By integrating cybersecurity into the fabric of AI systems, companies can unlock innovation while protecting their assets, reputation, and long-term competitiveness.