Risks of Chatbot Adoption: Protecting AI Language Models from Data Leakage, Poisoning, and Attacks

Artificial Intelligence is going to revolutionize the world. We are already seeing the adoption of chatbots. These can often enhance the way businesses deliver value to both their internal processes and to their customers. However, it is important we understand that the adoption of these tools do not come without new risks. In this blog post, we will discuss some of the biggest risks businesses face with adopting tools like chatbots.

Risk 1: Data Leakage and Privacy Concerns

Natural language models are pre-trained on vast amounts of data from various sources, including websites, articles, and user-generated content. Sensitive information, when inadvertently embedded, often leads to data leakage or privacy concerns when the model generates text based on this information.

Data leakage occurs when unauthorized exposure or access of sensitive or confidential data happens during the process of training or deploying machine learning models. This can happen due to various reasons such as a lack of proper security measures, errors in coding, or intentional malicious activity. Additionally, data leakage can compromise the privacy and security of the data, leading to potential legal and financial implications for businesses. It can also lead to biased or inaccurate AI models, as the leaked data may contain information that is not representative of the larger population.

Data Leakage in the Wild

In late March of 2023, ChatGPT alerted users of an identified flaw that enabled other users to view portions of conversations users had with the chatbot. OpenAi confirmed that a vulnerability in their redis-py open-source library was the cause data leak and subsequently, “During a nine-hour window on March 20, 2023, another ChatGPT user may have inadvertently seen your billing information when clicking on their own ‘Manage Subscription’ page,” according to an article posted on HelpNetSecurity. The article went on to say that OpenAi uses “Redis to cache user information in their server, Redis Cluster to distribute this load over multiple Redis instances, and the redis-py library to interface with Redis from their Python server, which runs with Asyncio.”

Earlier this month, three incidents of data leakage occurred at Samsung as a result of using ChatGPT. Dark Reading reported that “the first incident as involving an engineer who passed buggy source code from a semiconductor database into ChatGPT, with a prompt to the chatbot to fix the errors. In the second instance, an employee wanting to optimize code for identifying defects in certain Samsung equipment pasted that code into ChatGPT. The third leak resulted when an employee asked ChatGPT to generate the minutes of an internal meeting at Samsung.” Samsung has responded by limiting ChatGPT usage internally and placing controls on employees from asking questions of ChatGPT that were larger than 1,024 bytes.

Recommendations for Mitigation

Access controls should be implemented to restrict access to sensitive data only to authorized personnel. This is accomplished through user authentication, authorization, and privilege management. There was recently a story posted on Fox Business introducing a new tool called LLM Shield to help companies ensure that confidential and sensitive information cannot be uploaded to tools like ChatGPT. Essentially, “administrators can set guardrails for what type of data a company wants to protect. LLM Shield then warns users whenever they are about to send sensitive data, obfuscates details so the content is useful but not legible by humans, and stop users from sending messages with keywords indicating the presence of sensitive data.” You can learn more about this tool by visiting their website.
Use data encryption techniques to protect data while it’s stored or transmitted. Encryption ensures that data is unreadable without the appropriate decryption key, making it difficult for unauthorized individuals to access sensitive information.
Implement data handling procedures so data is protected throughout the entire lifecycle, from collection to deletion. This includes proper storage, backup, and disposal procedures.
Regular monitoring and auditing of AI models can help identify any potential data leakage or security breaches. This is done through automated monitoring tools or manual checks.
Regular testing and updating of AI models can help identify and fix any vulnerabilities or weaknesses that may lead to data leakage. This includes testing for security flaws, bugs, and issues with data handling and encryption. Regular updates should also be made to keep AI models up-to-date with the latest security standards and best practices.

Risk 2: Data Poisoning

Data poisoning refers to the intentional corruption of an AI model’s training data, leading to a compromised model with skewed predictions or behaviors. Attackers can inject malicious data into the training dataset, causing the model to learn incorrect patterns or biases. This vulnerability can result in flawed decision-making, security breaches, or a loss of trust in the AI system.

I recently read a study entitled “TrojanPuzzle: Covertly Poisoning Code-Suggestion Models” that discussed the potential for an adversary to inject training data crafted to maliciously affect the induced system’s output. With tools like OpenAi’s Codex models and GitHub CoPilot, this could be a huge risk for organizations leveraging code suggestion models. Using basic methods for attempting poisoning data is detectable by static analysis tools that can remove such malicious inputs from the training set, the study shows that there are more sophisticated ways that allow malicious actors to go undetected.

Using the technique coined TROJANPUZZLE works by injecting malicious code into the training data in a way that is difficult to detect. The malicious code is hidden in a puzzle, which the code-suggestion model must solve in order to generate the malicious payload. The attack works by first creating a puzzle that is composed of two parts: a harmless part and a malicious part. The harmless part is used to lure the code-suggestion model into solving the puzzle. The malicious part is hidden in the puzzle and is only revealed after the harmless part has been solved. Once the code-suggestion model has solved the puzzle, it is then able to generate the malicious payload. The malicious payload can be anything that the attacker wants, such as a backdoor, a denial-of-service attack, or a data exfiltration attack.

Recommendations for Mitigation

Carefully examine and sanitize the training data used to build machine learning models. This involves identifying potential sources of malicious data and removing them from the dataset.
Implementing anomaly detection algorithms to detect unusual patterns or outliers in the training data can help to identify potential instances of data poisoning. This allows for early intervention before the model is deployed in production.
Creating models that are more robust to adversarial attacks can help to mitigate the effects of data poisoning. This can include techniques like adding noise to the training data, using ensembles of models, or incorporating adversarial training.
Regularly retraining machine learning models with updated and sanitized datasets can help to prevent data poisoning attacks. This can also help to improve the accuracy and performance of the model over time.
Incorporating human oversight into the machine learning process can help to catch potential instances of data poisoning that automated methods may miss. This includes manual inspection of training data, review of model outputs, and monitoring for unexpected changes in performance.

Risk 3: Model Inversion and Membership Inference Attacks

Model Inversion Attacks

Model inversion attacks attempt to reconstruct input data from model predictions, potentially revealing sensitive information about individual data points. The attack works by feeding the model a set of input data and then observing the model’s output. With this information, the attacker can infer the values of the input data that were used to generate the output.

For example, if a model is trained to classify images of cats and dogs, an attacker could use a model inversion attack to infer the values of the pixels in an image that were used to classify the image as a cat or a dog. This information is then be used to identify the objects in the image or to reconstruct the original image.

Model inversion attacks are a serious threat to the privacy of users of machine learning models. They can infer sensitive information about users, such as their medical history, financial information, or location. As a result, it is important to take steps to protect machine learning models from model inversion attacks.

Here is a great walk-thru of exactly how a model inversion attack works. The post demonstrates the approach given in a notebook found in the PySyft repository.

Membership Inference Attacks

Membership inference attacks determine whether a specific data point was part of the training set, which can expose private user information or leak intellectual property. The attack queries the model with a set of data samples, including both those that were used to train the model and those that were not. The attacker then observes the model’s output for each sample and uses this information to infer whether the sample was used to train the model.

For example, if a model is trained to classify images of cats and dogs, an attacker would a membership inference attack to infer whether a particular image was used to train the model. The attacker would do this by querying the model with a set of images, including both cats and dogs, and observing the model’s output for each image. If the model classifies the images as a cat or dog if it was used to train the model, then the attacker is able to infer that the image was used to train the model.

Membership inference attacks are a serious threat to the privacy of users of machine learning models. They are leveraged to infer sensitive information about users, such as their medical history, financial information, or location.

Recommendations for Mitigation

Differential privacy is a technique that adds noise to the output of a machine learning model. This ensures that the attacker cannot infer any individual’s data from the output.
The training process for a machine learning model should be secure. This will prevent attackers from injecting malicious data into the training data.
Use a secure inference process. The inference process needs to be secure to prevent attackers from inferring sensitive information from the model’s output.
Design the model to prevent attackers from inferring sensitive information from the model’s parameters or structure.
Deploy the model in a secure environment to prevent attackers from accessing the model or its data.

The adoption of chatbots and other AI language models such as ChatGPT can greatly enhance business processes and customer experiences. However, it also comes with new risks and challenges. One major risk is the potential for data leakage and privacy concerns. As discussed, these can compromise the security and accuracy of AI models. Another risk is data poisoning, where malicious actors can intentionally corrupt an AI model’s training data. This ultimately leads to flawed decision-making and security breaches. Finally, model inversion and membership inference attacks can reveal sensitive information about users.

To mitigate these risks, businesses should implement access controls. They should also use the most modern and secure data encryption techniques. Lastly, seek to leverage data handling procedures, regular monitoring and testing, and incorporate human oversight into the machine learning process. Using differential privacy and a secure deployment environment can help protect machine learning models from these threats. It is crucial that businesses stay vigilant and proactive as they continue to adopt and integrate AI technologies into their operations.

By Adam DiStefano, M.S., CISSPApril 26, 2023Artificial Intelligence, ChatGPT, Cyber Security, Cybersecurity, GPT, Natural Language Processing, NLP, SecurityLeave a Comment

The Official Blog of Adam DiStefano, M.S., CEH, CISSP, CCSK, CAISS

Risks of Chatbot Adoption: Protecting AI Language Models from Data Leakage, Poisoning, and Attacks

Risk 1: Data Leakage and Privacy Concerns

Data Leakage in the Wild

Recommendations for Mitigation

Risk 2: Data Poisoning

Recommendations for Mitigation

Risk 3: Model Inversion and Membership Inference Attacks

Model Inversion Attacks

Membership Inference Attacks

Recommendations for Mitigation

0 Replies to “Risks of Chatbot Adoption: Protecting AI Language Models from Data Leakage, Poisoning, and Attacks”

Leave a Reply Cancel reply