ChatGPT User Data Exposure
Description
Created by US startup OpenAI, ChatGPT appeared in November and was quickly seized upon by users amazed at its ability to answer difficult questions, write sonnets or code, and provide information on loaded issues.
ChatGPT has even passed medical and legal exams set for human students, scoring high marks. But the technology also comes with many risks as its learning system and similar competitor models are integrated into commercial applications.
According to a Bloomberg story, a recent bug appears to have given other users of OpenAI's ChatGPT access to the titles of some users' previous chats with the AI chatbot. This glitch, which came to light on March 20, 2023, enabled certain users to view brief descriptions of other users' conversations from the chat history sidebar, prompting the company to temporarily shut down the chatbot. It's also possible that the first message of a newly-created conversation was visible in someone else's chat history if both users were active around the same time," the company said.
The bug, it further added, originated in the redis-py library, leading to a scenario where canceled requests could cause connections to be corrupted and return unexpected data from the database cache, in this case, information belonging to an unrelated user.
While the problem has since been addressed, OpenAI noted that the issue may have had more implications elsewhere, potentially revealing payment-related information of 1.2% of the ChatGPT Plus subscribers on March 20 between 1-10 a.m. PT. This included another active user's first and last name, email address, payment address, credit card number's last four digits (only), and credit card expiration date. It emphasized that full credit card numbers were not exposed.
OpenAI temporarily shut down ChatGPT on Monday in response to the bug, but it was brought back online later that night. As of this writing the chat history sidebar has been replaced with a message noting that “History is temporarily unavailable” and that the company is “working to restore this feature as soon as possible.”
The last update on OpenAI’s status page from 10:54 PM ET on Monday notes that service has been restored, but it’s still working to bring back past conversation histories for all users. However, according to Altman, users won’t have access to chats they did from 4 AM ET to 1 PM ET on March 27th.
Update March 22nd, 4:45 PM ET: Added confirmation from Sam Altman that the chat history issue was due to a bug in a piece of open-source software and that users wouldn’t be able to access their chat history from a few hours on Monday.
Technical details:
The bug was discovered in the Redis client open-source library, redis-py. Here’s how the bug worked:
- Used Redis to cache user information in our server so we don’t need to check our database for every request.
- Used Redis Cluster to distribute this load over multiple Redis instances.
- Used the redis-py library to interface with Redis from our Python server, which runs with Asyncio.
- The library maintains a shared pool of connections between the server and the cluster and recycles a connection to be used for another request once done.
- When using Asyncio, requests and responses with redis-py behave as two queues: the caller pushes a request onto the incoming queue and will pop a response from the outgoing queue, and then return the connection to the pool.
- If a request is canceled after the request is pushed onto the incoming queue, but before the response popped from the outgoing queue, the bug is seen: the connection thus becomes corrupted and the next response that’s dequeued for an unrelated request can receive data left behind in the connection.
- In most cases, it results in an unrecoverable server error, and the user will have to try their request again.
- But in some cases, the corrupted data matches the data type the requester was expecting, so what gets returned from the cache appears valid, even if it belongs to another user.
- At 1 a.m. Pacific time on Monday, March 20, inadvertently introduced a change to our server that caused a spike in Redis request cancellations. This created a small probability for each connection to return bad data.
- This bug only appeared in the Asyncio redis-py client for Redis Cluster and has now been fixed.
Actions taken
- Added redundant checks to ensure the data returned by the Redis cache matches the requesting user.
- Programatically examined logs to make sure that all messages are only available to the correct user.
- Correlated several data sources to identify the affected users and notify them.
- Improved logging to identify when this is happening and fully confirm it has stopped.
- Improved the robustness and scale of the Redis cluster to reduce the likelihood of connection errors at extreme load.