News
One Way to Fight Misinformation
Comments
Link successfully copied
This illustration photo shows a phone displaying a picture of rescuers working on a residential building destroyed after a missile strike in Dnipro on Jan. 16, 2023, with the WarOnFakes.com website displaying a fake video of the same residential building shown in the background, taken on Feb. 14, 2023. (Olivier Douliery/AFP via Getty Images)
By Vishnu Pendyala
10/11/2024Updated: 10/11/2024

Commentary

The much-touted California state AI safety bill, SB 1047, considered the most stringent in the nation, has been vetoed by the California governor. Baby steps in addressing some of the low-hanging fruits in the AI regulation landscape are better than giant leaps. One such step would be to mandate the traceability of data. California should take the lead on this, just like it did on data privacy.

Misinformation has been playing the devil with the war in Ukraine. The problem is compounded with the advent of Generative Artificial Intelligence (AI). Deepfakes are a major problem. Governments naturally want to fight misinformation and fraud. Mandating traceability of data, also called provenance, is an important step in that direction and is a more comprehensive solution than just requiring watermarks in the metadata of AI-generated photos as proposed in California state’s AB 3211 that some companies are supporting.

Data is the new oil that is running a substantial amount of machinery in our daily lives, particularly in the form of AI. It is, therefore, important that the integrity of data be preserved. The traceability factor that data provenance assures is an important aspect of integrity. It can help in detecting anomalies and errors in data, just like tracing money can help with the integrity of the economy.

Provenance helps create trust in digital items found online. There is some indication that provenance is effective in reducing the users’ vulnerability to misinformation. It is an important ethical measure that deters the misuse of digital items such as photographs. Provenance should include documenting the method used for generating the data. Mandating that the lineage, transformations, and the context of such transformations of data are maintained along with the data is likely to reduce the deliberate creation and use of fake content such as the deep fakes used in scams.

I have been a victim of misinformation in multiple ways and multiple times, such as on anonymous websites. Although most reviews about my teaching on the website RateMyProfessors.com are affirmatively positive, the few that are false still hurt. In rigorous research studies, anonymous websites such as RateMyProfessors.com have been proven to be inaccurate and biased. Still, the website proclaims, “The law protects Rate My Professors from legal responsibility for the content submitted by our users.” It is not clear if the Federal Trade Commission rule banning fake reviews will have any impact on the website, but mandating data provenance may.

As the research states, on such websites, “there is no guarantee that a ’student reviewer' is even a student,” implying a lack of provenance, and the information there is “unsuitable for use in a decision making process.” Still, many students use it for decision making, and the government seems not to want to do anything about it—until perhaps someone starts a website along the lines of “rate my lawmaker” or opinion pieces like this make a difference.

Enforcing traceability of the information on websites may not eliminate misinformation but can effectively reduce it. Scams targeting young adults are increasing on social media. I reported at least one that I encountered to Facebook but to no avail. The content moderators did not think that it was an attempt at scamming, probably because the conversation on Facebook’s messenger was in vernacular language. Such scam attempts can at least be partly attributed to the lack of traceability.

There are more compelling reasons for mandating data provenance. It is touted as the “secret weapon” to protect businesses from fraud and promises to be a new beginning in cybersecurity. It facilitates investigating data breaches and narrowing down the specific information that is breached. A big debate with the advent of Generative AI is copyright. Data lineage can establish ownership and help resolve disputes around intellectual property. It is, of course, a great resource in forensic analysis.

Tracking the origins, movements, and transformations can enable the assessment of any biases in the data and provide an indication of reliability. Reproducibility is important in many research and decision-making setups. Data provenance partly ensures the reproducibility of experiments, the decision-making process, and can help in the recovery of data lost due to various catastrophes.

Provenance is essential for the data ecosystem in the age of Generative AI. Mandating it will enhance the reliability, accountability, and trustworthiness of data, the systems built on top of it, and the decisions derived from them. The good news is that the technical community is increasingly realizing the importance of data provenance and effectively working toward ensuring it through standards, initiatives, and technologies such as the Data Fabric. It is time that its necessity gets the attention of lawmakers. Until that time, industries should volunteer to take the lead.

Opinions expressed are Vishnu’s and not those of his employer or any other entity that he is affiliated with.

Share This Article:
Vishnu S. Pendyala, Ph.D., teaches machine learning and other data science courses at San Jose State University. He is an ACM distinguished speaker, book author, and has over two decades of experience in the software industry.

©2023-2024 California Insider All Rights Reserved. California Insider is a part of Epoch Media Group.