In recent years, advancements in artificial intelligence (AI) have shown enormous potential to transform technologies, practices and interactions on a global scale. The adoption of AI for content creation, data analysis, decision-making and other labor-intensive jobs is growing at a rapid rate, prompting many to question its social impacts.
Will these changes benefit some more than others? Will they advantage or disadvantage students? Can they lead innovation themselves or are they limited to their input?
UC Davis experts Martin Hilbert and Marit MacArthur from the UC Davis College of Letters and Science answer these and other important questions at a recent Team Research Forum.
Role of AI in education and research
In the AI-skeptical piece titled “On the dangers of stochastic parrots: Can language models be too big?” Bender et al. make the point “that most language technology is built to serve the needs of those who already have the most privileges in society.” At this point, researchers posit that AI-enabled tools such as ChatGPT are designed to serve experts, not novices, which begs the question of the social impact of AI in education.
Critical editing holds the key
According to MacArthur, general purpose AI tools have been designed for experts and well-educated workers to build more expertise and to increase efficiency. They are not designed for novices or education where training students to develop expertise by challenging and guiding them is the core focus.
A Temple University study shows that “the usage of generative AI tools, such as ChatGPT or GitHub Copilot, poses a unique problem, as they were not developed solely for an educational purpose and their inclusion in the classroom may have unknown effects. Many researchers have begun to explore the potential negative effects the inclusion of these models into education may bring, including bias and fairness of these models, over-reliance, explainability, and trust.”
MacArthur added that AI tools for tutoring show more promise, but they must be designed and tested by educators, and protect student privacy. Also, “universities should not be charged high fees for such tools, since generative AI could not be developed in the first place without the training data of fluent prose written by college graduates, which it needs more of to keep developing,” she said.
MacArthur believes that the future of writing and programming is critical editing –– a skill requiring humans to be able to assess and tweak the output of AI for their purposes. An AI ghostwriter cannot quite understand the purpose, audiences, guidelines for the genre and the real-world local context, because it’s not embedded in the physical, social world.
There’s also the risk of the fluency fallacy –– the erroneous attribution of accuracy, expertise and authority to a given text or speaker based on the use of stylishly polished, grammatically correct, idiomatic syntax and vocabulary.
Teaching the foundational skills so that students can prompt AI well and evaluate the output intelligently for their purposes has now become more important than ever.
“Experts are good at this prompt writing, not novices,” said MacArthur, and strongly advocates for AI literacy to develop a set of skills that will enable individuals to communicate and collaborate effectively with AI, understand its ethical implications and critically reflect on its use for increased productivity.
AI in research productivity and student feedback
Recent research demonstrates that AI feedback on student writing in K-12 context is comparable to human feedback when certain criteria are used. According to MacArthur, however, heavy reliance on AI feedback could result in inaccurate assessment of the student’s need because then teachers might lose the opportunity to get to know their students well, understand their thought processes, give customized feedback and build the human relationships necessary for teachers to improve, which is also important to train future teachers.
Compared to higher education, the situation may be slightly different in K-12 public schools with large classes where, “AI feedback could possibly relieve teachers of the burden of laborious feedback, but if it’s used to replace teachers completely then that would be very problematic,” MacArthur said.
According to Hilbert, data from randomized control studies show that AI can increase the productivity in coding-related tasks by 55%, even for the senior professionals. In addition, AI copilots have become very important in scaling up, for example, teaching 1000s of students, or for online classes like Coursera with 70,000 students.
Research shows that a student with a human personal tutor can increase their performance by two standard deviations, which means a grade improvement from C minus to A minus. Fine-tuning ChatGPT to evolve as a personal-like tutor is one way to address the challenges of providing meaningful student feedback in large classroom settings.
MacArthur argued that, although AI feedback may democratize access to tutoring, it could also democratize access to cheating — robbing one’s ability to learn something — and teaching the difference between the two is crucial.
Common AI biases and challenges
According to Hilbert, there are various sources such as age, gender, ethnic background, religious belief and income that could lead to bias in an AI system. However, he said that “it’s much easier, feasible and practical to take this bias out of a machine than out of the brain.”
This is because a biological neural network like the human brain can hold only a small number of variables by design, and so there is a need to summarize the variables, which often leads to an ingrained bias. In contrast, for an AI system, all that is required to remove the bias from the results is to build the models in such a way that they do not consider those bias variables in their computation, or at least make their outcomes equally probable with respect to those.
Hilbert is optimistic about such a strategy as it requires modifying the software only, but he cautions that first an agreement is needed on these potential bias variables that should be left out from the models. The obvious ones are the handful of variables protected by law, such as gender and race.
Going beyond these, he thinks that reaching such an agreement might be challenging because every variable contributes to an AI system’s accuracy, and leaving out some could lead to sacrificing the accuracy, which in turn could result in taking a hit on the company’s profit. And without any government regulations, companies might not be ready to do that.
MacArthur agreed that removing the bias from an AI system is technically easier, but she also points out that unless regulations mandate it, that is difficult to achieve, especially in the US. She noted that, at the ground level the engineers developing the AI are more engaged in releasing the software as quickly as possible with the highest accuracy and trying to maximize the company’s profits, thus, leaving little room for inspection of the bias and the resulting discrimination.
Both experts underscored the importance of diversified training data to eliminate such risks of bias and underrepresentation.
Novelty and hallucinations in AI
Recent studies show that AI can improve human-decision making by increasing the novelty in its solutions. Hilbert highlights the ingenious role that AI can play by citing how AlphaGo –– the AI system developed by Google DeepMind –– won against the best human player in the world in the game of Go with its famous move 37, initially questioned by human experts, but later on lauded for its innovation.
He also thinks that AI-generated data, including adversarial data, can be effectively used to train other AI systems, advancing innovations and predictive analysis, especially in healthcare. He cites the example of how Google developed a pair of interactive patient-doctor AI models that trained on each other’s data, producing an AI system that showed not only improved diagnostic accuracy, but also performed as good as a primary care physician, even superseding on empathy, making it hard to distinguish from a human caregiver.
MacArthur also believes that AI will be extremely helpful in healthcare, but she emphasizes the importance of human expertise in order to keep using AI intelligently.
On the topic of hallucination — incorrect or misleading response generated by AI models — Hilbert argued that it’s more of a feature than a bug in an AI system, because the system is designed to generate the most probable outcome most of the time, but when there is incomplete information, it can hallucinate and output a less probable one, much like our dreams where we sometimes hallucinate in our sleep.
MacArthur, however, pointed out that depending on the context, the results of a hallucination can be good or bad and, especially in healthcare, hallucination should not be desired as it could lead to unexpected results.
Hilbert also stressed the importance of human agency in giving goals to the AI. The technical term in machine learning is a reward function or inverse loss function.
“Machine learning based AI cannot give itself goals; humans do. Even for so-called ‘unsupervised’ learning,” said Hilbert. “Those humans are the ones ultimately responsible for the AI’s behavior.”
He noted that defining adequate goals is tricky. For example, the first version of the YouTube recommender algorithm was called the “watch-time maximization algorithm,” but instead it also could have been the “watch-time maximization without making children addicted algorithm,” but it wasn’t.
“For one, humans do not agree on what we want AI to do. Additionally, the complex adaptive systems managed by AI are rattled with innumerable known- and unknown-unknowns,” said Hilbert. “Finding the right goals is known as the AI alignment problem, and it will likely keep humankind busy for the decades to come.”
This article originally appeared on the UC Davis Office of Research website.