• Medicine
  • Spirituality
  • Astronomy
  • Sociology
  • Technology
  • History
  • Open Access
  • News

Scientists Tested Natural Language Models To Predict Human Language Judgments

Natural language models may test computational assumptions about how people understand language. A group of scientists from Columbia University in New York, coordinated by Tal Golan and Matthew Siegelman, assessed the model human consistency of several language models using a unique experimental approach: problematic sentence pairs. Two language models differ regarding which sentence is more likely to appear in the real test for each contentious sentence pair. Taking into account nine language models (including n-gram, recurrent neural networks, and transformer models),

The researchers generated hundreds of such contentious sentence pairings by picking phrases from a corpus or synthetically optimizing sentence pairs to be highly controversial and controversial. Human volunteers subsequently made evaluations indicating which of the two terms was more plausible for each couple. Controversial phrase pairings successfully highlighted model flaws and found models most closely matched with human assessments. GPT-2 was the most human-consistent model studied. However, testing showed severe deficiencies in its alignment with human perception.

Natural Language Models

These researchers put nine models from three different classes to the test: n-gram models, recurrent neural networks, and transformers. The Natural Language Toolkit's open source code was used to train the n-gram models. The recurrent neural networks were trained using PyTorch designs and optimization processes. HuggingFace, an open-source repository, was used to build the transformers. They gathered opinions from 100 native English speakers who took an online exam. Participants in each experimental session were asked to determine which statements they would be "more likely to encounter in the world, as either speech or written text" and rate their confidence in their response on a 3 point scale.

Despite the consistency in model ranking between our findings and earlier work, GPT-2's severe failure in predicting human reactions to natural vs. synthetic contentious pairings reveals that GPT-2 does not adequately imitate the computations used in human processing of even short words. This result is somewhat predictable because GPT-2 is an off-the-shelf machine learning model that was not created with human psycholinguistic and physiological features in mind. Even though we found a lot of human inconsistency, a recent GPT-2 study found that almost all of the variations in how people responded to actual words could be explained.

Natural And Synthetic Sentences Pairs

The researchers arranged 90 sentence pairings into ten sets of nine sentences each and gave each set to a different group of ten individuals. They calculated the percentage of trials in which the model and the person agreed on which phrase was more likely to assess model-human alignment. All nine language models outperformed chance by predicting human choices for randomly generated natural phrase pairings (50% accuracy). They gave each group of ten individuals a different set of phrase pairs. We statistically analyzed between-model differences while accounting for both people and sentence pairs as random variables using a simple Wilcoxon signed-rank test across the ten participant groups.

A process for synthesizing contentious sentence pairs was created, in which naturally existing sentences serve as initializations for synthetic phrases and reference points that drive sentence synthesis. They started with a naturally occurring statement. They then keep replacing words in the sentence with comments from a predefined vocabulary to make the synthetic sentence less likely to be correct by one language model while ensuring that the synthetic sentence is at least as possible to be accurate by another model.

Human participants rated ten contentious synthetic-sentence pairings for each model pair. They assessed how well each model predicted human sentence choices in all of the controversial synthetic-sentence combinations in which it was one of two models tested.


The tests proved that:

  • There are many ways natural language processing models can generate controversial sentence pairs. They can pick pairs of sentences from a corpus or change natural sentences to make controversial predictions.
  • The contentious sentence pairs make it easy to quickly compare models that seem the same in terms of human consistency.
  • All of the existing natural language processing model classes mistakenly give a high probability to the following non-natural sentences: A simple statement may be modified such that its likelihood according to a specific model does not diminish. Still, as per human judgments, the phrase becomes much less likely.
  • This method of comparing and testing models may give new ideas about which types of models work best with human language perception, and which types of models need to be made in the future.

About The Authors

Suleman Shah

Suleman Shah - Suleman Shah is a researcher and freelance writer. As a researcher, he has worked with MNS University of Agriculture, Multan (Pakistan) and Texas A & M University (USA). He regularly writes science articles and blogs for science news website immersse.com and open access publishers OA Publishing London and Scientific Times. He loves to keep himself updated on scientific developments and convert these developments into everyday language to update the readers about the developments in the scientific era. His primary research focus is Plant sciences, and he contributed to this field by publishing his research in scientific journals and presenting his work at many Conferences. Shah graduated from the University of Agriculture Faisalabad (Pakistan) and started his professional carrier with Jaffer Agro Services and later with the Agriculture Department of the Government of Pakistan. His research interest compelled and attracted him to proceed with his carrier in Plant sciences research. So, he started his Ph.D. in Soil Science at MNS University of Agriculture Multan (Pakistan). Later, he started working as a visiting scholar with Texas A&M University (USA). Shah’s experience with big Open Excess publishers like Springers, Frontiers, MDPI, etc., testified to his belief in Open Access as a barrier-removing mechanism between researchers and the readers of their research. Shah believes that Open Access is revolutionizing the publication process and benefitting research in all fields.

Recent Articles

  • Stop Using Any Of These Milks In Your Coffee Immediately, FDA Issues A Warning

    Stop Using Any Of These Milks In Your Coffee Immediately, FDA Issues A Warning

    Some people like to drink their coffee black, but there's no shame in adding a little cream, sugar, or both to alleviate the bitterness. There's more variety than ever regarding creamer selections, including mainstays like milk and Coffee Mate and non-dairy alternatives like oat and almond milk.

  • Suffocating In A Dream - Symbolizes Your Fear And Anxiety

    Suffocating In A Dream - Symbolizes Your Fear And Anxiety

    Suffocating in a dream frequently provides a break from reality. When you go to sleep tonight, perhaps you'll have a dream about someone you love, or you might conjure up visions of a dream vacation or an item you've been wanting to buy for yourself.

  • Asteroids On Mars - They May Have Made Mars' Sand

    Asteroids On Mars - They May Have Made Mars' Sand

    The gradual wearing away of rocks through time is the process that gives rise to new deposits of sand on Earth. But violent asteroids on Mars may affect how much new sand is made on the planet. Rock-like objects that orbit the sun but are not quite large enough to be classified as planets are called asteroids.

  • Osteoarthritis Prevention - Several Ways To Obtain The Desired Results

    Osteoarthritis Prevention - Several Ways To Obtain The Desired Results

    The most prevalent kind of arthritis, also known as inflammation of the joints, is known as osteoarthritis (OA) (where the ends of two bones meet). It is also called "wear and tear" arthritis or degenerative joint disease because it usually happens slowly over a person's lifetime as they age. Even though osteoarthritis, often known as OA, is becoming increasingly common with age, it is not a necessary consequence of getting older.

  • Why Are You Unhappy - Top Eleven Psychological Reasons

    Why Are You Unhappy - Top Eleven Psychological Reasons

    Happiness is an artificial construct; you either make it for yourself or you don't. Your daily routine is the primary contributor to your level of long-term happiness. We are always on the lookout for the next big idea, the next big promotion, or the next new person who will enter our lives and make us happier.

  • Maintain Balanced Blood Sugar – Ways To Get The Best Results

    Maintain Balanced Blood Sugar – Ways To Get The Best Results

    To maintain balanced blood sugar levels as regulated to help avoid or postpone significant health concerns is critical.

  • Propane - A Sustainable Option For Air Conditioning

    Propane - A Sustainable Option For Air Conditioning

    The increasing usage of air conditioners, which poses a threat to the environment due to their high energy consumption and the use of refrigerants with a high warming potential, is being driven by the current extreme heat waves, the intensity and frequency of which are anticipated to increase in the future. A new study found that using propane as a refrigerant could cut down on the rise in global temperature caused by space cooling.

  • Dead Rose Tattoo Meaning - Good Or Bad Sign? Let's Find Out

    Dead Rose Tattoo Meaning - Good Or Bad Sign? Let's Find Out

    When individuals think of love, numerous things and emotions come to mind. We feel content when we are in love. A dead rose tattoo meaning is not a good symbol of love.

  • Electrical Tape On Skin - A Skin's Healer Or A Destroyer?

    Electrical Tape On Skin - A Skin's Healer Or A Destroyer?

    Tapes have a number of uses, including maintenance, repairs around the home, electricity, and yard, as well as in the construction industry. However, there are also a large number of individuals who experiment with applying tapes to their skin, most commonly electrical tape. To clarify, the question is: Is electrical tape on skin safe?