kean application essay

2022-2023 Undergraduate Catalog

About This Catalog
2022-2023 Academic Calendar
Degree Programs
Minor Programs
Colleges/Programs by College
University Admissions

Freshman Admission: General Statement

Alternative Freshman Admissions Programs
International Admissions
Admission of Veterans
Re-Enrollment Admissions
Transfer and Second Baccalaureate Degree Admission: General Statement
Financial Information
Academic Standards, Policies, and Procedures
Campus Life/Student Services
Faculty/Personnel/Advisory Boards
Catalog Home
All Catalogs

2022-2023 Undergraduate Catalog > Admissions > Freshman Admission: General Statement

Kean offers a smarter approach to education that focuses on giving you the skills and real-world experience to launch the career of your dreams. Applicants for admission to Kean University are considered in terms of prior achievement and future promise. The Office of Admissions uses alternate indicators to evaluate college preparedness, included but not limited to courses taken, grade point average, class rank, and earned college credits.

Apply , Visit , or Request Information today.

Important Dates and Deadlines:

December 1: Preferred Application Deadline (Spring Semester) January 1: Early Action Deadline March 1: Preferred FAFSA (Free Application for Federal Student Aid) Deadline. April 30: Preferred Application Deadline. Rolling basis thereafter. (Fall semester) May 1: National Decision Day - Preferred Tuition Deposit Deadline. May 1: Educational Opportunity Fund (EOF) Program Application Deadline (Fall semester admission only).

Freshman Application Instructions

International Application Instructions

Up one level

Kean University

Kean Connect
Navigate360
Learning Commons
Academic Calendar
Cougar Link
Support Services
Rapid Initiative

Transfer Application Instructions

Thank you for your interest in Kean University. Below you will find application instructions for transfer students. Please note, freshman with advanced standing who have earned college credits after high school graduation are required to apply as a transfer student.

If you are an international applicant, please visit our International Application Instructions page.

Applicants: Important Dates and Deadlines

Fall Priority Deadline: June 1

Fall: August 1 (Michael Graves College: Graphic Design June 30th)

Spring Priority Deadline: November 1

Spring Deadline: December 1

Applicants with 30 credits or more are required to submit:

Official transcripts from every college and post-secondary school attended.

Transfer Applicants with less than 30 credits are required to submit:

Transcripts from every college and post-secondary school attended, and
Official high school transcripts.

A complete application consists of the following items*:

A submitted online application .
Official transcripts as referenced above from every college and post-secondary school attended must be sent electronically to the attention of the Office of Transfer Admissions at [email protected] . If electronic copies of transcripts are unavailable, official (sealed envelope) transcripts can be sent by mail to the Office of Transfer Admissions located in Kean Hall at 1000 Morris Avenue, Union, New Jersey 07083.
A non-refundable $75 application fee. Application fees may be paid through the online application by credit card or debit card. Alternately, the fee may be paid by check or money orders made payable to Kean University and sent to the address below.
International students (i.e. those who require an I-20 Form, including those on F-1 Visas to study in the U.S.) must also complete the International Student Application Supplement.

* Please review the additional application requirements for consideration into the following programs:

Architecture
Art Education

Further, please note the following:

The current deadline for programs in the Michael Graves College (Architecture and Design programs) is June 30.
Incomplete applications and submitted applications received without transcripts will not be processed for decision.
Transfer credit will only be awarded from institutions listed on the application at the point of admission. Any institutions not listed will nullify the potential transfer of those credits.
Deliberate omission of an institution attended to conceal poor performance will result in withdrawal of an admission offer.

Mailing Instructions:

Coming Back to Kean? (Readmission)

Once a Cougar, always a Cougar! We invite you to finish what you started! If you were admitted and enrolled at Kean, but never completed your degree requirements, then apply for Readmission . Get started by connecting with one of our admissions counselors.

For detailed information on the readmission, please visit Kean.edu/offices/admissions/readmission-kean .

Applicants for Second Baccalaureate Degree

Transfer applicants that already hold a Baccalaureate degree from an accredited four-year U.S. college/university or foreign equivalent, will need to complete the same Transfer Student Application Instructions listed above. Please also carefully read the information below.

Second Baccalaureate Degree Requirements

You must have a Baccalaureate degree from an accredited four-year U.S. college/university or the equivalent from a foreign university. The equivalency of the foreign degree must be verified by an NACES accredited evaluation agency (see above in Application Instructions).
Your GPA from your baccalaureate degree-granting institution must meet the minimum GPA requirement for the specific major desired. Minimum GPA's required range from 2.0 to 3. depending on the major.
You will also be subject to any other requirements or special procedures in effect for admission to the major program you select (e.g. portfolios for design majors).

Please Note:

Not all major programs offer second degrees. For example, Second Degrees in Education majors are not offered. Interested students should change the way you are applying from Transfer to Post-Baccalaureate Teaching Certification in the Academic Intent section of the online application.
You cannot work on your first and second baccalaureate degrees simultaneously.

Second Baccalaureate Degree Completion Requirements

You must complete all academic major requirements and all "cognate" or "additional required courses" that support the major and any lower-level prerequisites for those courses, if applicable.
You will be granted Advanced Standing Credits for all Free Electives and General Education (GE) requirements - except for GE courses that are prerequisites, if applicable.
Transfer courses may be awarded toward the Second Degree major; however, you must complete a minimum of 32 credits, including at least half of the major requirements, at Kean after admission to the Second Degree program.
Once accepted to the Second Degree program, all coursework must be completed at Kean University.

Additional Resources

Transfer Credit Evaluation
Transfer- Planning to Become a Teacher

Admissions Appeal Request

Requests for reconsideration of the Kean University Undergraduate Admissions Application must be based upon new academic information (that was not available at the time of the original decision and) which shows a significant improvement as compared to credentials previously submitted.

If you would like to appeal your admissions decision, please email [email protected] .

Kean University

Academic Programs

Accounting*
Art & Design
Communication
Criminal Justice*
Engineering
Finance/Accounting
Government/Political Science
Health Science
Liberal Arts
Performing Arts
Psychology*
R.N. to BSN *
Social Science
Visual Arts

Student experience

Co-op/Internship Opportunities
Disability Services
Hispanic Serving Institution (HSI)
Intramural/Club Sports
LGBTQIA Services
Military/Veteran Services
Night Class Offerings
On-Campus Housing
Online Programs
PTK Transfer Friendly University
Professional Equity Theatre
ROTC Program
Study Abroad
Undergraduate Research
Veteran Fee Waiver
Virtual Learning

Application information

Find out about requirements, fees, and deadlines

Kean University offers a smarter approach to education that focuses on rewarding careers, fulfilling lives, and lifelong learning. Ready to apply as a freshman? Learn more about application instructions and important deadlines at kean.edu/admissions.

Kean University is nationally recognized for the seamless way we welcome transfer students who are ready to pursue four-year degrees. Ready to apply as a transfer student? Learn more about application instructions and important deadlines at kean.edu/admissions.

Additional Information

Admissions office

1000 Morris Avenue Union , NJ 07083 , United States of America

[email protected]

Phone number

(908) 737-7100

For first-year students

Admissions website.

www.kean.edu/offices/admissions

Financial aid website

www.kean.edu/offices/financial-aid

For transfer students

Undocumented or daca students.

www.kean.edu/offices/registrars-office/new-jersey-dream-act

Mid-Atlantic

View more in this region

Follow Kean University

Facebook icon
Twitter icon
Youtube icon
Instagram icon

You are using an outdated browser. Please upgrade your browser or activate Google Chrome Frame to improve your experience.

Kean University Requirements for Admission

Choose your test.

What are Kean University's admission requirements? While there are a lot of pieces that go into a college application, you should focus on only a few critical things:

GPA requirements
Testing requirements, including SAT and ACT requirements
Application requirements

In this guide we'll cover what you need to get into Kean University and build a strong application.

School location: Union, NJ

Admissions Rate: 82.6%

If you want to get in, the first thing to look at is the acceptance rate. This tells you how competitive the school is and how serious their requirements are.

The acceptance rate at Kean University is 82.6% . For every 100 applicants, 83 are admitted.

This means the school is lightly selective . The school will have their expected requirements for GPA and SAT/ACT scores. If you meet their requirements, you're almost certain to get an offer of admission. But if you don't meet Kean University's requirements, you'll be one of the unlucky few people who gets rejected.

We can help. PrepScholar Admissions is the world's best admissions consulting service. We combine world-class admissions counselors with our data-driven, proprietary admissions strategies . We've overseen thousands of students get into their top choice schools , from state colleges to the Ivy League.

We know what kinds of students colleges want to admit. We want to get you admitted to your dream schools.

Learn more about PrepScholar Admissions to maximize your chance of getting in.

Kean University GPA Requirements

Many schools specify a minimum GPA requirement, but this is often just the bare minimum to submit an application without immediately getting rejected.

The GPA requirement that really matters is the GPA you need for a real chance of getting in. For this, we look at the school's average GPA for its current students.

Average GPA: 3.2

The average GPA at Kean University is 3.2 .

(Most schools use a weighted GPA out of 4.0, though some report an unweighted GPA.

With a GPA of 3.2, Kean University accepts below-average students . It's OK to be a B-average student, with some A's mixed in. It'd be best to avoid C's and D's, since application readers might doubt whether you can handle the stress of college academics.

SAT and ACT Requirements

Each school has different requirements for standardized testing. Only a few schools require the SAT or ACT, but many consider your scores if you choose to submit them.

Kean University hasn't explicitly named a policy on SAT/ACT requirements, but because it's published average SAT or ACT scores (we'll cover this next), it's likely test flexible. Typically, these schools say, "if you feel your SAT or ACT score represents you well as a student, submit them. Otherwise, don't."

Despite this policy, the truth is that most students still take the SAT or ACT, and most applicants to Kean University will submit their scores. If you don't submit scores, you'll have one fewer dimension to show that you're worthy of being admitted, compared to other students. We therefore recommend that you consider taking the SAT or ACT, and doing well.

Kean University SAT Requirements

Many schools say they have no SAT score cutoff, but the truth is that there is a hidden SAT requirement. This is based on the school's average score.

Average SAT: 990

The average SAT score composite at Kean University is a 990 on the 1600 SAT scale.

This score makes Kean University Lightly Competitive for SAT test scores.

Kean University SAT Score Analysis (New 1600 SAT)

The 25th percentile SAT score is 910, and the 75th percentile SAT score is 1150. In other words, a 910 on the SAT places you below average, while a 1150 will move you up to above average .

Here's the breakdown of SAT scores by section:

SAT Score Choice Policy

The Score Choice policy at your school is an important part of your testing strategy.

Kean University ACT Requirements

Just like for the SAT, Kean University likely doesn't have a hard ACT cutoff, but if you score too low, your application will get tossed in the trash.

Average ACT: 20

The average ACT score at Kean University is 20. This score makes Kean University Moderately Competitive for ACT scores.

The 25th percentile ACT score is 16, and the 75th percentile ACT score is 25.

ACT Score Sending Policy

If you're taking the ACT as opposed to the SAT, you have a huge advantage in how you send scores, and this dramatically affects your testing strategy.

Here it is: when you send ACT scores to colleges, you have absolute control over which tests you send. You could take 10 tests, and only send your highest one. This is unlike the SAT, where many schools require you to send all your tests ever taken.

This means that you have more chances than you think to improve your ACT score. To try to aim for the school's ACT requirement of 16 and above, you should try to take the ACT as many times as you can. When you have the final score that you're happy with, you can then send only that score to all your schools.

ACT Superscore Policy

By and large, most colleges do not superscore the ACT. (Superscore means that the school takes your best section scores from all the test dates you submit, and then combines them into the best possible composite score). Thus, most schools will just take your highest ACT score from a single sitting.

We weren't able to find the school's exact ACT policy, which most likely means that it does not Superscore. Regardless, you can choose your single best ACT score to send in to Kean University, so you should prep until you reach our recommended target ACT score of 16.

Download our free guide on the top 5 strategies you must be using to improve your score. This guide was written by Harvard graduates and ACT perfect scorers. If you apply the strategies in this guide, you'll study smarter and make huge score improvements.

Free eBook: 5 Tips to 4+ Points on the ACT

SAT/ACT Writing Section Requirements

Currently, only the ACT has an optional essay section that all students can take. The SAT used to also have an optional Essay section, but since June 2021, this has been discontinued unless you are taking the test as part of school-day testing in a few states. Because of this, no school requires the SAT Essay or ACT Writing section, but some schools do recommend certain students submit their results if they have them.

Kean University considers the SAT Essay/ACT Writing section optional and may not include it as part of their admissions consideration. You don't need to worry too much about Writing for this school, but other schools you're applying to may require it.

Final Admissions Verdict

Because this school is lightly selective, you have a great shot at getting in, as long as you don't fall well below average . Aim for a 910 SAT or a 16 ACT or higher, and you'll almost certainly get an offer of admission. As long as you meet the rest of the application requirements below, you'll be a shoo-in.

But if you score below our recommended target score, you may be one of the very few unlucky people to get rejected.

Admissions Calculator

Here's our custom admissions calculator. Plug in your numbers to see what your chances of getting in are. Pick your test: SAT ACT

80-100%: Safety school: Strong chance of getting in
50-80%: More likely than not getting in
20-50%: Lower but still good chance of getting in
5-20%: Reach school: Unlikely to get in, but still have a shot
0-5%: Hard reach school: Very difficult to get in

How would your chances improve with a better score?

Take your current SAT score and add 160 points (or take your ACT score and add 4 points) to the calculator above. See how much your chances improve?

At PrepScholar, we've created the leading online SAT/ACT prep program . We guarantee an improvement of 160 SAT points or 4 ACT points on your score, or your money back.

Here's a summary of why we're so much more effective than other prep programs:

PrepScholar customizes your prep to your strengths and weaknesses . You don't waste time working on areas you already know, so you get more results in less time.
We guide you through your program step-by-step so that you're never confused about what you should be studying. Focus all your time learning, not worrying about what to learn.
Our team is made of national SAT/ACT experts . PrepScholar's founders are Harvard graduates and SAT perfect scorers . You'll be studying using the strategies that actually worked for them.
We've gotten tremendous results with thousands of students across the country. Read about our score results and reviews from our happy customers .

There's a lot more to PrepScholar that makes it the best SAT/ACT prep program. Click to learn more about our program , or sign up for our 5-day free trial to check out PrepScholar for yourself:

Application Requirements

Every school requires an application with the bare essentials - high school transcript and GPA, application form, and other core information. Many schools, as explained above, also require SAT and ACT scores, as well as letters of recommendation, application essays, and interviews. We'll cover the exact requirements of Kean University here.

Application Requirements Overview

Common Application Accepted
Electronic Application Available
Essay or Personal Statement Required for all freshmen
Letters of Recommendation 2
Interview Not required
Application Fee $75
Fee Waiver Available? Available
Other Notes

Testing Requirements

SAT or ACT Considered if submitted
SAT Essay or ACT Writing Optional
SAT Subject Tests
Scores Due in Office August 15

Coursework Requirements

Subject Required Years
Foreign Language
Social Studies
Electives 5

Deadlines and Early Admissions

Offered? Deadline Notification
Yes August 15 November 1
Yes December 1 January 1

Admissions Office Information

Address: 1000 Union, NJ 7083
Phone: (908) 737-KEAN xKEAN
Fax: (908) 737-3415
Email: [email protected]

Other Schools For You

If you're interested in Kean University, you'll probably be interested in these schools as well. We've divided them into 3 categories depending on how hard they are to get into, relative to Kean University.

Reach Schools: Harder to Get Into

These schools are have higher average SAT scores than Kean University. If you improve your SAT score, you'll be competitive for these schools.

Same Level: Equally Hard to Get Into

If you're competitive for Kean University, these schools will offer you a similar chance of admission.

Safety Schools: Easier to Get Into

If you're currently competitive for Kean University, you should have no problem getting into these schools. If Kean University is currently out of your reach, you might already be competitive for these schools.

If You Liked Our Advice...

Our experts have written hundreds of useful articles on improving your SAT score and getting into college. You'll definitely find something useful here.

Subscribe to our newsletter to get FREE strategies and guides sent to your email. Learn how to ace the SAT with exclusive tips and insights that we share with our private newsletter subscribers.

You should definitely follow us on social media . You'll get updates on our latest articles right on your feed. Follow us on all of our social networks:

Kean University: Direct admissions details

Dec 26, 2023 • knowledge, information.

Common App Direct Admissions is a program that offers college admission to qualified students. Participating colleges set a minimum, qualifying GPA for students in their home state. Common App then identifies students who meet those requirements using their Common App responses.

Some colleges may remove certain application requirements for Common App Direct Admissions recipients. If a college has any exceptions in their direct admissions offer, we will list them in this FAQ.

To accept your Common App Direct Admissions offer, all you need to do is submit your application to Kean University for free.

Add college to my list

Direct Admissions requirements

Personal essay, teacher evaluations.

0 Required, 10 Optional

Program or major exceptions

While this offer will guarantee your admission to this college, it does not guarantee you will be admitted to the following programs:

Advertising (B.F.A.)
Architectural Studies (B.A.)
Biology - Teacher Education (B.A.)
Biology - STEM Teacher Education - 5yr. (B.S./M.A.)
Biomedicine - STEM 4yr. (B.S.)
Biotechnology/Molecular Biology - STEM 5yr. (B.S./M.S.)
Chemistry - STEM Teacher Education - 5yr. (B.S./M.A.)
Computational Science & Engineering - STEM 5yr. (B.S./M.S.)
Fine Arts - Art Education (B.A.)
Graphic Design - (B.F.A.)
Industrial Design (B.I.D.)
Interior Design (B.F.A.)
Mathematics - STEM Teacher Education - 5yr. (B.S./M.A.)
Music (B.A.)
Music Education (B.M.)
Music Performance (B.M.)
Studio Art (B.F.A.)
Theatre (B.A.)
Theatre Design & Technology (B.F.A.)
Theatre Performance (B.F.A.)
Theatre Performance - Musical Theatre Option (B.F.A.)

College-specific exceptions

Enrollment is contingent upon conviction and disciplinary history inquiry.

Fee waiver instructions

To ensure your application to this college is free, please indicate that you’d like to use a school-specific fee waiver.

In the college’s application questions, find the “Do you intend to use one of these school-specific fee waivers?” question.
Respond to this question with the “Common App Direct Admissions recipient” option.

The college will verify your eligibility status after you submit your application.

Additional resources

Explore colleges profile – Learn more about what this college has to offer
Financial aid – More information about this college’s financial aid offerings
Direct Admissions FAQ – More information about the Common App Direct Admissions program

Kean University

Cost & scholarships
Essay prompt

Acceptance Rate

Average SAT

SAT 25th-75th

Students Submitting SAT

Average (25th - 75th)

Reading and Writing

Average ACT

ACT 25th-75th

Students Submitting ACT

Wondering your admission chance to this school? Calculate your chance now

Applications, how to apply, tests typically submitted, similar schools.

Dominican College of Blauvelt campus image

grade B Overall Grade
Rating 3.75 out of 5 1,614 reviews

How to Apply to Kean University

Start your application, application requirements.

High School GPA Required
High School Rank Considered but not required
High School Transcript Required
College Prep Courses Required
SAT/ACT Considered but not required
Recommendations Considered but not required

Why choose WKU
China-Mainland
China-Hong Kong,Macao and Taiwan International
International
Freshmen Guide

Start Here Go Anywhere

Wenzhou-Kean University now offers 18 undergraduate programs and Kean University offers more than 50 undergraduate programs and more than 60 graduate options for study.

Kean University is a public university founded in 1855 in New Jersey. It enjoys a high reputation in the United States. Kean is a world-class, vibrant and diverse university.

100% of professional courses are introduced from the Kean University, using original American and international textbooks.

From 35 countries and regions around the world, most of whom have international teaching backgrounds and doctoral degrees.

Accounting (B.S.) Finance（B.S.） Economics （B.S.） Global Business（B.S.） Management (Business Analytics Option) (B.S.) Management (Supply Chain and Information Management Option) (B.S.) Marketing (B.S.) Accounting (M.S.) Global Management (M.B.A.)

English（B.A.） Communication（B.A.） Psychology（B.A.） Psychology (M.A.)

Chemistry（B.S.） Environmental Science（B.S.） Biology (Cell and Molecular Biology Option)（B.S.） Mathematical Sciences (Data Analytics Option)（B.A.） Computer Science（B.S.） Computer Information Systems（M.S.） Biotechnology Science（M.S.）

Architectural Studies (B.A.) Industrial Design（B.I.D.） Interior Design（B.F.A.） Architecture (M.Arch.)

Instruction and Curriculum (M.A.) Educational Administration (M.A.) Educational Leadership (Ed.D.)

Career prospects

中国浙江省温州市瓯海区大学路88号

邮政编码: 325060

+86 577 5587 0000

[email protected]

Applying to Kean University

Admission Trends
SAT Score Trends
NJ Acceptance Rate
Save School

Acceptance Rate

Kean University's acceptance rate is 82.60% for 2023 admission. A total of 10,485 students applied and 8,661 were admitted to the school. Judging by the acceptance rate, it is relatively easy (higher than the national average) to get into Kean.

Kean's acceptance rate places 16th out of 52 New Jersey colleges with competitive admission .

The median SAT score is 1,020 and the ACT score is 18 at Kean. Kean's SAT score ranks 21st out of 22 New Jersey colleges that consider SAT score for admission .

To apply to Kean, personal statement (or essay) is required to submit and recommendations is not required but considered. The SAT and ACT score is not required, but considered for admission. In addition, english proficiency test score is also considered (required).

Admission Statistics

For the academic year 2022-23, the acceptance rate of Kean University is 82.60% and the yield (also known as enrollment rate) is 22.72%. 4,061 men and 6,424 women applied to Kean and 3,196 men and 5,465 women students were accepted.

Among them, 813 men and 1,155 women were enrolled in the school (Fall 2022). The following table and chart show the admission statistics including the number of applicants, acceptance rate, and yield at Kean.

Data source: IPEDS (Integrated Postsecondary Education Data System) (Last update: December 11, 2023)

Kean University 2023 Acceptance Rate By Gender chart

SAT and ACT Score Distribution

In 2023, 252 students (13% of enrolled) have submitted their SAT scores and 28 students (1%) have submitted their ACT scores for seeking degrees at Kean.

The median SAT score is 1,020 with 510 SAT EBRW and 510 SAT Math median scores. The 75 th percentile SAT score of EBRW (Evidence-Based Reading / Writing) is 580 and the 25 th percentile is 450. The SAT Math 75 th percentile score is 580 and 25 th percentile is 450 .

The median ACT composite score is 18 with 75 th percentile score of 25 and 25 th percentile score of 16 at Kean.

The SAT and ACT scores at Kean University are similar to the average score compared to similar colleges (SAT: 1,065, ACT: 21 - public doctoral / research university).

Kean University 2023 SAT and ACT Score Chart

First-year Enrollment by Gender and Enrolled Type

For the academic year 2022-23, total 1,968 first-year students enrolled in Kean. The proportion of full-time students is 96.75% and part-time students is 3.25%.

By gender, the proportion of men students is 41.31% and women students is 58.69%.

The following table shows the first-year students by gender and enrolled type at Kean.

Application Requirements

It requires to submit High School GPA, High School Record (or Transcript), Completion of College Preparatory Program, English Proficiency Test, and Personal statement or essay to its applicants.

The SAT and ACT test score is not reqired, but considered in admission process at Kean. The english proficiency test is reqired .

The next table summarizes the application requirements to apply Kean University.

For more admission information such as minimum GPA and deadlines, see its online application page .

General Admissions Information

The application fee required to apply to Kean University is $75 for both undergraduate and graduate programs. The following table provides general admission information for applying to Kean.

Kean University Admission Requirements 2022

Acceptance Rate

Kean University Admissions

It has an acceptance rate of around 69%. This figure translates into the fact that out of 100 applicants willing to take admission at the school, 69 are admitted. Notably, the SAT scores of the students lie in the range of 920-1100. For the perusal of the applicants, here are some interesting figures; the ACT scores of the admitted students lie in the range of 17-23, whereas, the average GPA scores hover around 3. The applicants must keep in mind that the application deadline for Kean University is August 15, 2020. Kean University follows a simple application and admission process that includes a total of 4 steps, which are as follows: Step 1: Submitting the application A candidate is first required to apply to the university as part of the admission procedure at Kean University. To apply to the university, interested candidates can either call the university at (908) 737-5326 or do it through online mode to secure admission. Those willing to take admission can also directly visit the university's administration at 1000 Morris Avenue, Union, New Jersey 07083. Students wishing to take admission at Kean University can also directly contact its admissions office by phone. Step 2 in the application process is acceptance by KU Once the candidate has submitted their form and deposited the enrollment fee, they are required to submit certain documents to successfully receive an admission decision. For Kean University, candidates have to submit the following documents: Transcripts: As an important part of the evaluation process, candidates have to submit necessary high-school and university transcripts through Parchment (or similar platforms). Essay: Also called as the personal statement, it offers the applicants a unique way to showcase their achievements and personality. Candidates are required to add personal experience and anecdotes to it. Official Test Scores: Candidates can make a stronger claim for securing a seat at Kean University by submitting their official SAT or ACT scores. It is not a mandatory requirement and not all students submit their scores. The university may request additional documents or information from the applicants once the reviewing process is over. Step 3: Confirm Your Attendance The third step includes confirming their attendance to KU by the candidates. The candidates will be required to complete their Financial Check-In process and select their housing, too. Review and verification of Summary of Accounts and Financial Aid (scholarships, fees, and tuition costs), choosing a lodging option, selecting a payment plan, and making the first payment are some of the processes included in the admission at the varsity. Candidates will be required to complete the Math and English assessments before proceeding to register for classes. Step 4 includes registration for classes Registration for courses will be the next step once all the processes mentioned above are completed.

SAT Scores you need to get in

SAT Range The composite score range which the Institution consistently takes, below which admission should be considered a reach.

Applicants Submitting SAT Percentage of Students submitting SAT Scores.

SAT Reading 25th

SAT Math 25th

SAT Composite 25th

SAT Reading 75th

SAT Math 75th

SAT Composite 75th

Average SAT Score

While making the admission decisions, Kean University also takes the SAT scores of the applicants into consideration. Apart from other things, the candidates need to know that SAT scores are compulsory to be considered for admission to the university. And, the average SAT score of students who get into the university is 1010. However, even those applicants with SAT scores of 920 or below could also be accepted in some cases. As many as 88% of the applicants to the university submit their SAT scores for admission consideration.

ACT Scores you need to get in

ACT Range The composite score range which the Institution consistently takes, below which admission should be considered a reach.

Applicants Submitting ACT Percentage of Students submitting ACT Scores.

ACT Math 25th

ACT English 25th

ACT Composite 25th

ACT Math 75th

ACT English 75th

ACT Composite 75th

Average ACT Score

Coming to the ACT scores that candidates need to get accepted to Kean University, the latest data suggests that one requires an average ACT score of 20. Moreover, most of the candidates accepted by Kean University have composite ACT scores in the range of 17-23. However, a minimum ACT composite score of 17 is necessary to be considered for admission by the institution. And, ACT scores are also compulsory for admission consideration. As many as 8% of the applicants submit their ACT scores for admission to the university, as per the latest information.

Estimated GPA Requirements & Average GPA

Grade Points Average (GPA) The average high school GPA of the admitted students

A high GPA is required for a candidate to be accepted at Kean University. Generally, having a good GPA helps the applicant stand out from the crowd and also improves their chances to secure admission into any college/university of their choice. The candidates must also have average high school grades with a GPA score of at least 2.50. To note, the average high school GPA of the applicants who get into KU is 3. Candidates applying to Kean University should also have a good idea about the requirements for admission, including the necessary documents, and all the information that is considered while making admission decisions. The students must also submit several important documents to the university such as the Secondary school GPA, Secondary school record, and Admission test scores (SAT/ACT). Furthermore, the submission of other documents and details, including their Completion of college-preparatory program, is also recommended for admission.

Admission Requirements

What Really Matters When Applying

High School GPA

High School Rank

Neither required nor recommended

High School Transcript

College Prep Courses

Recommended

Recommendations

Kean University has a plethora of online and on-campus programs tailor-made for students to make their dream come true, be it in any kind of field. The requirements tend to differ as per the type of applicant, that is undergraduate (freshmen, international, transfer), graduate, and online programs.

Apart from important transcripts, the students may get asked to submit ACT/SAT scores, academic placement test scores, and other documents like essays, letters of recommendation, etc.

The application fee for application form for any program is $75 (non-refundable) .

Undergraduate Admission Requirements

Applicants need to submit the following requirements below on the basis of major, types of programs.

For Freshmen students

Admission to most of the programs for freshmen is SAT-optional for students who have excellent academic performance to show through their transcripts. This condition stands irrespective of their standardized test scores. You need to include Kean's College Entrance Examination Board (CEEB) number 2517 while submitting your scores.

Following terms are needed to be looked at for eligibility:

It is mandatory to submit your official high school transcript.

You need to submit your recent score of ACT/SAT. The university will consider the highest-level score of the Evidence-Based Reading and Writing and Math sections on your SAT and the highest composite score of the ACT.

Recommended not compulsory to submit:

Two (2) letters of recommendation from either a teacher, counselor, employer, and/or coach, etc.

A personal essay stating your educational and professional career goals.

List of your high school activities and/or work experiences, indicating any leadership positions held.

SAT-optional requirements : If you have a record of a high level of academic success in school, you may apply to the university through this window. Applicants are required to take the reading, writing, and math placement tests.

Applicants with a high school GPA of 3.0 or higher and a minimum of 16 college preparatory courses get the option of not submitting their SAT scores during the enrollment process. However, the following applicants need to submit their test scores:

NJCSTM majors (honors program)

Home-school students

International students

Merit Scholarship applicants

GED applicants

SAT test-optional applicants must mandatorily submit the following:

Admission requirements for Architecture, Design, Theatre, Music, and Fine Arts programs require additional documents. For more details, please visit the official link .

International students must fulfill additional requirements along with the standard ones. The benefit of applying from the SAT-optional window is not applicable to international students. Some of the additional requirements for International students are mentioned below:

Official high school transcripts evaluated and verified by the National Association of Credit Evaluation Services (NACES).

Standardized Testing Scores from the SAT or ACT. The university will consider the highest-level score of the Evidence-Based Reading and Writing and Math sections on your SAT and the highest composite score of the ACT.

English Language Proficiency Test scores from TOEFL (minimum: Internet-based: 79; Paper-based: 550) or IELTS (minimum: 6.0)

The applicants can get an exemption from the English Proficiency test by submitting documentation as evidence of completion of secondary schooling from an institution where English is the language of instruction.

Optional: Personal Statement, as specified by the program. It should include their academic and professional goals and how and why Kean University fits into their aspirations. Also, 2-3 Letters of Recommendation, as specified by the program.

Graduate Admission Requirements

The graduate programs by Kean University embrace dozens of disciplines from Criminal Justice, MBA, Educational Administration, Fine Arts, Speech-Language Pathology, Architecture, and many more. Each program has certain program-specific requirements, for which you need to visit the official university website.

The common requirements for Graduate Admission are as follows:

Filled application form.

Bachelor's degree (or foreign equivalent) from an accredited college or university

Cumulative GPA of 3.0 or higher (candidates with under a 3.0 will be considered based on the strength of the overall application).

Official copies of transcripts of previous institutions attended (this includes transcripts that reflect transfer credit). Transcripts must include proof of all courses, grades, and degrees. You can also include summer coursework, study abroad, or transfer coursework (if any).

Professional Resume or CV

Program-specific requirements

These includes:

Personal Statement, as specified by the program. It should include their academic and professional goals and how and why Kean University fits into their aspirations.

2-3 Letters of Recommendation, as specified by the program.

Documented observational service hours.

Copy of license/certificates.

Standardized Test Scores.

For international students, test scores of TOEFL or IELTS.

A completed supplemental application providing additional information.

The following programs will require both a Centralized Application Service (CAS) application along with Kean University CAS Supplemental application :

Occupational Therapy (OTD) and Occupational Therapy (M.S)

Physical Therapy (DPT)

School and Clinical Psychology (Psy.D)

Speech-Language Pathology (M.A.)

The applicants are advised to be cautious regarding the usage of test codes while submission of test scores.

An English proficiency examination is required of all students who have a bachelor’s degree from an institution outside the U.S. in a country where English is not the principal language.

The minimum score required for TOEFL - Internet-based: 79; Paper-based: 550 and minimum test score for IELTS - 6.5.

The applicants can get an exemption from the English Proficiency test by submitting documentation as evidence of completion of a bachelor’s or master’s degree from an accredited U.S. college or university where English is the language of instruction.

Admission Deadlines

Application Deadline Deadline for application submissions. Please contact the school for more details.

Application Fee Application fees may vary by program and may be waived for certain students. Please check with the school.

Early Decision Deadline

Early Action Deadline

Offer Action Deadline

Offers Early Decision

Application Website

apply.kean.edu

Accepts Common App

Accepts Coalition App

Apart from submitting the documents and other required information, the candidates should also deposit an application fee of $75. The application deadline of the university depends on several factors. Fall applications, most popular among students, usually start from September and might go on till April. September deadline usually pertains to early decision, while the final deadline is in April. Notably, to be eligible for scholarships, students might have to apply before early deadline. This is why it is always recommended to apply early.

Acceptance Rate and Admission Statistics

Admission statistics.

Percent of Admitted Who Enrolled (Admission Yield)

Credits Accepted

Dual Credit

Credit for Life Experiences

The majority of the universities in the US incentivize students by helping them score university credit through certain courses offered in high school. There are many such programs that are at disposal for the students. Some of the famous ones include AP Credit, CLEP (Credit for Life Experiences), and Dual Enrollment (also known as Dual Credit). However, there is no common policy and instead, colleges follow their own, separate policy regarding these credits. There are multiple perks for the usage of these courses such as higher GPA, shorter time to finish a degree, and increased chances of completing a degree. To pursue AP and CLEP credits, students are instructed to visit collegeboard.org and request to submit the scores to the university of their choice.

Other Colleges In New Jersey

Hoboken, New Jersey 4 years Public

Located in Galloway, NJ, Stockton University is a public university. The university covers an area of 2,000 acres. The Stockton University is part of the Stockton University group. Stockton University was founded in the year 1969. The motto of the Stockton University is An Environment for Excell

Galloway, New Jersey 4 years Public

Online Education at Montclair State University

The university strives to provide their non-traditional students with the same quality of education that what their campus students receive. They prepare thousands of students for a lifetime of professional and personal su

Montclair, New Jersey 4 years Public

Other Colleges In Union

A private university by the name of Healthcare Training Institute is located in Union, New Jersey. Across all the courses offered at Healthcare Training Institute, the enrolment stands at 40. To teach various courses to the students, the school has employed a total of 4 faculty. At HTI, around 98

Union, New Jersey 2 years Private For-Profit

Other Public Colleges

Online education at lattc.

College provides an option for the students to take classes in a setting other than the traditional face-to-face classroom.

All course resources and class activities can be accessed online 24/7 to meet your needs while you are a

Los Angeles, California 2 years Public

Minot, North Dakota 4 years Public

Joshua Tree, California 2 years Public

The College Monk Blog

Get detailed info on online education, planning, student life, careers, degree programs, accreditation, academic support, guides and more.

How to get the highest possible ACT score: A complete guide

Wondering what could be the highest ACT Score that one could get? Learn more about the highest

Wondering what is a good SAT score? Learn about average SAT scores, good SAT scores, and the best

What is the SAT? It is standard test for college admission process and scores mostly determine your

GPA is a measurement of your academic success in high school that colleges will consider very

What is the Highest GPA? How to Calculate and Increase Yours

Confused about grade point averages? This article explains everything you need to know, including

ACT vs SAT - Which Test Is Right For You?

Confused between ACT and SAT? Here are the key difference between ACT and SAT Test to help you in

Susan Devlin

We are quite confident to write and maintain the originality of our work as it is being checked thoroughly for plagiarism. Thus, no copy-pasting is entertained by the writers and they can easily 'write an essay for me’.

Estelle Gallagher

The experts well detail out the effect relationship between the two given subjects and underline the importance of such a relationship in your writing. Our cheap essay writer service is a lot helpful in making such a write-up a brilliant one.

Deadlines can be scary while writing assignments, but with us, you are sure to feel more confident about both the quality of the draft as well as that of meeting the deadline while we write for you.

Finished Papers

Finished Papers

Dissertation Chapter - Abstract
Dissertation Chapter - Introduction Chapter
Dissertation Chapter - Literature Review
Dissertation Chapter - Methodology
Dissertation Chapter - Results
Dissertation Chapter - Discussion
Dissertation Chapter - Hypothesis
Dissertation Chapter - Conclusion Chapter

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 03 June 2024

Applying large language models for automated essay scoring for non-native Japanese

Wenchao Li 1 &
Haitao Liu 2

Humanities and Social Sciences Communications volume 11 , Article number: 723 ( 2024 ) Cite this article

12 Accesses

1 Altmetric

Metrics details

Language and linguistics

Recent advancements in artificial intelligence (AI) have led to an increased use of large language models (LLMs) for language assessment tasks such as automated essay scoring (AES), automated listening tests, and automated oral proficiency assessments. The application of LLMs for AES in the context of non-native Japanese, however, remains limited. This study explores the potential of LLM-based AES by comparing the efficiency of different models, i.e. two conventional machine training technology-based methods (Jess and JWriter), two LLMs (GPT and BERT), and one Japanese local LLM (Open-Calm large model). To conduct the evaluation, a dataset consisting of 1400 story-writing scripts authored by learners with 12 different first languages was used. Statistical analysis revealed that GPT-4 outperforms Jess and JWriter, BERT, and the Japanese language-specific trained Open-Calm large model in terms of annotation accuracy and predicting learning levels. Furthermore, by comparing 18 different models that utilize various prompts, the study emphasized the significance of prompts in achieving accurate and reliable evaluations using LLMs.

Scoring method of English composition integrating deep learning in higher vocational colleges

ChatGPT-3.5 as writing assistance in students’ essays

Detecting contract cheating through linguistic fingerprint

Conventional machine learning technology in aes.

AES has experienced significant growth with the advancement of machine learning technologies in recent decades. In the earlier stages of AES development, conventional machine learning-based approaches were commonly used. These approaches involved the following procedures: a) feeding the machine with a dataset. In this step, a dataset of essays is provided to the machine learning system. The dataset serves as the basis for training the model and establishing patterns and correlations between linguistic features and human ratings. b) the machine learning model is trained using linguistic features that best represent human ratings and can effectively discriminate learners’ writing proficiency. These features include lexical richness (Lu, 2012 ; Kyle and Crossley, 2015 ; Kyle et al. 2021 ), syntactic complexity (Lu, 2010 ; Liu, 2008 ), text cohesion (Crossley and McNamara, 2016 ), and among others. Conventional machine learning approaches in AES require human intervention, such as manual correction and annotation of essays. This human involvement was necessary to create a labeled dataset for training the model. Several AES systems have been developed using conventional machine learning technologies. These include the Intelligent Essay Assessor (Landauer et al. 2003 ), the e-rater engine by Educational Testing Service (Attali and Burstein, 2006 ; Burstein, 2003 ), MyAccess with the InterlliMetric scoring engine by Vantage Learning (Elliot, 2003 ), and the Bayesian Essay Test Scoring system (Rudner and Liang, 2002 ). These systems have played a significant role in automating the essay scoring process and providing quick and consistent feedback to learners. However, as touched upon earlier, conventional machine learning approaches rely on predetermined linguistic features and often require manual intervention, making them less flexible and potentially limiting their generalizability to different contexts.

In the context of the Japanese language, conventional machine learning-incorporated AES tools include Jess (Ishioka and Kameda, 2006 ) and JWriter (Lee and Hasebe, 2017 ). Jess assesses essays by deducting points from the perfect score, utilizing the Mainichi Daily News newspaper as a database. The evaluation criteria employed by Jess encompass various aspects, such as rhetorical elements (e.g., reading comprehension, vocabulary diversity, percentage of complex words, and percentage of passive sentences), organizational structures (e.g., forward and reverse connection structures), and content analysis (e.g., latent semantic indexing). JWriter employs linear regression analysis to assign weights to various measurement indices, such as average sentence length and total number of characters. These weights are then combined to derive the overall score. A pilot study involving the Jess model was conducted on 1320 essays at different proficiency levels, including primary, intermediate, and advanced. However, the results indicated that the Jess model failed to significantly distinguish between these essay levels. Out of the 16 measures used, four measures, namely median sentence length, median clause length, median number of phrases, and maximum number of phrases, did not show statistically significant differences between the levels. Additionally, two measures exhibited between-level differences but lacked linear progression: the number of attributives declined words and the Kanji/kana ratio. On the other hand, the remaining measures, including maximum sentence length, maximum clause length, number of attributive conjugated words, maximum number of consecutive infinitive forms, maximum number of conjunctive-particle clauses, k characteristic value, percentage of big words, and percentage of passive sentences, demonstrated statistically significant between-level differences and displayed linear progression.

Both Jess and JWriter exhibit notable limitations, including the manual selection of feature parameters and weights, which can introduce biases into the scoring process. The reliance on human annotators to label non-native language essays also introduces potential noise and variability in the scoring. Furthermore, an important concern is the possibility of system manipulation and cheating by learners who are aware of the regression equation utilized by the models (Hirao et al. 2020 ). These limitations emphasize the need for further advancements in AES systems to address these challenges.

Deep learning technology in AES

Deep learning has emerged as one of the approaches for improving the accuracy and effectiveness of AES. Deep learning-based AES methods utilize artificial neural networks that mimic the human brain’s functioning through layered algorithms and computational units. Unlike conventional machine learning, deep learning autonomously learns from the environment and past errors without human intervention. This enables deep learning models to establish nonlinear correlations, resulting in higher accuracy. Recent advancements in deep learning have led to the development of transformers, which are particularly effective in learning text representations. Noteworthy examples include bidirectional encoder representations from transformers (BERT) (Devlin et al. 2019 ) and the generative pretrained transformer (GPT) (OpenAI).

BERT is a linguistic representation model that utilizes a transformer architecture and is trained on two tasks: masked linguistic modeling and next-sentence prediction (Hirao et al. 2020 ; Vaswani et al. 2017 ). In the context of AES, BERT follows specific procedures, as illustrated in Fig. 1 : (a) the tokenized prompts and essays are taken as input; (b) special tokens, such as [CLS] and [SEP], are added to mark the beginning and separation of prompts and essays; (c) the transformer encoder processes the prompt and essay sequences, resulting in hidden layer sequences; (d) the hidden layers corresponding to the [CLS] tokens (T[CLS]) represent distributed representations of the prompts and essays; and (e) a multilayer perceptron uses these distributed representations as input to obtain the final score (Hirao et al. 2020 ).

AES system with BERT (Hirao et al. 2020 ).

The training of BERT using a substantial amount of sentence data through the Masked Language Model (MLM) allows it to capture contextual information within the hidden layers. Consequently, BERT is expected to be capable of identifying artificial essays as invalid and assigning them lower scores (Mizumoto and Eguchi, 2023 ). In the context of AES for nonnative Japanese learners, Hirao et al. ( 2020 ) combined the long short-term memory (LSTM) model proposed by Hochreiter and Schmidhuber ( 1997 ) with BERT to develop a tailored automated Essay Scoring System. The findings of their study revealed that the BERT model outperformed both the conventional machine learning approach utilizing character-type features such as “kanji” and “hiragana”, as well as the standalone LSTM model. Takeuchi et al. ( 2021 ) presented an approach to Japanese AES that eliminates the requirement for pre-scored essays by relying solely on reference texts or a model answer for the essay task. They investigated multiple similarity evaluation methods, including frequency of morphemes, idf values calculated on Wikipedia, LSI, LDA, word-embedding vectors, and document vectors produced by BERT. The experimental findings revealed that the method utilizing the frequency of morphemes with idf values exhibited the strongest correlation with human-annotated scores across different essay tasks. The utilization of BERT in AES encounters several limitations. Firstly, essays often exceed the model’s maximum length limit. Second, only score labels are available for training, which restricts access to additional information.

Mizumoto and Eguchi ( 2023 ) were pioneers in employing the GPT model for AES in non-native English writing. Their study focused on evaluating the accuracy and reliability of AES using the GPT-3 text-davinci-003 model, analyzing a dataset of 12,100 essays from the corpus of nonnative written English (TOEFL11). The findings indicated that AES utilizing the GPT-3 model exhibited a certain degree of accuracy and reliability. They suggest that GPT-3-based AES systems hold the potential to provide support for human ratings. However, applying GPT model to AES presents a unique natural language processing (NLP) task that involves considerations such as nonnative language proficiency, the influence of the learner’s first language on the output in the target language, and identifying linguistic features that best indicate writing quality in a specific language. These linguistic features may differ morphologically or syntactically from those present in the learners’ first language, as observed in (1)–(3).

我-送了-他-一本-书

Wǒ-sòngle-tā-yī běn-shū

1 sg .-give. past- him-one .cl- book

“I gave him a book.”

Agglutinative

彼-に-本-を-あげ-まし-た

Kare-ni-hon-o-age-mashi-ta

3 sg .- dat -hon- acc- give.honorification. past

Inflectional

give, give-s, gave, given, giving

Additionally, the morphological agglutination and subject-object-verb (SOV) order in Japanese, along with its idiomatic expressions, pose additional challenges for applying language models in AES tasks (4).

足-が棒-になり-ました

Ashi-ga bo-ni nar-mashita

leg- nom stick- dat become- past

“My leg became like a stick (I am extremely tired).”

The example sentence provided demonstrates the morpho-syntactic structure of Japanese and the presence of an idiomatic expression. In this sentence, the verb “なる” (naru), meaning “to become”, appears at the end of the sentence. The verb stem “なり” (nari) is attached with morphemes indicating honorification (“ます” - mashu) and tense (“た” - ta), showcasing agglutination. While the sentence can be literally translated as “my leg became like a stick”, it carries an idiomatic interpretation that implies “I am extremely tired”.

To overcome this issue, CyberAgent Inc. ( 2023 ) has developed the Open-Calm series of language models specifically designed for Japanese. Open-Calm consists of pre-trained models available in various sizes, such as Small, Medium, Large, and 7b. Figure 2 depicts the fundamental structure of the Open-Calm model. A key feature of this architecture is the incorporation of the Lora Adapter and GPT-NeoX frameworks, which can enhance its language processing capabilities.

GPT-NeoX Model Architecture (Okgetheng and Takeuchi 2024 ).

In a recent study conducted by Okgetheng and Takeuchi ( 2024 ), they assessed the efficacy of Open-Calm language models in grading Japanese essays. The research utilized a dataset of approximately 300 essays, which were annotated by native Japanese educators. The findings of the study demonstrate the considerable potential of Open-Calm language models in automated Japanese essay scoring. Specifically, among the Open-Calm family, the Open-Calm Large model (referred to as OCLL) exhibited the highest performance. However, it is important to note that, as of the current date, the Open-Calm Large model does not offer public access to its server. Consequently, users are required to independently deploy and operate the environment for OCLL. In order to utilize OCLL, users must have a PC equipped with an NVIDIA GeForce RTX 3060 (8 or 12 GB VRAM).

In summary, while the potential of LLMs in automated scoring of nonnative Japanese essays has been demonstrated in two studies—BERT-driven AES (Hirao et al. 2020 ) and OCLL-based AES (Okgetheng and Takeuchi, 2024 )—the number of research efforts in this area remains limited.

Another significant challenge in applying LLMs to AES lies in prompt engineering and ensuring its reliability and effectiveness (Brown et al. 2020 ; Rae et al. 2021 ; Zhang et al. 2021 ). Various prompting strategies have been proposed, such as the zero-shot chain of thought (CoT) approach (Kojima et al. 2022 ), which involves manually crafting diverse and effective examples. However, manual efforts can lead to mistakes. To address this, Zhang et al. ( 2021 ) introduced an automatic CoT prompting method called Auto-CoT, which demonstrates matching or superior performance compared to the CoT paradigm. Another prompt framework is trees of thoughts, enabling a model to self-evaluate its progress at intermediate stages of problem-solving through deliberate reasoning (Yao et al. 2023 ).

Beyond linguistic studies, there has been a noticeable increase in the number of foreign workers in Japan and Japanese learners worldwide (Ministry of Health, Labor, and Welfare of Japan, 2022 ; Japan Foundation, 2021 ). However, existing assessment methods, such as the Japanese Language Proficiency Test (JLPT), J-CAT, and TTBJ Footnote 1 , primarily focus on reading, listening, vocabulary, and grammar skills, neglecting the evaluation of writing proficiency. As the number of workers and language learners continues to grow, there is a rising demand for an efficient AES system that can reduce costs and time for raters and be utilized for employment, examinations, and self-study purposes.

This study aims to explore the potential of LLM-based AES by comparing the effectiveness of five models: two LLMs (GPT Footnote 2 and BERT), one Japanese local LLM (OCLL), and two conventional machine learning-based methods (linguistic feature-based scoring tools - Jess and JWriter).

The research questions addressed in this study are as follows:

To what extent do the LLM-driven AES and linguistic feature-based AES, when used as automated tools to support human rating, accurately reflect test takers’ actual performance?

What influence does the prompt have on the accuracy and performance of LLM-based AES methods?

The subsequent sections of the manuscript cover the methodology, including the assessment measures for nonnative Japanese writing proficiency, criteria for prompts, and the dataset. The evaluation section focuses on the analysis of annotations and rating scores generated by LLM-driven and linguistic feature-based AES methods.

Methodology

The dataset utilized in this study was obtained from the International Corpus of Japanese as a Second Language (I-JAS) Footnote 3 . This corpus consisted of 1000 participants who represented 12 different first languages. For the study, the participants were given a story-writing task on a personal computer. They were required to write two stories based on the 4-panel illustrations titled “Picnic” and “The key” (see Appendix A). Background information for the participants was provided by the corpus, including their Japanese language proficiency levels assessed through two online tests: J-CAT and SPOT. These tests evaluated their reading, listening, vocabulary, and grammar abilities. The learners’ proficiency levels were categorized into six levels aligned with the Common European Framework of Reference for Languages (CEFR) and the Reference Framework for Japanese Language Education (RFJLE): A1, A2, B1, B2, C1, and C2. According to Lee et al. ( 2015 ), there is a high level of agreement (r = 0.86) between the J-CAT and SPOT assessments, indicating that the proficiency certifications provided by J-CAT are consistent with those of SPOT. However, it is important to note that the scores of J-CAT and SPOT do not have a one-to-one correspondence. In this study, the J-CAT scores were used as a benchmark to differentiate learners of different proficiency levels. A total of 1400 essays were utilized, representing the beginner (aligned with A1), A2, B1, B2, C1, and C2 levels based on the J-CAT scores. Table 1 provides information about the learners’ proficiency levels and their corresponding J-CAT and SPOT scores.

A dataset comprising a total of 1400 essays from the story writing tasks was collected. Among these, 714 essays were utilized to evaluate the reliability of the LLM-based AES method, while the remaining 686 essays were designated as development data to assess the LLM-based AES’s capability to distinguish participants with varying proficiency levels. The GPT 4 API was used in this study. A detailed explanation of the prompt-assessment criteria is provided in Section Prompt . All essays were sent to the model for measurement and scoring.

Measures of writing proficiency for nonnative Japanese

Japanese exhibits a morphologically agglutinative structure where morphemes are attached to the word stem to convey grammatical functions such as tense, aspect, voice, and honorifics, e.g. (5).

食べ-させ-られ-まし-た-か

tabe-sase-rare-mashi-ta-ka

[eat (stem)-causative-passive voice-honorification-tense. past-question marker]

Japanese employs nine case particles to indicate grammatical functions: the nominative case particle が (ga), the accusative case particle を (o), the genitive case particle の (no), the dative case particle に (ni), the locative/instrumental case particle で (de), the ablative case particle から (kara), the directional case particle へ (e), and the comitative case particle と (to). The agglutinative nature of the language, combined with the case particle system, provides an efficient means of distinguishing between active and passive voice, either through morphemes or case particles, e.g. 食べる taberu “eat concusive . ” (active voice); 食べられる taberareru “eat concusive . ” (passive voice). In the active voice, “パンを食べる” (pan o taberu) translates to “to eat bread”. On the other hand, in the passive voice, it becomes “パンが食べられた” (pan ga taberareta), which means “(the) bread was eaten”. Additionally, it is important to note that different conjugations of the same lemma are considered as one type in order to ensure a comprehensive assessment of the language features. For example, e.g., 食べる taberu “eat concusive . ”; 食べている tabeteiru “eat progress .”; 食べた tabeta “eat past . ” as one type.

To incorporate these features, previous research (Suzuki, 1999 ; Watanabe et al. 1988 ; Ishioka, 2001 ; Ishioka and Kameda, 2006 ; Hirao et al. 2020 ) has identified complexity, fluency, and accuracy as crucial factors for evaluating writing quality. These criteria are assessed through various aspects, including lexical richness (lexical density, diversity, and sophistication), syntactic complexity, and cohesion (Kyle et al. 2021 ; Mizumoto and Eguchi, 2023 ; Ure, 1971 ; Halliday, 1985 ; Barkaoui and Hadidi, 2020 ; Zenker and Kyle, 2021 ; Kim et al. 2018 ; Lu, 2017 ; Ortega, 2015 ). Therefore, this study proposes five scoring categories: lexical richness, syntactic complexity, cohesion, content elaboration, and grammatical accuracy. A total of 16 measures were employed to capture these categories. The calculation process and specific details of these measures can be found in Table 2 .

T-unit, first introduced by Hunt ( 1966 ), is a measure used for evaluating speech and composition. It serves as an indicator of syntactic development and represents the shortest units into which a piece of discourse can be divided without leaving any sentence fragments. In the context of Japanese language assessment, Sakoda and Hosoi ( 2020 ) utilized T-unit as the basic unit to assess the accuracy and complexity of Japanese learners’ speaking and storytelling. The calculation of T-units in Japanese follows the following principles:

A single main clause constitutes 1 T-unit, regardless of the presence or absence of dependent clauses, e.g. (6).

ケンとマリはピクニックに行きました (main clause): 1 T-unit.

If a sentence contains a main clause along with subclauses, each subclause is considered part of the same T-unit, e.g. (7).

天気が良かったので (subclause)、ケンとマリはピクニックに行きました (main clause): 1 T-unit.

In the case of coordinate clauses, where multiple clauses are connected, each coordinated clause is counted separately. Thus, a sentence with coordinate clauses may have 2 T-units or more, e.g. (8).

ケンは地図で場所を探して (coordinate clause)、マリはサンドイッチを作りました (coordinate clause): 2 T-units.

Lexical diversity refers to the range of words used within a text (Engber, 1995 ; Kyle et al. 2021 ) and is considered a useful measure of the breadth of vocabulary in L n production (Jarvis, 2013a , 2013b ).

The type/token ratio (TTR) is widely recognized as a straightforward measure for calculating lexical diversity and has been employed in numerous studies. These studies have demonstrated a strong correlation between TTR and other methods of measuring lexical diversity (e.g., Bentz et al. 2016 ; Čech and Miroslav, 2018 ; Çöltekin and Taraka, 2018 ). TTR is computed by considering both the number of unique words (types) and the total number of words (tokens) in a given text. Given that the length of learners’ writing texts can vary, this study employs the moving average type-token ratio (MATTR) to mitigate the influence of text length. MATTR is calculated using a 50-word moving window. Initially, a TTR is determined for words 1–50 in an essay, followed by words 2–51, 3–52, and so on until the end of the essay is reached (Díez-Ortega and Kyle, 2023 ). The final MATTR scores were obtained by averaging the TTR scores for all 50-word windows. The following formula was employed to derive MATTR:

${\rm{MATTR}}({\rm{W}})=\frac{{\sum }_{{\rm{i}}=1}^{{\rm{N}}-{\rm{W}}+1}{{\rm{F}}}_{{\rm{i}}}}{{\rm{W}}({\rm{N}}-{\rm{W}}+1)}$

Here, N refers to the number of tokens in the corpus. W is the randomly selected token size (W < N). ${F}_{i}$ is the number of types in each window. The ${\rm{MATTR}}({\rm{W}})$ is the mean of a series of type-token ratios (TTRs) based on the word form for all windows. It is expected that individuals with higher language proficiency will produce texts with greater lexical diversity, as indicated by higher MATTR scores.

Lexical density was captured by the ratio of the number of lexical words to the total number of words (Lu, 2012 ). Lexical sophistication refers to the utilization of advanced vocabulary, often evaluated through word frequency indices (Crossley et al. 2013 ; Haberman, 2008 ; Kyle and Crossley, 2015 ; Laufer and Nation, 1995 ; Lu, 2012 ; Read, 2000 ). In line of writing, lexical sophistication can be interpreted as vocabulary breadth, which entails the appropriate usage of vocabulary items across various lexicon-grammatical contexts and registers (Garner et al. 2019 ; Kim et al. 2018 ; Kyle et al. 2018 ). In Japanese specifically, words are considered lexically sophisticated if they are not included in the “Japanese Education Vocabulary List Ver 1.0”. Footnote 4 Consequently, lexical sophistication was calculated by determining the number of sophisticated word types relative to the total number of words per essay. Furthermore, it has been suggested that, in Japanese writing, sentences should ideally have a length of no more than 40 to 50 characters, as this promotes readability. Therefore, the median and maximum sentence length can be considered as useful indices for assessment (Ishioka and Kameda, 2006 ).

Syntactic complexity was assessed based on several measures, including the mean length of clauses, verb phrases per T-unit, clauses per T-unit, dependent clauses per T-unit, complex nominals per clause, adverbial clauses per clause, coordinate phrases per clause, and mean dependency distance (MDD). The MDD reflects the distance between the governor and dependent positions in a sentence. A larger dependency distance indicates a higher cognitive load and greater complexity in syntactic processing (Liu, 2008 ; Liu et al. 2017 ). The MDD has been established as an efficient metric for measuring syntactic complexity (Jiang, Quyang, and Liu, 2019 ; Li and Yan, 2021 ). To calculate the MDD, the position numbers of the governor and dependent are subtracted, assuming that words in a sentence are assigned in a linear order, such as W1 … Wi … Wn. In any dependency relationship between words Wa and Wb, Wa is the governor and Wb is the dependent. The MDD of the entire sentence was obtained by taking the absolute value of governor – dependent:

MDD = $\frac{1}{n}{\sum }_{i=1}^{n}|{\rm{D}}{{\rm{D}}}_{i}|$

In this formula, $n$ represents the number of words in the sentence, and ${DD}i$ is the dependency distance of the ${i}^{{th}}$ dependency relationship of a sentence. Building on this, the annotation of sentence ‘Mary-ga-John-ni-keshigomu-o-watashita was [Mary- top -John- dat -eraser- acc -give- past] ’. The sentence’s MDD would be 2. Table 3 provides the CSV file as a prompt for GPT 4.

Cohesion (semantic similarity) and content elaboration aim to capture the ideas presented in test taker’s essays. Cohesion was assessed using three measures: Synonym overlap/paragraph (topic), Synonym overlap/paragraph (keywords), and word2vec cosine similarity. Content elaboration and development were measured as the number of metadiscourse markers (type)/number of words. To capture content closely, this study proposed a novel-distance based representation, by encoding the cosine distance between the essay (by learner) and essay task’s (topic and keyword) i -vectors. The learner’s essay is decoded into a word sequence, and aligned to the essay task’ topic and keyword for log-likelihood measurement. The cosine distance reveals the content elaboration score in the leaners’ essay. The mathematical equation of cosine similarity between target-reference vectors is shown in (11), assuming there are i essays and ( L i , …. L n ) and ( N i , …. N n ) are the vectors representing the learner and task’s topic and keyword respectively. The content elaboration distance between L i and N i was calculated as follows:

$\cos \left(\theta \right)=\frac{{\rm{L}}\,\cdot\, {\rm{N}}}{\left|{\rm{L}}\right|{\rm{|N|}}}=\frac{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}{N}_{i}}{\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}^{2}}\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{N}_{i}^{2}}}$

A high similarity value indicates a low difference between the two recognition outcomes, which in turn suggests a high level of proficiency in content elaboration.

To evaluate the effectiveness of the proposed measures in distinguishing different proficiency levels among nonnative Japanese speakers’ writing, we conducted a multi-faceted Rasch measurement analysis (Linacre, 1994 ). This approach applies measurement models to thoroughly analyze various factors that can influence test outcomes, including test takers’ proficiency, item difficulty, and rater severity, among others. The underlying principles and functionality of multi-faceted Rasch measurement are illustrated in (12).

$\log \left(\frac{{P}_{{nijk}}}{{P}_{{nij}(k-1)}}\right)={B}_{n}-{D}_{i}-{C}_{j}-{F}_{k}$

(12) defines the logarithmic transformation of the probability ratio ( P nijk /P nij(k-1) )) as a function of multiple parameters. Here, n represents the test taker, i denotes a writing proficiency measure, j corresponds to the human rater, and k represents the proficiency score. The parameter B n signifies the proficiency level of test taker n (where n ranges from 1 to N). D j represents the difficulty parameter of test item i (where i ranges from 1 to L), while C j represents the severity of rater j (where j ranges from 1 to J). Additionally, F k represents the step difficulty for a test taker to move from score ‘k-1’ to k . P nijk refers to the probability of rater j assigning score k to test taker n for test item i . P nij(k-1) represents the likelihood of test taker n being assigned score ‘k-1’ by rater j for test item i . Each facet within the test is treated as an independent parameter and estimated within the same reference framework. To evaluate the consistency of scores obtained through both human and computer analysis, we utilized the Infit mean-square statistic. This statistic is a chi-square measure divided by the degrees of freedom and is weighted with information. It demonstrates higher sensitivity to unexpected patterns in responses to items near a person’s proficiency level (Linacre, 2002 ). Fit statistics are assessed based on predefined thresholds for acceptable fit. For the Infit MNSQ, which has a mean of 1.00, different thresholds have been suggested. Some propose stricter thresholds ranging from 0.7 to 1.3 (Bond et al. 2021 ), while others suggest more lenient thresholds ranging from 0.5 to 1.5 (Eckes, 2009 ). In this study, we adopted the criterion of 0.70–1.30 for the Infit MNSQ.

Moving forward, we can now proceed to assess the effectiveness of the 16 proposed measures based on five criteria for accurately distinguishing various levels of writing proficiency among non-native Japanese speakers. To conduct this evaluation, we utilized the development dataset from the I-JAS corpus, as described in Section Dataset . Table 4 provides a measurement report that presents the performance details of the 14 metrics under consideration. The measure separation was found to be 4.02, indicating a clear differentiation among the measures. The reliability index for the measure separation was 0.891, suggesting consistency in the measurement. Similarly, the person separation reliability index was 0.802, indicating the accuracy of the assessment in distinguishing between individuals. All 16 measures demonstrated Infit mean squares within a reasonable range, ranging from 0.76 to 1.28. The Synonym overlap/paragraph (topic) measure exhibited a relatively high outfit mean square of 1.46, although the Infit mean square falls within an acceptable range. The standard error for the measures ranged from 0.13 to 0.28, indicating the precision of the estimates.

Table 5 further illustrated the weights assigned to different linguistic measures for score prediction, with higher weights indicating stronger correlations between those measures and higher scores. Specifically, the following measures exhibited higher weights compared to others: moving average type token ratio per essay has a weight of 0.0391. Mean dependency distance had a weight of 0.0388. Mean length of clause, calculated by dividing the number of words by the number of clauses, had a weight of 0.0374. Complex nominals per T-unit, calculated by dividing the number of complex nominals by the number of T-units, had a weight of 0.0379. Coordinate phrases rate, calculated by dividing the number of coordinate phrases by the number of clauses, had a weight of 0.0325. Grammatical error rate, representing the number of errors per essay, had a weight of 0.0322.

Criteria (output indicator)

The criteria used to evaluate the writing ability in this study were based on CEFR, which follows a six-point scale ranging from A1 to C2. To assess the quality of Japanese writing, the scoring criteria from Table 6 were utilized. These criteria were derived from the IELTS writing standards and served as assessment guidelines and prompts for the written output.

A prompt is a question or detailed instruction that is provided to the model to obtain a proper response. After several pilot experiments, we decided to provide the measures (Section Measures of writing proficiency for nonnative Japanese ) as the input prompt and use the criteria (Section Criteria (output indicator) ) as the output indicator. Regarding the prompt language, considering that the LLM was tasked with rating Japanese essays, would prompt in Japanese works better Footnote 5 ? We conducted experiments comparing the performance of GPT-4 using both English and Japanese prompts. Additionally, we utilized the Japanese local model OCLL with Japanese prompts. Multiple trials were conducted using the same sample. Regardless of the prompt language used, we consistently obtained the same grading results with GPT-4, which assigned a grade of B1 to the writing sample. This suggested that GPT-4 is reliable and capable of producing consistent ratings regardless of the prompt language. On the other hand, when we used Japanese prompts with the Japanese local model “OCLL”, we encountered inconsistent grading results. Out of 10 attempts with OCLL, only 6 yielded consistent grading results (B1), while the remaining 4 showed different outcomes, including A1 and B2 grades. These findings indicated that the language of the prompt was not the determining factor for reliable AES. Instead, the size of the training data and the model parameters played crucial roles in achieving consistent and reliable AES results for the language model.

The following is the utilized prompt, which details all measures and requires the LLM to score the essays using holistic and trait scores.

Please evaluate Japanese essays written by Japanese learners and assign a score to each essay on a six-point scale, ranging from A1, A2, B1, B2, C1 to C2. Additionally, please provide trait scores and display the calculation process for each trait score. The scoring should be based on the following criteria:

Moving average type-token ratio.

Number of lexical words (token) divided by the total number of words per essay.

Number of sophisticated word types divided by the total number of words per essay.

Mean length of clause.

Verb phrases per T-unit.

Clauses per T-unit.

Dependent clauses per T-unit.

Complex nominals per clause.

Adverbial clauses per clause.

Coordinate phrases per clause.

Mean dependency distance.

Synonym overlap paragraph (topic and keywords).

Word2vec cosine similarity.

Connectives per essay.

Conjunctions per essay.

Number of metadiscourse markers (types) divided by the total number of words.

Number of errors per essay.

Japanese essay text

出かける前に二人が地図を見ている間に、サンドイッチを入れたバスケットに犬が入ってしまいました。それに気づかずに二人は楽しそうに出かけて行きました。やがて突然犬がバスケットから飛び出し、二人は驚きました。バスケットの中を見ると、食べ物はすべて犬に食べられていて、二人は困ってしまいました。(ID_JJJ01_SW1)

The score of the example above was B1. Figure 3 provides an example of holistic and trait scores provided by GPT-4 (with a prompt indicating all measures) via Bing Footnote 6 .

Example of GPT-4 AES and feedback (with a prompt indicating all measures).

Statistical analysis

The aim of this study is to investigate the potential use of LLM for nonnative Japanese AES. It seeks to compare the scoring outcomes obtained from feature-based AES tools, which rely on conventional machine learning technology (i.e. Jess, JWriter), with those generated by AI-driven AES tools utilizing deep learning technology (BERT, GPT, OCLL). To assess the reliability of a computer-assisted annotation tool, the study initially established human-human agreement as the benchmark measure. Subsequently, the performance of the LLM-based method was evaluated by comparing it to human-human agreement.

To assess annotation agreement, the study employed standard measures such as precision, recall, and F-score (Brants 2000 ; Lu 2010 ), along with the quadratically weighted kappa (QWK) to evaluate the consistency and agreement in the annotation process. Assume A and B represent human annotators. When comparing the annotations of the two annotators, the following results are obtained. The evaluation of precision, recall, and F-score metrics was illustrated in equations (13) to (15).

${\rm{Recall}}(A,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,A}$

${\rm{Precision}}(A,\,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,B}$

The F-score is the harmonic mean of recall and precision:

${\rm{F}}-{\rm{score}}=\frac{2* ({\rm{Precision}}* {\rm{Recall}})}{{\rm{Precision}}+{\rm{Recall}}}$

The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either precision or recall are zero.

In accordance with Taghipour and Ng ( 2016 ), the calculation of QWK involves two steps:

Step 1: Construct a weight matrix W as follows:

${W}_{{ij}}=\frac{{(i-j)}^{2}}{{(N-1)}^{2}}$

i represents the annotation made by the tool, while j represents the annotation made by a human rater. N denotes the total number of possible annotations. Matrix O is subsequently computed, where O_( i, j ) represents the count of data annotated by the tool ( i ) and the human annotator ( j ). On the other hand, E refers to the expected count matrix, which undergoes normalization to ensure that the sum of elements in E matches the sum of elements in O.

Step 2: With matrices O and E, the QWK is obtained as follows:

K = 1- $\frac{\sum i,j{W}_{i,j}\,{O}_{i,j}}{\sum i,j{W}_{i,j}\,{E}_{i,j}}$

The value of the quadratic weighted kappa increases as the level of agreement improves. Further, to assess the accuracy of LLM scoring, the proportional reductive mean square error (PRMSE) was employed. The PRMSE approach takes into account the variability observed in human ratings to estimate the rater error, which is then subtracted from the variance of the human labels. This calculation provides an overall measure of agreement between the automated scores and true scores (Haberman et al. 2015 ; Loukina et al. 2020 ; Taghipour and Ng, 2016 ). The computation of PRMSE involves the following steps:

Step 1: Calculate the mean squared errors (MSEs) for the scoring outcomes of the computer-assisted tool (MSE tool) and the human scoring outcomes (MSE human).

Step 2: Determine the PRMSE by comparing the MSE of the computer-assisted tool (MSE tool) with the MSE from human raters (MSE human), using the following formula:

${\rm{PRMSE}}=1-\frac{({\rm{MSE}}\,{\rm{tool}})\,}{({\rm{MSE}}\,{\rm{human}})\,}=1-\,\frac{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-{\hat{{\rm{y}}}}_{{\rm{i}}})}^{2}}{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-\hat{{\rm{y}}})}^{2}}$

In the numerator, ŷi represents the scoring outcome predicted by a specific LLM-driven AES system for a given sample. The term y i − ŷ i represents the difference between this predicted outcome and the mean value of all LLM-driven AES systems’ scoring outcomes. It quantifies the deviation of the specific LLM-driven AES system’s prediction from the average prediction of all LLM-driven AES systems. In the denominator, y i − ŷ represents the difference between the scoring outcome provided by a specific human rater for a given sample and the mean value of all human raters’ scoring outcomes. It measures the discrepancy between the specific human rater’s score and the average score given by all human raters. The PRMSE is then calculated by subtracting the ratio of the MSE tool to the MSE human from 1. PRMSE falls within the range of 0 to 1, with larger values indicating reduced errors in LLM’s scoring compared to those of human raters. In other words, a higher PRMSE implies that LLM’s scoring demonstrates greater accuracy in predicting the true scores (Loukina et al. 2020 ). The interpretation of kappa values, ranging from 0 to 1, is based on the work of Landis and Koch ( 1977 ). Specifically, the following categories are assigned to different ranges of kappa values: −1 indicates complete inconsistency, 0 indicates random agreement, 0.0 ~ 0.20 indicates extremely low level of agreement (slight), 0.21 ~ 0.40 indicates moderate level of agreement (fair), 0.41 ~ 0.60 indicates medium level of agreement (moderate), 0.61 ~ 0.80 indicates high level of agreement (substantial), 0.81 ~ 1 indicates almost perfect level of agreement. All statistical analyses were executed using Python script.

Results and discussion

Annotation reliability of the llm.

This section focuses on assessing the reliability of the LLM’s annotation and scoring capabilities. To evaluate the reliability, several tests were conducted simultaneously, aiming to achieve the following objectives:

Assess the LLM’s ability to differentiate between test takers with varying levels of oral proficiency.

Determine the level of agreement between the annotations and scoring performed by the LLM and those done by human raters.

The evaluation of the results encompassed several metrics, including: precision, recall, F-Score, quadratically-weighted kappa, proportional reduction of mean squared error, Pearson correlation, and multi-faceted Rasch measurement.

Inter-annotator agreement (human–human annotator agreement)

We started with an agreement test of the two human annotators. Two trained annotators were recruited to determine the writing task data measures. A total of 714 scripts, as the test data, was utilized. Each analysis lasted 300–360 min. Inter-annotator agreement was evaluated using the standard measures of precision, recall, and F-score and QWK. Table 7 presents the inter-annotator agreement for the various indicators. As shown, the inter-annotator agreement was fairly high, with F-scores ranging from 1.0 for sentence and word number to 0.666 for grammatical errors.

The findings from the QWK analysis provided further confirmation of the inter-annotator agreement. The QWK values covered a range from 0.950 ( p = 0.000) for sentence and word number to 0.695 for synonym overlap number (keyword) and grammatical errors ( p = 0.001).

Agreement of annotation outcomes between human and LLM

To evaluate the consistency between human annotators and LLM annotators (BERT, GPT, OCLL) across the indices, the same test was conducted. The results of the inter-annotator agreement (F-score) between LLM and human annotation are provided in Appendix B-D. The F-scores ranged from 0.706 for Grammatical error # for OCLL-human to a perfect 1.000 for GPT-human, for sentences, clauses, T-units, and words. These findings were further supported by the QWK analysis, which showed agreement levels ranging from 0.807 ( p = 0.001) for metadiscourse markers for OCLL-human to 0.962 for words ( p = 0.000) for GPT-human. The findings demonstrated that the LLM annotation achieved a significant level of accuracy in identifying measurement units and counts.

Reliability of LLM-driven AES’s scoring and discriminating proficiency levels

This section examines the reliability of the LLM-driven AES scoring through a comparison of the scoring outcomes produced by human raters and the LLM ( Reliability of LLM-driven AES scoring ). It also assesses the effectiveness of the LLM-based AES system in differentiating participants with varying proficiency levels ( Reliability of LLM-driven AES discriminating proficiency levels ).

Reliability of LLM-driven AES scoring

Table 8 summarizes the QWK coefficient analysis between the scores computed by the human raters and the GPT-4 for the individual essays from I-JAS Footnote 7 . As shown, the QWK of all measures ranged from k = 0.819 for lexical density (number of lexical words (tokens)/number of words per essay) to k = 0.644 for word2vec cosine similarity. Table 9 further presents the Pearson correlations between the 16 writing proficiency measures scored by human raters and GPT 4 for the individual essays. The correlations ranged from 0.672 for syntactic complexity to 0.734 for grammatical accuracy. The correlations between the writing proficiency scores assigned by human raters and the BERT-based AES system were found to range from 0.661 for syntactic complexity to 0.713 for grammatical accuracy. The correlations between the writing proficiency scores given by human raters and the OCLL-based AES system ranged from 0.654 for cohesion to 0.721 for grammatical accuracy. These findings indicated an alignment between the assessments made by human raters and both the BERT-based and OCLL-based AES systems in terms of various aspects of writing proficiency.

Reliability of LLM-driven AES discriminating proficiency levels

After validating the reliability of the LLM’s annotation and scoring, the subsequent objective was to evaluate its ability to distinguish between various proficiency levels. For this analysis, a dataset of 686 individual essays was utilized. Table 10 presents a sample of the results, summarizing the means, standard deviations, and the outcomes of the one-way ANOVAs based on the measures assessed by the GPT-4 model. A post hoc multiple comparison test, specifically the Bonferroni test, was conducted to identify any potential differences between pairs of levels.

As the results reveal, seven measures presented linear upward or downward progress across the three proficiency levels. These were marked in bold in Table 10 and comprise one measure of lexical richness, i.e. MATTR (lexical diversity); four measures of syntactic complexity, i.e. MDD (mean dependency distance), MLC (mean length of clause), CNT (complex nominals per T-unit), CPC (coordinate phrases rate); one cohesion measure, i.e. word2vec cosine similarity and GER (grammatical error rate). Regarding the ability of the sixteen measures to distinguish adjacent proficiency levels, the Bonferroni tests indicated that statistically significant differences exist between the primary level and the intermediate level for MLC and GER. One measure of lexical richness, namely LD, along with three measures of syntactic complexity (VPT, CT, DCT, ACC), two measures of cohesion (SOPT, SOPK), and one measure of content elaboration (IMM), exhibited statistically significant differences between proficiency levels. However, these differences did not demonstrate a linear progression between adjacent proficiency levels. No significant difference was observed in lexical sophistication between proficiency levels.

To summarize, our study aimed to evaluate the reliability and differentiation capabilities of the LLM-driven AES method. For the first objective, we assessed the LLM’s ability to differentiate between test takers with varying levels of oral proficiency using precision, recall, F-Score, and quadratically-weighted kappa. Regarding the second objective, we compared the scoring outcomes generated by human raters and the LLM to determine the level of agreement. We employed quadratically-weighted kappa and Pearson correlations to compare the 16 writing proficiency measures for the individual essays. The results confirmed the feasibility of using the LLM for annotation and scoring in AES for nonnative Japanese. As a result, Research Question 1 has been addressed.

Comparison of BERT-, GPT-, OCLL-based AES, and linguistic-feature-based computation methods

This section aims to compare the effectiveness of five AES methods for nonnative Japanese writing, i.e. LLM-driven approaches utilizing BERT, GPT, and OCLL, linguistic feature-based approaches using Jess and JWriter. The comparison was conducted by comparing the ratings obtained from each approach with human ratings. All ratings were derived from the dataset introduced in Dataset . To facilitate the comparison, the agreement between the automated methods and human ratings was assessed using QWK and PRMSE. The performance of each approach was summarized in Table 11 .

The QWK coefficient values indicate that LLMs (GPT, BERT, OCLL) and human rating outcomes demonstrated higher agreement compared to feature-based AES methods (Jess and JWriter) in assessing writing proficiency criteria, including lexical richness, syntactic complexity, content, and grammatical accuracy. Among the LLMs, the GPT-4 driven AES and human rating outcomes showed the highest agreement in all criteria, except for syntactic complexity. The PRMSE values suggest that the GPT-based method outperformed linguistic feature-based methods and other LLM-based approaches. Moreover, an interesting finding emerged during the study: the agreement coefficient between GPT-4 and human scoring was even higher than the agreement between different human raters themselves. This discovery highlights the advantage of GPT-based AES over human rating. Ratings involve a series of processes, including reading the learners’ writing, evaluating the content and language, and assigning scores. Within this chain of processes, various biases can be introduced, stemming from factors such as rater biases, test design, and rating scales. These biases can impact the consistency and objectivity of human ratings. GPT-based AES may benefit from its ability to apply consistent and objective evaluation criteria. By prompting the GPT model with detailed writing scoring rubrics and linguistic features, potential biases in human ratings can be mitigated. The model follows a predefined set of guidelines and does not possess the same subjective biases that human raters may exhibit. This standardization in the evaluation process contributes to the higher agreement observed between GPT-4 and human scoring. Section Prompt strategy of the study delves further into the role of prompts in the application of LLMs to AES. It explores how the choice and implementation of prompts can impact the performance and reliability of LLM-based AES methods. Furthermore, it is important to acknowledge the strengths of the local model, i.e. the Japanese local model OCLL, which excels in processing certain idiomatic expressions. Nevertheless, our analysis indicated that GPT-4 surpasses local models in AES. This superior performance can be attributed to the larger parameter size of GPT-4, estimated to be between 500 billion and 1 trillion, which exceeds the sizes of both BERT and the local model OCLL.

Prompt strategy

In the context of prompt strategy, Mizumoto and Eguchi ( 2023 ) conducted a study where they applied the GPT-3 model to automatically score English essays in the TOEFL test. They found that the accuracy of the GPT model alone was moderate to fair. However, when they incorporated linguistic measures such as cohesion, syntactic complexity, and lexical features alongside the GPT model, the accuracy significantly improved. This highlights the importance of prompt engineering and providing the model with specific instructions to enhance its performance. In this study, a similar approach was taken to optimize the performance of LLMs. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. Model 1 was used as the baseline, representing GPT-4 without any additional prompting. Model 2, on the other hand, involved GPT-4 prompted with 16 measures that included scoring criteria, efficient linguistic features for writing assessment, and detailed measurement units and calculation formulas. The remaining models (Models 3 to 18) utilized GPT-4 prompted with individual measures. The performance of these 18 different models was assessed using the output indicators described in Section Criteria (output indicator) . By comparing the performances of these models, the study aimed to understand the impact of prompt engineering on the accuracy and effectiveness of GPT-4 in AES tasks.

Based on the PRMSE scores presented in Fig. 4 , it was observed that Model 1, representing GPT-4 without any additional prompting, achieved a fair level of performance. However, Model 2, which utilized GPT-4 prompted with all measures, outperformed all other models in terms of PRMSE score, achieving a score of 0.681. These results indicate that the inclusion of specific measures and prompts significantly enhanced the performance of GPT-4 in AES. Among the measures, syntactic complexity was found to play a particularly significant role in improving the accuracy of GPT-4 in assessing writing quality. Following that, lexical diversity emerged as another important factor contributing to the model’s effectiveness. The study suggests that a well-prompted GPT-4 can serve as a valuable tool to support human assessors in evaluating writing quality. By utilizing GPT-4 as an automated scoring tool, the evaluation biases associated with human raters can be minimized. This has the potential to empower teachers by allowing them to focus on designing writing tasks and guiding writing strategies, while leveraging the capabilities of GPT-4 for efficient and reliable scoring.

PRMSE scores of the 18 AES models.

This study aimed to investigate two main research questions: the feasibility of utilizing LLMs for AES and the impact of prompt engineering on the application of LLMs in AES.

To address the first objective, the study compared the effectiveness of five different models: GPT, BERT, the Japanese local LLM (OCLL), and two conventional machine learning-based AES tools (Jess and JWriter). The PRMSE values indicated that the GPT-4-based method outperformed other LLMs (BERT, OCLL) and linguistic feature-based computational methods (Jess and JWriter) across various writing proficiency criteria. Furthermore, the agreement coefficient between GPT-4 and human scoring surpassed the agreement among human raters themselves, highlighting the potential of using the GPT-4 tool to enhance AES by reducing biases and subjectivity, saving time, labor, and cost, and providing valuable feedback for self-study. Regarding the second goal, the role of prompt design was investigated by comparing 18 models, including a baseline model, a model prompted with all measures, and 16 models prompted with one measure at a time. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. The PRMSE scores of the models showed that GPT-4 prompted with all measures achieved the best performance, surpassing the baseline and other models.

In conclusion, this study has demonstrated the potential of LLMs in supporting human rating in assessments. By incorporating automation, we can save time and resources while reducing biases and subjectivity inherent in human rating processes. Automated language assessments offer the advantage of accessibility, providing equal opportunities and economic feasibility for individuals who lack access to traditional assessment centers or necessary resources. LLM-based language assessments provide valuable feedback and support to learners, aiding in the enhancement of their language proficiency and the achievement of their goals. This personalized feedback can cater to individual learner needs, facilitating a more tailored and effective language-learning experience.

There are three important areas that merit further exploration. First, prompt engineering requires attention to ensure optimal performance of LLM-based AES across different language types. This study revealed that GPT-4, when prompted with all measures, outperformed models prompted with fewer measures. Therefore, investigating and refining prompt strategies can enhance the effectiveness of LLMs in automated language assessments. Second, it is crucial to explore the application of LLMs in second-language assessment and learning for oral proficiency, as well as their potential in under-resourced languages. Recent advancements in self-supervised machine learning techniques have significantly improved automatic speech recognition (ASR) systems, opening up new possibilities for creating reliable ASR systems, particularly for under-resourced languages with limited data. However, challenges persist in the field of ASR. First, ASR assumes correct word pronunciation for automatic pronunciation evaluation, which proves challenging for learners in the early stages of language acquisition due to diverse accents influenced by their native languages. Accurately segmenting short words becomes problematic in such cases. Second, developing precise audio-text transcriptions for languages with non-native accented speech poses a formidable task. Last, assessing oral proficiency levels involves capturing various linguistic features, including fluency, pronunciation, accuracy, and complexity, which are not easily captured by current NLP technology.

Data availability

The dataset utilized was obtained from the International Corpus of Japanese as a Second Language (I-JAS). The data URLs: [ https://www2.ninjal.ac.jp/jll/lsaj/ihome2.html ].

J-CAT and TTBJ are two computerized adaptive tests used to assess Japanese language proficiency.

SPOT is a specific component of the TTBJ test.

J-CAT: https://www.j-cat2.org/html/ja/pages/interpret.html

SPOT: https://ttbj.cegloc.tsukuba.ac.jp/p1.html#SPOT .

The study utilized a prompt-based GPT-4 model, developed by OpenAI, which has an impressive architecture with 1.8 trillion parameters across 120 layers. GPT-4 was trained on a vast dataset of 13 trillion tokens, using two stages: initial training on internet text datasets to predict the next token, and subsequent fine-tuning through reinforcement learning from human feedback.

https://www2.ninjal.ac.jp/jll/lsaj/ihome2-en.html .

http://jhlee.sakura.ne.jp/JEV/ by Japanese Learning Dictionary Support Group 2015.

We express our sincere gratitude to the reviewer for bringing this matter to our attention.

On February 7, 2023, Microsoft began rolling out a major overhaul to Bing that included a new chatbot feature based on OpenAI’s GPT-4 (Bing.com).

Appendix E-F present the analysis results of the QWK coefficient between the scores computed by the human raters and the BERT, OCLL models.

Attali Y, Burstein J (2006) Automated essay scoring with e-rater® V.2. J. Technol., Learn. Assess., 4

Barkaoui K, Hadidi A (2020) Assessing Change in English Second Language Writing Performance (1st ed.). Routledge, New York. https://doi.org/10.4324/9781003092346

Bentz C, Tatyana R, Koplenig A, Tanja S (2016) A comparison between morphological complexity. measures: Typological data vs. language corpora. In Proceedings of the workshop on computational linguistics for linguistic complexity (CL4LC), 142–153. Osaka, Japan: The COLING 2016 Organizing Committee

Bond TG, Yan Z, Heene M (2021) Applying the Rasch model: Fundamental measurement in the human sciences (4th ed). Routledge

Brants T (2000) Inter-annotator agreement for a German newspaper corpus. Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00), Athens, Greece, 31 May-2 June, European Language Resources Association

Brown TB, Mann B, Ryder N, et al. (2020) Language models are few-shot learners. Advances in Neural Information Processing Systems, Online, 6–12 December, Curran Associates, Inc., Red Hook, NY

Burstein J (2003) The E-rater scoring engine: Automated essay scoring with natural language processing. In Shermis MD and Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Čech R, Miroslav K (2018) Morphological richness of text. In Masako F, Václav C (ed) Taming the corpus: From inflection and lexis to interpretation, 63–77. Cham, Switzerland: Springer Nature

Çöltekin Ç, Taraka, R (2018) Exploiting Universal Dependencies treebanks for measuring morphosyntactic complexity. In Aleksandrs B, Christian B (ed), Proceedings of first workshop on measuring language complexity, 1–7. Torun, Poland

Crossley SA, Cobb T, McNamara DS (2013) Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications. System 41:965–981. https://doi.org/10.1016/j.system.2013.08.002

Article Google Scholar

Crossley SA, McNamara DS (2016) Say more and be more coherent: How text elaboration and cohesion can increase writing quality. J. Writ. Res. 7:351–370

CyberAgent Inc (2023) Open-Calm series of Japanese language models. Retrieved from: https://www.cyberagent.co.jp/news/detail/id=28817

Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, Minnesota, 2–7 June, pp. 4171–4186. Association for Computational Linguistics

Diez-Ortega M, Kyle K (2023) Measuring the development of lexical richness of L2 Spanish: a longitudinal learner corpus study. Studies in Second Language Acquisition 1-31

Eckes T (2009) On common ground? How raters perceive scoring criteria in oral proficiency testing. In Brown A, Hill K (ed) Language testing and evaluation 13: Tasks and criteria in performance assessment (pp. 43–73). Peter Lang Publishing

Elliot S (2003) IntelliMetric: from here to validity. In: Shermis MD, Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Google Scholar

Engber CA (1995) The relationship of lexical proficiency to the quality of ESL compositions. J. Second Lang. Writ. 4:139–155

Garner J, Crossley SA, Kyle K (2019) N-gram measures and L2 writing proficiency. System 80:176–187. https://doi.org/10.1016/j.system.2018.12.001

Haberman SJ (2008) When can subscores have value? J. Educat. Behav. Stat., 33:204–229

Haberman SJ, Yao L, Sinharay S (2015) Prediction of true test scores from observed item scores and ancillary data. Brit. J. Math. Stat. Psychol. 68:363–385

Halliday MAK (1985) Spoken and Written Language. Deakin University Press, Melbourne, Australia

Hirao R, Arai M, Shimanaka H et al. (2020) Automated essay scoring system for nonnative Japanese learners. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 1250–1257. European Language Resources Association

Hunt KW (1966) Recent Measures in Syntactic Development. Elementary English, 43(7), 732–739. http://www.jstor.org/stable/41386067

Ishioka T (2001) About e-rater, a computer-based automatic scoring system for essays [Konpyūta ni yoru essei no jidō saiten shisutemu e − rater ni tsuite]. University Entrance Examination. Forum [Daigaku nyūshi fōramu] 24:71–76

Hochreiter S, Schmidhuber J (1997) Long short- term memory. Neural Comput. 9(8):1735–1780

Article CAS PubMed Google Scholar

Ishioka T, Kameda M (2006) Automated Japanese essay scoring system based on articles written by experts. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 17–18 July 2006, pp. 233-240. Association for Computational Linguistics, USA

Japan Foundation (2021) Retrieved from: https://www.jpf.gp.jp/j/project/japanese/survey/result/dl/survey2021/all.pdf

Jarvis S (2013a) Defining and measuring lexical diversity. In Jarvis S, Daller M (ed) Vocabulary knowledge: Human ratings and automated measures (Vol. 47, pp. 13–44). John Benjamins. https://doi.org/10.1075/sibil.47.03ch1

Jarvis S (2013b) Capturing the diversity in lexical diversity. Lang. Learn. 63:87–106. https://doi.org/10.1111/j.1467-9922.2012.00739.x

Jiang J, Quyang J, Liu H (2019) Interlanguage: A perspective of quantitative linguistic typology. Lang. Sci. 74:85–97

Kim M, Crossley SA, Kyle K (2018) Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. Mod. Lang. J. 102(1):120–141. https://doi.org/10.1111/modl.12447

Kojima T, Gu S, Reid M et al. (2022) Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, New Orleans, LA, 29 November-1 December, Curran Associates, Inc., Red Hook, NY

Kyle K, Crossley SA (2015) Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Q 49:757–786

Kyle K, Crossley SA, Berger CM (2018) The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behav. Res. Methods 50:1030–1046. https://doi.org/10.3758/s13428-017-0924-4

Article PubMed Google Scholar

Kyle K, Crossley SA, Jarvis S (2021) Assessing the validity of lexical diversity using direct judgements. Lang. Assess. Q. 18:154–170. https://doi.org/10.1080/15434303.2020.1844205

Landauer TK, Laham D, Foltz PW (2003) Automated essay scoring and annotation of essays with the Intelligent Essay Assessor. In Shermis MD, Burstein JC (ed), Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 159–174

Laufer B, Nation P (1995) Vocabulary size and use: Lexical richness in L2 written production. Appl. Linguist. 16:307–322. https://doi.org/10.1093/applin/16.3.307

Lee J, Hasebe Y (2017) jWriter Learner Text Evaluator, URL: https://jreadability.net/jwriter/

Lee J, Kobayashi N, Sakai T, Sakota K (2015) A Comparison of SPOT and J-CAT Based on Test Analysis [Tesuto bunseki ni motozuku ‘SPOT’ to ‘J-CAT’ no hikaku]. Research on the Acquisition of Second Language Japanese [Dainigengo to shite no nihongo no shūtoku kenkyū] (18) 53–69

Li W, Yan J (2021) Probability distribution of dependency distance based on a Treebank of. Japanese EFL Learners’ Interlanguage. J. Quant. Linguist. 28(2):172–186. https://doi.org/10.1080/09296174.2020.1754611

Article MathSciNet Google Scholar

Linacre JM (2002) Optimizing rating scale category effectiveness. J. Appl. Meas. 3(1):85–106

PubMed Google Scholar

Linacre JM (1994) Constructing measurement with a Many-Facet Rasch Model. In Wilson M (ed) Objective measurement: Theory into practice, Volume 2 (pp. 129–144). Norwood, NJ: Ablex

Liu H (2008) Dependency distance as a metric of language comprehension difficulty. J. Cognitive Sci. 9:159–191

Liu H, Xu C, Liang J (2017) Dependency distance: A new perspective on syntactic patterns in natural languages. Phys. Life Rev. 21. https://doi.org/10.1016/j.plrev.2017.03.002

Loukina A, Madnani N, Cahill A, et al. (2020) Using PRMSE to evaluate automated scoring systems in the presence of label noise. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, Seattle, WA, USA → Online, 10 July, pp. 18–29. Association for Computational Linguistics

Lu X (2010) Automatic analysis of syntactic complexity in second language writing. Int. J. Corpus Linguist. 15:474–496

Lu X (2012) The relationship of lexical richness to the quality of ESL learners’ oral narratives. Mod. Lang. J. 96:190–208

Lu X (2017) Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Lang. Test. 34:493–511

Lu X, Hu R (2022) Sense-aware lexical sophistication indices and their relationship to second language writing quality. Behav. Res. Method. 54:1444–1460. https://doi.org/10.3758/s13428-021-01675-6

Ministry of Health, Labor, and Welfare of Japan (2022) Retrieved from: https://www.mhlw.go.jp/stf/newpage_30367.html

Mizumoto A, Eguchi M (2023) Exploring the potential of using an AI language model for automated essay scoring. Res. Methods Appl. Linguist. 3:100050

Okgetheng B, Takeuchi K (2024) Estimating Japanese Essay Grading Scores with Large Language Models. Proceedings of 30th Annual Conference of the Language Processing Society in Japan, March 2024

Ortega L (2015) Second language learning explained? SLA across 10 contemporary theories. In VanPatten B, Williams J (ed) Theories in Second Language Acquisition: An Introduction

Rae JW, Borgeaud S, Cai T, et al. (2021) Scaling Language Models: Methods, Analysis & Insights from Training Gopher. ArXiv, abs/2112.11446

Read J (2000) Assessing vocabulary. Cambridge University Press. https://doi.org/10.1017/CBO9780511732942

Rudner LM, Liang T (2002) Automated Essay Scoring Using Bayes’ Theorem. J. Technol., Learning and Assessment, 1 (2)

Sakoda K, Hosoi Y (2020) Accuracy and complexity of Japanese Language usage by SLA learners in different learning environments based on the analysis of I-JAS, a learners’ corpus of Japanese as L2. Math. Linguist. 32(7):403–418. https://doi.org/10.24701/mathling.32.7_403

Suzuki N (1999) Summary of survey results regarding comprehensive essay questions. Final report of “Joint Research on Comprehensive Examinations for the Aim of Evaluating Applicability to Each Specialized Field of Universities” for 1996-2000 [shōronbun sōgō mondai ni kansuru chōsa kekka no gaiyō. Heisei 8 - Heisei 12-nendo daigaku no kaku senmon bun’ya e no tekisei no hyōka o mokuteki to suru sōgō shiken no arikata ni kansuru kyōdō kenkyū’ saishū hōkoku-sho]. University Entrance Examination Section Center Research and Development Department [Daigaku nyūshi sentā kenkyū kaihatsubu], 21–32

Taghipour K, Ng HT (2016) A neural approach to automated essay scoring. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 1–5 November, pp. 1882–1891. Association for Computational Linguistics

Takeuchi K, Ohno M, Motojin K, Taguchi M, Inada Y, Iizuka M, Abo T, Ueda H (2021) Development of essay scoring methods based on reference texts with construction of research-available Japanese essay data. In IPSJ J 62(9):1586–1604

Ure J (1971) Lexical density: A computational technique and some findings. In Coultard M (ed) Talking about Text. English Language Research, University of Birmingham, Birmingham, England

Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. In Advances in Neural Information Processing Systems, Long Beach, CA, 4–7 December, pp. 5998–6008, Curran Associates, Inc., Red Hook, NY

Watanabe H, Taira Y, Inoue Y (1988) Analysis of essay evaluation data [Shōronbun hyōka dēta no kaiseki]. Bulletin of the Faculty of Education, University of Tokyo [Tōkyōdaigaku kyōiku gakubu kiyō], Vol. 28, 143–164

Yao S, Yu D, Zhao J, et al. (2023) Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36

Zenker F, Kyle K (2021) Investigating minimum text lengths for lexical diversity indices. Assess. Writ. 47:100505. https://doi.org/10.1016/j.asw.2020.100505

Zhang Y, Warstadt A, Li X, et al. (2021) When do you need billions of words of pretraining data? Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, pp. 1112-1125. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.90

Download references

This research was funded by National Foundation of Social Sciences (22BYY186) to Wenchao Li.

Author information

Authors and affiliations.

Department of Japanese Studies, Zhejiang University, Hangzhou, China

Department of Linguistics and Applied Linguistics, Zhejiang University, Hangzhou, China

You can also search for this author in PubMed Google Scholar

Contributions

Wenchao Li is in charge of conceptualization, validation, formal analysis, investigation, data curation, visualization and writing the draft. Haitao Liu is in charge of supervision.

Corresponding author

Correspondence to Wenchao Li .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

Ethical approval was not required as the study did not involve human participants.

Informed consent

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material file #1, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Li, W., Liu, H. Applying large language models for automated essay scoring for non-native Japanese. Humanit Soc Sci Commun 11 , 723 (2024). https://doi.org/10.1057/s41599-024-03209-9

Download citation

Received : 02 February 2024

Accepted : 16 May 2024

Published : 03 June 2024

DOI : https://doi.org/10.1057/s41599-024-03209-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

Explore articles by subject
Guide to authors
Editorial policies

COMMENTS

Freshman Application Instructions
Application Instructions. STEP 1: Complete a Kean University Online Application. Kean University is also a member institution of The Common Application. STEP 2: Submit Payment. You will have an opportunity to pay the application fee of $75 directly through the online application. Once you've completed the application, submit both the ...
Apply Now
Visit the Kean University Online Application Portal. Admissions. Contact Us Office of Admissions Kean Hall - North Wing. Telephone (908) 737-7100. Fax (908) 737-7105. Email.
Kean University's 2023-24 Essay Prompts
Choose the option that best helps you answer that question and write an essay of no more than 650 words, using the prompt to inspire and structure your response. Remember: 650 words is your limit, not your goal. Use the full range if you need it, but don't feel obligated to do so.
How to Submit Application Items
Kean University Office of Admissions, Kean Hall 1000 Morris Ave Union, NJ 07083 Email: [email protected] Fax: (908) 737-7105. Essay/Personal Statement: Upload to your Kean Application/Common Application. Mail/Fax/E-mail a .doc or .pdf copy of your essay to the Kean Office of Admissions. Mail to: Kean University Office of Admissions, Kean ...
Submit Application Items
SAT/ACT Scores. - Select Kean University when you register for your exam, or on test day. - Submit a request online for your SAT / ACT scores to be sent to Kean after you test. - Kean's School Codes: SAT: 2517; ACT: 2582. - Request that your high school email/fax/mail your scores to Kean University. Essay/Personal Statement.
Admissions and Aid
Kean Online Admissions. Online Experience. Kean Online. Financial Aid and Scholarships. Kean is New Jersey's most affordable university. Find out how financial aid, scholarships and tuition grant programs can help make your world-class degree a reality. ... Kean's main campus is located in Union, New Jersey, just 30 minutes from New York City ...
Kean University
Kean University - Cougar App Login. Welcome to the Kean University CougarApp where you can: Create and submit a new admissions application. Check on the status of an existing application. Sign-Up for in-person or virtual events. Upload admissions documents. Pay Application and Deposit Fees.
Graduate Application Instructions
Graduate Record Examination (GRE) - Institution Code 2517. For applicants using a CAS application (Physical Therapy, PsyD, and Speech-Language Pathology (M.A.) only), please send your scores using the following codes: Physician Assistant Studies (PA) - CASPA Institutional Code 3150. Physical Therapy (DPT) - PTCAS GRE Institutional Code 7444.
Kean University
Freshman Admission: General Statement. Kean offers a smarter approach to education that focuses on giving you the skills and real-world experience to launch the career of your dreams. Applicants for admission to Kean University are considered in terms of prior achievement and future promise. The Office of Admissions uses alternate indicators to ...
Transfer Application Instructions
A submitted online application. Official transcripts as referenced above from every college and post-secondary school attended must be sent electronically to the attention of the Office of Transfer Admissions at [email protected]. If electronic copies of transcripts are unavailable, official (sealed envelope) transcripts can be sent by mail ...
Apply to Kean University
Founded in 1855, Kean University was the first public post-secondary institution in New Jersey. With a rich history of excellence and innovation in higher education, Kean is a world-class, vibrant and diverse university. We offer more than 50 undergraduate majors and more than 70 options for graduate study, including six doctoral programs in a variety of disciplines and several degree programs ...
Kean University Admission Requirements
Find out admissions requirements for Kean University, including GPA requirements and SAT, ACT, and application requirements. Call Direct: 1 (866) 811-5546 Sign In Start Free Trial. ... Kean University considers the SAT Essay/ACT Writing section optional and may not include it as part of their admissions consideration. You don't need to worry ...
Kean University Admission?
This can also help you tailor your application essays to reflect your genuine interest in the university. 3. Strong academic record: Like any other college, Kean University values a strong academic record in the admission process. Focus on maintaining a good GPA and take rigorous courses to demonstrate your readiness for college-level work. 4.
Kean University: Direct admissions details
If a college has any exceptions in their direct admissions offer, we will list them in this FAQ. To accept your Common App Direct Admissions offer, all you need to do is submit your application to Kean University for free. Add college to my list Direct Admissions requirements Personal essay. Optional. Teacher evaluations. 0 Required, 10 Optional
Kean University Admission Requirements
What are your chances of getting into Kean University? Learn the admissions requirements, including test scores and GPA, and calculate your chances. Ivy Chancing Schools. expand_more. Explore Colleges Rankings. Resources. ... Admissions; Essay prompt; Acceptance Rate. 79%. SAT. Average SAT . 1020. SAT 25th-75th. 920 - 1120. Students Submitting ...
How to Apply to Kean University
How to apply to Kean University. View admissions requirements and deadlines and start your application. Skip to Main Content. ... $2,000 No Essay Scholarship; Admissions Calculator; Mega Menu. Write a review. K-12 SCHOOLS. Review Your School; Schools Near You; Compare Schools; K-12 Quiz; School Search;
301 Moved Permanently
Wenzhou-Kean University now offers 18 undergraduate programs and Kean University offers more than 50 undergraduate programs and more than 60 graduate options for study.
Kean Acceptance Rate and SAT/ACT Scores
Kean's acceptance rate places 16th out of 52 New Jersey colleges with competitive admission. The median SAT score is 1,020 and the ACT score is 18 at Kean. Kean's SAT score ranks 21st out of 22 New Jersey colleges that consider SAT score for admission. To apply to Kean, personal statement (or essay) is required to submit and recommendations is ...
Kean University Admission Requirements 2022
Kean University Admissions. It has an acceptance rate of around 69%. This figure translates into the fact that out of 100 applicants willing to take admission at the school, 69 are admitted. Notably, the SAT scores of the students lie in the range of 920-1100. For the perusal of the applicants, here are some interesting figures; the ACT scores ...
5 Strategies To Unlock Your Winning College Essay
The best essays have clear, coherent language and are free of errors. The story is clearly and specifically told. After drafting, take the time to revise and polish your writing. Seek feedback ...
Kean University: Campus, Courses, Admissions, Fees, Scholarships and
Application Process for Kean University. ... Essay and/or SOP for US; Letters of recommendation to study in US; Resume of school and curricular activities. *SAT/ACT test scores are optional for applicants who have a 3.0 GPA or above and have completed all the 16 college preparatory courses. Students taking admission through SAT test-optional ...
Kean Application Essay
407. Customer Reviews. KONTAK KAMI. Kean Application Essay, Free Resume Templates Functional, Example Of Essay About Comparison And Contrast, Cover Letter With Attachments Format, Sample Essay Achievement Life, Book Analysis Essay Title, How To Write A Compare And Contrast Essay On Music. Receive your essay and breathe easy, because now you don ...
Kean Application Essay
Kean Application Essay, Best School Essay Editor For Hire Uk, Example Personal Statement For Chemical Engineering, Follow Up Telephone Call After Resume, Research Paper On Lgbt In Indian Cinema, What Is A Frq Essay, Cleanliness Importance Essay In Hindi Max Price
Kean Application Essay
Essay writing services are legal if the company has passed a number of necessary checks and is licensed. This area is well developed and regularly monitored by serious services. ... Kean Application Essay, Purchasing A Research Paper Online, How To Write A Business Proposal Pdf, Best University Essay Ghostwriters Service For College, How To ...
Applying large language models for automated essay scoring for non
Recent advancements in artificial intelligence (AI) have led to an increased use of large language models (LLMs) for language assessment tasks such as automated essay scoring (AES), automated ...