Using Student Evaluations to Assess Teaching Effectiveness

The purpose of this guide is to provide both faculty and administrators with information concerning the utility and biases associated with Student Evaluations Teaching (SETs).

Use SET Ratings to:

Estimate overall teaching effectiveness . Provide general feedback that instructors can use to make positive changes in their teaching practice.

Demonstrate that an instructor uses teaching practices positively associated with student achievement. Pay special attention to items that correspond to the following ratings categories:

Preparation and organization of the course
Clarity and ability to be understood
Pursuance and/or accomplishment of course objectives
Motivation of students to do their best and requiring high standards of performance
Stimulation of student interest in the course or subject matter
Encouragement of questions/discussion and openness to the opinions of others
Presentation skills
Knowledge of subject matter

Study potential biases at the department and campus level, as well as validity and reliability of the instruments themselves, so that the instruments can be improved over time.

Avoid Using SET Ratings to:

Infer relationships between teaching effectiveness and student learning . There is little to no evidence that SET scores are aligned with how much students learn or how successful students will be in future courses.

Demonstrate incremental improvements in teaching. SET ratings are relatively stable from semester to semester, and only the faculty members who start out with comparatively low scores are likely to show significant improvement in their scores over time.

Compare faculty members and courses. A growing body of literature suggests that SET ratings are affected by a multitude of factors, including ones outside the control of faculty. These may or may not be salient factors within your department, but it is nonetheless worthwhile to be aware of these potential biases and trends when reviewing SET scores:

Student perceptions of faculty personality (via verbal and non-verbal behaviors) confound students’ ratings of instruction, such that students may be rating likability over teaching quality.
Upper-division students value different aspects of teaching than do their lower-division counterparts, even within the same discipline, so they may rate the same instructor differently.
Because male students tend to give lower ratings than female students, courses that have an uneven sex ratio cannot be reasonably compared to ones that have an equal or opposite sex ratio.
Science and engineering students tend to give lower ratings than social science, humanities, and fine arts students (regardless of gender) making comparisons between disciplines and between non-majors versus major courses unreasonable.
Students who perceive faculty as grading leniently may give higher ratings and students who perceive faculty as grading stringently may give lower ratings.
Students who expect to receive better grades may give higher ratings and those who expect to receive lower grades may give lower ratings.
Even though there is a positive association between perceptions of rigor (based upon workload, difficulty of material, time, or effort) and student learning, students may give lower ratings for courses they perceive as rigorous.
Students may give lower ratings to older faculty than younger faculty, even when instructor experience is held constant.
Students may give lower ratings to minority faculty than non-minority faculty.
Students may give lower ratings to female faculty than male faculty
Students may give lower ratings to large classes than small classes (there is little to no correlation between size and ratings for mid-sized classes).

Remember, SETs Provide a Single Source of Evidence
Just as you would not want your students to construct arguments, draw conclusions, or evaluate scenarios based upon a single source of evidence, so too would you not want your colleagues to evaluate teaching based upon a single source of evidence. Therefore, multiple measures are needed to describe and evaluate someone’s teaching, especially for the purposes of tenure, promotion, and merit pay. These other measures can include:

Review of course materials (formative and summative)
Classroom observations performed by peers (formative) and senior faculty (formative and summative)
Teaching portfolios
Student comments collected in student focus groups or interviews
Student comments on SETs and mid-semester evaluations
Unsolicited comments made by students
Samples of students’ work, preferably with instructor feedback, completed rubrics, etc.
Samples of written communication with students
Records of student achievement after leaving the course and/or institution
Teaching philosophy statements
Statements/records of the instructors’ activities and achievements regarding advising students, mentoring future/new faculty, and the scholarship of teaching and learning (SoTL)
Statements of the instructors’ short and long-term teaching goals
Self-evaluations and reflections, including changes instituted following evaluations, faculty development, and research on teaching.

How the Center for Teaching and Learning Can Assist

Consult with faculty about SET ratings and teaching strategies. See Tips for Using Student Evaluations to Help Students Learn for more information.
Perform formative classroom observations
Administer student focus groups
Assist in developing mid-semester evaluations
Work with faculty, chairs, or deans to identify and develop methods for evaluating teaching, such as peer review programs, teaching portfolios
Assist in reviewing student evaluation ratings and comments

References and Resources

Berk, R.A. (2005). Survey of 12 strategies to measure teaching effectiveness. International Journal of Teaching and Learning in Higher Education, 17(1): 48-62. Retrieved from: http://www.isetl.org/ijtlhe/pdf/IJTLHE8.pdf

Chism, N.V.N (2007). Peer Review of Teaching: A Sourcebook (2^nd Ed.). Bolton, MA: Anker Publishing.

Feldman, K.A. (1998). Identifying exemplary teachers and teaching: evidence from student ratings. In K.A. Feldman & M.B. Paulsen (Eds.), Teaching and Learning in the College Classroom. Needham Heights, MA: Simon & Schuster.

Murray, H.G. (2007). Low inference teaching behaviors and college teaching effectiveness: recent developments and controversies. In R.P. Perry and J.C. Smart (Eds.), The Scholarship of Teaching and Learning in Higher Education: An Evidence-Based Perspective. Dordrecht, Netherlands: Springer. Retrieved from: https://link.springer.com/chapter/10.1007%2F1-4020-5742-3_6

Onwuegbuzie, A.J., Daniel, L.G., & Collins, K.M. (2009). A meta-validation model for assessing the score-validity of student teaching evaluations. Quality and Quantity, 43 (2), 197-209.

Seldin, P. (1993). The use and abuse of student ratings of professors. Chronicle of Higher Education, 21, 40.

Authored by Sarah Lang (May, 2010)

Revised by Sarah Lang (October, 2011)

Revised by Terri Tarr (April, 2020)

Recommended Books

Developing a Comprehensive Faculty Evaluation System- Raoul A. Arreola
Call Number: LB2333 .A77 2007
ISBN: 1933371110
Publication Date: 2006-10-15

Peer Review of Teaching- Nancy Van Note Chism; W. J. McKeachie (Foreword by); Grady W. Chism (Contribution by)
Call Number: LB2333 .C49 2007
ISBN: 1933371218
Publication Date: 2007-06-04

The Teaching Portfolio- Peter Seldin; J. Elizabeth Miller; Clement A. Seldin; Wilbert McKeachie (Foreword by)
Call Number: LB2333 .S46 2010
ISBN: 0470538090
Publication Date: 2010-08-30

Faculty Evaluation from Faculty Focus

As part of the educational assessment process, faculty evaluation attempts to assess and quantify the effectiveness of teaching professionals. Turn to Faculty Focus for tips and techniques.

Using Student Evaluations to Assess Teaching Effectiveness

Recommended Books

Faculty Evaluation from Faculty Focus

Center for Teaching and Learning resources and social media channels