1. Introduction
2. Method
   2.1 Instruments used
3. Exploration and selection of norm data
   3.1 Age
   3.2 Sex
   3.3 Level of education
   3.4 Nationality
   3.5 Labour market position
   3.6 Work sectorindustry
   3.7 Completion time
   3.8 Response variation
   3.9 Human response
4. Analysis
   4.1 Raw scores
   4.2 Correlations
   4.3 Reliability
   4.4 Construct validity / factor analysis
5. Norms
   5.1 Labour force
   5.2 Group differences
6. Conclusion
7. References

Introduction

This document provides insight into the psychometry of the Big Five personality test of 123test. This test, developed by 123test B.V., is an operationalisation of the Big Five personality theory.

The test measures the five main dimensions of personality and the 30 underlying facets. This makes it a scientific instrument, which, moreover, has a high degree of reliability. has a high validity and a representative and recently assembled norm group used.

Information on reliability, validity and norm groups are described in this document. Also discussed how the dimensions of this test vary as a function of level of education, gender and age.

Method

Since November 1, 2019, more than 500,000 responses of the Big Five Personality Test have been recorded on www.123test.com. By analyzing the data of these anonymous respondents, we can form a good picture of this instrument.

Instruments used

The Big Five Personality Test is free to use at https://www.123test.com/personality-test/. The Dutch equivalent of this questionnaire can be found at https://www.123test.com/nl/persoonlijkheidstest/ and can also be used free of charge.

Exploration and selection of norm data

In order to explore the gathered data, this chapter examines a number of background variables of the respondent in more detail. The selection criteria used for the final dataset are indicated for each component.

The complete dataset consists of 490.689 respondents. Based on cirteria such as age, sex, educational level, nationality, labour market position, completion time and response variation, the final subset is made on which analyses are done and with which the final norm is calculated.

Age

An age group of 18 to 67 years has been chosen, because this group best represents the working population of the Western world.

Sex

The sex of all respondents is known because this background question was mandatory. Striking is the higher number of women who completed the test. Logically, both sexes are included in the dataset because this group best represents the labour population of the Western world.

Level of education

Because the Big Five Personality Test is specially developed for average to higher educated people, it was decided to select a number of education levels to be included in the dataset. The blue shaded education levels in the diagram are included in the dataset.

Nationality

The numbers of nationalities represented in the original dataset is enormous: 217 countries, dependencies and territories were represented with more than 10 respondents.

Because the Big Five Personality Test is developed for the English speaking market of the Western world, a country selection is made. Countries selected in the final dataset are shaded blue.

Labour market position

The respondent was asked about his/her labour market position. Only the labour market positions Salaried employment, Self-employed/Freelancer and Officially unemployed were used in the dataset, because this group best represents the labour population of the Western world.

Work sector/industry

The respondent was asked to indicate in which working sector he/she works. A choice could be made from the 23 work sectors used in the model of EurOccupations (Wageindicator.org 2009). The distribution gives no reason to correct for this.

Completion time

Looking at the duration of completion of a questionnaire is a good way to determine how seriously a respondent has completed the questionnaire. It was decided to take between 5 and 45 minutes in the final dataset.

Response Variation

Looking at a respondent’s response variation is a good way to determine how seriously the respondent has completed the questionnaire. It was decided to only include a response variation of 5 in the final dataset. A response variation of 5 means that a respondent has used all the answer options of the Likert-5 scale at least once over all 120 items.

Human response

Online questionnaires can suffer from crawlers and bots who fill in the questionnaires automatically. By using a consistency measure we can exclude responses that are not consistent from the dataset.

The consistency measure psychometric synonym (Meade and Craig 2012) has been used to identify artificial and random responses. This consistency measure is calculated by first selecting all item pairs that correlate > .60 across the entire dataset. In this dataset, 9 item pairs are selected. Next, for each respondent the psychometric synonym score is calculated which is equal to the within-person correlation of the selected item pairs.

The cut-off value of 0.2 used by Meade & Craig (2012) was used to filter artificial responses and responses with a random response pattern from the dataset. In the histogram below, the deleted responses are shaded gray.

Analysis

The final dataset includes 15.107 respondents.

Raw scores

In this chapter the raw scores of all the facets and factors are presented. X-axes have been omitted because of possible unwanted reuse of the norm data.

Factors

The histograms of the raw factor scores all show a normal distribution.

Correlations

Correlations between factors

The five factors generally show minimal correlations. Natural reactions shows clear negative correlations with Conscientiousness and Extraversion.

Reliability

Cronbach’s alpha (Cronbach and Shavelson 2004) is a measure of the reliability of psychometric tests or questionnaires. The value of alpha is an estimate for the lower limit of reliability of the test in question.

Factors

A often used criterion for instruments used in advisory situations is that the reliability coefficient of Cronbach’s alpha should not be lower than .60. Scores higher than .80 are assessed as ‘good’.

On average across the five factors, the reliability coefficient is 0.88, which may be considered very high.

Factors	Item count	Cronbach’s Alpha
Openness to experience	24	0.81564
Conscientiousness	24	0.90888
Extraversion	24	0.89249
Agreeableness	24	0.86265
Natural reactions	24	0.91921

If an item does not correlate sufficiently with the other items of the same factor, it damages the reliability of said factor. Below is shown what happens to the Cronbach’s Alpha of a factor when one of the 24 items is removed.

If item deleted	O	C	E	A	N
1	0.806	0.904	0.885	0.859	0.916
2	0.808	0.905	0.884	0.857	0.914
3	0.809	0.907	0.885	0.859	0.914
4	0.809	0.906	0.887	0.857	0.912
5	0.800	0.906	0.884	0.855	0.916
6	0.806	0.906	0.884	0.855	0.915
7	0.805	0.904	0.887	0.854	0.917
8	0.804	0.905	0.885	0.858	0.917
9	0.811	0.906	0.888	0.857	0.913
10	0.811	0.907	0.888	0.855	0.914
11	0.814	0.908	0.888	0.855	0.913
12	0.814	0.905	0.888	0.855	0.916
13	0.814	0.905	0.889	0.861	0.917
14	0.811	0.906	0.890	0.859	0.918
15	0.814	0.904	0.890	0.853	0.919
16	0.807	0.907	0.894	0.856	0.919
17	0.807	0.904	0.889	0.856	0.918
18	0.806	0.904	0.889	0.866	0.920
19	0.806	0.903	0.898	0.865	0.920
20	0.805	0.905	0.897	0.861	0.918
21	0.809	0.907	0.887	0.859	0.913
22	0.817	0.905	0.886	0.857	0.914
23	0.813	0.905	0.888	0.855	0.914
24	0.812	0.903	0.888	0.857	0.916

Facets

The average Cronbach’s Alpha of the 30 facets is 0.753, which is a good performance considering the length of the scales.

Factors	Facets	Item count	Cronbach’s Alpha
Openness to experience	Facet: Imagination	4	0.77087
	Facet: Artistic interests	4	0.73223
	Facet: Depth of emotions	4	0.65264
	Facet: Willingness to experiment	4	0.66261
	Facet: Intellectual curiosity	4	0.69076
	Facet: Tolerance for diversity	4	0.49487
Conscientiousness	Facet: Sense of competence	4	0.72658
	Facet: Orderliness	4	0.83003
	Facet: Sense of responsibility	4	0.69670
	Facet: Achievement striving	4	0.75943
	Facet: Self-discipline	4	0.74301
	Facet: Deliberateness	4	0.86425
Extraversion	Facet: Warmth	4	0.80829
	Facet: Gregariousness	4	0.81566
	Facet: Assertiveness	4	0.86924
	Facet: Activity level	4	0.71323
	Facet: Excitement seeking	4	0.65720
	Facet: Positive emotions	4	0.81739
Agreeableness	Facet: Trust in others	4	0.84768
	Facet: Sincerity	4	0.74710
	Facet: Altruism	4	0.73092
	Facet: Compliance	4	0.66146
	Facet: Modesty	4	0.73948
	Facet: Sympathy	4	0.72971
Natural reactions	Facet: Anxiety	4	0.82595
	Facet: Angry hostility	4	0.86720
	Facet: Moodiness/Contentment	4	0.85947
	Facet: Self-consciousness	4	0.70584
	Facet: Self-indulgence	4	0.76216
	Facet: Sensitivity to stress	4	0.79752

Construct Validity: Factor Analysis

Screeplot

In factor analysis, a screeplot or eigenvalue diagram is a graph in which the eigenvalues of the possible variables for the factors are plotted in order of decreasing magnitude.

In the table below you can see that there are 5 clear components (PC) with an eigenvalue > 1.0. This corresponds with well-known scientific literature which states that personality contains 5 components.

Principal Components Analysis

Principal component analysis is a multivariate method of analysis in statistics to describe a large amount of data with a smaller number of relevant quantities, the main components or principal components.

The table below shows the results of a PCA with varimax rotation. The 30 facets can clearly be reduced to the five components to which they belong according to the theoretical model of the Big Five. The dominant factor Extraversion attracts a lot of variance, especially in the form of negative charges of Natural reactions. There are only a number of facets that have a higher primary charge on another component.

All in all, the analysis shows a very recognizable and satisfactory picture.

Factor	Code	Facet	RC1	RC2	RC3	RC4	RC5
Extraversion	E1	Facet: Warmth	0.828
	E2	Facet: Gregariousness	0.832
	E3	Facet: Assertiveness	0.481	0.513
	E4	Facet: Activity level	0.436	0.493
	E5	Facet: Excitement seeking	0.574
	E6	Facet: Positive emotions	0.69
Conscientiousness	C1	Facet: Sense of competence		0.801
	C2	Facet: Orderliness		0.578
	C3	Facet: Sense of responsibility		0.529
	C4	Facet: Achievement striving		0.727
	C5	Facet: Self-discipline		0.784
	C6	Facet: Deliberateness		0.432		-0.574
Agreeableness	A1	Facet: Trust in others	0.454		0.407
	A2	Facet: Sincerity			0.65
	A3	Facet: Altruism			0.793
	A4	Facet: Compliance			0.595
	A5	Facet: Modesty			0.556
	A6	Facet: Sympathy			0.706
Natural reactions	N1	Facet: Anxiety				0.69
	N2	Facet: Angry hostility				0.708
	N3	Facet: Moodiness/Contentment				0.552
	N4	Facet: Self-consciousness	-0.729
	N5	Facet: Self-indulgence				0.503
	N6	Facet: Sensitivity to stress				0.638
Openness to experience	O1	Facet: Imagination					0.587
	O2	Facet: Artistic interests					0.689
	O3	Facet: Depth of emotions			0.701
	O4	Facet: Willingness to experiment					0.543
	O5	Facet: Intellectual curiosity					0.746
	O6	Facet: Tolerance for diversity					0.595

Norms

Labour force

For a correct norm group, the dataset must properly reflect the intended group of users, in this case the Western world labour force. Because a dataset almost never has the same composition as the intended user group, weighing is used.

The dataset is weighted according to the distribution in the table below.

Criterium	Groepen	Population
Sexe	Female	50,50%
	Male	49,50%
Education	Average education	62,60%
	Higher education	37,40%
Age	15-24	17,40%
	25-44	45,50%
	45-64	37,10%

A much used standard for norm groups for use in ‘advisory’ situations is that the norm group should consist of >200 respondents. For recruitment and selection purposes this is >400. In this dataset 15107 respondents are included and therefore very clearly meets this standard.

Group differences

If there are significant differences between relevant groups within a norm group, this could and should be corrected by using separate norm groups.

For comparing group averages and determining effect size, Cohen’s d is used (Cohen 1992). Effect sizes close to zero are small, effect sizes larger than 0.8 or smaller than -0.8 are often considered large.

Sex

To determine whether norms are needed for specific groups, group differences between the sexes have been examined. The results of these analyses are shown below.

Effect size

Factor	Cohen’s D (Male-Female)
Openness to experience	0.128
Conscientiousness	0.121
Extraversion	-0.024
Agreeableness	0.543
Natural reactions	0.252

Given that no impact sizes of -0.8 or 0.8 have been found, it can be concluded that the use of a single norm for the sexes is justified.

Age

In order to determine whether norms are needed for specific groups, group differences between age groups were taken into account. The results of these analyses are shown below.

Effect size

Factor	Cohen’s D (Older-Younger)
Openness to experience	0.156
Conscientiousness	-0.724
Extraversion	-0.151
Agreeableness	-0.552
Natural reactions	0.705

Given that no impact sizes of -0.8 or 0.8 have been found, it can be concluded that the use of a single norm for age is justified.

Level of education

In order to determine whether standards are needed for specific groups, group differences between education levels have been examined. The results of these analyses are shown below.

Effect size

Factor	Cohen’s D (University-Highschool)
Openness to experience	-0.330
Conscientiousness	-0.410
Extraversion	-0.242
Agreeableness	-0.312
Natural reactions	0.334

Given that no impact sizes of -0.8 or 0.8 have been found, it can be concluded that the use of a single standard for education level is justified.

Conclusion

The results of this study show that the Big Five Personality Test of 123test is a reliable and valid instrument with a solid norm to be used among Western world respondents with an average to higher educational level, with an age between 18 and 67 years for self-analysis, in career guidance or in other professional settings.

Reliability

The results of this study show that the Big Five Personality Test of 123test scores well to very well on the reliability coefficients commonly used in science.

Validity

The results of this study show that the Big Five Personality Test of 123test shows good construct validity of the measured constructs.

Norms

The results of this study show that the Big Five Personality Test of 123test has a good norm that shows no differences between groups.

References

Cohen, Jacob. 1992. “A Power Primer.” Psychological Bulletin 112 (1). American Psychological Association: 155.

Cronbach, Lee J., and Richard J. Shavelson. 2004. “My Current Thoughts on Coefficient Alpha and Successor Procedures.” Educational and Psychological Measurement 64 (3): 391–418.

Meade, Adam W, and S Bartholomew Craig. 2012. “Identifying Careless Responses in Survey Data.” Psychological Methods 17 (3). American Psychological Association: 437.

Wageindicator.org. 2009. “EurOccupations.”

Technical documentation Personality test

Bart Dekker MSc. Dr. Edwin van Thiel

Introduction

Method

Instruments used

Exploration and selection of norm data

Age

Sex

Level of education

Nationality

Labour market position

Work sector/industry

Completion time

Response Variation

Human response

Analysis

Raw scores

Factors

Facets

Openness to experience

Conscientiousness

Extraversion

Agreeableness

Natural reactions

Correlations

Correlations between factors

Correlations between facets

Reliability

Factors

Facets

Construct Validity: Factor Analysis

Screeplot

Principal Components Analysis

Norms

Labour force

Group differences

Sex

Age

Level of education

Conclusion

References

Technical documentation
Personality test

Bart Dekker MSc.
Dr. Edwin van Thiel