Please Don't Put me in a Box

Why I vehemently distrust all personality assessments

May 20, 2024

Years ago, I was facilitating a teambuilding workshop for a software engineering group. The standard for the company was to use DiSC as a way to assess and scaffold a workshop, and then discuss the results and bring them into some other activities to help teams find ways of working that were appropriate for the “makeup” of the group.

While doing the recommended activity of sorting the room based on DiSC styles (as in, each person stood in the corner of the room or on the wall that corresponded to their respective style), I saw faces change. The best way I can describe it is that everyone suddenly had the curly Grinch smile on their faces — a pleased, patently mischievous grin. I voiced my observation and asked folks what they were experiencing. One woman on the team simply said, “THIS is why none of us get along.”

At that moment, I was done ever recommending DiSC to another team. Some of my colleagues challenged me when I relayed this story. The team recognized their differences! They were working toward a better understanding of each other! You just have to keep using it, they will eventually come to see how they can work together! But I knew from the continued conversations with this team that all the experience did was give them reasons to see each other as different. They held on to the pictures of their colleagues standing on the opposite side of the room, and they were justified in all of the stories they told themselves about why they didn’t like each other.

At this point you might be thinking, DiSC is like that, but Enneagram (or Myers-Briggs, or Colors, or…) isn’t. And trust me, I have explored all of these questions. And I can’t deny that I know quite a few people who have said that doing personality assessments alongside their team helped them understand how to communicate with each other better, etc. But once I started digging into the origins, evidence (or lack thereof), biases, and uses of these assessments, I was increasingly convinced that these tools are doing more harm than good.

Important note: Several assessments are used in the medical and psychological fields to assist in diagnosis. These tend to be much more robust, well-researched and maintained. I’m limiting this article to popular assessments that are accessible to the public.

Origins & Evidence

Myers-Briggs was one of the earlier assessments developed, in about 1943. Prior to that, personality assessments were mostly in the realm of pseudo-science, claiming to identify personal traits based on bumps in the skull or particular physical traits. During World War I, the U.S. Army developed the Woodworth Personal Data Sheet — to attempt to assess the likelihood of shell shock amongst draftees. There isn’t a lot of information about the accuracy or specific development of this test, but it is considered to be a jumping off point for many assessments that followed.

The folks who created the Myers-Briggs test — Katharine Briggs and her daughter — did it to investigate the unlikely connection between the daughter, Isabel, and her fiance Clarence Myers. They then “validated” the questions by matching them to people they knew well. From the get-go, Myers-Briggs used Jungian psychology as a basis and asserted that your types were set from birth — perception, judgment, source of energy, and orientation to the outer world.

While Jung is still discussed a lot in certain circles, his ideas (and in particular the baseline ideas that were used to create the Myers-Briggs test) are not scientifically sound. The collection of assertions that are at the basis of the test are staggering — that there is any empirical evidence to support Jung’s theories, that the spectrums described are truly spectrums, that there are in fact “true types” in personalities (studies have shown that most people who take the test twice get different results), that preferences can accurately predict personality type, etc. This paper is the absolute best summary of all of this and even goes so far as to say that collective buy-in to Myers-Briggs testing is an indicator of a post-truth world.

Stepping back for a moment from specific tests, all of them can be categorized as “self-report inventories,” meaning that the results are based on individuals responding to questions meant to assess their own qualities. Right off the bat, we can see how this would be limiting. Even a well-meaning person taking the assessment can be subject to a variety of biases or fantasies in their self-concept. It’s also incredibly easy for someone to report answers that reflect how they see themselves ideally, rather than how they are. Mood, context, and level of self-awareness all impact the responses.

When the results come in, there is another opportunity for inaccuracy. Confirmation bias is common, where someone will latch onto the elements in the test results or descriptions they already believed about themselves, and use it to justify or confirm those characteristics. Pair that with the Barnam Effect, where people are likely to believe descriptions about themselves are incredibly accurate, even though they were written to be so vague that anyone could relate to them. Finally, there’s the fundamental attribution error. In this scenario, someone could take the more pleasant elements of a description and use it to validate their own goodness but attribute anything that didn’t resonate or felt like a negative quality to the inaccuracy of the test itself (or some other external factor).

Most companies that manage these assessments will tout their commitment to research and assert that they are accurate and science-backed. Don’t be fooled. What they are researching and measuring is something like whether or not people feel the tests are accurate, or the reliability. What they can’t measure is the validity — whether or not the assessment actually measures something real.

That brings us to DiSC, which was developed based on the ideas of William Moulton Marston but turned into an assessment in 1956 by Walter Clarke. This tool was specifically built for the workplace, asserting that it would help businesses choose employees and (somehow) improve performance. It uses four primary traits to measure how emotions are displayed in someone’s environment - dominance, influence, steadiness, and compliance.

Enneagram seems to be the latest hot personality test, and I’ve had many people tell me it’s “more accurate” or somehow just better than the others. So let’s dig into that. The Enneagram Institute describes the origin of the test as “a modern synthesis of a number of ancient wisdom traditions.” Solid start. The site goes on to describe how the creator, Oscar Ichazo, studied many different cultures, religions and traditions to develop the Enneagram.

Ichazo saw the Enneagram as a way of examining specifics about the structure of the human soul and particularly about the ways in which actual soul qualities of Essence become distorted, or contracted into states of ego.

No disrespect to any spiritual tradition, but for me that tells me everything I need to know about the scientific rigor given to this assessment. I feel pretty comfortable categorizing it as pseudo-science, as do lots of other social scientists and psychologists. (Post-publication edit: Someone on LinkedIn wisely pointed out that Enneagram doesn’t purport to be a science-backed tool, so therefore it’s not so much pseudo-science as it is non-science. I appreciated this take and it gives me a bit more respect for the tool, which many people said has helped them on their spiritual and personal journeys. How it’s allowed to be used in workplaces, I’m still not sure.)

CliftonStrengths aims to help you uncover your core strengths and delivers a report of either your primary five strengths or a full list of your 34 top strengths. Because it’s so focused on helping you assess what you’re good at, and doesn’t much emphasize your relationships with others or have you identify with a “type,” I’ve always found it to be the least nefarious. However, there isn’t any evidence for this approach either, and it has the same problems the others do in terms of validity.

Hogan is the funniest in my opinion, because they have built their marketing around how inaccurate other assessments are, and how much better theirs is. They do this essentially by saying that accuracy increases if you take a combination of assessments. Spoiler: it’s just marketing.

There are lots of others. There is Colors, which is outlined so perfectly in this article about how the Swedish people were swindled in 2014 by the creator of it. This uses the same concepts from the originator of DiSC (who also, oddly enough, created Wonder Woman!?). I won’t outline all the others, but you’ll find that they have mostly the same backstory and issues.

Lastly, any assertion that your style will never change is asinine. The one that asserts this claim the most is Myers-Briggs (btw - if you are a glutton for punishment, read that linked article - it’s a painful set of blatant contradictions with no logical explanation). There are a lot of reasons why this is a specious claim, but my go-to is to just think of all the developments we’ve made in understanding trauma in the past century. Are we really going to argue that someone with deep childhood trauma who does intense self-healing will always have the same personality as when they were six?

Some assessments acknowledge that the results will change depending on the context in which you take the assessment. For example, if you take it thinking about work scenarios vs. thinking about home life scenarios, your results will be different. And your mood can affect your results too. While I appreciate the acknowledgment that styles can change based on context (anyone who has done survey administration knows how much moods and contexts can change the results), I have to wonder: if the results can change so much based on these different contexts, what good are they? Do they really tell you anything?

So, how can we do the least harm?

Here’s where I say some things that make you think I’m a hypocrite. But stay with me. Despite everything I said above, I love tarot cards. I do readings regularly for myself when I’ve got sticky things I’m trying to work out, and I even have a tattoo of a tarot card on my arm. Do I believe they have some magical power or any basis in science? Nope. But they do provide a fun change in perspective when I’m feeling stuck. Beyond that, I’m not as into astrology as a lot of people, but I do enjoy reading horoscopes for fun and often find resonance with my sun and rising signs. And if someone asks me for my DiSC or Myers-Briggs or Enneagram, I know the answer. (Though, that’s mostly because previous jobs have made me take the tests.)

I’m a big fan of people looking for insights into themselves and building their self-awareness in any way they can. I would never tell someone not to take a personality assessment as an experimental mirror to get some external information that might help them understand themselves. Just like tarot cards, tools like this are meant to be relatable and recognizable in their descriptions. So if you resonate powerfully with how an Enneagram 2 is described (my type, btw), then by nature of that resonance (and hopefully a good amount of reflection on it), you might learn some helpful things about yourself. Hopefully you also pair it with a number of other tools for self-awareness, including talking to actual people who you interact with, and seeking therapy.

All that said, I have major problems with these assessments when either or both of these are involved: identification (with that style or quality) and comparison.

If you begin to identify with a particular type, you inadvertently limit your self-concept and development as a human. There are so many camps of thinkers with relation to how the self develops, how identity develops, etc. but I’m a firm believer that flexibility is key to mental health, learning and growing. I have met a lot of people who are so convinced that they are introverts (for example) that they use that identity to justify a lot of behaviors that hold them back in one way or another. This is not self-awareness, this is identification with a shallow, prescribed, generic definition of a quality. And by reducing ourselves to these qualities instead of doing the work to truly grow our self-awareness, we are doing ourselves a disservice.

The other danger is in exactly what I experienced with that team years ago. When we use a personality assessment as a way to rationalize our differences (or even similarities) with others, we construct artificial rooms in our minds where some people are allowed in and others aren’t. This construct can be incredibly limiting to our ability to find connection and common ground with people.

Finally, these assessments, when used at the team level, can let managers and leaders off the hook regarding the role they play in creating healthy team environments. This quote from an HBR article is a really nice summary:

Research suggests that many beliefs held by HR professionals about personality screening run counter to scientific evidence. And management scholars worry that fixating on personality as the primary source of conflict at work can cause managers to overlook the crucial role they play in creating the enabling conditions for teams to succeed—whatever their composition.

What to stay far away from

The most egregious use of personality assessments is in hiring — or any decision-making, really. The idea that we could have such an accurate description of a role and such an accurate understanding of the styles of the existing team that we could prescribe what style would thrive best in that role and with the team is, in a word, absurd. Given what we know about the accuracy of these tests (and frankly, the accuracy of most role definitions), this is a recipe for not only leading yourself away from really great candidates but also likely exhibiting a whole lot of bias (read: sexism, racism, ableism, classism, etc) in your process. Stick to good old interviewing. (To their credit, the Myers-Briggs Foundation has recommended against using the test for hiring or job assignment decisions, but unfortunately, that hasn’t stopped many companies.)

I will never again use a personality assessment as a part of teambuilding. I have colleagues who build this into their programs, and with exceptionally expert facilitation plus a group of participants who are reasonably self-aware already, I suppose it could be beneficial. I believe that is the exception, not the rule.

The Crux

Hopefully, if any of this information is new to you, you’ll at least come to personality assessments with a little more discernment than previously. But if you’re like me, you might be asking, How the heck are these still around!? I believe the answer lies somewhere between our desire to reduce and simplify complexity and our desire for accessible tools to help us know ourselves better. These are basic human desires, and it’s understandable why we would latch onto the promise of tools that seem to help us get there. I’ll tackle this topic of complexity reduction in a future article because it’s a particular passion of mine. For now, hopefully we all think twice before we put ourselves or each other in boxes.

Zach Holloway

First, this: "Mood, context, and level of self-awareness all impact the responses." Nothing to add there, but "mmmmhmmmm!"

Second, I'm fascinated by the history of these tests, so thanks for dropping all the links. I can't help but think about something like the DiSC as an example of how much we want to create certainty or rationale amongst a world of chaos. If I'm an employer/manager and I can use a test to 1) forecast who might be a good "fit" or 2) "define" a person's working style or personality based on the results of that, it's much easier to explain away conflict in the workplace, someone leaving a role, or an employee "just not working out". It could certainly be a shortcut around the more difficult work of growing in understanding of people on their own terms and through day-to-day interaction.

Thanks for sharing your thoughts here—will definitely be noodling on this for a bit!

Expand full comment

1 reply by Emily Logan

2 more comments...

Take Part

Discussion about this post