One last time on the email validations (and phone numbers, and addresses…)

How many of you were sitting on meetings or other ways argued about how to do the email validation correctly?

One last time on the email validations (and phone numbers, and addresses…)
User flow with the drop off points added

How many of you were sitting on meetings or other ways argued about how to do the email validation correctly?

TLDR: simple answer, DON’T do it, does not worth the effort, there is only one way, send and email and see if the user clicks a link in it, that is it.

And now the longer version. In this article, I use the word “validation” in the sense of blocking progress on a screen or not saving data to a database until such validation rule is satisfied.

During my one and a half decade in the software development industry, I had more than my fair share of this argument, (and include postal code and phone number to the same problem, or entire delivery addresses for that matter!), and I was committing huge amount of time in my early career to implement the fanciest, sometimes regex based, sometimes code based format validation, double entry with blocked copy-paste type solutions on different platforms (Silverlight, ASP.NET, React, NodeJS, name it) to only realize, it is a futile, and in the end, useless action without any impact.

Take it to the root of the problem, why do you want email validation the first place? There are a couple of reasons, let’s take them down one by one:

To know it is an address I can send email to

You can come up with any fancy local validation, it will never prove you can deliver an email to it until tried, e.g. it was a perfectly valid company address of someone that just got deactivated due to the person leaving the company.

And just look at the RFC 2822 (which overrides RFC 822) how complicated is the address format that will still work https://datatracker.ietf.org/doc/html/rfc2822(https://datatracker.ietf.org/doc/html/rfc2822). You don’t want to replicate this, or in some industries you can’t reuse just any open source libraries (e.g. in banking just ask IT security about their opinion).

To know I can reach the customer through this email

Same problem, local calculations will never help establishing you can communicate with the intended customer, a typo still leads to a valid email address but you won’t be able to reach the correct customer or event at all a single mailbox.

So to truly establish if an email address is working and correct, you need to send a secret to that address and ask for it in some way so that the customer proves, he owns and accesses that email account.

But wait, isn’t sending a message is too costly?

For most security, GDPR and other regulations you anyway have to establish the identity of the user and if they actually have rights to that account (for example you can’t just subscribe random addresses to all newsletters).

Nowadays setting up or accessing an SMTP server and sending an email is trivial on your chosen platform, cost is limited to some computational power, etc. As you would never trust frontend validation (hope you don’t!), you will repeat the validation on the backend which in turn will cost you computational time anyway, while not eliminating the need for sending the email itself.

And just compare to how many times you do this when to secure the account you anyway provide an OTP (one time password) solution too? That uses the same logic just for almost all logins instead on top of the first one.

User flow without pesky validations
User flow without pesky validations

Just compare the two diagrams, how much complication is added while not getting anything more than reminding the customer if they had a typo between one of their input field or you just angered them to not be able to use their otherwise valid email address.

While you still did not solve multiple other problems:

  1. Are you sure if this address is for a single person? Like email lists, multiple users access through shared passwords → this you can handle in an EULA (end user licence agreement) or other contractual clauses depending on your industry to hold the user accountable for damages inflicted through shared passwords
  2. Are you sure if this address is unique and does not lead to same person or email address? Like aliases (not entirely same as email lists). Most people don’t know, but the email address implements an implicit alias mechanism, for example put a + sign before the @ sign and enter anything between them, your email still gets delivered to the same address just you now have infinite number of aliases, (have fun with some trial accesses that are bound to email addresses) → this won’t be captured by some of the validation you create, so isn’t it better to focus on a potentially damaging problem to solve? If you identify your customers with different data, for example in finance you have set regulations on how to do that and those never include email address, you again made the validations redundant or useless
  3. Are you sure if the email address is not temporary? Like burner emails, similar to burner phones, can even self destruct after set period of time, even complicates detection of multiple addresses actually belonging to same person.

Surely we need to validate phone numbers — no, we don’t

Lots of times I can’t even count anymore how many client wanted me to validate phone numbers and outright disregard non-conforming values by not letting the customer or admin save that number into the systems. And every time they asked a couple of weeks later if we could add just one more phone to be valid, and again and again, and to led up to ask, oh, can we support international numbers?

Or worst time waste is to trace down why a single customer can not finish registration to an ecommerce site, while living in the serviced area, all correct data, except, he had a foreign mobile phone number as was recently relocated. Now the shop lost a customer who without phone validation in the first place, could easily be served by the shop.

And what do you do even in a single market, like Hungary has landlines and 3 major mobile networks, and a few barely known but still existing smaller ones, people who moved from other countries and decided to keep their original numbers, or even chose to move those numbers between service providers.

Same rule applies, send a secret to the phone number and ask for customer to enter it back (you would not send the sms yourself but use a SaaS (software as a service), wouldn’t you?), now you established that the phone they have access to (not that they are the only ones with access, etc, so again, cover yourself with contractual clauses or other means if want to use it for personal identification, but you can use it better than email in this regard, phones are nowadays more personal items).

However, here the price comes back in argument, to send an SMS is comparably much higher cost than sending an email, so coming up with good strategies to lower the number of times you have to do it per customer is necessary (could be another own blog post), however the strict validation of the number in a static, local code still not gonna help much.

For example you can limit the numbers obviously, cause there are only 9 digits in a hungarian phone number now, isn’t it? And we ran out of numbers e.g. 2 years from now or for other reason now 10 digits are on the market and you have to upgrade all your more than 2 years old code now to allow a significant portion of customer base into your business.

Instead you can chose to guide customer toward cheaper solutions and only use SMS when necessary as a fallback (for example in card payments 3DS verification process) and use push notifications or email if possible instead. Or when not for fallback and still the user wants SMS for some reason, calculate into their cost or bill, but don’t waste energy on validating them on frontend and backend code, you will have more follow ups, production support and sleepless hours than those maybe unnecessary SMSs cost.

Okey, but even more costly to send a real mail — true, but have fun with validating addresses anyway

Just two interesting things most people don’t know about the postal services. One, they are heroes who service areas nobody would do, second, the address mostly does not make sense to computers.

Outside of some cool areas, like Manhatten, where the grid is almost perfect and roads are parallel, plots are squares, most areas on the planet are weird and not following an easily computerised logic, most places were designed way before algorithms would be applied to them and they target humans to help them do their job.

Simple ask from a client, just whatever city the customer enters for the delivery, let’s autofill the postal code based on the given city. Oh and must do it from a drop down, with mandatory fill. Not gonna work, postal codes and city (or village names more often) are in a many to many relationship to each other. Like most capital cities or larger cities have multiple districts with multiple postal codes, while rural areas usually have multiple villages under same postal code.

We would not need one or the another on an address if they were so linked, so can help suggesting one or MORE postal codes for a city or suggest one or more cities to a postal code, but never can be entirely sure without accessing huge databases of postal addresses, not feasible for a bank for example.

And how waste comes to this?

Waste appears in many forms around the format validations of complicated addresses, like

  • Argue in a meeting about how to do the best validations → I am sure noone in those meetings read the entire RFC documents or knows top of their head all possible address formats, and just look around how many people in the meeting and how long it goes on, that is usually a hefty prices for a single engineer’s 1–2 days coding effort.
  • How much time it takes to trace down issues to false negatives and false positives on the validations → especially with a loop when the engineer gets a ticket saying that “a registration did not work, please check” but no data what was added to the ticket (especially hard to get that data IF you can only log them after getting it to the backend and no, frontend logging to the backend is not the way to solve this).
  • How many times have to add an exception, add a new rule or rework the code implementing the validation → every time it is the opportunity cost of other feature that could be delivered instead of fixing something that actually does not produce impact on your top and bottom lines.

Summary

  • You only want to ensure you can reach the correct customer on the given email address, for this always send verification emails (sms) with a secret in it!
  • Don’t validate entered email addresses, phone numbers or postal address strongly (e.g. don’t block progress in your journey)!
  • But still show warnings instead if you want to help your customers or your employees to spot potential mistakes but don’t force them to conform, especially with phone number formats and postal addresses!
  • Format phone numbers only for the viewer, store them in simplest form (e.g., just digits in international form)

This way you avoid potential waste around hard validation of input fields of email, sms and addresses such as meeting times, tracing/investigating problems and adding exceptions or otherwise modifying the code later.

Another great source on the topic:

I Knew How To Validate An Email Address Until I Read The RFC
Raise your hand if you know how to validate an email address. For those of you with your hand in the air, put it down…