Byte Order Mark U+FEFF
U+FEFF65279\FEFF\uFEFF%EF%BB%BFEF BB BFFormat (Cf)Arabic Presentation Forms-BThe byte order mark (BOM, U+FEFF) is an invisible character that appears at the very beginning of a text file to signal the file's character encoding and byte order. It is one of the most notorious invisible characters in software development because it causes bugs that are genuinely invisible: your file looks correct in every text editor, your code has no syntax errors, but something mysteriously breaks.
If you are a developer who has spent hours debugging a JSON parsing error, a PHP "headers already sent" warning, a shell script that won't run, or a CSV import that fails on the first column, there is a reasonable chance the BOM was the culprit.
What BOM Actually Does
BOM serves two legitimate purposes:
1. Byte Order Detection (UTF-16)
UTF-16 encoding stores characters as 2-byte pairs. But there are two ways to order those bytes: big-endian (most significant byte first) and little-endian (least significant byte first). The BOM tells software which order the file uses:
- FE FF (big-endian): the BOM character stored as-is.
- FF FE (little-endian): the BOM character with bytes reversed.
This is the original and legitimate purpose of BOM. Without it, a UTF-16 reader has to guess the byte order.
2. UTF-8 Encoding Signature
In UTF-8, there is no byte-order ambiguity (UTF-8 has a fixed byte order). But some software, especially on Windows, places the three bytes EF BB BF at the start of UTF-8 files as a signature to indicate "this file is UTF-8." This is where the trouble starts.
Why BOM Causes Bugs
When a UTF-8 BOM is present, the file starts with three invisible bytes (EF BB BF) before any visible content. Most text editors hide these bytes completely. You see your file starting with the first visible character. But to software reading the raw bytes, the file starts with unexpected data.
JSON
JSON parsers expect the first byte to be { or [. Finding EF instead causes a parse error.
PHP
PHP sends the BOM bytes as HTTP response body before your code runs, which prevents you from setting headers, starting sessions, or sending redirects.
Shell Scripts
CSV Import
The first column header in a BOM-prefixed CSV file appears to have three extra invisible bytes. Column name matching, header parsing, and data mapping all fail silently or produce wrong results.
XML
Common Uses (When BOM Is Present)
- Windows Notepad. Notepad historically saved UTF-8 files with BOM by default. This is the single biggest source of BOM-related bugs.
- Microsoft Excel. Excel often adds BOM when saving CSV files, causing issues when those CSVs are imported by other software.
- Some IDEs and text editors. Older versions of various editors saved with BOM by default.
- File conversion tools. Converting between encodings can introduce BOM.
- Copy-paste from certain sources. Pasting text that originated from a BOM-prefixed file can carry the BOM along.
How to Detect BOM
In Our Tool
Paste your file contents into the Invisible Character Viewer on the homepage. BOM will be highlighted at the very beginning of the text with the label "BOM" and code point U+FEFF.
On the Command Line
In Code
How to Remove BOM
How to Type
Technical Details
- Unicode category: Cf (Format).
- Width: Zero. No visible glyph.
- Historical note: U+FEFF was originally defined as both a byte order mark AND a zero-width no-break space. Unicode 3.2 (2002) deprecated the inline use and introduced U+2060 () as the replacement for zero-width no-break space behavior.
- UTF-8 representation: EF BB BF (3 bytes).
- UTF-16BE representation: FE FF (2 bytes).
- UTF-16LE representation: FF FE (2 bytes).
- UTF-32BE representation: 00 00 FE FF (4 bytes).
- UTF-32LE representation: FF FE 00 00 (4 bytes).
Should You Ever Use BOM?
For UTF-16: Yes. BOM is essential for UTF-16 files because it resolves the byte-order ambiguity.
For UTF-8: Almost never. The Unicode Consortium says: "Use of a BOM is neither required nor recommended for UTF-8." The only exception is software that specifically requires a BOM to identify UTF-8 files (some older Windows tools). For web, development, and data exchange, always use UTF-8 without BOM.
For new files: Save as UTF-8 without BOM. This is now the default in most modern text editors, but always verify your editor's settings.
The BOM Debugging Checklist
If you suspect BOM is causing an issue in your project, work through this checklist:
- Paste the file contents into our Invisible Character Viewer. If BOM is present, it will be highlighted as the very first character.
- Run
file filenameon the command line. It will say "UTF-8 Unicode (with BOM)" if BOM is present. - Check your editor's encoding setting. Make sure it says "UTF-8" and NOT "UTF-8 with BOM."
- Check all files in the chain. If you are importing a CSV that was exported from Excel, the BOM may be in the CSV, not in your code.
- Check your
.gitattributes. You can add rules to automatically strip BOM on commit. - Check for BOM in environment files.
.envfiles with BOM can cause environment variable parsing to fail silently. The first variable in the file will have three invisible bytes prepended to its name.
That last point is especially insidious. If your DATABASE_URL environment variable is not being recognized, check whether the .env file starts with BOM. The variable name becomes \xEF\xBB\xBFDATABASE_URL which does not match the string DATABASE_URL.
Frequently Asked Questions
What is a byte order mark (BOM)?
Why does BOM cause bugs?
How do I remove a BOM from my file?
Should I use UTF-8 with or without BOM?
What is the difference between BOM and Word Joiner?
Related Characters
Need to detect or remove Byte Order Mark characters in your text?
Open Invisible Character Viewer