Byte Order Mark U+FEFF


Codepoint
U+FEFF
Decimal
65279
HTML

CSS
\FEFF
JS
\uFEFF
URL
%EF%BB%BF
UTF-8
EF BB BF
Category
Format (Cf)
Block
Arabic Presentation Forms-B

The byte order mark (BOM, U+FEFF) is an invisible character that appears at the very beginning of a text file to signal the file's character encoding and byte order. It is one of the most notorious invisible characters in software development because it causes bugs that are genuinely invisible: your file looks correct in every text editor, your code has no syntax errors, but something mysteriously breaks.

If you are a developer who has spent hours debugging a JSON parsing error, a PHP "headers already sent" warning, a shell script that won't run, or a CSV import that fails on the first column, there is a reasonable chance the BOM was the culprit.

What BOM Actually Does

BOM serves two legitimate purposes:

1. Byte Order Detection (UTF-16)

UTF-16 encoding stores characters as 2-byte pairs. But there are two ways to order those bytes: big-endian (most significant byte first) and little-endian (least significant byte first). The BOM tells software which order the file uses:

  • FE FF (big-endian): the BOM character stored as-is.
  • FF FE (little-endian): the BOM character with bytes reversed.

This is the original and legitimate purpose of BOM. Without it, a UTF-16 reader has to guess the byte order.

2. UTF-8 Encoding Signature

In UTF-8, there is no byte-order ambiguity (UTF-8 has a fixed byte order). But some software, especially on Windows, places the three bytes EF BB BF at the start of UTF-8 files as a signature to indicate "this file is UTF-8." This is where the trouble starts.

Why BOM Causes Bugs

When a UTF-8 BOM is present, the file starts with three invisible bytes (EF BB BF) before any visible content. Most text editors hide these bytes completely. You see your file starting with the first visible character. But to software reading the raw bytes, the file starts with unexpected data.

JSON

text
// File content (as you see it in your editor):{"name": "test"}
// Actual bytes in the file:EF BB BF 7B 22 6E 61 6D 65 ...         ^-- your JSON starts here, not at the beginning
// Result: "Unexpected token" or "Invalid JSON"

JSON parsers expect the first byte to be { or [. Finding EF instead causes a parse error.

PHP

php
// If your PHP file has BOM, the three bytes are sent as output// before PHP even starts processing. This causes:// "Warning: Cannot modify header information - headers already sent"

PHP sends the BOM bytes as HTTP response body before your code runs, which prevents you from setting headers, starting sessions, or sending redirects.

Shell Scripts

bash
#!/bin/bash# If this file has BOM, the shebang becomes:# \xEF\xBB\xBF#!/bin/bash# The system doesn't recognize this as a valid interpreter path# Result: "command not found" or "bad interpreter"

CSV Import

The first column header in a BOM-prefixed CSV file appears to have three extra invisible bytes. Column name matching, header parsing, and data mapping all fail silently or produce wrong results.

XML

xml
<?xml version="1.0" encoding="UTF-8"?><!-- With BOM, the three bytes appear before the XML declaration --><!-- Some parsers handle this; others throw "Content is not allowed in prolog" -->

Common Uses (When BOM Is Present)

  • Windows Notepad. Notepad historically saved UTF-8 files with BOM by default. This is the single biggest source of BOM-related bugs.
  • Microsoft Excel. Excel often adds BOM when saving CSV files, causing issues when those CSVs are imported by other software.
  • Some IDEs and text editors. Older versions of various editors saved with BOM by default.
  • File conversion tools. Converting between encodings can introduce BOM.
  • Copy-paste from certain sources. Pasting text that originated from a BOM-prefixed file can carry the BOM along.

How to Detect BOM

In Our Tool

Paste your file contents into the Invisible Character Viewer on the homepage. BOM will be highlighted at the very beginning of the text with the label "BOM" and code point U+FEFF.

On the Command Line

bash
# Check if a file starts with UTF-8 BOMhead -c 3 filename | xxd | grep 'efbb bf'
# Check using file commandfile filename# Output will say "UTF-8 Unicode (with BOM)" if BOM is present
# Find all files with BOM in a directorygrep -rlP '^\xEF\xBB\xBF' .

In Code

javascript
// JavaScript: check if a string starts with BOMtext.charCodeAt(0) === 0xFEFF
// JavaScript: strip BOM from stringif (text.charCodeAt(0) === 0xFEFF) {  text = text.substring(1);}
python
# Python: read file and strip BOMwith open('file.txt', encoding='utf-8-sig') as f:    content = f.read()# 'utf-8-sig' automatically strips BOM if present

How to Remove BOM

ToolMethod
VS CodeClick encoding in status bar > "Save with Encoding" > "UTF-8" (not "UTF-8 with BOM")
Notepad++Encoding > Convert to UTF-8
Sublime TextFile > Save with Encoding > UTF-8
Vim:set nobomb then :w
Command line (Linux/Mac)sed -i '1s/^\xEF\xBB\xBF//' filename
PowerShellRead file, strip BOM, write back without BOM encoding
PythonOpen with encoding='utf-8-sig', write with encoding='utf-8'
GitAdd .gitattributes rule to strip BOM on commit

How to Type

PlatformMethod
AnyYou almost never want to type BOM manually. It is added by text editors during save.
HTML&#65279; or &#xFEFF; (but you should not add BOM to HTML).
JavaScript\uFEFF in strings.
LinuxCtrl+Shift+U, type FEFF, press Enter.

Technical Details

  • Unicode category: Cf (Format).
  • Width: Zero. No visible glyph.
  • Historical note: U+FEFF was originally defined as both a byte order mark AND a zero-width no-break space. Unicode 3.2 (2002) deprecated the inline use and introduced U+2060 () as the replacement for zero-width no-break space behavior.
  • UTF-8 representation: EF BB BF (3 bytes).
  • UTF-16BE representation: FE FF (2 bytes).
  • UTF-16LE representation: FF FE (2 bytes).
  • UTF-32BE representation: 00 00 FE FF (4 bytes).
  • UTF-32LE representation: FF FE 00 00 (4 bytes).

Should You Ever Use BOM?

For UTF-16: Yes. BOM is essential for UTF-16 files because it resolves the byte-order ambiguity.

For UTF-8: Almost never. The Unicode Consortium says: "Use of a BOM is neither required nor recommended for UTF-8." The only exception is software that specifically requires a BOM to identify UTF-8 files (some older Windows tools). For web, development, and data exchange, always use UTF-8 without BOM.

For new files: Save as UTF-8 without BOM. This is now the default in most modern text editors, but always verify your editor's settings.

The BOM Debugging Checklist

If you suspect BOM is causing an issue in your project, work through this checklist:

  1. Paste the file contents into our Invisible Character Viewer. If BOM is present, it will be highlighted as the very first character.
  2. Run file filename on the command line. It will say "UTF-8 Unicode (with BOM)" if BOM is present.
  3. Check your editor's encoding setting. Make sure it says "UTF-8" and NOT "UTF-8 with BOM."
  4. Check all files in the chain. If you are importing a CSV that was exported from Excel, the BOM may be in the CSV, not in your code.
  5. Check your .gitattributes. You can add rules to automatically strip BOM on commit.
  6. Check for BOM in environment files. .env files with BOM can cause environment variable parsing to fail silently. The first variable in the file will have three invisible bytes prepended to its name.

That last point is especially insidious. If your DATABASE_URL environment variable is not being recognized, check whether the .env file starts with BOM. The variable name becomes \xEF\xBB\xBFDATABASE_URL which does not match the string DATABASE_URL.

Frequently Asked Questions

What is a byte order mark (BOM)?
A BOM (U+FEFF) is an invisible character placed at the very beginning of a text file to indicate the character encoding and byte order. In UTF-8, it appears as the three bytes EF BB BF. In UTF-16, it indicates whether the file uses big-endian or little-endian byte order.
Why does BOM cause bugs?
BOM is invisible in most text editors but is real data at the start of the file. JSON parsers, CSV importers, XML processors, shell script interpreters, and PHP all expect files to start with actual content, not invisible characters. BOM breaks their parsing.
How do I remove a BOM from my file?
In VS Code: click the encoding in the status bar and choose 'Save with Encoding' then 'UTF-8' (without BOM). In Notepad++: Encoding > Convert to UTF-8. On the command line: sed -i '1s/^\xEF\xBB\xBF//' filename. Or use our Invisible Character Viewer to detect it.
Should I use UTF-8 with or without BOM?
Without BOM for almost all use cases. The Unicode Consortium does not recommend BOM in UTF-8 files. BOM is only necessary for UTF-16 files to indicate byte order. Most modern software assumes UTF-8, making BOM unnecessary and often harmful.
What is the difference between BOM and Word Joiner?
U+FEFF was originally defined as both a byte order mark (at the start of files) and a zero-width no-break space (inline in text). Unicode 3.2 split these roles: U+FEFF is now only for BOM, and U+2060 (Word Joiner) is the correct character for inline no-break behavior.

Related Characters

Need to detect or remove Byte Order Mark characters in your text?

Open Invisible Character Viewer