Most Windows text files use "ANSI", "OEM", "Unicode" or "UTF-8" encoding. What Windows terminology calls "ANSI encodings" are usually single-byte ISO/IEC 8859 encodings (i. e. ANSI in the Microsoft Notepad menus is really "System Code Page", non-Unicode, legacy encoding), except for in locales such as Chinese, Japanese and Korean that require double-byte character sets. ANSI encodings were traditionally used as default system locales within Windows, before the transition to Unicode. By contrast, OEM encodings, also known as DOS code pages, were defined by IBM for use in the original IBM PC text mode display system. They typically include graphical and line-drawing characters common in DOS applications. "Unicode"-encoded Windows text files contain text in UTF-16 Unicode Transformation Format. Such files normally begin with Byte Order Mark (BOM), which communicates the endianness of the file content. Although UTF-8 does not suffer from endianness problems, many Windows programs (i. e. Notepad) prepend the contents of UTF-8-encoded files with BOM, to differentiate UTF-8 encoding from other 8-bit encodings.
Because of their simplicity, text files are commonly used for storage of information. They avoid some of the problems encountered with other file formats, such as endianness , padding bytes, or differences in the number of bytes in a machine word . Further, when data corruption occurs in a text file, it is often easier to recover and continue processing the remaining contents. A disadvantage of text files is that they usually have a low entropy , meaning that the information occupies more storage than is strictly necessary.