Home » Infrastructure » Unix » Determine Encoding Type
Determine Encoding Type [message #133033] Wed, 17 August 2005 07:16 Go to next message
mrinalkumar01
Messages: 6
Registered: April 2005
Junior Member
Hi,

How can i determine the encoding type (ASCII/ISo88591/UTF-8) of a particular file.

Thanks
Re: Determine Encoding Type [message #134628 is a reply to message #133033] Thu, 25 August 2005 17:13 Go to previous message
andrew again
Messages: 2577
Registered: March 2000
Senior Member
Windows supports High-endian and Low-endian Unicode files. (wordpad > save-as). There are 2 bytes at the beginning of the file which indicate if it is High or Low endian.

If there is no marker in the file then I don't see how you could interpret the contents (assuming the file extension doesn't tell you). "file" utility on Unix makes a best guess at the content of a file - you can try that. If 7-bit ascii, you wouldn't find any bytes with values above 127, whereas you would in something like ISO-1. ISO-1 or ISO-15 have an unused range from 128-159, whereas Windows (cp1252) could have bytes in that range. Without having byte markers in the file or knowing what codepage it was written in, I don't think you can tell. Some operating systems store additional attributes about a file which could include something like codepage.
Previous Topic: Regular Expression Count
Next Topic: install oracle 9i error at starting! help me!!
Goto Forum:
  


Current Time: Mon Apr 15 23:57:28 CDT 2024