字符编码的批量与单个转换

Posted on 2017-05-25 Edited on 2022-04-24 In external

前提

这里暂时要求使用者知道源文件是什么编码
尽管Python中可能有猜测的函数,但是不保证字数少时不出差错

单个

emacs中

先正常显示

1 2	M-x revert-buffer-with-coding-system gbk [ENTER]

该方法也能用于猜测文件是什么编码,由于使用地区的限制,能够有的文件编码格式本身不多
中国

gbk
gb2312

日本

shift-jis
euc~jp~

标准

utf-8

然后设置保存文件所用编码

1 2	M-x set-buffer-file-coding-system utf-8 [ENTER]

Python

file=open(name,'r')     # 打开旧文件                                        
content=file.read()     # 读取旧文件                                        
newcontent=content.decode('shift-jis').encode('utf-8') # 转换编码           
file.close()                                           # 关闭文件           
newfile=open(newname,'w') # 创建同名新文件                                  
newfile.write(newcontent) # 写入新文件                                      
newfile.close()           # 关闭新文件

shell

1	iconv -f GBK -t UTF-8 file1 -o file2

貌似大小写通用

使用

iconv -l

查看支持的编码列表

批量

Python

# _*_coding:utf-8 _*_                                                             # __author__="chougousui"                                                           
import os
import sys
input=sys.argv[1]
# input='/home/chougousui/prog/temp/sr'                                             
rootdir =os.path.abspath(input)
list=[]
for parent,dirnames,filenames in os.walk(rootdir):
    for dirname in dirnames:
        list.append(os.path.join(parent, dirname)) # 将父文件夹下的子目录加入列表   
    for filename in filenames:
        list.append(os.path.join(parent, filename)) # 将父文件夹下的子文件加入列表  
for name in list:
    print name                  # 查看列表                                          
print                           # 换行                                              
for name in list:
    newname=rootdir+'_edited'+name.split(rootdir)[1]
    # 新建父文件夹名,并将之下的子文件(夹)名抄录                                     
    if os.path.splitext(newname)[1]=='': # 如果是文件目录                           
        if not os.path.exists(newname):
            os.makedirs(newname) # 文件目录那一般不存在,新建                        
            print 'created new dir: '+newname
    else:                       # 如果是文件                                        
        file=open(name,'r')     # 打开旧文件                                        
        content=file.read()     # 读取旧文件                                        
        newcontent=content.decode('shift-jis').encode('utf-8') # 转换编码           
        file.close()                                           # 关闭文件           
        newfile=open(newname,'w') # 创建同名新文件                                  
        newfile.write(newcontent) # 写入新文件                                      
        newfile.close()           # 关闭新文件                                      
        print newname             # 输出处理结果

shell

现在能在文件夹下只有文件时使用循环命令来转换,有文件夹的情况尚未考虑,
进展中