将ANTLR生成的.tokens文件重格式化（C#版）（20080721更新）-C++-编程技术-六狼论坛-IT论坛-计算机论坛

RednaxelaFX 发表于 2013-2-4 20:20:31

将ANTLR生成的.tokens文件重格式化（C#版）（20080721更新）

相关链接：
将ANTLR生成的.tokens文件重格式化（Ruby版）
将ANTLR生成的.tokens文件重格式化（C++版）

既然都把Ruby和C++版写出来了，干脆有爱的把C#的版本也实现一个来看看。
果然不出所料，代码长度介于C++与Ruby的版本之间。比起C++版来说最大的优势就是不用自己去关心内存管理的问题。这个应用场景并不会很让人在意内存的使用量就是了。

其实这代码的长度跟Ruby的相比长不了多少，只不过我的这个实现里定义了一个struct用了好几行。如果像Ruby版本一样使用正则表达式的话就不用定义这个struct了。
注意到排序的部分，使用lambda expression的C#代码完全可以跟Ruby代码的简洁性媲美：一句话解决，甚至连类型都不需要指定，编译器会推导出来。因为用到的lambda表达式里只有一个表达式，所以连包围它的花括号都可以省下，真好嗯 = =
不需要像C++里那样为了使用自定义functor而需要专门定义一下给个名字……

嗯，我对C#的熟悉程度比对Ruby稍微高一些，写代码的速度目前还是C#快点。不过我相信只要不是为了追求执行效率的话，过不了多久我写Ruby代码的速度就会超过写C#的速度了。Sigh，动态语言的好处。

reformat.cs: (without regex)
using System;using System.IO;using System.Collections.Generic;struct TokenNameValuePair { private string m_name; private int m_value;    public TokenNameValuePair(string name, int value) {    this.m_name= name;    this.m_value = value; }    public string Name{ get { return this.m_name; } } public int Value { get { return this.m_value; } }}sealed class ReformatTokensFile { private const string USAGE = "Usage: reformat ";    static void Reformat(string infilename, string outfilename) {    List<TokenNameValuePair> lines = new List<TokenNameValuePair>();             using (StreamReader reader = File.OpenText(infilename)) {          string line = reader.ReadLine();          while (null != line) {             string[] parts = line.Split('=');             string name= parts;             int    value = Convert.ToInt32(parts);             lines.Add(new TokenNameValuePair(name, value));                            line = reader.ReadLine();          }    }             lines.Sort((first, second) => first.Value.CompareTo(second.Value));             using (StreamWriter writer = File.CreateText(outfilename)) {          foreach (TokenNameValuePair pair in lines) {             writer.WriteLine("{0}={1}", pair.Value, pair.Name);          }    } }    static void Main(string[] args) {    if (2 != args.Length) {          Console.WriteLine(USAGE);          return;    }             string infilename= args;    string outfilename = args;    if (!File.Exists(infilename)) {          Console.WriteLine("Invalid input file name.");          Console.WriteLine(USAGE);          return;    }             Reformat(infilename, outfilename); }}
注意第36行的lambda expression。
说起来，非要“减少行数”的话，在读文件的那个using语句（25-33行）里改成这样也行：
foreach (string line in reader.ReadToEnd().Split('\n')) { if (line.Equals(string.Empty)) continue; string[] parts = line.Split('='); lines.Add(new TokenNameValuePair(parts, Convert.ToInt32(parts)));}
行数减少了，性能上没什么显著的好处——一次把整个文件都读进来了，要是内存小而文件大的话恐怕要吃不消 = =

来看看用正则表达式的版本，行数是不是显著减少了
reformat.cs: (with regex)
using System;using System.IO;using System.Collections.Generic;using System.Text.RegularExpressions;sealed class ReformatTokensFile { private const string USAGE = "Usage: reformat ";    static void Reformat(string infilename, string outfilename) {    List<string> lines= new List<string>();    Regex revert = new Regex(@"^([^=]+)=(+)$");             using (StreamReader reader = File.OpenText(infilename)) {          string line = reader.ReadLine();          while (null != line) {             lines.Add(revert.Replace(line, "$2=$1"));             line = reader.ReadLine();          }    }    Regex leadingNumber = new Regex(@"^+");    lines.Sort((first, second) =>          Convert.ToInt32(leadingNumber.Match(first).Value).CompareTo(          Convert.ToInt32(leadingNumber.Match(second).Value))    );             using (StreamWriter writer = File.CreateText(outfilename)) {          foreach (string line in lines) {             writer.WriteLine(line);          }    } }    static void Main(string[] args) {    if (2 != args.Length) {          Console.WriteLine(USAGE);          return;    }             string infilename= args;    string outfilename = args;    if (!File.Exists(infilename)) {          Console.WriteLine("Invalid input file name.");          Console.WriteLine(USAGE);          return;    }             Reformat(infilename, outfilename); }}
注意第22-25行的lambda expression。

当然必须要提到的是，像上面这样用正则表达式很浪费时间。Ruby版的时候我纯粹是为了写起来顺手才那样写的，然而如果是用C#的话多少还是应该考虑下运行效率吧？
使用正则表达式来匹配，这个动作本身就会产生不少临时string对象，所以在不需要使用捕获型括号的时候应该尽量避免使用括号。
把第12行定义的leadingNumber这个正则表达式放到lambda表达式中传给Sort()，意味著排序中的每次比较都必须做两次正则表达式匹配。数组稍微大一点的话这里消耗的时间就会变得可观了。仔细想想，相对来说我还是比较倾向不在这里（C#版）中使用正则表达式，因为这个场景里Split()已经够用。
（更新：于是在Ruby的版本里我也避免了在sort的时候用正则表达式匹配）

-- LINQ更新：
不过前面的代码都还没充分发挥出C# 3.0的能力。下面就用C# 3.0的隐式类型、LINQ、var等新特性来重写第一个版本的代码：
using System;using System.IO;using System.Linq;using System.Collections.Generic;sealed class ReformatTokensFile { private const string USAGE = "Usage: reformat ";    static void Reformat(string infilename, string outfilename) {    var lines = MakeList(new { Name = string.Empty, Value = 0 });             using (var reader = File.OpenText(infilename)) {          var line = reader.ReadLine();          while (null != line) {             var parts = line.Split('=');             var name= parts;             var value = Convert.ToInt32(parts);             lines.Add(new { Name = name, Value = value });                            line = reader.ReadLine();          }    }             using (var writer = File.CreateText(outfilename)) {          foreach (var pair in from l in lines                               orderby l.Value                               select l)             writer.WriteLine("{0}={1}", pair.Value, pair.Name);    } } public static List<T> MakeList<T>(T itemOftype) {    return new List<T>(); }    static void Main(string[] args) {    if (2 != args.Length) {          Console.WriteLine(USAGE);          return;    }             var infilename= args;    var outfilename = args;    if (!File.Exists(infilename)) {          Console.WriteLine("Invalid input file name.");          Console.WriteLine(USAGE);          return;    }             Reformat(infilename, outfilename); }}
要说简洁了的话，确实啊……

注意这段代码是如何使用MakeList<T>这个辅助方法来创建隐式类型的List<T>的。
这种使用泛型的方式，在Java中就做不到（也没必要就是了，反正Java的泛型是类型擦除）。

页: [1]

六狼论坛's Archiver

将ANTLR生成的.tokens文件重格式化（C#版）（20080721更新）