Sunday, May 12, 2013

Select Most Frequent Value Then Filter by Lower Value if Top Frequency Is Not Unique Using LINQ

In this post I need to be able to parse and query some data that is saved in a flat file. First, let's create this txt file and name it data.txt. Copy and paste the below gibberish to data.txt and create another txt file for the output (result.txt).
20,3,2,2,20,1,4,5,5,1
22,800,600,500,22,0,400,900,0,300,700,300,100,200,400,18
19,200,700,400,700,500,500,900,800,800,400,300,700,22,19
500,900,400,0,500,100,900,0,0,600,600,200,700,900,0,99

So, I would like to operate on each line to get the number that occurs the most, also if some numbers are equally occurring then just take the smallest of them. Ex: if I have 1,2,3,3,1 then the result should be 1 because in this case 3 and 1 share the same frequency position but we just want a single value based on a condition (the lower value) Let's take a look at the code. I am using a custom extension method on the IEnumerable but you can do the same using a foreach loop.
 static void Main(string[] args)
 {
    var lines = File.ReadLines("data.txt");    
    var result = new List();
    var q = lines.Select(x => x)
                 .Select(x => x.Split(',')
                 .GroupBy(i => i)
                 .Select(g => new { g.Key, Count = g.Count()}))
                 .ForEach( v =>
                 {
                    var frequency = v.Max(h => h.Count);
                    result.Add(v.Where(n => n.Count == frequency).OrderBy(i => i.Key).First().Key);
                 });
            File.WriteAllText("result.txt", string.Join("\r\n", result));
 }
The final output is:
1
0
700
0

No comments:

Post a Comment