Cameron Fletcher

Random thoughts and dicussions on the things that interest me

Sentence and Word Analysis #2

This as part two of my post on sentence and word analysis. In part one I discussed my motives for analysing the RSS feed in question. In this post I shall be building upon my initial findings and presenting the C# and SQL code that I used to do so.

I have continued to run the RSS reader periodically and now have 284 job descriptions to analyse. I have run through the initial results and identified the words and sentences that are irrelevant and placed these into a keywords table so that I may strip them from my results. This was quite a lengthy process as there were a significant number of these to exclude - nearly a thousand. Following that, I looked through the results and because of the different permutations of the keywords that I was looking for it was evident that I would need to look within the top 100 words/phrases to identify the ones that I was interested in. I made a decision to leave in keywords that related to job skills in addition to computer languages.

The top 100 keyword/skills results from analysis of 284 job descriptions. The analysis took 9.5 minutes to run.

#  Word Rank   #  Word Rank   #  Word Rank
1   C# 301   35   CSS 29   69   structured 16
2   SQL 206   36   E-commerce 29   70   Unix 16
3   .NET 203   37   ASP 26   71   Website 16
4   Server 173   38   C# .NET 26   72   will work 16
5   ASP.NET 129   39   CRM 26   73   automated 15
6   SQL Server 122   40   Equities 26   74   Datawarehouse 15
7   SharePoint 79   41   RAD 26   75   Derivative 15
8   Office 78   42   SQL Server 2005 26   76   desk 15
9   Test 70   43   VBA 25   77   Equity 15
10   C++ 69   44   Winforms 25   78   International 15
11   banking 63   45   C#. 23   79   MOSS 15
12   Java 59   46   C#.NET 23   80   OLAP 15
13   London 59   47   Fixed Income 23   81   VB6 15
14   Front Office 55   48   framework 23   82   ASAP 14
15   XML 52   49   Quant 22   83   Back End 14
16   Windows 47   50  2 21   84   Basic 14
17   Oracle 45   51   Visual Studio 21   85   business req. 14
18   tools 45   52   GUI 20   86   comm. skills 14
19   database 44   53   VB.NET 20   87   document 14
20   Excel 43   54   Web based 20   88   experienced C# 14
21   FX 43   55   Access 19   89   functional 14
22   MS 41   56   Cash 18   90   VB 14
23   HTML 40   57   digital 18   91   .NET Framework 13
24   C# ASP.NET 39   58   Finance 18   92   .NET 3.5 13
25   life cycle 38   59   AJAX 17   93   ASP.NET C# 13
26   C# Developer 36   60   Biztalk 17   94   ASP.Net Developer 13
27   Reporting 36   61   Excel VBA 17   95   C# ASP.net SQL 13
28   analyst 34   62   media 17   96   degree 13
29   JavaScript 33   63   Security 17   97   MS SQL 13
30  3.5 32   64   ASP.Net SQL 16   98   Rates FX 13
31   agile 31   65   CMS 16   99   Reporting Services 13
32   architecture 31   66   credit derivatives 16   100   Siebel 13
33   communication 31   67   Silverlight 16      
34   .NET developer 29   68   Sophis 16      

A link to a backup of the database may be found here: jobs.zip (392.41 kb)
You will need to restore this into SQL Server before the .NET code (below) will work.

A link to the .NET code (C#) is here: RssReader.zip (3.48 kb)
You will need to modify the App.Config file to point to your RSS feed and database.

To run the analysis on the sentances in the database you'll need to execute the 'analyse' stored procedure. Once that has finished execuiting you'll need to perform a select from the 'analysis_results' view to view the results.

Posted: Jun 09 2009, 14:21 by flet0496 | Comments (1) RSS comment feed |
  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Filed under: .NET

Sentence and Word Analysis #1

I was not put forward for a job recently because I had Windows Forms experience and not WinForms experience stated on my CV. The same agency also said I had not been singled out as they were looking for someone who had worked with MVC, unlike me as I had only worked with the Model View Controller framework.

I have come to understand that I have to not only write my CV to appeal to prospective employers who should know what I’m talking about but also for the multiple layers of incompetent individuals though which my CV must make its journey before arriving before someone who can actually read. I have come to refer to these individuals as the jam layer because whilst I am the icing on the cake (and most of the time, the cake itself - that is, after all, what I get paid for) they are the jam in-between because they have jam for brains.

So, I decided to analyse the results from a jobserve.com search for .NET roles based in the UK and identify the top used keywords and phrases so that I could litter my CV with them in the hope that someone with jam for brains would identify the correlation, even if they don't understand what that means.

To achieve this completed the following steps:

  1. I wrote an RSS reader in C# that read my RSS feed (for .NET jobs based in UK) that I'd set up through jobserve.com.
  2. The RSS reader then iterated through each posting and called a stored procedure in my SQL database that added the content of the job posting to a sentences table as a string.
  3. I then had a stored procedure that split the string into words and added them to a words table along with details of the sentence they were in and their position within that sentence.
  4. The more complex bit was looping through the words in each sentence concatenating from one to ten consecutive words together throughout the sentence and placing the resultant string into an analysis table.
  5. I then performed a simple groupby query eliminating the conjunctives (and, or, as, etc.) to retrieve the results.

This took me a couple of hours and as jobserve.com only allows you to receive the last 24 hours worth of job postings via RSS I present you with the top 30 ranked words/phrases from 75 job postings (below). The rank column details the number of occurrences of that word within the 75 postings.

#  Word Rank   #  Word Rank   #  Word Rank
1  experience 114   11  skills 44   21  Server 36
2  C# 83   12  team 44   22  test 36
3  developer 75   13  experience of 43   23  ASP.NET 35
4  development 73   14  Risk 43   24  candidate 33
5  Strong 56   15  SQL 43   25  Web 33
6  knowledge 51   16  role 42   26  Applications 31
7  working 51   17  Business 39   27  work 29
8  Investment 49   18  client 39   28  SQL server 28
9  .NET 45   19  based 38   29  contract 27
10  trading 45   20  knowledge of 36   30  CV 27

Annoyingly, these results are probably more use to anyone wishing to write an appealing CV profile about themselves. For my purposes I probably need a larger dataset to work on – I reckon about 500-600 job postings and more granular analysis of wordsets ie. group words into technical, competency based, business area, etc. – and also to look into maybe the top 100 from specific wordsets rather than just the top 30 generic words.

Part two of this post may be found here.

Posted: May 30 2009, 11:59 by flet0496 | Comments (2) RSS comment feed |
  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Filed under: .NET