Sentence and Word Analysis #2
This as part two of my post on sentence and word analysis. In part one I discussed my motives for analysing the RSS feed in question. In this post I shall be building upon my initial findings and presenting the C# and SQL code that I used to do so.
I have continued to run the RSS reader periodically and now have 284 job descriptions to analyse. I have run through the initial results and identified the words and sentences that are irrelevant and placed these into a keywords table so that I may strip them from my results. This was quite a lengthy process as there were a significant number of these to exclude - nearly a thousand. Following that, I looked through the results and because of the different permutations of the keywords that I was looking for it was evident that I would need to look within the top 100 words/phrases to identify the ones that I was interested in. I made a decision to leave in keywords that related to job skills in addition to computer languages.
The top 100 keyword/skills results from analysis of 284 job descriptions. The analysis took 9.5 minutes to run.
| # |
Word |
Rank |
|
# |
Word |
Rank |
|
# |
Word |
Rank |
| 1 |
C# |
301 |
|
35 |
CSS |
29 |
|
69 |
structured |
16 |
| 2 |
SQL |
206 |
|
36 |
E-commerce |
29 |
|
70 |
Unix |
16 |
| 3 |
.NET |
203 |
|
37 |
ASP |
26 |
|
71 |
Website |
16 |
| 4 |
Server |
173 |
|
38 |
C# .NET |
26 |
|
72 |
will work |
16 |
| 5 |
ASP.NET |
129 |
|
39 |
CRM |
26 |
|
73 |
automated |
15 |
| 6 |
SQL Server |
122 |
|
40 |
Equities |
26 |
|
74 |
Datawarehouse |
15 |
| 7 |
SharePoint |
79 |
|
41 |
RAD |
26 |
|
75 |
Derivative |
15 |
| 8 |
Office |
78 |
|
42 |
SQL Server 2005 |
26 |
|
76 |
desk |
15 |
| 9 |
Test |
70 |
|
43 |
VBA |
25 |
|
77 |
Equity |
15 |
| 10 |
C++ |
69 |
|
44 |
Winforms |
25 |
|
78 |
International |
15 |
| 11 |
banking |
63 |
|
45 |
C#. |
23 |
|
79 |
MOSS |
15 |
| 12 |
Java |
59 |
|
46 |
C#.NET |
23 |
|
80 |
OLAP |
15 |
| 13 |
London |
59 |
|
47 |
Fixed Income |
23 |
|
81 |
VB6 |
15 |
| 14 |
Front Office |
55 |
|
48 |
framework |
23 |
|
82 |
ASAP |
14 |
| 15 |
XML |
52 |
|
49 |
Quant |
22 |
|
83 |
Back End |
14 |
| 16 |
Windows |
47 |
|
50 |
2 |
21 |
|
84 |
Basic |
14 |
| 17 |
Oracle |
45 |
|
51 |
Visual Studio |
21 |
|
85 |
business req. |
14 |
| 18 |
tools |
45 |
|
52 |
GUI |
20 |
|
86 |
comm. skills |
14 |
| 19 |
database |
44 |
|
53 |
VB.NET |
20 |
|
87 |
document |
14 |
| 20 |
Excel |
43 |
|
54 |
Web based |
20 |
|
88 |
experienced C# |
14 |
| 21 |
FX |
43 |
|
55 |
Access |
19 |
|
89 |
functional |
14 |
| 22 |
MS |
41 |
|
56 |
Cash |
18 |
|
90 |
VB |
14 |
| 23 |
HTML |
40 |
|
57 |
digital |
18 |
|
91 |
.NET Framework |
13 |
| 24 |
C# ASP.NET |
39 |
|
58 |
Finance |
18 |
|
92 |
.NET 3.5 |
13 |
| 25 |
life cycle |
38 |
|
59 |
AJAX |
17 |
|
93 |
ASP.NET C# |
13 |
| 26 |
C# Developer |
36 |
|
60 |
Biztalk |
17 |
|
94 |
ASP.Net Developer |
13 |
| 27 |
Reporting |
36 |
|
61 |
Excel VBA |
17 |
|
95 |
C# ASP.net SQL |
13 |
| 28 |
analyst |
34 |
|
62 |
media |
17 |
|
96 |
degree |
13 |
| 29 |
JavaScript |
33 |
|
63 |
Security |
17 |
|
97 |
MS SQL |
13 |
| 30 |
3.5 |
32 |
|
64 |
ASP.Net SQL |
16 |
|
98 |
Rates FX |
13 |
| 31 |
agile |
31 |
|
65 |
CMS |
16 |
|
99 |
Reporting Services |
13 |
| 32 |
architecture |
31 |
|
66 |
credit derivatives |
16 |
|
100 |
Siebel |
13 |
| 33 |
communication |
31 |
|
67 |
Silverlight |
16 |
|
|
|
|
| 34 |
.NET developer |
29 |
|
68 |
Sophis |
16 |
|
|
|
|
A link to a backup of the database may be found here: jobs.zip (392.41 kb)
You will need to restore this into SQL Server before the .NET code (below) will work.
A link to the .NET code (C#) is here: RssReader.zip (3.48 kb)
You will need to modify the App.Config file to point to your RSS feed and database.
To run the analysis on the sentances in the database you'll need to execute the 'analyse' stored procedure. Once that has finished execuiting you'll need to perform a select from the 'analysis_results' view to view the results.
Sentence and Word Analysis #1
I was not put forward for a job recently because I had Windows Forms experience and not WinForms experience stated on my CV. The same agency also said I had not been singled out as they were looking for someone who had worked with MVC, unlike me as I had only worked with the Model View Controller framework.
I have come to understand that I have to not only write my CV to appeal to prospective employers who should know what I’m talking about but also for the multiple layers of incompetent individuals though which my CV must make its journey before arriving before someone who can actually read. I have come to refer to these individuals as the jam layer because whilst I am the icing on the cake (and most of the time, the cake itself - that is, after all, what I get paid for) they are the jam in-between because they have jam for brains.
So, I decided to analyse the results from a jobserve.com search for .NET roles based in the UK and identify the top used keywords and phrases so that I could litter my CV with them in the hope that someone with jam for brains would identify the correlation, even if they don't understand what that means.
To achieve this completed the following steps:
- I wrote an RSS reader in C# that read my RSS feed (for .NET jobs based in UK) that I'd set up through jobserve.com.
- The RSS reader then iterated through each posting and called a stored procedure in my SQL database that added the content of the job posting to a sentences table as a string.
- I then had a stored procedure that split the string into words and added them to a words table along with details of the sentence they were in and their position within that sentence.
- The more complex bit was looping through the words in each sentence concatenating from one to ten consecutive words together throughout the sentence and placing the resultant string into an analysis table.
- I then performed a simple groupby query eliminating the conjunctives (and, or, as, etc.) to retrieve the results.
This took me a couple of hours and as jobserve.com only allows you to receive the last 24 hours worth of job postings via RSS I present you with the top 30 ranked words/phrases from 75 job postings (below). The rank column details the number of occurrences of that word within the 75 postings.
| # |
Word |
Rank |
|
# |
Word |
Rank |
|
# |
Word |
Rank |
| 1 |
experience |
114 |
|
11 |
skills |
44 |
|
21 |
Server |
36 |
| 2 |
C# |
83 |
|
12 |
team |
44 |
|
22 |
test |
36 |
| 3 |
developer |
75 |
|
13 |
experience of |
43 |
|
23 |
ASP.NET |
35 |
| 4 |
development |
73 |
|
14 |
Risk |
43 |
|
24 |
candidate |
33 |
| 5 |
Strong |
56 |
|
15 |
SQL |
43 |
|
25 |
Web |
33 |
| 6 |
knowledge |
51 |
|
16 |
role |
42 |
|
26 |
Applications |
31 |
| 7 |
working |
51 |
|
17 |
Business |
39 |
|
27 |
work |
29 |
| 8 |
Investment |
49 |
|
18 |
client |
39 |
|
28 |
SQL server |
28 |
| 9 |
.NET |
45 |
|
19 |
based |
38 |
|
29 |
contract |
27 |
| 10 |
trading |
45 |
|
20 |
knowledge of |
36 |
|
30 |
CV |
27 |
Annoyingly, these results are probably more use to anyone wishing to write an appealing CV profile about themselves. For my purposes I probably need a larger dataset to work on – I reckon about 500-600 job postings and more granular analysis of wordsets ie. group words into technical, competency based, business area, etc. – and also to look into maybe the top 100 from specific wordsets rather than just the top 30 generic words.
Part two of this post may be found here.