Automatically scan online job postings for keywords commonly related to certain job titles. In this example, we see what skills are commonly required of a “Web Developer”
I’ve been playing with Groovy and HTTPBuilder lately, using them to get JSON from open web services to use in mashups.
I also started thinking about jobs (to my current employer – no, I’m not looking for a new one). Wouldn’t it be useful to know what skills are being demanded in the current job market. For example, being a Java developer at heart, should I be learning Ant or Maven? What about EJB3 or Spring? How about .NET – do many jobs ask for both?
Below, I’ve written a simple Groovy script to get job postings off the job site AuthenticJobs.com (simply because they had an public API with no hurdles to jump). With those postings, the program will search the job description for keywords, and see which ones come out on top. This is by no means scientific, but it could make a useful tool for anyone in the job market, especially a newcomer such as a Graduate.
First, we must set up a new HTTP request:
import groovyx.net.http.HTTPBuilder import static groovyx.net.http.Method.GET import static groovyx.net.http.ContentType.TEXT import net.sf.json.groovy.* @Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.5.0-RC2') @Grab(group='net.sf.json-lib', module='json-lib', version='2.3', classifier='jdk15') def http = new HTTPBuilder('http://www.authenticjobs.com');
Notice the use of the Grab annotation. This is a nice feature within Groovy called Grape, which connects to the internet and downloads all the dependencies for the project. In this case, we need a JSON library, and the HTTPBuilder library.
Now we will issue an HTTP request to the AuthenticJobs webservice. We will ask for a list of fifteen matches, with keywords “Web Developer”, in JSON format:
def data = null; http.request(GET,TEXT) { uri.path = '/api/' uri.query = [api_key:"++++hidden++++", method: "aj.jobs.search", keywords: "web developer", perpage: "15", format:"json"] response.success = {resp,unformatted -> data = unformatted.text; } }
Now lets parse the data into JSON format so we can extract the job descriptions:
def json = new JsonSlurper().parseText(data)
Let’s define some variables which we will use when processing the JSON. You will see that we are doing a match on list items (bullet points). It’s not unfair to reason that most of the “skills” we wish to extract will be mentioned in bullet-pointed lists.
def words = [:] // This is where we store the keywords def match = "<li>([^<]+)</li>" // Where to look within the description def filter = ["and","to","in","of", // Common words to remove "a","with","the","one", "at","it","for","as","on", "be", "or", "is"] def junk = [",","—","."] // Some junk to filter out
Finally, the processing below:
json.listings.listing.each { job -> job.description.toLowerCase().split("\r\n").findAll {it =~ match}.each { def matcher = (it =~ match); // Find all bullet points String allWords = matcher[0][1]; junk.each {allWords = allWords.replace(it,"")} // Remove junk chars allWords.split().each { // Split into individual words word -> words[word] = (words[word] == null) ? 1 : words[word] + 1 } } } filter.each {words.remove (it)} // Take out common words words.sort {a, b -> b.value <=> a.value}.each { k,v -> print " (${k}, ${v}) " // Sort in descending order of relevence }
Here is my output from running the above for the string “Web Developer”:
(experience, 28) (web, 14) (knowledge, 12) (javascript, 11) (design, 11)
(development, 10) (css, 8) (must, 8) (strong, 7) (media, 7) (have, 7)
(working, 7) (php, 6) (skills, 6) (ability, 6) (using, 6) (sites, 6)
(html, 5) (understanding, 5) (xhtml, 4) (familiarity, 4) (other, 4)
(interaction, 4) ((we, 4) (this, 4) (has, 4) (user, 4) (work, 4)
(use, 4) (accessibility, 3) (seo, 3) (plus, 3) (communication, 3)
(new, 3) (technologies, 3) (creative, 3) (clean, 3) (email, 3)
(interface, 3) (jquery, 3) (creating, 3) (website, 3) (their, 3)
(building, 3) (photoshop, 3) (cross-browser, 3) (ajax, 3)
(excellent, 3) (able, 3) (programming, 3) (please, 3) (will, 3)
(front-end, 2) (development;, 2) (least, 2) (tools, 2)
(adobe, 2) (fluency, 2) (best, 2) (practices, 2) (visual, 2) (verbal, 2)
(interpersonal, 2) (desire, 2) (learn, 2) (projects, 2)
(code, 2) (wordpress, 2)
Let’s analyse some of what we have found above:
- Clear skill requirements are Javascript, CSS, PHP, XTML
- It is good to see that the industry rates accessibility as must-have
- This could be a tenuous observation, but experience comes well above both skills and ability combined
- Soft-skills have found their way onto the list as much as hard-skills. Indeed the days of pure computer geek-ery are over
Once again I want to point out the unscientific nature of this experiment.
AuthenticJobs ask users of their API to limit the number of requests to something reasonable, so caching would probably be a good strategy to use here, though I have left it out for brevity. I invite people to take what I have written here and use as they see fit.
Hope you find it useful!

[...] thanks to Matt Morten for his blog post combining the two technologies I wanted to use and it was one of the few resources that I could get [...]