Qlik Sense for analytics

I recently started using Qlik Sense for dashboard creation and thought I would share my thoughts on it. It’s been a while since I used something like this as part of my day to day work. Having been more on the ad-hoc analytics side of the business before, I’m more used to creating visualization using whatever seems most appropriate. Personally, I prefer to make my own tools or visualization whatever feels most appropriate. However, now that there is a big focus on reporting too, the company has been using Qlik Sense because it can be deployed easily, is simple to use and is fast.

Qlik Sense is free to download from this link. It isn’t available on a mac, only on Windows, which is a little bit of a shame.

Qlik Sense is a great tool for exploring data sets, creating a few visualization and perhaps even making a quick mock up of a dashboard idea (which you may keep in QS or consider later migrating to Qlik View or something else that’s a little less rigid on design rules). Qlik Sense would be a great tool for business users as it allows for light analytical capabilities and is easy to use, I have no doubt that almost anyone could pick up how to use it.

I won’t discuss the data load side, however I will only say that in the newest version of Qlik Sense, they have introduced the smart data load, which is more of a visual tool for data loading.

The general feel of Qlik Sense is very intuitive, smart and simple. Below is an image of the design area and the charts that you can use are available in the left hand navigation bar. You can then choose your measure(s) and dimension(s).

sense2

Compared to something like Tableau, my personal opinion in that Qlik Sense is a lot more modern and visually appealing. Things like colors etc are customizable and you can also color by expression. Further you can do set analysis, same as in Qlik View, which can be useful.

However, in terms of design, you are limited in QlikSense as it is fairly rigid in terms of where you can place elements and sizing is sometimes frustrating. The grid on the UI forms the limits of where an element can begin and end, which sometimes means buttons are too big or too small. A table that just could fit in the minds eye… well it just doesn’t. MapsSense

But, let’s not let the down sides detract from the fact that this is a great tool. One that I use and will continue to use because I see it has a place in the company who wants business insights to be made available to those outside of the general domain of data science/analytics.

Other fun features in QlikSense are maps, which I will discuss in more detail in a couple of weeks. I’ll keep posting on Qlik Sense so stay tuned.

Compare files in TextWrangler

Being able to compare two text files is a useful tool for data analytics, especially in the early data munging phase. In TextWrangler, this is really easy and fast to do. I recently used this tool to check file formatting. One other example is if you have two files with source code and you would like to see differences. 

In the menu, select ‘Search’ > ‘Find differences’. Select the two files that you would like to compare and then click compare.

Alternatively, highlight the two files you would like to compare, left click and select ‘Compare’.

You will then see the two files side by side, along with an overview underneath. Here you can see the non-matching lines.

Compare TextWrangler

Intro to Terminal on a Mac

Using Terminal on a Mac is something quite useful if you are working with data analytics. It can be used to run scripts, access databases and manage files. It means that everything is at your fingertips, you do not need to click around and streamlines certain tasks to some extent. However, it does take some getting used to so here are some quick tips for starting out with the command line.

You can access terminal by going to \Applications\Utilities\Terminal

ls – this command will show all files or folders within the current directory. You can also use ls -l to view the directory content as a list.

cd – used to change directory (e.g cd Documents/myfolder)

open – to open a file, for example ‘open myfile.text’ will open the file myfile.text.

history – used to call command history.

rm – remove a file. If you want to remove a number of files, e.g cat1, cat2, cat3… just type rm cat* and they will all be deleted.

ctrl-z – stop any commands that are running.

man – manual command. Using this command, you can call on the manual for a particular command. e.g man python

python manual terminal

Similar to R, in the Terminal window you can use the up arrow to scroll through recent commands.

say – with this command you can make your mac talk. Try the command ‘say “more terminal tips to follow soon”

…as it says, more terminal tips to follow soon. I will discuss database connection, file operations, python programming and more from the command line.

“rworldmap”

This week I wanted to make a simple map to show countries that have (or have not) responded to a questionnaire. The R package “rworldmap” seemed to be the best fit for what I wanted to achieve. This package allows you to join your data to a world map, using countries or other geographic regions. In the past I have used GIS software for tasks such as this (MapInfo, ArcGIS, QGIS etc) but I think that nowadays, using specific software for geographic oriented tasks is not as efficient.

So here is a quick intro to simple plotting with the “rworldmap” package. I will add some more information on this package at a later date, introducing some more complex capabilities. 

I created a .csv file with three columns; ‘country_name’, ‘company’ and a categorical field (‘Response’), which indicates whether they have or have not responded with a simple ‘yes’ or ‘no’. 

Install the package as follows:

install.packages(“rworldmap”) 

Load it into the workspace using:

library(“rworldmap”)

To call the list of countries available in the “rworldmap” package you can use the following:

data(countryExData)

countries <- countryExData[ ,2]

To get the country ISO code, you can use:

ISO_code < countryExData[ ,1]

You can also use the following to extract the larger regions:

EPI_regions <- countryExData[ ,3]

GEO_regions <- countryExData[ ,4]

The data was loaded as a .csv file using:

country_response <- read.csv(file=”Countries_Response.csv”)

Then to join to the rworldmap data using the country name field:

join_country_res <- joinCountryData2Map(country_response, joinCode=”NAME”, nameJoinColumn=”Country”)

To map the results you can use the following:

mapCountryData(join_country_res,nameColumnToPlot=”Response”)

Image

You can see that the countries that do not have colours are either missing from my dataset, or have a different name, which cannot be matched with the names from the r package. For this reason, it is better practise to use the country ISO code as the join field as you may otherwise miss some countries (due to different naming). I added an ISO code column to the data set, reloaded it into the dataframe and rejoined the data, this time using:

join_country_ISO <- joinCountryData2Map(country_response, joinCode=”ISO3″, nameJoinColumn=”ISO”)

In my dataset it made very little difference, however less match errors were returned when I ran the join and you can see below that a couple of extra countries have coloured. 

Image

 

 

 

Two Islands

tug of warMarketeers and Intelligence analysts. We are like the French and the English. Like oil and water, sliding over one another without ever actually having to mix. But this is one relationship that should be airtight. If you want a successful business then these two need to be working in perfect harmony, but they usually aren’t.

It’s no secret that there is often little love lost between marketing and analytics. It makes me sad when I see people twittering things like “Big Data People need to GET OFF MY LAWN”. This shows the frustration that marketeers face when nosey data analysts come poking their fingers at their strategy. The data analyst can seem like the hair in their soup, the anchovy on their pizza and the milk in their coffee…we’re ruining everything. However, this is a deep problem that can affect the very core of a companies strategy and success. One that, if not addressed, could have serious consequences in performance.

The marketeer and the analyst ‘know’ their customer in very different ways. While the analyst is crunching numbers behind a digital facade, the marketeer is talking to customers, hearing their complaints and ‘feeling’ things that could never penetrate the cold hard confines of the very factual database environment.

Data analysts are not always the most sociable people. Sometimes we need to get up from behind R studio and go and talk to someone from marketing. Why not spend the day with them so you know what they do. Sit uncomfortably close to them, so you can peek at their screen and see what it is that they do. The data cannot tell you everything and in a world where we are swamped by gigantic volumes of data, the marketeer can give guidance and focus. It’s possible to find a whole range of exciting correlations but it’s also a danger if it is misguided and it is here that human intuition is of value.

These differences could be addressed at the very beginnings of project realisation. The contact point between marketing and intelligence analytics usually comes somewhere in the middle of a roadmap. I propose that it should come at the beginning.

Okay marketeers. Where am I on your journey? I’d like to be at the gate when you leave the house so that we can travel together. It will be fun. We can plan the best route together, get to know each other and help each other through whatever comes our way. Usually though I’m the sucker you hitch a ride with when you’re lost out on the open road and want to go back home. I’m saving your back, but my load is heavier. You’re not a nice passenger either. You’ve become grumpy because things didn’t turn out how you wanted and now you are resentful that I am helping you out.

If only we’d got together at the gate…

This fuzzy approach to data analytics is one that I see a lot. The two sides almost resenting each other. One for not being asked and the other for asking. Yet the power of these two combined in a supporting and innovative way enriches a business. Many companies have themselves to blame for this alienation. By separating the two, it’s little wonder they can’t work efficiently together. They group us together in specialised teams; the ‘marketing’ team, the ‘retail’ team, the ‘customer intelligence’ team.

So scrap the ‘BI team’ or the ‘Marketing team’. Work on a project basis instead for optimal cooperation. Flexible work spaces are a great way of improving this relationship. Instead of having islands of analytics and marketing, switch it around so they sit together in relation to the goals that they are working on? This way they are together at every stage in the process, which can only mean intelligent insights based on both data and marketing intuition… or excessive squabbling.

We are all here for the same thing so let’s make peace not war.

Effective A/B Testing

A/B testing is the method of pitting scenario_control against scenario_variant and seeing if the latter leads to an uplift in conversion. It is a big buzz in the e-commerce world. A lot of people are talking about it, but do they understand how to make it work…truly work. I have a number of years working in a front-end environment where A/B testing was at the forefront of the company’s web optimisation techniques.

But how effective is it? How can you be sure that what you are seeing isn’t a false positive? Is A/B testing a useful tool or a fools paradise?

There has been a fair amount written on the topic of A/B testing and how to make it effective in terms of carrying out the actual experiment phase.  I will give a small overview of what I feel are the more helpful ideas and approaches offered by other authors. Myself, I really want to highlight how data analytics can be used in order to build your starting hypotheses. As a data scientist, I feel that there is a lot that can be done in the initial phase in order to form a sound business case for running an experiment.

In general the most important factors are a strong hypotheses, effective sample size, adequate time (often it’s a battle between time needed vs time available).

Back it up with data…

What is it that you hope to achieve will this? Do you want to increase conversion in terms of revenue, likes, memberships etc? Be clear from the start about why you are running an experiment and use the data that is available in the company in order to build a small business case first.

Using the insights from the data could be a great starting place.  It eliminates/compliments the “I have an idea…let’s just try it” approach, grounding the experiment in some real firm evidence for the proposed variant.

The bigger the better…

The statistical power of an A/B experiment increases as the sample size grows. So, it’s important that there are enough users in your experiment. If traffic to your site is low then there is a very serious risk that any experiment you do decide to run will be a waste of time. Small sample size means that there is a greater chance of random variables.

The larger the sample size, the more statistical power the experiment will have. Statistical power is essentially the probability that a variant will come back as a positive result when it is, in reality, truly positive.

Do not give in to impatience…

Time is also a big factor. Essentially, the more time the better. Stopping experiments as soon as they are positive may be detrimental. Martin Goodson of Qubit has written an interesting paper on effective A/B testing titled “Most winning A/B test results are illusionary”. He stresses that ending an A/B test when as soon as you see a positive result “will result in false positives”.

These last two factors interplay with each other. The more people in your experiment, the less time you need to run it for a vice versa. Sometimes we are restricted by either or. Or both. So perhaps then you should really reconsider using A/B testing altogether.

In Mats Einarsen’s blog post on A/B testing, he demonstrates the problem with not allowing enough time or not repeat testing by linking to a code which runs an A/A test. In this experiments, both variants are the same but we still see false positives appearing (up to 25%).

The paper from Goodson is a good starting point if you’d like to know more about best practises for A/B testing. He outlines 4 key features of running a successful A/B test:

– use a valid hypothesis

– do a power calculation first to estimate sample size

– do not stop the test early if you use ‘classical methods’ of testing

– perform a second ‘validation’ test repeating your original test to check that the effect is real

I’d add a fifth one, which is…

 think about whether you really need to run an A/B test in the first place

Okay, so it’s a fancy technique and all the big e-commerce companies are doing it BUT is it appropriate for your company/situation? If you have a small number of users, or not enough time to really test properly then perhaps use a different approach, perhaps focus more on using data analytics to build a case for change. Don’t fall victim to fancy software solutions that promise optimisation bliss because, if you aren’t careful the result will probably be the same…or worse.

Further reading:

http://blog.rjmetrics.com/2014/07/07/state-your-hypothesis-a-scientific-approach-to-ab-testing/

http://www.cennydd.com/blog/statistical-significance-other-ab-pitfalls

http://www.evanmiller.org/how-not-to-run-an-ab-test.html

http://www.einarsen.no/is-your-ab-testing-effort-just-chasing-statistical-ghosts/