NM Incite

Quick Links

The Social Marketer

Aug 9

Is Your Social Media Data Misinforming Your Marketing Strategy?

on August 9, 2012 - 9 Comments

In our last few blog posts, we’ve been describing how marketers are turning to social media research to uncover deep insights about their consumers. As with any research, the quality of the insights depends on the quality – not necessarily quantity – of the underlying data.

Let’s say you want to benchmark your brand vs. competitors and vs. itself so you can track brand health over time. Your initial findings show dramatic spikes in buzz for your brand, suggesting certain promotions are working. Months later you want to invest more in this research and analyze this data for qualitative insights, but wait, where are the messages? You realize that fifty-percent of the data you were basing decisions on was SPAM and generated by bots, 3% of the messages were duplicates of each other and 10% had absolutely nothing to do with your brand. Lots of social media data for the analysis? Yes. Relevant, trust-worthy, reliable findings? No.

All the data in the world is meaningless unless you know it is clean, reliable and accurate. You need the reassurance that what you’re collecting won’t skew your findings and result in poor decision-making. To help, we put together a short guide of what to look for when evaluating your social data options. Social media’s big advantage is that it represents authentic consumer expressions. However, there are a number of data issues to watch out for. Let me take you through some pitfalls and how to work around them.

How do I know what true, quality social media data looks like?
To be a serious marketing player in social media research, you need to ensure you’re getting the best combination of reach and accuracy. Reach means pulling in every single piece of content possibly available. Accuracy means balancing the breadth with relevancy. Here are the critical questions you should ask a company:

• How is the data collected? Is it all machine-based or is there a human element?
• How flexible is the data-mining technology?
• What type of SPAM detection filters are employed?
• Does the data come from third-party sources?
• How are new sites detected? How quickly are they brought online?
• How flexible are keyword searches and search filters?

Data Collection
Most data-mining applications work the same – they put out spiders and crawl the online world by pulling back everything they can get, but they often can’t get to everything. That’s why you need a human component to identify the sites that don’t get captured by web crawlers. Many popular sites also frequently change the way companies can collect their data, which can impact data integrity. To avoid this, you need a team on hand for real-time manual adjustments so there’s no interruption in data collection.

The flexibility of the data-mining technology also comes into play. Many platforms employ a one-size-fits-all approach, meaning that when it starts crawling a particular board, it won’t collect all the messages, just the ones it sees first until it hits its message threshold. For these instances, there should be a specialized team that can reconfigure the tool to capture all the messages.

Be watchful for and wary of companies that rely on third-party sources for data; research indicates that 80% of players in the space use third party commodity data providers. Third-parties aren’t always reliable; they can drop sources without notice, which will skew your findings. They also apply a ‘lazy man’s’ approach to data collection. They pull in as much volume as they can using collection services, not taking into consideration the relevance or reach of a source, and also possibly missing highly relevant, industry-specific sources not captured by collection services. The signal to noise ratio doesn’t look good here.

Data Hygiene
Effective SPAM detection is critical. All data collection should be using a machine-based learning algorithm. Machine-learning technologies get fed thousands of messages that qualify as SPAM. It then self-learns what different types of SPAM messages look like and weeds them out of the dataset. This detection is effective for blogs, boards and groups, but Twitter is a whole other beast. Twitter SPAM detection should be rule-based. Rule-based detection effectively weeds out messages from bots by checking a user’s profile for certain characteristics, such as no followers or a handle with no associated name.

Companies that offer flexible keyword (classifier) tools and filtering options provide an extra layer of accuracy and relevancy protection. For example, research indicates that in healthcare, 92% of cleaned messages are still irrelevant – so you need this flexibility. Not all SPAM can always be removed and believe it or not, not all messages that have your product or brand name in them are relevant, especially if it includes a common term (e.g., analysis on Snickers candy bar). Tools that use Boolean Logic allow you to get extremely specific with what you want to pull in; an added bonus is the availability to use proximity operators (you can tell the tool that you want “x” to be with a certain number of words from “z;” for example, if you want information on vanilla lattes, you tell the tool that “vanilla” has to appear with 3 words of “latte” to account for phrases such as “I had a latte; it was vanilla,” or “that vanilla spice latte was so delicious!”). And tools that allow you to apply segment filters give you even more relevance (segment filters allow you to search only on sources that are known to focus on specific topics or are comprised of certain target demographics). See figure below:

Social Media Data Hygiene

 

Finally, another bonus is analyst input. If there are lots of people working with the social media data on a daily basis, it’s easier to identify if some sites are being crawled appropriately or if the data isn’t entirely clean. With this constant feedback, the crawlers and machine-learning SPAM detection can be updated quickly.

How has the quality of data your company worked with affected your results?
 

To learn more about generating insights through social media,
download our white paper: The Customer-First Imperative
  • matt_pierson

    Johann, these are great points. In fact, they were great points when I made them 6 months ago and gave NMI a shoutout:

    http://measurematt.tumblr.com/post/17432329170/why-radian6-is-wrong-for-you-part-i
    http://measurematt.tumblr.com/post/17432340159/why-radian6-is-wrong-for-you-part-ii

    • Johann Dudley

      I hadn’t seen the articles, but definitely agree
      with you that owning your own content roadmap and the accountability for site
      collection is critical to ensure greatest relevancy, breadth and depth in
      content coverage.

  • matt_pierson

    That said, I think your definition of reach is confusing, since marketers define reach in a much different way. Why not use the industry standards of precision and recall instead of reach and accuracy.

    • http://www.nmincite.com/ NM Incite

      Thanks for your message, Matt. You are right that in this
      context, we are using “reach” as a synonym for “coverage,” rather than in the
      sense of media reach (size of audience exposed). The term here is used
      specifically in content acquisition parlance to refer to the depth to which a
      content crawler can collect data within a certain board or forum. Many 3rd
      party content aggregators have a fixed crawler depth that prevents them from
      reaching all content within a thread. We custom set crawler depth (reach) per
      site to ensure greatest possible coverage. As you point out, owning your own
      content collection destiny gives maximum flexibility.

  • owenayuk

    matt, it seems you are good at this, but i agree with nmincite
    http://www.loadedspot.com

  • Alta

    Glad to see some statistics term (signal-to-noise) that 99.99% of online marketers do not understand. How about an article discussing statistical measures used in marketing?

    • http://www.nmincite.com/ NM Incite

      Today, marketers are increasingly being asked to base their decisions off of numbers –product data is coming in from a variety of sources (social media, surveys, commercial ratings, etc) and marketers are expected to take the numbers and make an intelligible decision based on the findings. So as you point out, statistics is starting to become a huge part of marketing, and the marketers that have a grasp on this will be well equipped. Here’s a great article that talks about the importance of data in marketing:
      http://blogs.hbr.org/cs/2012/08/marketers_flunk_the_big_data_test.html

  • Ashish Pandey

    thankx for the post , its very nice and easy to understand social media on a daily basis

    • http://www.nmincite.com/ NM Incite

      Thanks – if there are other topics you’re interested in, please let us know!