JL Forums

Calling all Savvy Spinning Mathematicians

0 Members and 1 Guest are viewing this topic.

Offline HoneyJo

  • *****
  • 1655
  • HoneyJo
    • View Profile
    • 'American Freelance Writer'
    • Email
Re: Calling all Savvy Spinning Mathematicians
« Reply #15 on: August 02, 2012, 08:51:41 PM »
Thanks Meg and holy smokes,

Above my head! You're gonna have to wait for Miss TBS (Meg)!

HJ
'I haven't lost my mind, it's backed up on my hard-drive somewhere!'
American Freelance Writer

Offline Meg

  • *****
  • 7486
  • Freedup to do what you really want
    • View Profile
Re: Calling all Savvy Spinning Mathematicians
« Reply #16 on: August 02, 2012, 10:16:34 PM »
Thanks Meg and holy smokes,

I was expecting to get in the millions, but had no idea that the variations would run into these kinds of crazy figures, and that is without the word/phrase level! Such a document goes a veeeeeeeeeeery long way. I have privately requested the services of our favorite mathematician, to calculate the figures from a few different angles :-)

 

LOL, you're making my brain spin! Millions of variations and now from different angles! I'm getting dizzy.

Re: Calling all Savvy Spinning Mathematicians
« Reply #17 on: August 02, 2012, 11:46:24 PM »
Thanks Meg and holy smokes,

I was expecting to get in the millions, but had no idea that the variations would run into these kinds of crazy figures, and that is without the word/phrase level! Such a document goes a veeeeeeeeeeery long way. I have privately requested the services of our favorite mathematician, to calculate the figures from a few different angles :-)

 

LOL, you're making my brain spin! Millions of variations and now from different angles! I'm getting dizzy.

Hehe - different angles meaning how many variations about 75%, how many above 90%, that sort of thing :-)

Re: Calling all Savvy Spinning Mathematicians
« Reply #18 on: September 28, 2012, 09:13:02 AM »
Referring to the second question in this thread, when faced with a problem like this, I always like to think of buckets.  For example, just consider the 1st paragraph of the first seed article for a moment:

  • There are 5 sentences, so think 5 buckets - one for each sentence
  • Each sentence is rewritten 4 times.  So, in total, you have 5 variations for each sentence
  • So, put those 5 variations of each sentence into their respective buckets
  • You now have 5 buckets, each with 5 sentences in them
  • To create your paragraph, you select 1 sentence from each bucket
  • Hence, when you select a sentence from the 1st bucket, there are 5 possible different ways to do this
  • Similarly with all the other buckets, the are  5 possible different ways to select each sentence
  • Thus, the 1st paragraph has 5 x 5 x 5 x 5 x5 = 3,125 different ways of being created (i.e. 5 to the power 5)

Similarly:

  • The 2nd paragraph can be created in 1,953,125 different ways (i.e 5 to the power 9)
  • The 3rd paragraph can be created in 3,125 different ways (i.e. 5 to the power 5)
  • The 4th paragraph can be created in 625 different ways (i.e. 5 to the power 4)
  • The 5th paragraph can be created in 78,125 different ways (i.e. 5 to the power 7)
  • The 6th paragraph can be created in 1,953,125 different ways (i.e. 5 to the power 9)
  • The 7th paragraph can be created in 3,125 different ways (i.e. 5 to the power 5)


Now let's look at the finished article.  This article is created by combining 7 paragraphs.

There are 7 paragraphs so think 7 buckets, 1 for each paragraph.  Get where this is leading us...

  • The 1st paragraph bucket has 3,125 different variations in it
  • The 2nd paragraph bucket has 1,953,125 different variations in it
  • The 3rd paragraph bucket has 3,125 different variations in it
  • The 4th paragraph bucket has 625 different variations in it
  • The 5th paragraph bucket has 78,125 different variations in it
  • The 6th paragraph bucket has 1,953,125 different variations in it
  • The 7th paragraph bucket has 3,125 different variations in it


So to create the article you choose 1 variation from each successive bucket.  Giving the total number of different articles possible as:
3,125    x    1,953,125    x    3,125    x    625    x    78,125    x    1,953,125    x    3,125

This results in a number that is very roughly 5 million, million, million, million, million.  (I will not use billions or trillions here as, depending on what part of the world you originate from, the terms billion and trillion can mean different quantities.)

If, in effect, you have 5 seed articles, then think of 5 even larger buckets, one for each seed article, and repeat the maths.  You end up with a very, very, very large number which is in the region of 1.5 google - yes, a google was a very impressive number long before it was a search engine giant!

However, I find the number of variations somewhat misleading.  If you go back to basics, you find that you have written 25 different versions of the first sentence.  So, if you use the spintax to pump out 1,000 different articles, each different variation of the first sentence will be duplicated 40 times on average.  This is why I like to spin at the word or phrase level along with sentence and paragraph spinning.

If I have misunderstood your original question, or if I have committed basic arithmetic errors anywhere, my apologies.  I am sorry for having consigned vital Internet bytes to the futility of my musings  :-[
« Last Edit: September 28, 2012, 09:49:32 AM by Madeira »

Offline Meg

  • *****
  • 7486
  • Freedup to do what you really want
    • View Profile
Re: Calling all Savvy Spinning Mathematicians
« Reply #19 on: September 28, 2012, 10:13:37 AM »
Referring to the second question in this thread, when faced with a problem like this, I always like to think of buckets.


It's good to have a puzzle or two to work out every so often. It keeps the old brain ticking over! It's also good to know that there are others who will take a look at puzzles of this kind.  ;D

Offline snm

  • *****
  • 849
    • View Profile
    • Buy Pre Spun Articles- Spin Ready, Spinnable Articles
Re: Calling all Savvy Spinning Mathematicians
« Reply #20 on: October 08, 2012, 05:28:52 AM »
My guess is spinning will have to incorporate some more features now to beat the search engines. Variations are going to be passe. The spun versions must read very different in every way possible.

Regards
SNM

Offline andrewwilson

  • *****
  • 1499
  • catching the zeitgeist since 2005
    • View Profile
    • We Build Money Making Websites
  • Skype: andrew.wilson41
Re: Calling all Savvy Spinning Mathematicians
« Reply #21 on: October 08, 2012, 07:39:14 AM »
My guess is spinning will have to incorporate some more features now to beat the search engines. Variations are going to be passe. The spun versions must read very different in every way possible.

Regards
SNM


Yes, I have been rabbiting on about 'information load' for a couple of years to anyone who will listen. It is not enough to spin the words but we need to spin the information and ideas.

While I had been banging on about this and trying to incorporate it in my own work, prefering to vary info as a priority over varying words it was not until I was working with a colleague on making a 'duplication detection engine', one that specifically was aimed at picking up spun content that I understood how important it was. The Duplication Detection Engine is not yet ready for production use but we found in our early testing that we could detect spun articles with only two samples and with three we could be certain. If we can do this then the boffins in Silicon Valley are doing better than we are. ;)

Some people have countered by suggesting that the resources to achive the goal of finding spun content are too great. I now know that this is bunkum. In testing we were able to run through tens of thousands of articles in an unoptimised database and generate the required output in fractions of a second and the format of the output, which is different, as far as we know, to any other dupe tool sold into the webmaster/SEO/IM world means that comparisons against our reference database are too fast to bother wth measuring.

It is no good trying to move ideas around within the text, changing the order of paragraphs, we will pick it up every time.

As it happens, the best strategy to defeat such a duplication detector is to do as Article Builder does where each paragraph is different with each iteration and within each iteration the words are spun as well. Some vendors of very highly spun content are on the right track but I have yet to see any others where all the info load changes every time, it is very obvious to anyone reading the output that one is reading two very different versions of the same seed content - Article Builder does not give that kind of a result.

Offline snm

  • *****
  • 849
    • View Profile
    • Buy Pre Spun Articles- Spin Ready, Spinnable Articles
Re: Calling all Savvy Spinning Mathematicians
« Reply #22 on: October 08, 2012, 11:12:35 PM »
Andrew,

Thanks for a great post.

Now the fact is, in real life, people don't use their own brains in multitudes of ways. They choose similar expressions, their sentence structures would have similarity, their logic build up would have their own styles and even long written texts have the stamp of the particular authors. A computer program trying to be unique would introduce too much variety from that viewpoint.

So a computerised duplicate detection tool could actually hit legitimately written pieces.

That's my fear.

Regards
SNM


Offline Meg

  • *****
  • 7486
  • Freedup to do what you really want
    • View Profile
Re: Calling all Savvy Spinning Mathematicians
« Reply #23 on: October 09, 2012, 04:19:56 AM »
This is a very interesting conversation. The difference, I think, with Article Builder is that several people are contributing to the text for any one single "article". (This is just my understanding, I do NOT KNOW this for certain.) So, say one person creates a seed article, then a 2nd, 3rd, 4th, etc person creates their own version of that article. I seem to recall Jon saying something like 10 articles to start off with. Then each paragraph is written separately 10 times and each sentence and word/phrase ditto. What this means is that any 1 article ends up being a MIXTURE of writers and styles, not just sentence and paragraph synonyms written by the same person. For any single individual to do that would probably mean them trying to write "in the style of" one author, then rewriting "in the style of" a different author? That would be very difficult.

Offline andrewwilson

  • *****
  • 1499
  • catching the zeitgeist since 2005
    • View Profile
    • We Build Money Making Websites
  • Skype: andrew.wilson41
Re: Calling all Savvy Spinning Mathematicians
« Reply #24 on: October 09, 2012, 05:30:01 AM »
SNM, I have no way of knowing this, but here's what I think:

At the moment it is not hard to get 'ordinary' spun stuff indexed and ranked. Much of the spun content in the world is of absolutely awful quality and it was this that was the driver behind my Duplicate Detection Engine. It is Dead Eye Dick on spun content done in the normal way. Assuming that Google's boffins are better than Jacek and I then they have the capability to do that which we have done, but for now they are only playing with it.

While you are correct about picking up patterns according to style and usage, this was an option that we had but we felt it gave no advantage and I kinda think that the Googlegods will do the same thing because it gives little incremental advantage.

Why to bother with something that gives a (for example) 1% increase in performance but costs an additional 5% in resources to achieve. After all, if one can reliably filter all the spun articles in the world once one has seen one example of an article the bar has already been raised enough to put a whole industry of spammers and defacers out of business. :)

Bottom line, over the next few years we will see more content generation along the lines of Article Builder, we will see people providing services and tools to create AB style content but with a true, beginning, middle and end and the folks selling article spinning will fall by the wayside. The thing is that I don't think this will happen overnight, but will take a couple of years or so of Google raising the bar incrementally. The responses will also be incremental.

To be honest, I can't see much wrong with a service that provides AB style content, doing so raises the average of quality on the WWW and developments in the field will enhance further. For now AB is a collaborator with Google, unplanned and unwitting maybe, but a helper nonetheless.

Offline snm

  • *****
  • 849
    • View Profile
    • Buy Pre Spun Articles- Spin Ready, Spinnable Articles
Re: Calling all Savvy Spinning Mathematicians
« Reply #25 on: October 09, 2012, 06:58:51 AM »
Andrew,

My definition of spinning is not creating rubbish. But that may be my own view. To me whether we realise or not we spin content in our mind all the time. As children we are taught not to re-use the same expression at more than one place in a text (because that would become monotonous to the reader) but to say it in different ways. In a way that is spinning. But only in our own mind while creating the composition.

If the content creation process mimics the way content is created in the normal course, but with more efficiency there can be no quarrel about that. AB creates tips library and builds it up. The same build up can also be done in a few other ways and I have seen some people doing that, though not in the same scale as AB nor certainly with the same success.

I gather that Google is using the basket of context specific words to judge whether two texts are spun versions of each other. I think the context and content related words might be the key differentiators.

My feeling is spinning is here to stay but with some modifications to the methods. And there would always be a cat and mouse game.

Regards
SNM

Offline andrewwilson

  • *****
  • 1499
  • catching the zeitgeist since 2005
    • View Profile
    • We Build Money Making Websites
  • Skype: andrew.wilson41
Re: Calling all Savvy Spinning Mathematicians
« Reply #26 on: October 09, 2012, 09:09:14 AM »
SNM, in all things there is a range of capabilitites and performance. I am sure that we'd all agree that the majority of stuff we see as spun is rubbish. To be honest, anyone using the automated functions of TBS without then reworking the output is making rubbish and we know that this is what many, if not most users do. That's life, it is normal and hardly is a judgement of the tool - after all the tool does not make rubbish, people do. ;)
Our Duplication Detection Engine was not designed to catch spun content, but I was amazed at just how capable it was at that. Interestingly it also seems that one can draw inferences as to quality as well; that is what we are working on now - how to classify the signals such that we can have a set of 'volume controls' to enable us to set thresholds for duplication and qualilty signals.

While you might be correct to suggest that we normally produce content by a process of choosing options, given what you said, using those inferior choices rather than discarding them leads to a reduction in quality - it is why we make the choices in normal writing.

The interesting thing about our research and the developing tool is that one does not have to consider words, meaning or context at all when doing the filtering, indeed to do so makes the task much harder than it needs to be. I do not get to see inside of the Googlegods' lair but I'd be surprised if the guys there did not come to the same conclusions as we did in our research.

Identifying spun content (indeed any kindof duplication) is easier if one ignores words, context and meaning except in so far as the words are building blocks of larger units. The 'what are you measuring?' is the secret sauce but suffice it to say that our inspiration was research in a field unrelated to vocabulary. For sure others will have read the same paper as we did and maybe had similar 'sparks'.

I think I can say that, given what I know now, if search engines chose to give the project adequate priority then spun content could disappear from the net overnight - if I know how to do it, so do they. ;) The only thing we have not done is to try to find the source document from a set of samples - that is a feature that the Googlegods would need to consider but is irrelevant to us.
So, no cat and mouse here, just a gradually raised bar, just as with Google's other initiatives - no major initiative has come without a previous iteration and warning signals.

Offline snm

  • *****
  • 849
    • View Profile
    • Buy Pre Spun Articles- Spin Ready, Spinnable Articles
Re: Calling all Savvy Spinning Mathematicians
« Reply #27 on: October 09, 2012, 11:48:19 PM »
Andrew,

I must say your project seems to be very interesting.

But Google also has a conflict of interest. If all duplication vanishes from the internet, how will that affect the ad revenues? There would be very few ad placeholders available in the content network. Back in June 2003 when Adsense was launched the internet did have significant amount of duplicate content as far as I remember. Of course with Adsense the duplication became more rewarding.

But billions of dollars of Google revenue come this way. It would be interesting to see how this aspect plays out.

Regards
SNM


Offline andrewwilson

  • *****
  • 1499
  • catching the zeitgeist since 2005
    • View Profile
    • We Build Money Making Websites
  • Skype: andrew.wilson41
Re: Calling all Savvy Spinning Mathematicians
« Reply #28 on: October 10, 2012, 12:04:07 AM »
The internet thrives upon duplication and Google no less. As I noted we see quality signals from the tool we made, so I am sure that Google does no less. The chances are though that if ALL spun content was removed from the internet the overall quality of experience for all internet users would rise. If I am right, and I bet I am, then Google, and others, will simply do as I noted upthread and gradually raise the bar enforcing an improvement in quality and pushing folks toward a quality level similar to that offered by AB and successor/competing services.

The bar in many ways has been rising for years. Nowadays we can not get away with stuff that worked back in 2005. Spinspam will be a target and it will be eliminated. Hell, spinspam will be eliminated by webmasters if they have the tools to do so - no need for Google to do anything.

Imagine a site that takes UGC in large scale (Empower, WordPress, Ezine Articles and thousands more) who can now simply filter any and all spun content, at whatever level of quality or duplication they are comfortable with, without any marginal cost to them for doing so. What will THEY do?
Bye Bye spinspam.
« Last Edit: October 10, 2012, 12:06:16 AM by andrewwilson »

Offline snm

  • *****
  • 849
    • View Profile
    • Buy Pre Spun Articles- Spin Ready, Spinnable Articles
Re: Calling all Savvy Spinning Mathematicians
« Reply #29 on: October 10, 2012, 05:48:01 AM »
Andrew,

Think of spin as a faster way to generate higher output. This of course means high regard for quality and a modification to the exisiting methods. That aspect may not go away.

Regards
SNM