Thoughts on Fully Searchable Flash
[update] There does appear to be a FAQ available here: http://www.adobe.com/devnet/flashplayer/articles/swf_searchability.html — doesn’t go into a lot of technical detail though.
OK, the news today is that Adobe is working with Google and Yahoo! to make sure they are able to index SWF content fully. That’s pretty exciting and a real step in the good direction but it does leave me with a bunch of questions and concerns.
Reportedly Google and Yahoo! have been given a ’special, search-engine optimized Flash Player’ that is able to index all data, not just static text as was the case with the Search Engine SDK Macromedia already released back in 2002. Google implemented that SDK which meant you were able to do search queries specifically on Flash content e.g. “peter elst” filetype:swf.
With just the press release and no FAQ to go on these questions spring to mind:
What exactly is getting indexed?
“… lt will move through the states of your application, get data from the server when your application normally would, and it will capture all of the text and data that you’ve got inside of your Flash-based application.” (via Ryan Stewart)
The concern I have here is that URL requests to the backend will get indexed, those URLs getting exposed in search queries or spider bots hitting those URLs could cause issues. Its not like in HTML content where the search engines can ignore form submit URLs, there is no such context in a HTTPService or URLRequest.
You don’t have to do anything
“The best part? You don’t have to do anything. Any SWF you already have out there will be indexed by this new player.”
So I don’t have to do anything, no opt-in? What if I don’t want stuff indexed, I know several companies with the (all be it a flawed) idea that having their content play back in an SWF on their site will drive traffic rather than allow it to be easily aggregated or screen scraped.
I’m pretty sceptical about this approach to be honest and doesn’t sound like its the best way to handle things. I believe the problem can not be fully addressed without some sort of meta information scheme provided by the content author.
So how exactly will this content get handled?
“… Google is going to have their own rules for how this new Flash Player indexes and uses the content. So will Yahoo”
“You can poke the system, see what works and what doesn’t work. See how Google will handle deep linking and URL changes in Flash. It’s all up for grabs and it’s really exciting to think about what the Flash community can discover about SEOing SWF files.”
I think that is worrying, that almost reads like “we have no idea what the result is going to be here” just try it. Now, remember this doesn’t appear to be opt-in. It doesn’t give me any control about how I optimize SEO with Flash it is all just out there.
Whenever the Flash content and SEO question came up at conferences for years the answer was twofold:
1. Google indexes static content, try filetype:swf
2. We are talking to Google and others to come up with solutions that will benefit the Flash and AJAX community who are pretty much in the same boat.
It almost looks as though this is a backup or intermediary solution. Where does this leave deep linking, what about those of us already using things like SWFObject to do SEO, will this ’special Flash Player’ be licensed to others (Microsoft Live Search anyone?)
Don’t get me wrong, I really welcome this move and can only applaud Adobe for taking steps like this but without any more details I’m not sure how excited or worried I should be getting. Adobe is pretty open about communicating their plans, usually that also means things get shown to subject matter experts under NDA and based on their feedback a FAQ gets published alongside.
Compare this announcement to the way the Open Screen Project was announced with comprehensive background information. This just seems like a blogging campaign linking to a relatively vague press release and a lot of people jumping on the bandwagon without a whole lot of substance to the story.
In any case, if this works out it will be a huge benefit for Flash content and remove some more barriers to entry for certain companies to adopt it. Keep up the good work Adobe and look forward to hearing more information about how this is going to work.
Read the press release here
This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 Belgium License.
Hey Peter, this post from Google looks like it might answer some of your questions - http://googlewebmastercentral.blogspot.com/2008/06/improved-flash-indexing.html
It’s not meant to be just a blogging campaign. It’s just that Google and Yahoo are the ones that control the rules for how stuff gets indexed. Literally all we’ve given them is a better way to crawl through SWF files. So there’s not much more info we can provide. In theory they can see anything a human can see - it’s just a question of what they do with the data.
=Ryan
rstewart@adobe.com
cool, thanks Ryan was just reading that link
Its understandable that Google and Yahoo control how they handle things on their side, what I would be interested in seeing is how Adobe exposes the SWF content through that special Flash Player.
From what I can gather it is allowing them to navigate through SWF content indexing away as they go.
I guess what I’m saying is why is there no client side component to this, its tempting to say “it will just work” but the reality likely is that it isn’t or it won’t produce results as expected.
Would’ve loved to see an updated Search Engine SDK and spits out the information in a uniform way rather than just leaving it up to the search engines to decide. Made sense to tie this into browser back/forward button support we see in Flex.
There’s a lot of unknowns, its not a beta but something that is or is soon getting deployed live which means we are all affected. That is a bit unsettling when there is so little information out there at this time about how what affect it’ll have on our work or time to fix stuff (just think about email addresses obscured in an SWF to prevent spam etc. there are dozens of examples that use SWF based on the assumption that it can’t be indexed or crawled).
In any case, watching how this develops.
I had the same feeling as you Peter.
It’s a logical thing to do. Silverlight has indexable content - it would be a sure death if Flash couldn’t come up with a similar thing really fast, because it would mean that Silverlight content would more easily be located.
As to unwanted pages listed: you can still use a robots.txt or a no-follow metatag to ensure the safety of your pages. You should do that anyway
SWFObject has long served as the de facto standard for doing SEO for Flash, and does a very good job at that — webkitchen.be is a great example if you want to see that in action.
Talk is that Flash CS4 will start using SWFObject as a default when publishing HTML, now the big announcement that its all automatic and you don’t need to do anything — we’ll see.
This definitely makes sense marketing wise for adding another checkbox to the Flash vs Silverlight comparison chart and its obviously not an easy thing to accomplish for a binary file format.
There doesn’t on the face of it appear to be a tie-in with existing functionality such as named anchors in Flash, browser back button / history support in Flex anyone remember SWF meta data title and description that gets stored in RDF format in the SWF header (does that even get used?), FLV meta data. There are numerous unconsolidated efforts going on and think its a missed opportunity in a sense.
Not a bad first move but would not say this is a complete solution to the Flash SEO question.
scratch that, webkitchen.be switched to a HTML blog a little while back — completely forgot
Completely agree, Peter. To my eyes, “SEO for Flash” has always been little more than a hater’s tool for politicking the debate, and it’s a question we’ve had an answer to for years. This new tech looks like a solution in search of a problem.
Argh! Google has actually started indexing my Flash Files and is revealing all the URL’s of the pictures in Flash. But also the url’s of the MP3’s I placed in Flash. I was hoping Flash would conceal it - because now, anyone can download our music without paying for it.
Now I have to come up with an authentication system for Flash
I had a post on the same topic as well Pete http://arulprasad.blogspot.com/2008/07/flash-content-to-be-indexed-better-by.html
@kristof - Do you mean that the files which are embedded inside the swf are showing up, or the ones that are being loaded at runtime? if its showing up files that are embedded into the swf, its a bad bad thing!
@Kristof: You could always see what flash loads. Just use Firebug, or even the standard “Activities” in Safari. And if you want to see the remoting calls, use something like Service Capture. On any site. Always has been.
This is something I always tell our clients: Everything that can be seen in the browser is public. Doesn’t matter if it’s loaded via html, Flash or javascript.
Howdy - I’m the player engineer who led the Flash search player development project.
As far a loading in external files - Google is rolling out searching in stages so look for them to support it in the future (I don’t have any dates). This loading of .swf, .txt .xml .gif/jpg/png files (using Loader.load() or loadMovie() etc.) will work much like it does in the web players. If you load an XML or Text file, parse it in actionscript and use the results to populate text fields, this will happen and the text should be found. If you use the info from a txt or xml file to create URL’s dynamically, this will also happen and then the file at that URL will be loaded if your code wants it to happen.
Remember** This is a real Flash player that executes all the actionscript it encounters (AS 1, 2 and 3) and executes actions on buttons and sprites that the Google search engine decides it wants to simulate a click on.
Google’s Search Engine contains the code that DRIVES the execution of the swf application. The Search player executes all the commands that it receives (like moving forward in time through the app, returning all the text current being displayed, and clicking on buttons and sprites that have actions attached to them.) It’s the Google logic that decides when it has visited all the states in a app. I have no visibility to how this code works.
I’m tentatively scheduled to do a talk on this at MAX in San Francisco so come on by!
Despite SWF files are indexed, there’s a lot of organisational mess in it (headers / body / text-based design elements) and as much as can be told off the Google search results at this point, due to the mess involved, regular sites are surely prioritised.
Any quick workarounds in the circumstances would be naive to expect and I couldn’t agree more with Peter’s comment above.
Until Adobe and search giants have worked it out, one can be advised to get a custom solution for the indexing, for example a Content Management System that is SEO-friendly, e.g. backbone3 with Flash SEO package.