Ultrashock Forums > Community Essentials > News
Google learns to crawl Flash
Member Blogs
 
Post Reply | View first unread Search this Thread | Thread Tools | Display Modes

#1
Bookmark and Share!
Google learns to crawl Flash
Old 2008-07-01

From the official Google blog:

"Google has been developing a new algorithm for indexing textual content in Flash files of all kinds, from Flash menus, buttons and banners, to self-contained Flash websites. Recently, we've improved the performance of this Flash indexing algorithm by integrating Adobe's Flash Player technology.

In the past, web designers faced challenges if they chose to develop a site in Flash because the content they included was not indexable by search engines. They needed to make extra effort to ensure that their content was also presented in another way that search engines could find.

Now that we've launched our Flash indexing algorithm, web designers can expect improved visibility of their published Flash content, and you can expect to see better search results and snippets. There's more info on the Webmaster Central blog about the Searchable SWF integration. "

More info:

Adobe Press Release: Adobe Advances Rich Media Search on the Web
FAQ on Google Webmaster Central Blog: Improved Flash indexing
::: Are you ready for a creative power? ::: : : .
postbit arrow 23 comments | 1391 views postbit arrow Reply: with Quote   
nrg
¶ixë₤▪†w€@kê®™
nrg is offline Administrator
seperator
Posts: 5,353
1999-12-13
Age: 35
nrg lives in Belgium
nrg has 10 blog entries10 11
nrg's Avatar
seperator

Ultrashock Member Comments:
kalitos's Avatar kalitos kalitos is offline kalitos lives in Portugal 2008-07-01 #2 Old  
Really cool
Reply With Quote  
Codemonkey's Avatar Codemonkey Codemonkey is offline Super Moderator Codemonkey lives in Netherlands 5 Blog Entries 2008-07-01 #3 Old  
I first heard of Google's collaboration with Adobe on the Flash on the Beach convention last year in London. Then it was a far cry away still. Glad it is becoming more tangible already!

Q: How does Google "see" the contents of a Flash file?
We've developed an algorithm that explores Flash files in the same way that a person would, by clicking buttons, entering input, and so on. Our algorithm remembers all of the text that it encounters along the way, and that content is then available to be indexed. We can't tell you all of the proprietary details, but we can tell you that the algorithm's effectiveness was improved by utilizing Adobe's new Searchable SWF library.
Q: What are the current technical limitations of Google's ability to index Flash?
There are three main limitations at present, and we are already working on resolving them:

1. Googlebot does not execute some types of JavaScript. So if your web page loads a Flash file via JavaScript, Google may not be aware of that Flash file, in which case it will not be indexed.
Reply With Quote  
uxte's Avatar uxte uxte is offline Moderator uxte lives in Finland 1 Blog Entries 2008-07-01 #4 Old  
Great news nrg
Reply With Quote  
Nutrox's Avatar Nutrox Nutrox is offline Nutrox lives in United Kingdom 1 Blog Entries 13 Creative Assets 2008-07-01 #5 Old  
This bit doesn't sound very good...
Googlebot does not execute some types of JavaScript. So if your web page loads a Flash file via JavaScript, Google may not be aware of that Flash file, in which case it will not be indexed.

Most people these days use SWFObject or something similar to add SWF files to HTML pages. Also, what about text that is loaded into the SWF from external files? If Googlebot can only "see" embedded/static text in SWF files then this seems pretty pointless to me, especially if any SWF file added to a HTML page with JavaScript is ignored.

Reply With Quote  
nrg's Avatar nrg nrg is offline Administrator nrg lives in Belgium 10 Blog Entries 11 Creative Assets 2008-07-01 #6 Old  
I see some bad things with this news:

1. Embedded email addresses inside swfs that used to be safe from questionable robots are now for the grabs again.

2. It's might be nice that Google can spider inside dynamic Flash content, but it cannot provide the deeplink url back to it.
Reply With Quote  
Nutrox's Avatar Nutrox Nutrox is offline Nutrox lives in United Kingdom 1 Blog Entries 13 Creative Assets 2008-07-01 #7 Old  
Yeah, deeplinks are another thing that Googlebot won't be aware of. I think we should just stick to the current Flash SEO techniques and continue to refine those, they seem to work well even if it does mean spending some extra dev time implementing them.
Reply With Quote  
Anik's Avatar Anik Anik is offline Super Moderator Anik lives in Argentina 45 Blog Entries 27 Creative Assets 2008-07-01 #8 Old  
Sounds good from one point of view, but it doesnt from another, first all the points already made, and then it might index silly texts or stuff that's inside your flash that you dont want it to be indexed.
Reply With Quote  
rmduran's Avatar rmduran rmduran is offline rmduran lives in Guatemala 2008-07-01 #9 Old  
I also see good and bad news as well from these. After all, we've all learned to optimize quite well knowing all known Flash limitations. I for one, used to see Flash as a way to hide content that I wanted to keep invisible. Ironic that now I will have to venture into new territory to achieve it.
Reply With Quote  
ServerSide's Avatar ServerSide ServerSide is offline ServerSide lives in Canada 1 Blog Entries 2008-07-01 #10 Old  
I'm also not interested in having Google index and provide direct links to the various .swf files that make up an application. For example, if I have a product.swf that is designed to be loaded into a container I don't want Google indexing and making this file available independently. Ironically I can see myself now hiding these Flash files from Google intentionally so they won't be indexed.
Reply With Quote  
Nutrox's Avatar Nutrox Nutrox is offline Nutrox lives in United Kingdom 1 Blog Entries 13 Creative Assets 2008-07-01 #11 Old  
Google won't index the individual SWF files as far as I know, it will index the HTML page the base SWF file is in. I'm not even sure that Google can index externally loaded SWF files unless it somehow finds them in the site's directory.
Reply With Quote  
tiran tiran is offline tiran lives in United States 2008-07-01 #12 Old  
Where does it say it can only see embedded static text? It says it can see "All of the text that users can see as they interact with your Flash file".

Also, It looks like you can keep hiding your stuff by embedding with JavaScript, but if you get that client that insists on paying you for something that can be indexed this way you have an avenue. Maybe not the giant step forward it first seems, but some clients that think they are SEO experts and write Flash off immediately might be appeased a bit.
Reply With Quote  
Nutrox's Avatar Nutrox Nutrox is offline Nutrox lives in United Kingdom 1 Blog Entries 13 Creative Assets 2008-07-01 #13 Old  
Quote: Originally Posted by tiran View Post
Where does it say it can only see embedded static text? It says it can see "All of the text that users can see as they interact with your Flash file"
If you are generating text fields at runtime, and loading external text at runtime, I can't see how Googlebot would be able to read that text. Also, what about loading delays? Even if Googlebot is able to read and follow all of the ActionScript code in a SWF in order to find out what external files are being loaded, I doubt it will hang around waiting for those files to load. Then there is the issue of "what" Googlebot should look at in external files, if you are loading an XML file then Googlebot will need to work out what is going to be shown to the user and what is going to be used internally by the SWF.

Maybe I am thinking about this is the wrong way, but I can't see Googlebot ever being able to read anything other than static text fields.

As for hiding stuff with JS, everyone should be adding Flash to their HTML pages with JS anyway in order to provide alternative content for users who don't have the Flash Player installed or don't have the required player version, and for browsers that are not capable of running Flash content etc.
Reply With Quote  
nrg's Avatar nrg nrg is offline Administrator nrg lives in Belgium 10 Blog Entries 11 Creative Assets 2008-07-01 #14 Old  
Quote: Originally Posted by Nutrox View Post
Google won't index the individual SWF files as far as I know, it will index the HTML page the base SWF file is in. I'm not even sure that Google can index externally loaded SWF files unless it somehow finds them in the site's directory.
I'm afraid you're wrong Si:

http://www.google.com/search?hl=en&s...filetype%3Aswf
Reply With Quote  
nrg's Avatar nrg nrg is offline Administrator nrg lives in Belgium 10 Blog Entries 11 Creative Assets 2008-07-01 #15 Old  
And have a look at the 800k+ results for mailto: inside swf files:

http://www.google.com/search?hl=en&s...wf&btnG=Search
Reply With Quote  
Anik's Avatar Anik Anik is offline Super Moderator Anik lives in Argentina 45 Blog Entries 27 Creative Assets 2008-07-01 #16 Old  
damn, iam not liking this.
Reply With Quote  
Nutrox's Avatar Nutrox Nutrox is offline Nutrox lives in United Kingdom 1 Blog Entries 13 Creative Assets 2008-07-01 #17 Old  
Yar, I understand that searching for SWF files by specifying the file extension will bring up results for individual SWF files, Google finds those via <objects> and links etc, but I was referring to a normal/standard search. Saying that though, if a standard search for something like "FWA Wallpapers" did bring up a result that pointed to FWA's main SWF file instead of the actual HTML page then Google have a big problem on their hands IMO.

That could be fixed easily enough using HTACCESS or something, but we as developers shouldn't be forced down that path by any search engine.

So yeah, searching for specific file extensions will no doubt bring up those files (not sure if "non-linked-to" files still appear though), but a standard search shouldn't really do that.

Reply With Quote  
Nutrox's Avatar Nutrox Nutrox is offline Nutrox lives in United Kingdom 1 Blog Entries 13 Creative Assets 2008-07-01 #18 Old  
Last edited by Nutrox : 2008-07-01 at 12:14.
I don't know what the hell Google are thinking allowing mailto links in SWF files to be indexed though, that is completely out of order.
Reply With Quote  
miko's Avatar miko miko is offline Administrator miko lives in United States 4 Blog Entries a lot of Creative Assets 2008-07-01 #19 Old  
I found another article this afternoon that may be interesting. I agree with Nutrox about the mailto links. This is a very serious issue and should be addressed.

http://news.cnet.com/8301-10784_3-9982137-7.html
Reply With Quote  
Sundev Sundev is offline Sundev lives in United States 2008-07-01 #20 Old  
Nutrox, from what I read in this thread:

http://www.eweek.com/c/a/Enterprise-...-Flash-Search/

It seems that server side and dynamic information will be searchable, at least this article made it sound like that was the case:

"So what does that mean? We are giving a special, search-engine optimized Flash Player to Yahoo and Google, which is going to help them crawl through every bit of your SWF file. This Flash Player will act just like a person would in some cases. It will click on your buttons, it will move through the states of your application, get data from the server when your application normally would, and it will capture all of the text and data that you’ve got inside of your Flash-based application. We’ve basically provided a very powerful looking glass into SWF files so Google and Yahoo can pull out meaningful information."
Reply With Quote  
Sundev Sundev is offline Sundev lives in United States 2008-07-01 #21 Old  
Yep, definitely, read the adobe press release.
Reply With Quote  
Nutrox's Avatar Nutrox Nutrox is offline Nutrox lives in United Kingdom 1 Blog Entries 13 Creative Assets 2008-07-01 #22 Old  
Last edited by Nutrox : 2008-07-02 at 00:06.
Ah... I see. I stand corrected. If Google will be using a special version of the Flash Player then I can see how it would be able to read files loaded in at runtime. I still have concerns about all of this though.


1. How will Google know what data is loaded for the user and what is loaded for internal use by the SWF? If Google is only interested in the contents of text fields then that is cool, if it tries to read loaded files as-is then there will be problems.

2. How is Google going to bypass security sandboxes? If Adobe are going to let the "special" Flash Player load content regardless of cross-domain policies then they should allow us to add additional elements to cross-domain policy files that will block the "special" Flash Player from certain files and/or directories. I don't know about you guys but I certainly don't want Google to have unrestricted access to everything loaded into my SWF files.

3. Will Google be able to index code contained in the SWF? That wouldn't be good.

4. Will Google be able to index every image contained in and loaded into SWF files? That wouldn't be good either.

5. Will Google be able to index audio and video files contained in and loaded into SWF files? Again, that wouldn't be good.


I think Adobe need to let us know exactly what Google and Yahoo be will capable of accessing and indexing. The more I think about this "special" Flash Player and the level of access it will give Google and Yahoo the more it worries me.



[edit]
If you don't trust Google and Yahoo with your SWF files, and you want to handle the SEO yourself, then you could block Google and Yahoo from accessing all your SWF files using a bit of HTACCESS:

Code:
RewriteCond %{HTTP_USER_AGENT} (Googlebot|Yahoo)
RewriteCond %{REQUEST_URI} \.swf$ [NC]
RewriteRule .* - [L,F]
At least I think that is correct. And yes, I probably am being a bit paranoid, but until Adobe answer some of the above concerns I think I am allowed to be.
Reply With Quote  
nrg's Avatar nrg nrg is offline