I was stuck for a while when application maintained by me have started throw ArgumentError: marshal data too short errors in random places. Or at least they looked for random places. When user have encountered that problem then it was unable to use application at all.

Marshal CC http://www.flickr.com/photos/qmnonic/
Logs were showing that it happens when Rails was trying to create session object. Session store was in ActiveRecord and sessions table was not corrupted.
After watching that for a while it have shown that places in code were this exception was thrown were random but there was some pattern. Page visited before was common in each case.
It have turned out that application was storing in session whole ActiveRecord object. Like:
session[:some_info] = @variable
And later we were trying to use that way:
@variable = Model.find session[:some_info]
Due to Rails magic AcitiveRecord’s find when provided with AR object will return that object (of course if is the same model). Code was working well (maybe not very effectively since You should avoid storing large objects in session) until object stored that way started to grow. Application was collecting some data and amount of data stored have grown to that point that after Marshal.dump size of string was more than 64 kB. And this is default size of text field used to store session data in MySQL.
When You try to store too much data in text field in MySQL, excessive data is being truncated, so Marshal.load throws that exception.
To have that error solved is enough to store just object id in session (session[:some_info] = @variable.id).
Well… Long time… But finally a new post!
If You have drank Linux KoolAid, then You can run on problems when You get some CHM file. It is old format and I think it has place where it can live (like many other proprietary data formats):

Place where we should send .chm files (c) Marcin Wichary
But still, You can get documentation in such format, chances are high if You are trying to interface some .Net SOAP service. In Linux - no viewer for CHM.
Or rather no standalone viewer. There is solution - CHM Reader addon for Firefox. It is not perfect (no global search), but allows to navigate through that file. Printouts to PDF usually are stripped from hyperlinks, so navigating through 800 pages is reason why PDF printout is not an option…
Just a quick note - if You are experiencing missing thumbnails in image upload in Wordpress, then probably Your PHP installation misses php-gd module.
It is module by default installed on FreeBSD, but on Debian based Linux You have to add it by hand (apt-get install php5-gd).
I had to move my blogs to new, temporary host, with Linux on board instead of FreeBSD. Then I have noticed that when I want to publish new post I can only insert in it uploaded image in original size - not very handy. It took 15 minutes of googling and finally dive into wp-includes/media.php to check how these thumbnails are generated.
Many times before when I was supposed to collect emails from users I was googling to find some regexp to verify email syntax. At least I was aware that regexp challenging email address syntax is a bullshit ;) I was working few years as Unix sysadmin mostly on mail servers, so I had some idea how RFCs related to email are bloated ;) and contain so many exceptions ;)

Step back and think again. Image CC by Vivianna_love
From some time I do use following regexp to verify email address:
/.+@.+[.].+/
This should check if:
- there is @ sign somewhere inside
- at least one character is before @ sign
- at least 3 chars (with one dot) are after @ sign
Why such simple rules? I have found comment on StackOverflow, that man should step back and think why is checking email address?
I want just to help users and stop them making simple mistakes in theirs emails. Like not providing @ at all. Or eating .com in gmail.com address. And that’s it - email will be probably shortly after that verified with only reliable method via sending email to this address.
And if You insist to verify emails with more strict regular expression, please do remember that plus sign is perfectly valid character before @. Many regexps found via google are forgetting that…
I’m still using fixtures. Shame, I know.
Why I do use them instead of Factory Girl or other solution like that? Well, fixtures can be much more closer to real data than mocks from Factory. How come, You ask? Fixtures are imaginary data exactly like mocks from other sources!
My answer is: that depends how You create fixtures. If You create them by hand, indeed they are disconnected from real world (like all mocks).

What is Your real data/mocks ratio? CC by hsing
But I prefer to extract fixtures from real (production) database. That way I can easily pick entries created by users which are edge cases. The only trouble is with creating fixtures. For some time I’m using modified extract_fixtures rake task. I have added some conditions to extracting process - SQL condition to select only particular records and adjusted syntax to recent rake.
This is useful especially when You are about to take over application code which has no tests. Extracting real data is quick way to start write integration tests (in such case they have are most efficient - time invested and application code coverage).
How to extract fixtures without pain?
Now I want to share with You next improvement to this recipe. Old fixtures can be left unchanged (as long You don’t use ERb to spice them up).
Rake task load old fixtures file and add new records as selected from database by arguments You are providing. More - it keeps old fixtures names so all Your tests using some_table :fixture will work (but don’t You dare to create attribute called fixtures_old_key_value :)) ). As a bonus attributes in Your fixture file will be sorted by attribute name!
OK some examples (written by hand not real YML files I hope there are no mistakes ;)) :
cat test/fixtures/entries.yml one: attr1: value attr2: value created_at: 2009-09-10 11:22 rake extract_fixtures[entries,1,"created_at<'2009-09-01'"] cat test/fixtures/entries.yml one: attr1: value attr2: value created_at: 2009-09-10 11:22 id: 1 entries_2: attr1: value from DB attr2: Also value from DB created_at: 2009-08-01 22:11 id: 2
All is clear? In file there was fixture named :one, after rake task we have added new one selected from database, new fixture has name TABLE_NAME_ID, old fixture has unchanged name.
You can add DISCARD=true and old fixtures will be well… discarded :)
Rake task take following arguments:
- table name
- limit number of records extracted from database
- WHERE condition
Rake has one limitation - it treats all commas as arguments separators, so it is not possible to use IN operator. Instead of rake extract_fixtures[entries,10,"id IN (1,2,3)"] write rake extract_fixtures[entries,10,"id=1 OR id=2 OR id=3)"].
You can download rake task file: http://nhw.pl/download/extract_fixtures.rake
I don’t know about You, but logs are for me most powerful debugging tool. Placing many logger.debug or logger.info can quickly provide info what is happening inside Rails application.
This approach is especially useful when something wrong is happening and trigger is unknown. Placing many logging directives can provide data for analysis what could be a reason.

CC by Admond
Default Rails logger has one serious flaw which makes logs on production sites almost useless - messages are not grouped in calls. If You have many processes of Rails running and logging to single file, some requests will be processed in parallel and You have log entries mixed. With default log format there is no way to say which entry is from which process.
Since Rails 2.0 we have ActiveSupport::BufferedLogger, but it solves other problem - number of disk writes and file locks - You can set after how many entries log will be flushed to disk.
AnnotatedLogger
Here comes AnnotatedLogger for a rescue. The idea is to take each message and prefix it with PID of Rails process. As long You don’t run Rails in multithread mode, this is unique ID which will make log entry distinguishable.
class AnnotatedLogger < Logger
def initialize *args
super *args
[:info, :debug, :warn, :error, :fatal].each {|m|
AnnotatedLogger.class_eval %Q|
def #{m} arg=nil, &block
pid = "%.5d:" % $$
if block_given?
arg = yield
end
super "%s %s" % [pid, arg.gsub(/\n/,"\n%s" % pid)]
end
|
}
end
end
Now in Rails::Initializer.run do |config| section of config/environment.rb define AnnotatedLogger as default Rails logger:
config.logger = AnnotatedLogger.new "log/#{RAILS_ENV}.log"
Of course You can add other data to log entries (timestamp?). Here is an example of log entries:
24551:Processing SearchController#processing (for [FILTERED] at 2009-09-15 15:11:24) [GET]
24551: Session ID: df260892836fc619ec666f894e7d8e88
24551: Parameters: {[FILTERED]}
24542: Airport Load (0.216460) SELECT * FROM [FILTERED]
24542: Completed in 0.24903 (4 reqs/sec) | Rendering: 0.01298 (5%) | DB: 0.22554 (90%) | 200 OK [FILTERED]
24551: Search Columns (0.004711) SHOW FIELDS FROM [FILTERED]
24551: Rendering template within layouts/blank
24551: Rendering search/processing
Without PIDs You would expect that Airport Load entry is part of SearchController#processing for session with ID df260892836fc619ec666f894e7d8e88. In reality this is output from processing different request.
What else?
This how I do deal with logs from Rails application. Do You have other ideas how to make logging more usable not only in development environment?
PS
I have just other idea - probably You could use BufferedLogger with disabled auto flushing and patch ActionController to flush manually all entries after request was processed - then all messages would be dumped in single block.
I’m using Webrat to keep some sanity when approaching maintenance of new application. Customers often come to me with legacy code, which somehow is not covered by tests.
In such case integration tests are way to go, since they provide most bang of Yours bucks - each written test could cover many parts of application.
I had to create test for testing download some data in CSV format (have You said binary? :) ). With default matchers from Webrat You won’t be able to write effective assertions - and that why I’m referring to such file as binary.

So how to do it? Here is a quick tip
Use Webrat’s response_body to get raw body returned by application. Like that:
click_link "Get me some CSV data"
ret = CSV.parse response_body
assert_equal(
2,
ret[2][5].to_f,
"In third row and sixth column You should have 2 and there is #{ret[2][5]}"
)
ActiveRecord, which is core component of Rails framework (at least before Rails 3.0 become reality) provides a lot features which developers do love.
Validations are one of those features. They are methods which provide easy way to check if model is valid and protect consistency our data in database. Sounds good, but this is bullshit.
What AR really is ;) (c) CC sekimuraActive Record validations are prone to race conditions. Period. It does not make any sense to rely on them if You really have to have consistent data (I’m referring to unique constraint and validates_uniqueness_of). The only way to go is to have constraints put on database level. Or write a lot workaround code in Rails. Error prone as well.
What is race condition? Race condition (or race hazard) is when outcome of some operation depends on timing between other operations.
Let’s take for example creation of two records where one attribute should be unique. How does work validates_uniqueness_of?
First it checks in DB (via SELECT) if there already is record with such value as unique attribute. If there is no such record then it run INSERT command to create new record.
Now imagine two processes are trying to create such record in the same time. Since SELECT and INSERT are separate operation it is quite possible (remember we have two processes trying to do the same thing at once):
- Model.save in PROCESS 1
- Model.save in PROCESS 2
- SELECT FROM PROCESS 1 (result - no record in DB)
- SELECT FROM PROCESS 2 (result - no record in DB)
- INSERT FROM PROCESS 1
- INSERT FROM PROCESS 1
Now guess how many records will be created? :))
What is takeaway from this rant?
ActiveRecord brings to table a lot improvements which each developer loves, but there is no silver bullet. Such race condition can happen (unless You run one Rails process in non-threading mode, but this is not very useful setup :D) even on low traffic application.
If there is really some business need which requires You to have unique data You have to implement some constraints on database level.
Use AR, since it is wonderful tool, but when used properly. Or maybe - know shortcomings of tools You do use.
If You do use awesome :) Paperclip library, and on application served by Passenger You get errors like:
Avatar /tmp/stream.1170.0 is not recognized by the 'identify' command.
most probably Passenger does not have setup environment and is missing a path. On FreeBSD identify is placed in /usr/local/bin and AFAIR this path is not included by default in PATH. As result Passenger can not find this utility.
You can try to setup environment for Passenger (Apache?) or just add in appropriate environment (in my case it was production.rb):
Paperclip.options[:command_path] = "/usr/local/bin"
