Fix a problem noted by Chris Mason . journal_get_descriptor_buffer() is used to obtain a regular old buffer_head against the blockdev mapping. The caller will populate that bh by hand and will then submit it for writing. But there are problems: a) The function sets bh->b_state nonatomically. But this buffer is accessible to other CPUs via pagecache lookup. b) The function sets the buffer dirty and then the caller populates it and then it is submitted for I/O. Wrong order: there's a window in which the VM could write the buffer before it is fully populated. c) The function fails to set the buffer uptodate after zeroing it. And one caller forgot to mark it uptodate as well. So if the VM happens to decide to write the containing page back __block_write_full_page() encounters a dirty, not uptodate buffer, which is an illegal state. This was generating buffer_error() warnings before we removed buffer_error(). Leaving the buffer not uptodate also means that a concurrent reader of /dev/hda1 could cause physical I/O against the buffer, scribbling on what we just put in it. So journal_get_descriptor_buffer() is changed to mark the buffer uptodate, under the buffer lock. I considered changing journal_get_descriptor_buffer() to return a locked buffer but there doesn't seem to be a need for this, and both callers end up using ll_rw_block() anyway, which requires that the buffer be unlocked again. Note that the journal_get_descriptor_buffer() callers dirty these buffers with set_buffer_dirty(). That's a bit naughty, because it could create dirty buffers against a clean page - an illegal state. They really should use mark_buffer_dirty() to dirty the page and inode as well. But all callers will immediately write and clean the buffer anyway, so we can safely leave this optimising cheat in place. Signed-off-by: Andrew Morton --- 25-akpm/fs/jbd/commit.c | 6 ++++-- 25-akpm/fs/jbd/journal.c | 12 +++++++++--- 25-akpm/fs/jbd/revoke.c | 4 ++-- 3 files changed, 15 insertions(+), 7 deletions(-) diff -puN fs/jbd/commit.c~jbd-descriptor-buffer-state-fix fs/jbd/commit.c --- 25/fs/jbd/commit.c~jbd-descriptor-buffer-state-fix Fri Jun 4 14:49:11 2004 +++ 25-akpm/fs/jbd/commit.c Fri Jun 4 14:59:10 2004 @@ -412,7 +412,8 @@ write_out_data: tagp = &bh->b_data[sizeof(journal_header_t)]; space_left = bh->b_size - sizeof(journal_header_t); first_tag = 1; - set_bit(BH_JWrite, &bh->b_state); + set_buffer_jwrite(bh); + set_buffer_dirty(bh); wbuf[bufs++] = bh; /* Record it so that we can wait for IO @@ -638,7 +639,8 @@ wait_for_iobuf: JBUFFER_TRACE(descriptor, "write commit block"); { struct buffer_head *bh = jh2bh(descriptor); - set_buffer_uptodate(bh); + + set_buffer_dirty(bh); sync_dirty_buffer(bh); if (unlikely(!buffer_uptodate(bh))) err = -EIO; diff -puN fs/jbd/journal.c~jbd-descriptor-buffer-state-fix fs/jbd/journal.c --- 25/fs/jbd/journal.c~jbd-descriptor-buffer-state-fix Fri Jun 4 14:49:11 2004 +++ 25-akpm/fs/jbd/journal.c Fri Jun 4 15:10:06 2004 @@ -585,9 +585,13 @@ int journal_bmap(journal_t *journal, uns * We play buffer_head aliasing tricks to write data/metadata blocks to * the journal without copying their contents, but for journal * descriptor blocks we do need to generate bona fide buffers. + * + * After the caller of journal_get_descriptor_buffer() has finished modifying + * the buffer's contents they really should run flush_dcache_page(bh->b_page). + * But we don't bother doing that, so there will be coherency problems with + * mmaps of blockdevs which hold live JBD-controlled filesystems. */ - -struct journal_head * journal_get_descriptor_buffer(journal_t *journal) +struct journal_head *journal_get_descriptor_buffer(journal_t *journal) { struct buffer_head *bh; unsigned long blocknr; @@ -599,8 +603,10 @@ struct journal_head * journal_get_descri return NULL; bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize); + lock_buffer(bh); memset(bh->b_data, 0, journal->j_blocksize); - bh->b_state |= (1 << BH_Dirty); + set_buffer_uptodate(bh); + unlock_buffer(bh); BUFFER_TRACE(bh, "return this buffer"); return journal_add_journal_head(bh); } diff -puN fs/jbd/revoke.c~jbd-descriptor-buffer-state-fix fs/jbd/revoke.c --- 25/fs/jbd/revoke.c~jbd-descriptor-buffer-state-fix Fri Jun 4 14:49:11 2004 +++ 25-akpm/fs/jbd/revoke.c Fri Jun 4 14:57:18 2004 @@ -522,7 +522,7 @@ void journal_write_revoke_records(journa kmem_cache_free(revoke_record_cache, record); } } - if (descriptor) + if (descriptor) flush_descriptor(journal, descriptor, offset); jbd_debug(1, "Wrote %d revoke records\n", count); } @@ -606,7 +606,7 @@ static void flush_descriptor(journal_t * header->r_count = htonl(offset); set_buffer_jwrite(bh); BUFFER_TRACE(bh, "write"); - set_buffer_uptodate(bh); + set_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); } #endif _