From: Martin Schwidefsky I got a bug report for s390 with an oops in mk_swap_pte which has its cause in the arch-independent swap entries vs. the pte coded swap entries. The swp_entry_t uses 27 bits for the offset and 5 bits for the type. In sys_swapon this definition is used to find out how many pages there can be at maximum: --- p->lowest_bit = 1; maxpages = swp_offset(swp_entry(0,~0UL)))) - 1; if (maxpages > swap_header->info.last_page) maxpages = swap_header->info.last_page; --- maxpages always is 0x7fffff for 32 bit and 0x7ffffffffffffff for 64 bit. This is kind of suboptimal because the architecture may be more restrictive on the number of bits in the offset. The current situation is: offset type max swap bits bits size alpha 24 5 64 GB arm 23 7 32 GB cris 20 7 4 GB h8300 ?? ? ? i386 24 5 64 GB ia64 54 7 big m68k 20 8 4 GB mips-32 20 7 4 GB mips-64 24 8 64 GB parisc 24 5 64 GB ppc 24 5 64 GB ppc64 48? 6 big s390-32 19 6 2 GB s390-64 52 6 big sh 22 8 16 GB sparc-32 19 7 2 GB sparc-64 43 8 big v850 ?? ? ? x86_64 40? 6 big In my case the swap device had 2.5 GB, mkswap happily created a swap file of that size. sys_swapon didn't object either but the first try to create a swap entry for a page with an offset > 7ffff crashed the machine. The same will happen on i386 with a swap device > 64 GB. I created a patch that should fix the problem. It uses swp_type(pte_to_swp_entry(swp_entry_to_pte(swp_entry(~0UL,0)))) to find the highest possible swap type number and swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0,~0UL)))) to find the highest possible swap offset. This should work with the existing __swp_entry/__swp_type/__swp_offset definitions for all architectures except for s390 because I've added a BUG_ON in __swp_entry if the created swap pte is dubious. Oh, well. By the way there is room for improvement for some architectures to increase the maximum size of a single swap device by using some of the bits currently used for the type. The architecture independent swp_entry_t limits the number of swap files to 32 anyway so there is not use for more than 5 type bits. While I was at it I did this change for s390. --- 25-akpm/include/asm-s390/pgtable.h | 32 ++++++++++++++------------------ 25-akpm/mm/swapfile.c | 4 ++-- 2 files changed, 16 insertions(+), 20 deletions(-) diff -puN include/asm-s390/pgtable.h~swp_entry-vs-swap_pte-fix include/asm-s390/pgtable.h --- 25/include/asm-s390/pgtable.h~swp_entry-vs-swap_pte-fix Wed Mar 24 14:08:14 2004 +++ 25-akpm/include/asm-s390/pgtable.h Wed Mar 24 14:08:14 2004 @@ -744,11 +744,11 @@ extern inline pmd_t * pmd_offset(pgd_t * * Bit 30 and 31 are used to distinguish the different page types. For * a swapped page these bits need to be zero. * This leaves the bits 1-19 and bits 24-29 to store type and offset. - * We use the 6 bits from 24-29 for the type and the 19 bits from 1-19 - * for the offset. - * 0| offset |0110| type |00| - * 0 0000000001111111111 2222 222222 33 - * 0 1234567890123456789 0123 456789 01 + * We use the 5 bits from 25-29 for the type and the 20 bits from 1-19 + * plus 24 for the offset. + * 0| offset |0110|o|type |00| + * 0 0000000001111111111 2222 2 22222 33 + * 0 1234567890123456789 0123 4 56789 01 * * 64 bit swap entry format: * A page-table entry has some bits we have to treat in a special way. @@ -761,26 +761,22 @@ extern inline pmd_t * pmd_offset(pgd_t * * Bit 62 and 63 are used to distinguish the different page types. For * a swapped page these bits need to be zero. * This leaves the bits 0-51 and bits 56-61 to store type and offset. - * We use the 6 bits from 56-61 for the type and the 52 bits from 0-51 - * for the offset. - * | offset |0110| type |00| - * 0000000000111111111122222222223333333333444444444455 5555 555566 66 - * 0123456789012345678901234567890123456789012345678901 2345 678901 23 + * We use the 5 bits from 57-61 for the type and the 53 bits from 0-51 + * plus 56 for the offset. + * | offset |0110|o|type |00| + * 0000000000111111111122222222223333333333444444444455 5555 5 55566 66 + * 0123456789012345678901234567890123456789012345678901 2345 6 78901 23 */ extern inline pte_t mk_swap_pte(unsigned long type, unsigned long offset) { pte_t pte; - pte_val(pte) = (type << 2) | (offset << 12) | _PAGE_INVALID_SWAP; -#ifndef __s390x__ - BUG_ON((pte_val(pte) & 0x80000901) != 0); -#else /* __s390x__ */ - BUG_ON((pte_val(pte) & 0x901) != 0); -#endif /* __s390x__ */ + pte_val(pte) = _PAGE_INVALID_SWAP | ((type & 0x1f) << 2) | + ((offset & 1) << 7) | ((offset & 0xffffe) << 11); return pte; } -#define __swp_type(entry) (((entry).val >> 2) & 0x3f) -#define __swp_offset(entry) ((entry).val >> 12) +#define __swp_type(entry) (((entry).val >> 2) & 0x1f) +#define __swp_offset(entry) (((entry).val >> 11) | (((entry).val >> 7) & 1)) #define __swp_entry(type,offset) ((swp_entry_t) { pte_val(mk_swap_pte((type),(offset))) }) #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) }) diff -puN mm/swapfile.c~swp_entry-vs-swap_pte-fix mm/swapfile.c --- 25/mm/swapfile.c~swp_entry-vs-swap_pte-fix Wed Mar 24 14:08:14 2004 +++ 25-akpm/mm/swapfile.c Wed Mar 24 14:08:14 2004 @@ -1302,7 +1302,7 @@ asmlinkage long sys_swapon(const char __ if (!(p->flags & SWP_USED)) break; error = -EPERM; - if (type >= MAX_SWAPFILES) { + if (type > swp_type(pte_to_swp_entry(swp_entry_to_pte(swp_entry(~0UL,0))))) { swap_list_unlock(); goto out; } @@ -1424,7 +1424,7 @@ asmlinkage long sys_swapon(const char __ } p->lowest_bit = 1; - maxpages = swp_offset(swp_entry(0,~0UL)) - 1; + maxpages = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0,~0UL)))) - 1; if (maxpages > swap_header->info.last_page) maxpages = swap_header->info.last_page; p->highest_bit = maxpages - 1; _